问题

k8s join node出现kube-flannel-ds服务状态Init kube-poxy 一直显示containercreating。

前提镜像已经配置阿里云镜像,kubeadm join node时候提示k8s.gcr.io/pause:3.5超时。

kubeadm join 10.8.0.1:6443 –token abcdef.0123456789abcdef –discovery-token-ca-cert-hash sha256:6bca936b75c82c2de910425b3a8f33716ab432590ed7b49a7698f7a9beef6ce2

kubeadm,kubectl版本信息

1
2
3
4
5
[root@node1 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:37:34Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
[root@node1 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:38:50Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T17:57:25Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

查看日志

1
2
3
4
5
6
7
8
9
10
11
12
[root@node1 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7568f67dbd-5cfz7 1/1 Running 0 95m
coredns-7568f67dbd-hnhtq 1/1 Running 0 95m
etcd-master 1/1 Running 0 4h46m
kube-apiserver-master 1/1 Running 0 4h46m
kube-controller-manager-master 1/1 Running 0 4h46m
kube-flannel-ds-bm2pd 0/1 Init:0/2 0 27m
kube-flannel-ds-vp5ln 1/1 Running 0 84m
kube-proxy-lqv8g 0/1 ContainerCreating 0 27m
kube-proxy-nxtr2 1/1 Running 0 4h46m
kube-scheduler-master 1/1 Running 0 4h46m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
[root@node1 ~]# kubectl describe pod kube-proxy-lqv8g -n kube-system
Name: kube-proxy-lqv8g
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: node1/10.8.0.14
Start Time: Tue, 16 Nov 2021 22:40:11 +0800
Labels: controller-revision-hash=76774c76cf
k8s-app=kube-proxy
pod-template-generation=1
Annotations: <none>
Status: Pending
IP: 10.8.0.14
IPs:
IP: 10.8.0.14
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID:
Image: registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.0
Image ID:
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/kube-proxy
--config=/var/lib/kube-proxy/config.conf
--hostname-override=$(NODE_NAME)
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mczsl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
kube-api-access-mczsl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m1s default-scheduler Successfully assigned kube-system/kube-proxy-lqv8g to node1
Warning FailedCreatePodSandBox 2m44s (x7 over 8m30s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "k8s.gcr.io/pause:3.5": failed to pull image "k8s.gcr.io/pause:3.5": failed to pull and unpack image "k8s.gcr.io/pause:3.5": failed to resolve reference "k8s.gcr.io/pause:3.5": failed to do request: Head "https://k8s.gcr.io/v2/pause/manifests/3.5": dial tcp 64.233.189.82:443: i/o timeout
Warning FailedCreatePodSandBox 30s (x5 over 4m8s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "k8s.gcr.io/pause:3.5": failed to pull image "k8s.gcr.io/pause:3.5": failed to pull and unpack image "k8s.gcr.io/pause:3.5": failed to resolve reference "k8s.gcr.io/pause:3.5": failed to do request: Head "https://k8s.gcr.io/v2/pause/manifests/3.5": dial tcp 108.177.125.82:443: i/o timeout

查询设置环境变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
[root@node1 ~]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 二 2021-11-16 22:40:05 CST; 40min ago
Docs: https://kubernetes.io/docs/
Main PID: 18472 (kubelet)
CGroup: /system.slice/kubelet.service
└─18472 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/...

11月 16 23:07:16 node1 kubelet[18472]: E1116 23:07:16.387147 18472 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.5\": failed...
11月 16 23:07:16 node1 kubelet[18472]: E1116 23:07:16.387214 18472 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.5\": failed to p...
11月 16 23:07:16 node1 kubelet[18472]: E1116 23:07:16.387246 18472 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.5\": failed to p...
11月 16 23:07:16 node1 kubelet[18472]: E1116 23:07:16.387315 18472 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-proxy-lqv8g_kube-system(bae0bd89-ffd6-4917-811d-720...ube-system(bae0bd
11月 16 23:07:30 node1 kubelet[18472]: E1116 23:07:30.380925 18472 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.5\": failed...
11月 16 23:07:30 node1 kubelet[18472]: E1116 23:07:30.380996 18472 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.5\": failed to p...
11月 16 23:07:30 node1 kubelet[18472]: E1116 23:07:30.381024 18472 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.5\": failed to p...
11月 16 23:07:30 node1 kubelet[18472]: E1116 23:07:30.381095 18472 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-flannel-ds-bm2pd_kube-system(02cb0dbb-cf25-4f63-904...ds-bm2pd_kube-sys
11月 16 23:07:57 node1 kubelet[18472]: W1116 23:07:57.733926 18472 manager.go:1176] Failed to process watch event {EventType:0 Name:/system.slice/containerd.service/kubepods-burstable-pod02cb0dbb_cf25_4f63_9041_af845af84298.slice:cr...
11月 16 23:09:43 node1 kubelet[18472]: W1116 23:09:43.628417 18472 manager.go:1176] Failed to process watch event {EventType:0 Name:/system.slice/containerd.service/kubepods-burstable-pod02cb0dbb_cf25_4f63_9041_af845af84298.slice:cr...
Hint: Some lines were ellipsized, use -l to show in full.

# 查看配置
[root@node1 ~]# cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

# 查看环境变量
[root@node1 ~]# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/k8sxio/pause:3.5"

从配置看已经设置成registry.aliyuncs.com/k8sxio/pause:3.5镜像但是还是使用的k8s.gcr.io/pause:3.5,怀疑是不是bug???

解决方案

需要自己从阿里云镜像拉下来打成 k8s.gcr.io/pause:3.5tag

1
2
3
4
5
6
7
8
9
10
11
[root@node1 ~]# ctr -n k8s.io i pull registry.aliyuncs.com/k8sxio/pause:3.5
registry.aliyuncs.com/k8sxio/pause:3.5: resolved |++++++++++++++++++++++++++++++++++++++|
index-sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07: done |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:369201a612f7b2b585a8e6ca99f77a36bcdbd032463d815388a96800b63ef2c8: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:019d8da33d911d9baabe58ad63dea2107ed15115cca0fc27fc0f627e82a695c1: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.9 s total: 4.8 Ki (5.3 KiB/s)
unpacking linux/amd64 sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07...
done: 65.565619ms
[root@node1 ~]# ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5
k8s.gcr.io/pause:3.5
1
2
3
4
5
6
7
8
9
10
11
12
[root@node1 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7568f67dbd-5cfz7 1/1 Running 0 98m
coredns-7568f67dbd-hnhtq 1/1 Running 0 98m
etcd-master 1/1 Running 0 4h50m
kube-apiserver-master 1/1 Running 0 4h50m
kube-controller-manager-master 1/1 Running 0 4h50m
kube-flannel-ds-bm2pd 1/1 Running 0 30m
kube-flannel-ds-vp5ln 1/1 Running 0 88m
kube-proxy-lqv8g 1/1 Running 0 30m
kube-proxy-nxtr2 1/1 Running 0 4h50m
kube-scheduler-master 1/1 Running 0 4h50m