查看错误日志

安装 flannel后重启coredns时候出现下面错误,使用 kubectl describe pod xxx命令查看events内容,出现对应节点的cni网络冲突错误信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
[root@master1 ~]# kubectl describe pod coredns-7568f67dbd-5cfz7 -n kube-system
Name: coredns-7568f67dbd-5cfz7
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master/10.0.16.13
Start Time: Tue, 16 Nov 2021 21:32:13 +0800
Labels: k8s-app=kube-dns
pod-template-hash=7568f67dbd
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-7568f67dbd
Containers:
coredns:
Container ID:
Image: registry.aliyuncs.com/k8sxio/coredns:v1.8.4
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nsq95 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-nsq95:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 65s default-scheduler Successfully assigned kube-system/coredns-7568f67dbd-5cfz7 to master
Warning FailedCreatePodSandBox 65s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "71141a403941f008a1b38c52d1ebcfbabe6e3cdabbcca7c70705b3fc2cfb2bac": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24
Warning FailedCreatePodSandBox 53s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6a5e7790ced985e85e86884617ccbd45035a9b9a90df25abd41935517944e9e3": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24
Warning FailedCreatePodSandBox 38s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ed58c187e565cdaf4f290ad16cdd951f27d71b6c840eaa4f4d2f4dea4c2607e5": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24
Warning FailedCreatePodSandBox 26s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c4c8776f595aae106e4cd18544a4a27fc95aa495d0cc1f173dc2759f55a16394": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24
Warning FailedCreatePodSandBox 12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7624f0f631f8aa8ee70f3bbdc32b83a9c7b317ae45d87b2961a1e37b7f4249da": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24
Warning FailedCreatePodSandBox 1s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "983f12b8967893877220c56dab792704338a9bdba1efca1a5e3924191045b122": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24

查询

查看出错节点cni0的网卡配置,发现cni0的这个网卡地址是10.88.0.1,明显与报错中的10.244.0.1不一致

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
[root@master1 ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

[root@master1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:9c:d3:b7 brd ff:ff:ff:ff:ff:ff
inet 10.0.16.13/22 brd 10.0.19.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe9c:d3b7/64 scope link
valid_lft forever preferred_lft forever
8: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
link/none
inet 10.8.0.1 peer 10.8.0.2/32 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::f497:602f:fb17:45a8/64 scope link flags 800
valid_lft forever preferred_lft forever
9: cni0: <NO-CARRIER,BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 86:a0:2d:45:a2:6b brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global cni0
valid_lft forever preferred_lft forever
inet6 2001:4860:4860::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::84a0:2dff:fe45:a26b/64 scope link
valid_lft forever preferred_lft forever
12: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 32:3c:2a:3e:11:e8 brd ff:ff:ff:ff:ff:ff
13: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 6e:61:19:99:ff:ef brd ff:ff:ff:ff:ff:ff
inet 10.96.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
14: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 4a:0f:6d:58:fe:29 brd ff:ff:ff:ff:ff:ff
inet 10.244.0.0/32 brd 10.244.0.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::480f:6dff:fe58:fe29/64 scope link
valid_lft forever preferred_lft forever

问题解决

我们可以将其改为10.244.1.1,也可将这个错误的网卡删掉,它会自己重建,这里采用删除重生的方法,首先停用网络,然后删除配置

1
2
ifconfig cni0 down    
ip link delete cni0

然后查看节点重建的cni0网卡,会依据flannel的网络环境配置生成

1
2
3
4
5
6
7
8
9
[root@master1 ~]# ifconfig cni0
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.244.0.255
inet6 fe80::d483:eff:fe7f:a866 prefixlen 64 scopeid 0x20<link>
ether d6:83:0e:7f:a8:66 txqueuelen 1000 (Ethernet)
RX packets 2470 bytes 199331 (194.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2369 bytes 250600 (244.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

清理

如果你的集群安装过程中遇到了其他问题,我们可以使用下面的命令来进行重置:

1
2
3
4
kubeadm reset
ifconfig cni0 down && ip link delete cni0
ifconfig flannel.1 down && ip link delete flannel.1
rm -rf /var/lib/cni