From 1e8153c2af051ce48d5aa08d3dbdc0d0970ea532 Mon Sep 17 00:00:00 2001 From: "Suren A. Chilingaryan" Date: Wed, 22 Jan 2020 03:16:06 +0100 Subject: Document another problem with lost IPs and exhausting of SDN IP range --- docs/troubleshooting.txt | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) (limited to 'docs/troubleshooting.txt') diff --git a/docs/troubleshooting.txt b/docs/troubleshooting.txt index fd57150..1f52fe9 100644 --- a/docs/troubleshooting.txt +++ b/docs/troubleshooting.txt @@ -132,7 +132,7 @@ etcd (and general operability) certificate verification code which introduced in etcd 3.2. There are multiple bug repports on the issue. -pods (failed pods, rogue namespaces, etc...) +pods: very slow scheduling (normal start time in seconds range), failed pods, rogue namespaces, etc... ==== - OpenShift has numerous problems with clean-up resources after the pods. The problems are more likely to happen on the heavily loaded systems: cpu, io, interrputs, etc. @@ -151,6 +151,8 @@ pods (failed pods, rogue namespaces, etc...) * Apart from overloaded nodes (max cpu%, io, interrupts), PLEG issues can be caused by 1. Excessive amount of resident docker images on the node (see bellow) 2. This can cause and will be further amplified by the spurious interfaces on OpenVSwich (see bellow) + 3. Another side effect is exhausing IPs in pod network on the node as their are also not cleaned properly (see bellow). + As IPs get exhausted, scheduling penalities also rise and at some point pods will fail to schedule (but will be displayed as Ready) x. Nuanced issues between kubelet, docker, logging, networking and so on, with remediation of the issue sometimes being brutal (restarting all nodes etc, depending on the case). https://github.com/kubernetes/kubernetes/issues/45419#issuecomment-496818225 * The problem is not bound to CronJobs, but having regular scheduled jobs make it presence significantly more visible. @@ -194,6 +196,24 @@ pods (failed pods, rogue namespaces, etc...) https://bugzilla.redhat.com/show_bug.cgi?id=1518684 https://bugzilla.redhat.com/show_bug.cgi?id=1518912 + - Another related problem causing long delays in pod scheduling is indicated by the following lines + "failed to run CNI IPAM ADD: no IP addresses available in network" in the system logs: + Jan 21 14:43:01 ipekatrin2 origin-node: E0121 14:43:01.066719 93115 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "kdb-server-testing-180-build_katrin" network: CNI request failed with status 400: 'failed to run IPAM for 4b56e403e2757d38dca67831ce09e10bc3b3f442b6699c20dcd89556763e2d5d: failed to run CNI IPAM ADD: no IP addresses available in network: openshift-sdn + Jan 21 14:43:01 ipekatrin2 origin-node: E0121 14:43:01.068021 93115 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kdb-server-testing-180-build_katrin(65640902-3bd6-11ea-bbd6-0cc47adef0e6)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "kdb-server-testing-180-build_katrin" network: CNI request failed with status 400: 'failed to run IPAM for 4b56e403e2757d38dca67831ce09e10bc3b3f442b6699c20dcd89556763e2d5d: failed to run CNI IPAM ADD: no IP addresses available in network: openshift-sdn + * The problem that OpenShift (due to "ovs" problems or something else) fails to clean the network interfaces. You can check for the currently + allocated ips in: + /var/lib/cni/networks/openshift-sdn + * This can be cleaned (but better not for cron, as I don't know what happens if IP is already assigned but the docker container is not + running yet). Anyway it is better first to disable scheduling on the node (and may be even evict all running pods): + oc adm manage-node --schedulable=false + oc adm manage-node --evacuate + for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -ilr $hash ./; fi; done | xargs rm + After this, the origin-node should be restarted and scheduling could be re-enabled + systemctl restart origin-node + oc adm manage-node --schedulable=true + * It doesn't seem to be directly triggered by lost ovs interfaces (more interfaces are lost than ips). So, it is not possible to release + IPs one by one. + - After crashes / upgrades some pods may end up in 'Error' state. This is quite often happen to * kube-service-catalog/controller-manager * openshift-template-service-broker/api-server @@ -322,7 +342,7 @@ MySQL the load). SHOW PROCESSLIST; The remedy is to restart slave MySQL with 'slave_parallel_workers=0', give it a time to go, and then - restart back in the standard multithreading mode. This can be achieved by editing 'statefulset/mysql-slave-0' + restart back in the standard multithreading mode. This can be achieved by editing 'statefulset/mysql-slave' and setting environmental vairable 'MYSQL_SLAVE_WORKERS' to 0 and, then, back to original value (16 currently). - This could be not end of this. The execution of statments from the log could 'stuck' because of the some "bad" @@ -344,12 +364,12 @@ MySQL SET @@SESSION.GTID_NEXT='4ab8feff-5272-11e8-9320-08002715584a:201840' This is the gtid of the next transaction. * So, the following commands should be executed on the slave MySQL server (see details, https://www.thegeekdiary.com/how-to-skip-a-transaction-on-mysql-replication-slave-when-gtids-are-enabled/) - SLAVE STOP; + STOP SLAVE; SET @@SESSION.GTID_NEXT=''; BEGIN; COMMIT; SET GTID_NEXT='AUTOMATIC'; - SLAVE START; + START SLAVE; * It is also possible to review the stuck transaction on the slave mysql node. In the '/var/lib/mysql/data' run mysqlbinlog --start-position= -- cgit v1.2.3