Update monitoring scripts to track leftover OpenVSwitch 'veth' interfaces and clean them up pereodically to avoid performance degradation, split kickstart

author: Suren A. Chilingaryan <csa@suren.me> 2018-07-05 06:29:09 +0200
committer: Suren A. Chilingaryan <csa@suren.me> 2018-07-05 06:29:09 +0200
commit: 2c3f1522274c09f7cfdb6309adc0719f05c188e9 (patch)
tree: e54e0c26f581543f48e945f186734e4bd9a8f15a /docs
parent: 8af0865a3a3ef783b36016c17598adc9d932981d (diff)
download: ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.gz
ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.bz2
ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.xz
ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.zip
6 files changed, 434 insertions, 2 deletions
diff --git a/docs/consistency.txt b/docs/consistency.txt
index caaaf36..dcf311a 100644
--- a/docs/consistency.txt
+++ b/docs/consistency.txt
@@ -39,7 +39,17 @@ Networking
  
  - Ensure, we don't have override of cluster_name to first master (which we do during the
  provisioning of OpenShift plays)
- 
+
+ - Sometimes OpenShift fails to clean-up after terminated pod properly. This causes rogue
+ network interfaces to remain in OpenVSwitch fabric. This can be determined by errors like:
+    could not open network device vethb9de241f (No such device)
+ reported by 'ovs-vsctl show' or present in the log '/var/log/openvswitch/ovs-vswitchd.log' 
+ which may quickly grow over 100MB quickly. If number of rogue interfaces grows too much,
+ the pod scheduling will start time-out on the affected node. 
+  * The work-around is to delete rogue interfaces with 
+    ovs-vsctl del-port br0 <iface>
+ This does not solve the problem, however. The new interfaces will get abandoned by OpenShift.
+
 
 ADEI
 ====
diff --git a/docs/kickstart.txt b/docs/kickstart.txt
index 1331542..b94b0f6 100644
--- a/docs/kickstart.txt
+++ b/docs/kickstart.txt
@@ -11,4 +11,14 @@ Troubleshooting
         dmsetup remove_all
         dmsetup remove <name>
         
- 
-\ No newline at end of file
+  - Sometimes even this does not help. 
+    > On CentOS 7.4 mdadm does not recognize the disk, but LVM thinks it is
+    part of MD. Then cleaning last megabytes of the former md partition may help.
+    > On Fedora 28, mdadm detects the old array and tries to "tear down" it down, but
+    fails as raid array is already innactive.
+    
+    * If raid is still more-or-less healthy. It can be destroyed with
+        mdadm --zero-superblock /dev/sdb3
+    * Otherwise:
+        dd if=/dev/zero of=/dev/sda4 bs=512 seek=$(( $(blockdev --getsz /dev/sda4) - 1024 )) count=1024
+
diff --git a/docs/logs.txt b/docs/logs.txt
new file mode 100644
index 0000000..e27b1ff
--- /dev/null
+++ b/docs/logs.txt
@@ -0,0 +1,36 @@
+/var/log/messages
+=================
+ - Various RPC errors. 
+    ... rpc error: code = # desc = xxx ...
+ 
+ - container kill failed because of 'container not found' or 'no such process': Cannot kill container ###: rpc error: code = 2 desc = no such process"
+    Despite the errror, the containers are actually killed and pods destroyed. However, this error likely triggers
+    problem with rogue interfaces staying on the OpenVSwitch bridge.
+
+ - containerd: unable to save f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a starttime: read /proc/81994/stat: no such process
+   containerd: f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a (pid 81994) has become an orphan, killing it
+    Seems a bug in docker 1.12* which is resolved in 1.13.0rc2. No side effects according to the issue.
+        https://github.com/moby/moby/issues/28336
+
+ - W0625 03:49:34.231471   36511 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": Unexpected command output nsenter: cannot open /proc/63586/ns/net: No such file or directory
+ - W0630 21:40:20.978177    5552 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "..."
+    Probably refered by the following bug report and accordingly can be ignored...
+        https://bugzilla.redhat.com/show_bug.cgi?id=1434950 
+
+ - E0630 14:05:40.304042    5552 glusterfs.go:148] glusterfs: failed to get endpoints adei-cfg[an empty namespace may not be set when a resource name is provided]
+   E0630 14:05:40.304062    5552 reconciler.go:367] Could not construct volume information: MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/4
+    I guess some configuration issue.... Probably can be ignored...
+
+ - kernel: SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
+    There are no adverse effects to this.  It is a potential kernel issue, but should be just ignored by the customer.  Nothing is going to break.
+        https://bugzilla.redhat.com/show_bug.cgi?id=1425278
+
+
+ - E0625 03:59:52.438970   23953 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted
+    seems fine and can be ignored.
+
+    
+/var/log/openvswitch/ovs-vswitchd.log
+=====================================
+ - bridge|WARN|could not open network device veth7d33a20f (No such device)
+    Indicates cleanup pod-cleanup failure and may cause problems during pod-scheduling.
diff --git a/docs/problems.txt b/docs/problems.txt
new file mode 100644
index 0000000..4be9dc7
--- /dev/null
+++ b/docs/problems.txt
@@ -0,0 +1,103 @@
+Actions Required
+================
+ * Long-term solution to 'rogue' interfaces is unclear. May require update to OpenShift 3.9 or later.
+ However, proposed work-around should do unless execution rate grows significantly.
+ * All other problems found in logs can be ignored.
+ 
+
+Rogue network interfaces on OpenVSwitch bridge
+==============================================
+ Sometimes OpenShift fails to clean-up after terminated pod properly. The actual reason is unclear.
+  * The issue is discussed here:
+        https://bugzilla.redhat.com/show_bug.cgi?id=1518684
+  * And can be determined by looking into:
+    ovs-vsctl show
+
+ Problems:
+  * As number of rogue interfaces grow, it start to have impact on performance. Operations with
+  ovs slows down and at some point the pods schedulled to the affected node fail to start due to
+  timeouts. This is indicated in 'oc describe' as: 'failed to create pod sandbox'
+
+ Cause:
+  * Unclear, but it seems periodic ADEI cron jobs causes the issue.
+  * Could be related to 'container kill failed' problem explained in the section bellow.
+     Cannot kill container ###: rpc error: code = 2 desc = no such process
+
+         
+ Solutions:
+  * According to RedHat the temporal solution is to reboot affected node (not tested yet). The problem
+  should go away, but may re-apper after a while. 
+  * The simplest work-around is to just remove rogue interface. They will be re-created, but performance
+  problems only starts after hundreds accumulate.
+    ovs-vsctl del-port br0 <iface>
+  
+ Status:
+   * Cron job is installed which cleans rogue interfaces as they number hits 25.
+
+
+Orphaning / pod termination problems in the logs
+================================================
+ There is several classes of problems reported with unknow reprecursions in the system log. Currently, I
+ don't see any negative side effects except some of these issues may trigger "rogue interfaces" problem.
+
+ ! container kill failed because of 'container not found' or 'no such process': Cannot kill container ###: rpc error: code = 2 desc = no such process"
+
+   Despite the errror, the containers are actually killed and pods destroyed. However, this error likely triggers
+   problem with rogue interfaces staying on the OpenVSwitch bridge.
+    
+  Scenario:
+    * happens with short-living containers 
+
+ - containerd: unable to save f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a starttime: read /proc/81994/stat: no such process
+   containerd: f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a (pid 81994) has become an orphan, killing it
+    
+  Scenario:
+    This happens every couple of minutes and attributed to perfectely alive and running pods. 
+    * For instance, ipekatrin1 was complaining some ADEI pod.
+    * After I removed this pod, it immidiately started complaining on 'glusterfs' replica.
+    * If 'glusterfs' pod re-created, the problem persist.
+    * It seems only a single pod is affected at each given moment (at least this was always true 
+    on ipekatrin1 & ipekatrin2 while I was researching the problem)
+    
+  Relations:
+    * This problem is not aligned with the next 'container not found' problem. One happens with short-living containers which
+    actually get destroyed. This one is triggered for persistent container which keep going. And in fact this problem is triggered
+    significantly more frequently.
+
+  Cause:
+    * Seems related to docker health checks due to a bug in docker 1.12* which is resolved in 1.13.0rc2
+        https://github.com/moby/moby/issues/28336
+        
+  Problems:
+    * It seems only extensive logging, according to the discussion in the issue
+
+  Solution: Ignore for now
+    * docker-1.13 had some problems with groups (I don't remember exactly) and it was decided to not run it with current version of KaaS.
+    * Only update docker after extensive testing on the development cluster or not at all.
+
+ - W0625 03:49:34.231471   36511 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": Unexpected command output nsenter: cannot open /proc/63586/ns/net: No such file or directory
+ - W0630 21:40:20.978177    5552 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "..."
+  Scenario:
+    * It seems can be ignored, see RH bug.
+    * Happens with short-living containers (adei cron jobs)
+
+  Relations:
+    * This is also not aligned with 'container not found'. The time in logs differ significantly.
+    * It is also not aligned with 'orphan' problem.
+
+  Cause:
+    ? https://bugzilla.redhat.com/show_bug.cgi?id=1434950 
+
+ - E0630 14:05:40.304042    5552 glusterfs.go:148] glusterfs: failed to get endpoints adei-cfg[an empty namespace may not be set when a resource name is provided]
+   E0630 14:05:40.304062    5552 reconciler.go:367] Could not construct volume information: MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/4
+
+    I guess some configuration issue.... Probably can be ignored...
+
+  Scenario:
+    * Reported on long running pods with persistent volumes (katrin, adai-db)
+    * Also seems an unrelated set of the problems.
+
+
+
+
+
diff --git a/docs/projects/katrindb.txt b/docs/projects/katrindb.txt
new file mode 100644
index 0000000..0a14a25
--- /dev/null
+++ b/docs/projects/katrindb.txt
@@ -0,0 +1,255 @@
+# Steps to setup KDB infrastructure in OpenShift
+
+Web interface: https://kaas.kit.edu:8443/console/
+
+Commandline interface:
+```
+oc login kaas.kit.edu:8443
+oc project katrin
+```
+
+
+## Overview
+
+The setup uses (at least) three containers:
+* `kdb-backend` is a MySQL/MariaDB container that provides the database backend
+  used by KDB server. It hosts the `katrin` and `katrin_run` databases.
+* `kdb-server` runs the KDB server process inside an Apache environment. It
+  provides the web interface (`kdb-admin.fcgi`) and the KaLi service
+  (`kdb-kali.fcgi`).
+* `run-processing` periodically retrieves run files from several DAQ machines
+  and adds the processed files to the KDB runlist. This process could be
+  distributed over several containers for the individual systems (`fpd` etc.)
+
+> The ADEI server hosting the `adei` MySQL database runs in an independent project with hostname `mysql.adei.svc`.
+
+A persistent storage volume is needed for the MySQL data (volume group `db`)
+and for the copied/processed run files (volume group `katrin`). The latter one
+is shared between the KDB server and run processing applications.
+
+
+## MySQL backend
+
+### Application
+
+This container is based on the official Redhat MariaDB Docker image. The 
+OpenShift application is created via the CLI:
+```
+oc new-app -e MYSQL_ROOT_PASSWORD=XXX --name=kdb-backend registry.access.redhat.com/rhscl/mariadb-101-rhel7
+```
+Because KDB uses two databases (`katrin`, `katrin_run`) and must be permitted
+to create/edit database users, it is required to define a root password here.
+
+### Volumes
+
+This container needs a persistent storage volume for the database content. In
+OpenShift this is done by removing the default storage and adding a persistent
+volume `kdb-backend` for MySQL data: `db: /kdb/mysql/data -> /var/lib/mysql/data`
+
+### Final steps
+
+It makes sense to add readiness/liveness probes as well: TCP socket, port 3306.
+
+> It is possible to access the MySQL server inside a container: `mysql -h kdb-backend.katrin.svc -u root -p -A`
+
+
+## KDB server
+
+### Application
+
+The container is created from a `Dockerfile` available in GitLab:
+https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/kdbserver
+
+The app is created via the CLI, but manual changes are necessary later on:
+```
+oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=kdb-server
+```
+
+> The build fails because the branch name and user credentials are not defined.
+
+The build settings must be adapted before the image can be created.
+* Set the git branch name to `kdbserver`.
+* Add a source secret `katrin-gitlab` that provides the git user credentials,
+  i.e. the `katrin` username and corresponding password for read-only access.
+
+When a container instance (pod) is created in OpenShift, the main script
+`/run-httpd.sh` starts the Apache webserver with the KDB fastcgi module.
+
+### Volumes
+
+Just like the MySQL backend, the container needs persistent storage enabled: `katrin: /data -> /mnt/katrin/data`
+
+### Config Maps
+
+Some default configuration files for the Apache web server and the KDB server
+installation are provided with the Dockerfile. The webserver config should 
+work correctly as it is. The main config must be updated so that the correct
+servers/databases are used. A config map `kdbserver-config` is created with
+mountpoint `/config` in the container:
+* `kdbserver.conf` is the main config for the KDB server instance. For the
+  steps outlined here, it should contain the following entries:
+
+```
+sql_server        = kdb-backend.katrin.svc
+sql_adei_server   = mysql.adei.svc
+
+sql_katrin_dbname = katrin
+sql_run_dbname    = katrin_run
+sql_adei_dbname   = adei_katrin
+
+sql_user          = root
+sql_password      = XXX
+sql_adei_user     = katrin
+sql_adei_password = XXX
+
+use_adei_cache    = true
+adei_service_url  = http://adei-katrin.kaas.kit.edu/adei
+adei_public_url   = http://katrin.kit.edu/adei-katrin
+```
+* `log4cxx.properties` defines the terminal/logfile output settings. By default,
+  all log output is shown on `stdout` (and visible in the OpenShift log).
+
+> Files in `/config` are symlinked to the respective files inside the container by `/run-httpd.sh`.
+
+### Database setup
+
+The KDB server sources provide a SQL dump file to initialize the database. To
+create an empty database with all necessary tables, run the `mysql` command:
+```
+mysql -h kdb-backend.katrin.svc -u root -p < /src/kdbserver/Data/katrin-db.sql
+```
+
+Alternatively, a full backup of the existing database can be imported:
+```
+tar -xJf /src/kdbserver/Data/katrin-db-bkp.sql.xz -C /tmp
+mysql -h kdb-backend.katrin.svc -u root -p < /tmp/katrin-db-bkp.sql
+```
+
+> To clean a database table, execute a MySQL `drop table` statement and re-initialize the dropped tables from the `katrin-db.sql` file.
+
+### IDLE storage
+
+IDLE provides a local storage on the server-side file system. An empty IDLE
+repository with default datasets is created by executing this command:
+```
+/opt/kasper/bin/idle SetupPublicDatasets
+```
+
+This creates a directory `.../storage/idle/KatrinIdle` on the storage volume
+that can be filled with contents from a backup archive. The `oc rsync` command
+allows to transfer files to a running container (pod) in OpenShift.
+
+> After restoring one should fix all permissions so that KDB can access the data.
+
+
+
+### Final steps
+
+Again a readiness/liveness probe can be added: TCP socket, port 80.
+
+To make the KDB server interface accessible to the outside, a route must be
+added in OpenShift: `http://kdb.kaas.kit.edu -> kdb-server:80`
+
+> The web interface is now available at http://kdb.kaas.kit.edu/kdb-admin.fcgi
+
+
+## Run processing
+
+### Application
+
+The setup for the run processing service is similar to the KDB server, with
+the container being created from a GitLab `Dockerfile` as well:
+https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/inlineprocessing
+The app is created via the CLI, but manual changes are necessary later on:
+```
+oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=run-processing
+```
+
+> The build fails because the branch name and user credentials are not defined.
+
+The build settings must be adapted before the image can be created.
+* Set the git branch name to `inlineprocessing`.
+* Use the source secret `katrin-gitlab` that was created before.
+
+#### Run environment
+
+When a container instance (pod) is created in OpenShift, the main script
+`/run-loop.sh` starts the main processing script `process-system.py`. It
+is executed in a continuous loop with a user-defined delay. The script
+is configured by the following environment variables that can be defined
+in the OpenShift configuration:
+* `PROCESS_SYSTEMS` defines one or more DAQ systems configured in the file
+  `ProcessingConfig.py`: `fpd`, `mos`, etc.
+* `PROCESS_FLAGS` defines additional options passed to the script, e.g.
+  `--pull` to automatically retrieve run files from configured DAQ machines.
+* `REFRESH_INTERVAL` defines the waiting time between consecutive executions.
+  Note that the `/run-loop.sh` script waits until `process-system.py` finished
+  before the next loop iteration is started, so the delay time is always
+  included regardless of how long the script takes to process all files.
+
+### Volumes
+
+The run processing stores files that need to be accessible by the KDB server
+application. Hence, the same persistent volume is used in this container:
+`katrin: data -> /mnt/katrin/data`
+
+To ensure that all processes can read/write correctly, the file permissions are
+relaxed (this can be done in an OpenShift terminal or remote shell):
+```
+mkdir -p /mnt/katrin/data/{inbox,archive,storage,workspace,logs,tmp}
+chown -R katrin: /mnt/katrin/data
+chmod -R ug+rw /mnt/katrin/data
+```
+
+### Config Maps
+
+Just like with the KDB server, a config map `run-processing-config` with
+mountpoint `/config` should be added, which defines the configuration of the
+processing script:
+* `ProcessingConfig.py` is the main config where the DAQ machines are defined
+  with their respective storage paths. The file also defines a list of
+  processing steps to be executed for each run file; these steps may have
+  to be adapted where necessary.
+* `datamanager.cfg` defines the interface to the KaLi web service. It must be
+  configured so that the KDB server instance from above is used:
+
+```
+url = http://kdb-server.katrin.svc/kdb-kali.fcgi
+user = katrin
+password = XXX
+timeout_seconds = 300
+cache_age_hours = -1
+```
+* `rsync-filter` is applied with the `rsync` command that copies run files
+  from the DAQ machines. It can be adapted to exclude certain directories,
+  e.g. old run files that do not need to be processed.
+* `log4cxx.properties` configures terminal/logfile output, see above.
+
+> Files in `/config` are symlinked to the respective files inside the container by `/run-loop.sh`.
+
+#### SSH keys
+
+A second config map `run-processing-ssh` is required to provide SSH keys that
+are used to authenticate remote connections to the DAQ machines. The map with
+mountpoint `/.ssh` should contain the files `id_dsa`, `id_dsa.pub` and
+`known_hosts` and must be adapted as necessary.
+
+> This assumes that the SSH credentials have been added to the respective machines beforehand!
+
+> The contents of `known_hosts` should be updated with the output of `ssh-keyscan` for the configured DAQ machines.
+
+### Notes
+
+The script `/run-loop.sh` pulls files from the DAQ machines and processes
+them automatically, newest first. Where necessary, run files can be copied
+manually (FPD example; adapt the options and `rsync-filter` file as required):
+```
+rsync -rltD --verbose --append-verify --partial --stats --compare-dest=/mnt/katrin/data/archive/FPDComm_530 --filter='. /opt/processing/system/rsync-filter' --log-file='/mnt/katrin/data/logs/rsync_fpd.log' katrin@192.168.110.76:/Volumes/DAQSTORAGE/data/ /mnt/katrin/data/inbox/FPDComm_530
+```
+
+If runs were not processed correctly, one can trigger manual reprocessing
+from an OpenShift terminal (with run numbers `START`, `END` as necessary):
+```
+./process-system.py -s fpd -r START END
+```
+
diff --git a/docs/troubleshooting.txt b/docs/troubleshooting.txt
index ae43c52..9fa6f91 100644
--- a/docs/troubleshooting.txt
+++ b/docs/troubleshooting.txt
@@ -134,6 +134,22 @@ etcd (and general operability)
  
 pods (failed pods, rogue namespaces, etc...)
 ====
+ - The 'pods' scheduling may fail on one (or more) of the nodes after long waiting with 'oc logs' reporting
+ timeout. The 'oc describe' reports 'failed to create pod sandbox'. This can be caused by failure to clean-up 
+ after terminated pod properly. It causes rogue network interfaces to remain in OpenVSwitch fabric. 
+  * This can be determined by errors reported using 'ovs-vsctl show' or present in the log '/var/log/openvswitch/ovs-vswitchd.log' 
+    which may quickly grow over 100MB quickly. 
+        could not open network device vethb9de241f (No such device)
+  * The work-around is to delete rogue interfaces with 
+        ovs-vsctl del-port br0 <iface>
+    More info:
+        ovs-ofctl -O OpenFlow13 show br0
+        ovs-ofctl -O OpenFlow13 dump-flows br0
+    This does not solve the problem, however. The new interfaces will get abandoned by OpenShift.
+  * The issue is discussed here:
+        https://bugzilla.redhat.com/show_bug.cgi?id=1518684
+        https://bugzilla.redhat.com/show_bug.cgi?id=1518912
+        
  - After crashes / upgrades some pods may end up in 'Error' state. This is quite often happen to
     * kube-service-catalog/controller-manager
     * openshift-template-service-broker/api-server
@@ -185,6 +201,8 @@ pods (failed pods, rogue namespaces, etc...)
                 docker ps -aq --no-trunc | xargs docker rm
 
 
+
+
 Builds
 ======
  - After changing storage for integrated docker registry, it may refuse builds with HTTP error 500. It is necessary
author	Suren A. Chilingaryan <csa@suren.me>	2018-07-05 06:29:09 +0200
committer	Suren A. Chilingaryan <csa@suren.me>	2018-07-05 06:29:09 +0200
commit	2c3f1522274c09f7cfdb6309adc0719f05c188e9 (patch)
tree	e54e0c26f581543f48e945f186734e4bd9a8f15a /docs
parent	8af0865a3a3ef783b36016c17598adc9d932981d (diff)
download	ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.gz ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.bz2 ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.tar.xz ands-2c3f1522274c09f7cfdb6309adc0719f05c188e9.zip