diff options
| author | Suren A. Chilingaryan <csa@suren.me> | 2025-12-09 16:14:26 +0000 |
|---|---|---|
| committer | Suren A. Chilingaryan <csa@suren.me> | 2025-12-09 16:14:26 +0000 |
| commit | 77aa9c433f9255d713394e3b25987fa2b4a03a1a (patch) | |
| tree | ddc5d87bf838bd589f36b43b53955ad8207796a2 | |
| parent | d35216ee0cbf9f1a84a6d4151daf870b1ff00395 (diff) | |
| download | ands-77aa9c433f9255d713394e3b25987fa2b4a03a1a.tar.gz ands-77aa9c433f9255d713394e3b25987fa2b4a03a1a.tar.bz2 ands-77aa9c433f9255d713394e3b25987fa2b4a03a1a.tar.xz ands-77aa9c433f9255d713394e3b25987fa2b4a03a1a.zip | |
20 files changed, 652 insertions, 66 deletions
diff --git a/group_vars/all.yml b/group_vars/all.yml index aef2251..b3a805d 100644 --- a/group_vars/all.yml +++ b/group_vars/all.yml @@ -1,4 +1,4 @@ ansible_ssh_user: root -ansible_ssh_private_key_file: /home/csa/.ssh/id_dsa +ansible_ssh_private_key_file: /home/csa/.ssh/id_rsa glusterfs_version: 312 diff --git a/inventories/production.erb b/inventories/production.erb index 575a86f..edd92c3 100644 --- a/inventories/production.erb +++ b/inventories/production.erb @@ -1,7 +1,9 @@ [masters] +#ipekatrin2.ipe.kit.edu ipekatrin[1:2].ipe.kit.edu [etcd] +#ipekatrin2.ipe.kit.edu ipekatrin[1:3].ipe.kit.edu [simple_storage_nodes] @@ -1,81 +1,49 @@ - System - ------- - 2025.09.28 - - ipekatrin1: - * Raid controller don't see 10 disks and behaves erratically. - * Turned of the server and ordered a replacement. - - Sotrage: - * Restarted degraded GlusterFS nodes and make them work on remaining 2 nodes (1 replica + metadata for most of our storage needs). - * Turned out 'database' volume is created in Raid-0 mode and it used backend for KDB database. So, data is gone. - * Recovered KDB database from backups and moved it to glusterfs/openshift volume. Nothing left on 'database' volume. Can be turned off. - - 2025.10.27 - - ipekatrin1: - * Disconnected all disks from the server and start preparing it as an application node - - Software: - * I have temporarily suspended all ADEI cronJobs to avoid resource contention on ipekatrin2 (as restart would be dangerous now) [clean (logs,etc.)/maintain (re-caching,etc.)/update(detecting new databases)] - - Research: - * DaemonSet/GlusterFS selects nodes based on the following nodeSelector - $ oc -n glusterfs get ds glusterfs-storage -o yaml | grep -B 5 -A 5 nodeSelector - nodeSelector: - glusterfs: storage-host - All nodes has corresponding labels in their metadata: - $ oc get node/ipekatrin1.ipe.kit.edu --show-labels -o yaml | grep -A 20 labels: - labels: - ... - glusterfs: storage-host - ... - * Thats removed now from ipekatrin1 and should be recovered if we bring storage back - oc label --dry-run node/ipekatrin1.ipe.kit.edu glusterfs- - * We further need to remove 192.168.12.1 from 'endpoints/gfs' (per namespaces) to avoid possible problems. - * On ipekatrin1, /etc/fstab glusterfs mounts should be changed from 'localhost' to some other server (or commented all-together). GlusterFS mounts - should be changed from localhost to - 192.168.12.2,192.168.12.3:<vol> /mnt/vol glusterfs defaults,_netdev 0 0 - * All raid volumes be also temporarily commented in /etc/fstab - * Further configuration changes required to run node without glusterfs causing no damage to the rest of the system - GlusterFS might be referenced via: /etc/hosts, /etc/fstab, /etc/systemd/system/*.mount /etc/auto.*, scripts/cron - endpoints (per namespace), inline gluster volumes in PV (gloabl), - gluster-block endpoints / tcmu gateway list, sc (heketi storageclass) and controllers (ds,deploy,sts); just in case check heketi cm/secrets), - - Plan: - * Prepare application node [double-check before implementing] - > Adjust /etc/fstab and check systemd based mounts. Shall we do soemth with hosts? - > Check/change cron & monitoring scipts - > Adjust node label and edit 'gfs' endpoints in all namespaces. - > Check glusterblock/heketi, stange pvs. - > Google above other possible culprits. - > Boot ipekatrin1 and check that all is fine - * cronJobs - > Set affinity to ipekatrin1. - > Restart cronJobs (maybe reduce intervals) - * ToDo - > Ideally eliminating cronJobs all together for rest of KaaS1 life-time and replacing with continuously running cron daemon iside container - > Rebuild ipekatrinbackupserv1 as new gluster node (using disks) and try connecting it to the cluster - Hardware -------- 2024 - ipekatrin1: Replaced disk in section 9. LSI software reports all is OK, but hardware led indicates a error (red). Probably indicator is broken. 2025.09 (early month) - - ipekatrin1: Replaced 3 disks (don't remeber slots). two of them was already once replaced. + - ipekatrin2: Replaced 3 disks (don't remeber slots). two of them was already once replaced. - Ordered spare disks 2025.10.23 - - ipekatrin1: - * Replaced RAID controller. Make attempt to rebuild, but disks are disconnected after about 30-40 minutes (recovered after shutoff, not reboot) - * Checked power issues: cabling bypassing PSU and monitoring voltages (12V system should not go bellow 11.9V). No change, voltages seemed fine. - * Checked cabling issues disconnecting first one cable and then another (supported mode, single cable connects all disks). No change - * Tried to imrpove cooling, setting fan speeds to maximum (kept) and even temporarily installing external cooler. Radiators were cool, also checked reported temperatures. No change, still goes down in 30-40 minutes. - * Suspect backplane problems. The radiators were quite hot before adjusting cooling. Seems known stability problems due to bad signal management in firmware if overheated. Firmware updates are suggest to stabilize. - * No support by SuperMicro. Queried Tootlec about possibility of getting firmware update or/and ordering backplane [Order RG_014523_001_Chilingaryan form 16.12.2016, Angebot 14.10, Contract: 28.11] - Hardware: Chassis CSE-846BE2C-R1K28B, Backplan BPN-SAS3-846EL2), 2x MCX353A-FCB ConnectX-3 VPI - * KATRINBackupServ1 (3-years older) has backplane with enough bays to mount disks. We still need to be able to put Raid-card and Mellanox ConnectX-3 board/boards with 2 ports (can leave with 1). - ipekatrin2: Noticed and cleared RAID alarm attributed to the battery subsystem. * No apparent problems at the moment. Temperatures are all in order. Battery reports healthy. Systems works as usual. - * Setup temperature monitoring of RAID card, currently 76-77C - + + 2025.09.28 - 2025.11.03 + - ipekatrin1: Raid controller failed. The system was not running stable after replacement (disk disconnect after 20-30m operation) + - ipekatrin1: Temporarily converted in the master-only node (apps scheduling disabled, glusterfs stopped) + - ipekatrin1: New disks (from ipekatrinbackupserv1) were assembled in the RAID, assembled in gluster, and manual (file walk-trough) healing + is executed. Expected to take about 2-3 weeks (about 2TB per day rate). No LVM configured, direct mount. + - Application node will be recovered once we replace system SSDs with larger ones (as there currently no space for images/containers) + and I don't want to put it on new RAID. + - Original disks from ipekatrin1 are assembled in ipekatrinbackupserv1. Disconnect problem preserve as some disks stop answerin + SENSE queries and backplane restarts a whole bunch of 10 disks. Anyway, all disks are accessible in JBOD mode and can be copied. + * XFS fs is severely damaged and needs reapirs. I tried accessing some files via xfs debugger, it worked. So, directory structure + and file content is, at least partially, are good and repair should be possible. + * If recovery would be necessary: buy 24 new disks, copy one-by-one, assemble in RAID, recover FS. + + 2025.12.08 + - Copied ipekatrin1 system SSDs to new 4TB drives and reinstalled in the server (only 2TB is used due to MBR limitations) + Software -------- 2023.06.13 - Instructed MySQL slave to ignore 1062 errors as well (I have skipped a few manually, but errors appeared non-stop) - Also ADEI-KATRIN pod got stuck. Pod was running, but apache was stuck and not replying. This caused POD state to report 'not-ready' but for some reason it was still 'live' and pod was not restarted. + + 2025.09.28 + - Restarted degraded GlusterFS nodes and make them work on remaining 2 nodes (1 replica + metadata for most of our storage needs). + - Turned out 'database' volume is created in Raid-0 mode and it used backend for KDB database. So, data is gone. + - Recovered KDB database from backups and moved it to glusterfs/openshift volume. Nothing left on 'database' volume. Can be turned off. + + 2025.09.28 - 2025.11.03 + - GlusterFS endpoints temporarily changed to use only ipekatrin2 (see details in dedicated logs) + - Heketi and gluster-blockd were disabled and will be not available further. Existing heketi volumes preserved. + + 2025.12.09 + - Renabled scheduling on ipekatrin1.. + - Manually run 'adei-clean' on katrin & darwin, but keep 'cron' scripts stopped for now. + - Restored configs: fstab restored, */gfs endpoints. Heketi/gluster-block stays disabled. No other system changes. + - ToDo: Re-enable 'cron' scripts if we decide to keep system running in parallel with KaaS2. diff --git a/logs/filter.sh b/logs/2019.09.26/filter.sh index 675fb90..675fb90 100755 --- a/logs/filter.sh +++ b/logs/2019.09.26/filter.sh diff --git a/logs/filters.txt b/logs/2019.09.26/filters.txt index daf2bab..daf2bab 100644 --- a/logs/filters.txt +++ b/logs/2019.09.26/filters.txt diff --git a/logs/2019.09.26/messages.ipekatrin2 b/logs/2019.09.26/logs/messages.ipekatrin2 index 6374da7..6374da7 100644 --- a/logs/2019.09.26/messages.ipekatrin2 +++ b/logs/2019.09.26/logs/messages.ipekatrin2 diff --git a/logs/2019.09.26/messages.ipekatrin3 b/logs/2019.09.26/logs/messages.ipekatrin3 index d497fc6..d497fc6 100644 --- a/logs/2019.09.26/messages.ipekatrin3 +++ b/logs/2019.09.26/logs/messages.ipekatrin3 diff --git a/logs/2025.11.03.storage-log.txt b/logs/2025.11.03.storage-log.txt new file mode 100644 index 0000000..a95dc57 --- /dev/null +++ b/logs/2025.11.03.storage-log.txt @@ -0,0 +1,140 @@ +Status +====== + - Raid controller failed on ipekatrin1 + - The system was not running stable after replacement (disk disconnect after 20-30m operation) + - ipekatrin1 was temporarily converted in the master-only node (apps scheduling disabled, glusterfs stopped) + - Heketi and gluster-blockd were disabled and will be not available further. Existing heketi volumes preserved. + - New disks (from ipekatrinbackupserv1) were assembled in the RAID, assembled in gluster, and manual (file walk-trough) healing + is executed. Expected to take about 2-3 weeks (about 2TB per day rate). No LVM configured, direct mount. + - Application node will be recovered once we replace system SSDs with larger ones (as there currently no space for images/containers) + and I don't want to put it on new RAID. + +Recovery Logs +==== + 2025.09.28 + - ipekatrin1: + * Raid controller don't see 10 disks and behaves erratically. + * Turned of the server and ordered a replacement. + - Sotrage: + * Restarted degraded GlusterFS nodes and make them work on remaining 2 nodes (1 replica + metadata for most of our storage needs). + * Turned out 'database' volume is created in Raid-0 mode and it used backend for KDB database. So, data is gone. + * Recovered KDB database from backups and moved it to glusterfs/openshift volume. Nothing left on 'database' volume. Can be turned off. + + 2025.10.23 + - ipekatrin1: + * Replaced RAID controller. Make attempt to rebuild, but disks are disconnected after about 30-40 minutes (recovered after shutoff, not reboot) + * Checked power issues: cabling bypassing PSU and monitoring voltages (12V system should not go bellow 11.9V). No change, voltages seemed fine. + * Checked cabling issues disconnecting first one cable and then another (supported mode, single cable connects all disks). No change + * Tried to imrpove cooling, setting fan speeds to maximum (kept) and even temporarily installing external cooler. Radiators were cool, also checked reported temperatures. No change, still goes down in 30-40 minutes. + * Suspect backplane problems. The radiators were quite hot before adjusting cooling. Seems known stability problems due to bad signal management in firmware if overheated. Firmware updates are suggest to stabilize. + * No support by SuperMicro. Queried Tootlec about possibility of getting firmware update or/and ordering backplane [Order RG_014523_001_Chilingaryan form 16.12.2016, Angebot 14.10, Contract: 28.11] + Hardware: Chassis CSE-846BE2C-R1K28B, Backplan BPN-SAS3-846EL2), 2x MCX353A-FCB ConnectX-3 VPI + * KATRINBackupServ1 (3-years older) has backplane with enough bays to mount disks. We still need to be able to put Raid-card and Mellanox ConnectX-3 board/boards with 2 ports (can leave with 1). + - ipekatrin2: Noticed and cleared RAID alarm attributed to the battery subsystem. + * No apparent problems at the moment. Temperatures are all in order. Battery reports healthy. Systems works as usual. + * Setup temperature monitoring of RAID card, currently 76-77C + + 2025.10.27 + - ipekatrin1: + * Disconnected all disks from the server and start preparing it as an application node + - Software: + * I have temporarily suspended all ADEI cronJobs to avoid resource contention on ipekatrin2 (as restart would be dangerous now) [clean (logs,etc.)/maintain (re-caching,etc.)/update(detecting new databases)] + - Research: + * DaemonSet/GlusterFS selects nodes based on the following nodeSelector + $ oc -n glusterfs get ds glusterfs-storage -o yaml | grep -B 5 -A 5 nodeSelector + nodeSelector: + glusterfs: storage-host + All nodes has corresponding labels in their metadata: + $ oc get node/ipekatrin1.ipe.kit.edu --show-labels -o yaml | grep -A 20 labels: + labels: + ... + glusterfs: storage-host + ... + * Thats removed now from ipekatrin1 and should be recovered if we bring storage back + oc label --dry-run node/ipekatrin1.ipe.kit.edu glusterfs- + * We further need to remove 192.168.12.1 from 'endpoints/gfs' (per namespaces) to avoid possible problems. + * On ipekatrin1, /etc/fstab glusterfs mounts should be changed from 'localhost' to some other server (or commented all-together). GlusterFS mounts + should be changed from localhost to (or probably just 12.2 as it only host containing data and going via intermediary makes no sense) + 192.168.12.2,192.168.12.3:<vol> /mnt/vol glusterfs defaults,_netdev 0 0 + * All raid volumes be also temporarily commented in /etc/fstab and systemd + systemctl list-units --type=mount | grep gluster + * Further configuration changes required to run node without glusterfs causing no damage to the rest of the system + GlusterFS might be referenced via: /etc/hosts, /etc/fstab, /etc/systemd/system/*.mount /etc/auto.*, scripts/cron + endpoints (per namespace), inline gluster volumes in PV (gloabl), + gluster-block endpoints / tcmu gateway list, sc (heketi storageclass) and controllers (ds,deploy,sts); just in case check heketi cm/secrets), + - Plan: + * Prepare application node [double-check before implementing] + + Adjust node label + + Edit 'gfs' endpoints in all namespaces. + + Check glusterblock/heketi, strange pv's. + + Check Ands monitoring & maintenance scirpts + + Adjust /etc/fstab and check systemd based mounts. Shall we do soemth with hosts? + + /etc/nfs-ganesha on ipekatrin1 & ipekatrin2 + + Check/change cron & monitoring scipts + + Check for backup scripts, it probably written on raid controller. + + Grep in OpenShift configs (and /etc globally) just in case + + Google above other possible culprits. + + Boot ipekatrin1 and check that all is fine + * cronJobs + > Set affinity to ipekatrin1. + > Restart cronJobs (maybe reduce intervals) + * copy cluster backups out + * ToDo + > Ideally eliminating cronJobs all together for rest of KaaS1 life-time and replacing with continuously running cron daemon iside container + > Rebuild ipekatrinbackupserv1 as new gluster node (using disks) and try connecting it to the cluster + + 2025.10.28-31 + - Hardware + * Re-assemled ipekatrin1 disks in ipekatrinbackupserv1 backplane using new LSI 9361-8i raid controller. Original LSI 9271-8i removed. + * Put old (SAS2) disks from ipekatrinbackupserv1 into ipekatrin1. Imported RAID configs, RAID started and seems works stable using SAS2 setup. + - Software + * Removed glusterfs & fat_storage labels from ipekatrin1.ipe.kit.edu node + oc label node/ipekatrin1.ipe.kit.edu glusterfs- + oc label node/ipekatrin1.ipe.kit.edu fat_storage- + * Indentified all endpoints used in PVs (no PV specifies IPs directly). No PV hardcode IPs directly (and it seems unsupported anyway) + Editied endpoints: gfs glusterfs-dynamic-etcd glusterfs-dynamic-metrics-cassandra-1 glusterfs-dynamic-mongodb glusterfs-dynamic-registry-claim glusterfs-dynamic-sharelatex-docker + * Verified that no glusterblock devices is used by pods or outside (no iscsi devics). Checked that heketi storageClass can be safely disabled without affecting existing volumes + Teminated heketi/glusterblock services, removed storageclasses + * Checked ands-distributed scripts & crons. No referring to gluster. Monitoring checks raid status, but this probably is not critical as it would just report error (which is true) + * Set nfsganesha cluster nodes to andstorage2 only on ipekatrin1/2 (no active server on ipekatrin3). Service is inactive at the moment + Anyway double-check to disable on ipekatrin1 on a first boot + * Found active 'block' volume in glusterfs. Checked it is empty and is not used by any active 'pv'. Stopped and deleted. + * Backup is done on /mnt/provision which should work in new configuration. So, no changes are needed. + * Mount points adjusted. + - First Boot: + * Disable nfs-ganesha on first boot on ipekatrin1 + * Verified that glusterfs is not started and gluster mounts are healthy + * etcd is running and seem healthy + ETCDCTL_API=3 /usr/bin/etcdctl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --endpoints https://`hostname`:2379 member list + curl -v --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt -s https://192.168.13.1:2379/v2/stats/self + * origin-master-api and origin-master-controllers are runnign + * origin-node and docker failed. /var/lib/docker is on the raid (mounted /var/lib/docker, but used via lvm thin pool). + * Created '/var/lib/docker-local for now and configured docker to user overlay2 in /etc/sysconfig/docker-storage + DOCKER_STORAGE_OPTIONS="--storage-driver=overlay2 --graph=/var/lib/docker-local" + * Adjusted selinux contexts + semanage fcontext -a -e /var/lib/docker /var/lib/docker-local + restorecon -R -v /var/lib/docker-local + * Infrastructure pods are running on ipekatrin1 + * Check Status and monitoring scripts are working [ seems reasonable to me ] + > Raid is not optimal and low data space is report (/mnt/ands is not mounted) + > Docker is not reporting available Data/Metadata space (as we are on local folder) + * Check /var/lib/docker-local space usage is monitored + > Via data space usage + - Problems + * We have '*-host' pvs bound to /mnt/hostdisk which are used adei/mysql (nodes 2&3) and as katrin temporary data folder. Currently keep node1 as master, but disable scheduling + oc adm cordon ipekatrin1.ipe.kit.edu + - Backup + * Backups from 'provision' volume are taken to 'kaas-manager' VM + - Monitor + * Usage in /var/lib/docker-local [ space usage ] + - ToDo + * Try building storage RAID in ipekatrinbackupserv1 (SFF-8643 to SFF-8087 cable needed, RAID-to-backplane). Turn on, check data is accessible and turn-off. + * We shall order larger SSD for docker (LVM) and KATRIN temporary files (/mnt/hostraid). Once done, uncordon jobs on katrin2 + oc adm uncordon ipekatrin1.ipe.kit.edu + * We might try building a smaller RAID from stable disk bays and move ADEI replica here (discuss!) or a larger from SAS2 drives if it proves more stable. + * We might be able to use Intel RES2SV240 or LSISAS2x28 expander board to reduce SAS3 to SAS2 speeds... + + 2025.11.01-03 + - Document attempts to recover storage raid + - GlusterFS changes and replication + diff --git a/scripts/disaster/gluster_endpoints/add_endpoints.sh b/scripts/disaster/gluster_endpoints/add_endpoints.sh new file mode 100644 index 0000000..4badee9 --- /dev/null +++ b/scripts/disaster/gluster_endpoints/add_endpoints.sh @@ -0,0 +1,17 @@ +[[ $# -ne 1 ]] && { echo "Usage: $0 <NEW_NODE_IP>"; exit 1; } + +NEW_IP="$1" + +oc get namespaces -o name | sed 's/namespaces\///' | \ +while read NS; do + if oc -n "$NS" get endpoints gfs &>/dev/null; then + echo "✓ Patching $NS/gfs with $NEW_IP" +# echo oc -n "$NS" patch endpoints gfs --type=strategic --patch="{\"subsets\":[{\"addresses\":[{\"ip\":\"$NEW_IP\"}]}]}" +# echo oc -n "$NS" patch ep gfs --type=strategic --patch='{"subsets":[{"addresses":[{"ip":"'"$NEW_IP"'"}]}]}' + oc -n "$NS" patch ep gfs --type=json -p='[{"op": "add", "path": "/subsets/0/addresses/-", "value": {"ip": "'"$NEW_IP"'"}}]' + else + echo "✗ No gfs endpoint in $NS (skipping)" + fi +done + +echo "Done. Verify: oc get ep gfs -A -o wide"
\ No newline at end of file diff --git a/scripts/disaster/gluster_endpoints/backups/ipekatrin1-edited.yaml b/scripts/disaster/gluster_endpoints/backups/ipekatrin1-edited.yaml new file mode 100644 index 0000000..6a8dc63 --- /dev/null +++ b/scripts/disaster/gluster_endpoints/backups/ipekatrin1-edited.yaml @@ -0,0 +1,85 @@ +apiVersion: v1 +kind: Node +metadata: + annotations: + alpha.kubernetes.io/provided-node-ip: 192.168.13.1 + volumes.kubernetes.io/controller-managed-attach-detach: "true" + creationTimestamp: 2018-03-23T04:20:04Z + labels: + beta.kubernetes.io/arch: amd64 + beta.kubernetes.io/os: linux + compute_node: "0" + fat_memory: "0" + fqdn: ipekatrin1.ipe.kit.edu + gpu_node: "0" + hostid: "1" + hostname: ipekatrin1 + kubernetes.io/hostname: ipekatrin1.ipe.kit.edu + master: "1" + node-role.kubernetes.io/master: "true" + openshift-infra: apiserver + permanent: "1" + pod_node: "1" + production: "1" + region: infra + server: "1" + zone: default + name: ipekatrin1.ipe.kit.edu + resourceVersion: "1138908753" + selfLink: /api/v1/nodes/ipekatrin1.ipe.kit.edu + uid: 7616a958-2e51-11e8-969e-0cc47adef108 +spec: + externalID: ipekatrin1.ipe.kit.edu +status: + addresses: + - address: 192.168.13.1 + type: InternalIP + - address: ipekatrin1.ipe.kit.edu + type: Hostname + allocatable: + cpu: "40" + memory: 263757760Ki + pods: "250" + capacity: + cpu: "40" + memory: 263860160Ki + pods: "250" + conditions: + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: OutOfDisk + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: MemoryPressure + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: DiskPressure + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: Ready + daemonEndpoints: + kubeletEndpoint: + Port: 10250 + nodeInfo: + architecture: amd64 + bootID: a87a0b63-abf8-4b1d-9a1a-49197b26817e + containerRuntimeVersion: docker://1.12.6 + kernelVersion: 3.10.0-693.21.1.el7.x86_64 + kubeProxyVersion: v1.7.6+a08f5eeb62 + kubeletVersion: v1.7.6+a08f5eeb62 + machineID: 73b3f7f0088b44adb16582623d7747b1 + operatingSystem: linux + osImage: CentOS Linux 7 (Core) + systemUUID: 00000000-0000-0000-0000-0CC47ADEF108 diff --git a/scripts/disaster/gluster_endpoints/backups/ipekatrin1.yaml b/scripts/disaster/gluster_endpoints/backups/ipekatrin1.yaml new file mode 100644 index 0000000..5e45f12 --- /dev/null +++ b/scripts/disaster/gluster_endpoints/backups/ipekatrin1.yaml @@ -0,0 +1,87 @@ +apiVersion: v1 +kind: Node +metadata: + annotations: + alpha.kubernetes.io/provided-node-ip: 192.168.13.1 + volumes.kubernetes.io/controller-managed-attach-detach: "true" + creationTimestamp: 2018-03-23T04:20:04Z + labels: + beta.kubernetes.io/arch: amd64 + beta.kubernetes.io/os: linux + compute_node: "0" + fat_memory: "0" + fat_storage: "1" + fqdn: ipekatrin1.ipe.kit.edu + glusterfs: storage-host + gpu_node: "0" + hostid: "1" + hostname: ipekatrin1 + kubernetes.io/hostname: ipekatrin1.ipe.kit.edu + master: "1" + node-role.kubernetes.io/master: "true" + openshift-infra: apiserver + permanent: "1" + pod_node: "1" + production: "1" + region: infra + server: "1" + zone: default + name: ipekatrin1.ipe.kit.edu + resourceVersion: "1137118496" + selfLink: /api/v1/nodes/ipekatrin1.ipe.kit.edu + uid: 7616a958-2e51-11e8-969e-0cc47adef108 +spec: + externalID: ipekatrin1.ipe.kit.edu +status: + addresses: + - address: 192.168.13.1 + type: InternalIP + - address: ipekatrin1.ipe.kit.edu + type: Hostname + allocatable: + cpu: "40" + memory: 263757760Ki + pods: "250" + capacity: + cpu: "40" + memory: 263860160Ki + pods: "250" + conditions: + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: OutOfDisk + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: MemoryPressure + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: DiskPressure + - lastHeartbeatTime: 2025-10-23T19:01:20Z + lastTransitionTime: 2025-10-23T19:02:02Z + message: Kubelet stopped posting node status. + reason: NodeStatusUnknown + status: Unknown + type: Ready + daemonEndpoints: + kubeletEndpoint: + Port: 10250 + nodeInfo: + architecture: amd64 + bootID: a87a0b63-abf8-4b1d-9a1a-49197b26817e + containerRuntimeVersion: docker://1.12.6 + kernelVersion: 3.10.0-693.21.1.el7.x86_64 + kubeProxyVersion: v1.7.6+a08f5eeb62 + kubeletVersion: v1.7.6+a08f5eeb62 + machineID: 73b3f7f0088b44adb16582623d7747b1 + operatingSystem: linux + osImage: CentOS Linux 7 (Core) + systemUUID: 00000000-0000-0000-0000-0CC47ADEF108 diff --git a/scripts/disaster/gluster_endpoints/backups/storageclasses_backup_2025-10-29.yaml b/scripts/disaster/gluster_endpoints/backups/storageclasses_backup_2025-10-29.yaml new file mode 100644 index 0000000..77e3452 --- /dev/null +++ b/scripts/disaster/gluster_endpoints/backups/storageclasses_backup_2025-10-29.yaml @@ -0,0 +1,38 @@ +apiVersion: v1 +items: +- apiVersion: storage.k8s.io/v1 + kind: StorageClass + metadata: + creationTimestamp: 2018-03-23T04:24:52Z + name: glusterfs-storage + namespace: "" + resourceVersion: "6403" + selfLink: /apis/storage.k8s.io/v1/storageclasses/glusterfs-storage + uid: 219550a3-2e52-11e8-969e-0cc47adef108 + parameters: + resturl: http://heketi-storage.glusterfs.svc.cluster.local:8080 + restuser: admin + secretName: heketi-storage-admin-secret + secretNamespace: glusterfs + provisioner: kubernetes.io/glusterfs +- apiVersion: storage.k8s.io/v1 + kind: StorageClass + metadata: + creationTimestamp: 2018-03-23T04:25:31Z + name: glusterfs-storage-block + namespace: "" + resourceVersion: "6528" + selfLink: /apis/storage.k8s.io/v1/storageclasses/glusterfs-storage-block + uid: 38ff5088-2e52-11e8-969e-0cc47adef108 + parameters: + chapauthenabled: "true" + hacount: "3" + restsecretname: heketi-storage-admin-secret-block + restsecretnamespace: glusterfs + resturl: http://heketi-storage.glusterfs.svc.cluster.local:8080 + restuser: admin + provisioner: gluster.org/glusterblock +kind: List +metadata: + resourceVersion: "" + selfLink: "" diff --git a/scripts/disaster/gluster_endpoints/check_pv.sh b/scripts/disaster/gluster_endpoints/check_pv.sh new file mode 100644 index 0000000..1f2a7e4 --- /dev/null +++ b/scripts/disaster/gluster_endpoints/check_pv.sh @@ -0,0 +1,50 @@ +#!/bin/bash + +pvs=$(oc get pv -o json | jq -r ' + .items[] + | select(.spec.glusterfs?) + | select(.spec.glusterfs.endpoints != "gfs") + | "\(.metadata.name) → endpoints=\(.spec.glusterfs.endpoints // "NONE")"') + + +echo "PV usage:" +echo + +#pvs=$(oc get pv --no-headers | awk '{print $1}') + +for pv in $pvs; do + # Extract PVC and namespace bound to PV + pvc=$(oc get pv "$pv" -o jsonpath='{.spec.claimRef.name}' 2>/dev/null) + ns=$(oc get pv "$pv" -o jsonpath='{.spec.claimRef.namespace}' 2>/dev/null) + + if [[ -z "$pvc" || -z "$ns" ]]; then + echo "$pv → UNUSED" + echo + continue + fi + + echo "$pv → PVC: $ns/$pvc" + + # Grep instead of JSONPath filter — much safer + pods=$(oc get pods -n "$ns" -o name \ + | while read -r pod; do + oc get "$pod" -n "$ns" -o json \ + | jq -r --arg pvc "$pvc" ' + . as $pod | + .spec.volumes[]? + | select(.persistentVolumeClaim? and .persistentVolumeClaim.claimName == $pvc) + | $pod.metadata.name + ' 2>/dev/null + done \ + | sort -u + ) + + if [[ -z "$pods" ]]; then + echo " → PVC bound but no running Pod is using it" + else + echo " → Pods:" + echo "$pods" | sed 's/^/ - /' + fi + + echo +done diff --git a/scripts/disaster/gluster_endpoints/find_inline_gluster_in_pods.sh b/scripts/disaster/gluster_endpoints/find_inline_gluster_in_pods.sh new file mode 100644 index 0000000..e116fb7 --- /dev/null +++ b/scripts/disaster/gluster_endpoints/find_inline_gluster_in_pods.sh @@ -0,0 +1,7 @@ +#! /bin/bash + +for p in $(oc get pods --all-namespaces --no-headers | awk '{print $2":"$1}'); do + pod=${p%:*}; ns=${p#*:}; + echo "=== $ns/$pod ===" + oc -n "$ns" get pod "$pod" -o json | grep gluster +done diff --git a/scripts/disaster/gluster_endpoints/remove_endpoints.sh b/scripts/disaster/gluster_endpoints/remove_endpoints.sh new file mode 100644 index 0000000..f4623f6 --- /dev/null +++ b/scripts/disaster/gluster_endpoints/remove_endpoints.sh @@ -0,0 +1,27 @@ +#!/bin/bash + +TARGET_IP="192.168.12.1" + +for ns in $(oc get ns --no-headers | awk '{print $1}'); do + for epname in gfs glusterfs-dynamic-etcd glusterfs-dynamic-metrics-cassandra-1 glusterfs-dynamic-mongodb glusterfs-dynamic-registry-claim glusterfs-dynamic-sharelatex-docker; do + ep=$(oc get endpoints "$epname" -n "$ns" -o json 2>/dev/null) || continue + + modified="$(printf '%s' "$ep" | jq \ + --arg ip "$TARGET_IP" \ + '(.subsets[]?.addresses |= map(select(.ip != $ip)))' + )" + + if diff <(echo "$ep") <(echo "$modified") >/dev/null; then + continue + fi + + echo -n "Namespace: $ns/$epname:" + echo -n "$ep" | jq '.subsets[].addresses' + echo -n " ===> " + echo -n "$modified" | jq '.subsets[].addresses' + echo + + # When verified, uncomment the following line to APPLY: + echo "$modified" | oc replace -f - -n "$ns" + done +done diff --git a/scripts/disaster/gluster_endpoints/remove_storageclasses.sh b/scripts/disaster/gluster_endpoints/remove_storageclasses.sh new file mode 100644 index 0000000..063650d --- /dev/null +++ b/scripts/disaster/gluster_endpoints/remove_storageclasses.sh @@ -0,0 +1,7 @@ +# Backups provided +oc delete sc glusterfs-storage +oc delete sc glusterfs-storage-block + +# It was a single replica +oc scale dc/glusterblock-storage-provisioner-dc -n glusterfs --replicas=0 +oc scale dc/heketi-storage -n glusterfs --replicas=0 diff --git a/scripts/disaster/walker.sh b/scripts/disaster/walker.sh new file mode 100644 index 0000000..0211105 --- /dev/null +++ b/scripts/disaster/walker.sh @@ -0,0 +1,73 @@ +#! /bin/bash + + +#find /mnt/provision/kaas/adei -type f -print0 | xargs -0 -I{} -n 1 sh -c ' dd if="$1" of=/dev/null bs=1M status=none || true; sleep .5' _ "{}" + +#find /mnt/ands/glusterfs/brick-provision/kaas/bora -type f -size 0 -print0 | \ +#while IFS= read -r -d '' f; do +# echo "Remvoing $f" +# setfattr -x trusted.glusterfs.mdata "$f" 2>/dev/null || true +# for a in $(getfattr -d -m trusted.afr -e hex "$f" 2>/dev/null | awk -F= '/trusted\.afr/{print $1}'); do +# setfattr -x "$a" "$f" 2>/dev/null || true +# done +#done + +#echo 3 | sudo tee /proc/sys/vm/drop_caches +#find /mnt/wave/ -type f -print0 | xargs -0 -I{} -n 1 -P 8 sh -c ' +# f="$1" +# dd if="$f" of=/dev/null bs=1M status=none || true; +# sz=$(stat -c%s "$f" 2>/dev/null || echo 0) +# echo "$f $sz" +# if [ "$sz" -eq 0 ]; then +# # give gluster a breath and try again, like you do manually +# sleep 0.5 +# dd if="$f" of=/dev/null bs=1M status=none 2>/dev/null || true +## sz=$(stat -c%s "$f" 2>/dev/null || echo 0) +# fi +# ' _ "{}" + +#find /mnt/datastore/services/gogs -type f -print0 | xargs -0 -n200 -P16 rm - +#find /mnt/datastore/services/gogs -depth -type d -empty -delete +#find /mnt/datastore/services/gogs/repositories -maxdepth 1 -mindepth 1 -type d -print0 | xargs -0 -I{} -n1 -P200 sh -c 'rm -rf "$1"' _ "{}" + + +#echo 3 | sudo tee /proc/sys/vm/drop_caches +#find /mnt/ands/glusterfs/brick-katrin_data -name .glusterfs -prune -o -type f -size 0 -print0 | xargs -0 -I{} -n 1 -P 8 sh -c ' +# fbrick="$1" +# brick_prefix="/mnt/ands/glusterfs/brick-katrin_data" +# mount_prefix="/mnt/katrin" +# fmount="${fbrick/#$brick_prefix/$mount_prefix}" +# dd if="$fmount" of=/dev/null bs=1M status=none || true; +# sz=$(stat -c%s "$fbrick" 2>/dev/null || echo 0) +# echo "$fmount $sz" +# if [ "$sz" -eq 0 ]; then +# # give gluster a breath and try again, like you do manually +# sleep 0.5 +# dd if="$fmount" of=/dev/null bs=1M status=none 2>/dev/null || true +## sz=$(stat -c%s "$fbrick" 2>/dev/null || echo 0) +# fi +# ' _ "{}" +# + +echo 3 | sudo tee /proc/sys/vm/drop_caches +find /mnt/ands/glusterfs/brick-katrin_data -name .glusterfs -prune -o -type f -print0 | xargs -0 -I{} -n 1 -P 8 sh -c ' + fbrick="$1" + mount_prefix="/mnt/katrin" + brick_prefix="/mnt/ands/glusterfs/brick-katrin_data" + fmount="${fbrick/#$brick_prefix/$mount_prefix}" + szbrick=$(stat -c%s "$fbrick" 2>/dev/null || echo 0) + szmount=$(stat -c%s "$fmount" 2>/dev/null || echo 0) + if [ $szbrick -ne $szmount ]; then + dd if="$fmount" of=/dev/null bs=1M status=none 2>/dev/null || true + sz=$(stat -c%s "$fbrick" 2>/dev/null || echo 0) + while [ $sz -ne $szmount ]; do + echo "* $fmount $szmount $szbrick => $sz" + sleep 1 + dd if="$fmount" of=/dev/null bs=1M status=none 2>/dev/null || true + sz=$(stat -c%s "$fbrick" 2>/dev/null || echo 0) + done + echo "$fmount $szmount $szbrick => $sz" + fi + ' _ "{}" + + diff --git a/scripts/maintain/gluster/bricks_move_heketi.sh b/scripts/maintain/gluster/bricks_move_heketi.sh new file mode 100644 index 0000000..36b8602 --- /dev/null +++ b/scripts/maintain/gluster/bricks_move_heketi.sh @@ -0,0 +1,39 @@ +HOST="192.168.12.1" +NEW_BASE="/mnt/ands/glusterfs/vg_ce3a7c1bb6da5c98ce4bb3e76aeacb8b" +GLUSTER_BIN="gluster" +DRYRUN=1 # set to 0 to actually run +GLUSTER_UID=107 # adjust if your gluster user has a different uid/gid + +# get all volumes like vol_<uid> +VOLS=$($GLUSTER_BIN volume list | grep '^vol_') + +for VOL in $VOLS; do + # find bricks on this host + # lines look like: "Brick2: 192.168.12.1:/var/lib/heketi/.../brick" + mapfile -t OLDBRICKS < <($GLUSTER_BIN volume info "$VOL" \ + | grep "$HOST:" \ + | awk '{print $2}') + + # skip volumes that don't have a brick on this host + if [ ${#OLDBRICKS[@]} -eq 0 ]; then + continue + fi + + for OLD in "${OLDBRICKS[@]}"; do + BRICKID=$(echo "$OLD" | sed -n 's#.*/\(brick_[^/]*\)/brick#\1#p') + if [ -z "$BRICKID" ]; then + echo "WARN: could not extract brick ID from $OLD" + continue + fi + + NEW="$HOST:$NEW_BASE/$BRICKID" + + echo "=== volume: $VOL ===" + echo "old brick: $OLD" + echo "new brick: $NEW" + + + $GLUSTER_BIN volume replace-brick "$VOL" "$OLD" "$NEW" commit force + + done +done diff --git a/scripts/maintain/gluster/bricks_populate.sh b/scripts/maintain/gluster/bricks_populate.sh new file mode 100644 index 0000000..15790a1 --- /dev/null +++ b/scripts/maintain/gluster/bricks_populate.sh @@ -0,0 +1,11 @@ +for brick in brick-*; do + [ -d $brick/.glusterfs ] && continue + name=${brick#brick-} + + echo "$name - $brick" + + setfattr -n trusted.gfid -v 0sAAAAAAAAAAAAAAAAAAAAAQ== /mnt/ands/glusterfs/$brick + setfattr -n trusted.glusterfs.volume-id -v 0x$(gluster volume info $name | grep 'Volume ID' | awk '{print $3}' | tr -d '-') /mnt/ands/glusterfs/$brick + mkdir -p /mnt/ands/glusterfs/$brick/.glusterfs/{indices,exports,xattrop,locks} + +done diff --git a/scripts/maintain/gluster/heal-walk.sh b/scripts/maintain/gluster/heal-walk.sh new file mode 100644 index 0000000..4c8d134 --- /dev/null +++ b/scripts/maintain/gluster/heal-walk.sh @@ -0,0 +1,35 @@ +#! /bin/bash + + +#find /mnt/provision/kaas/adei -type f -print0 | xargs -0 -I{} -n 1 sh -c ' dd if="$1" of=/dev/null bs=1M status=none || true; sleep .5' _ "{}" + +#find /mnt/ands/glusterfs/brick-provision/kaas/bora -type f -size 0 -print0 | \ +#while IFS= read -r -d '' f; do +# echo "Remvoing $f" +# setfattr -x trusted.glusterfs.mdata "$f" 2>/dev/null || true +# for a in $(getfattr -d -m trusted.afr -e hex "$f" 2>/dev/null | awk -F= '/trusted\.afr/{print $1}'); do +# setfattr -x "$a" "$f" 2>/dev/null || true +# done +#done + +#find /mnt/datastore/services/gogs -type f -print0 | xargs -0 -n200 -P16 rm - +#find /mnt/datastore/services/gogs -depth -type d -empty -delete +#find /mnt/datastore/services/gogs/repositories -maxdepth 1 -mindepth 1 -type d -print0 | xargs -0 -I{} -n1 -P200 sh -c 'rm -rf "$1"' _ "{}" + + +echo 3 | sudo tee /proc/sys/vm/drop_caches +find /mnt/wave/ -type f -print0 | xargs -0 -I{} -n 1 -P 8 sh -c ' + f="$1" + dd if="$f" of=/dev/null bs=1M status=none || true; + sz=$(stat -c%s "$f" 2>/dev/null || echo 0) + echo "$f $sz" + if [ "$sz" -eq 0 ]; then + # give gluster a breath and try again, like you do manually + sleep 0.5 + dd if="$f" of=/dev/null bs=1M status=none 2>/dev/null || true +# sz=$(stat -c%s "$f" 2>/dev/null || echo 0) + fi + ' _ "{}" + + +#find /mnt/wave/ -type f -print0 | xargs -0 -I{} -n 1 -P 8 sh -c 'echo $1; dd if="$1" of=/dev/null bs=1M status=none || true; sleep .5' _ {} |
