diff options
author | OpenShift Merge Robot <openshift-merge-robot@users.noreply.github.com> | 2017-09-19 21:01:59 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2017-09-19 21:01:59 -0700 |
commit | 993436532384cb76ae4039dca224ad132fe43d45 (patch) | |
tree | 3358bb4528f484579a7105912317a9651b6cbb2d /roles/etcd_server_certificates | |
parent | 3eb3a30712edb771e65f37739d5ae6c2471295cc (diff) | |
parent | 91a6e0a86a783ce25c090a4946a6ada20363ba6c (diff) | |
download | openshift-993436532384cb76ae4039dca224ad132fe43d45.tar.gz openshift-993436532384cb76ae4039dca224ad132fe43d45.tar.bz2 openshift-993436532384cb76ae4039dca224ad132fe43d45.tar.xz openshift-993436532384cb76ae4039dca224ad132fe43d45.zip |
Merge pull request #3778 from lhuard1A/rh_subscription_resilient
Automatic merge from submit-queue
Make RH subscription more resilient to temporary failures
subscription-manager can sometimes fail because of server side errors.
Manually replaying the command usually works.
So, let’s make openshift-ansible more resilient to temporary failures of
subscription-manager by retrying the failed commands with a maximum of
3 retries.
Here is an example of such sporadic errors:
```
TASK [rhel_subscribe : Retrieve the OpenShift Pool ID] *************************
ok: [lenaic-node-compute-c96e7]
ok: [lenaic-master-bbe09]
ok: [lenaic-node-compute-2976a]
fatal: [lenaic-node-infra-47ba5]: FAILED! => {"changed": false, "cmd": ["subscription-manager", "list", "--available", "--matches=Red Hat OpenShift Container Platform, Premium*", "--pool-only"], "delta": "0:00:07.152650", "end": "2017-04-04 11:24:59.729405", "failed": true, "rc": 70, "start": "2017-04-04 11:24:52.576755", "stderr": "Unable to verify server's identity: (104, 'Connection reset by peer')", "stdout": "", "stdout_lines": [], "warnings": []}
TASK [rhel_subscribe : Determine if OpenShift Pool Already Attached] ***********
skipping: [lenaic-master-bbe09]
skipping: [lenaic-node-compute-2976a]
skipping: [lenaic-node-compute-c96e7]
TASK [rhel_subscribe : fail] ***************************************************
skipping: [lenaic-node-compute-2976a]
skipping: [lenaic-master-bbe09]
skipping: [lenaic-node-compute-c96e7]
TASK [rhel_subscribe : Attach to OpenShift Pool] *******************************
fatal: [lenaic-node-compute-c96e7]: FAILED! => {"changed": true, "cmd": ["subscription-manager", "subscribe", "--pool", "8a85f9814ff0134a014ff43b44095513"], "delta": "0:00:21.421300", "end": "2017-04-04 11:25:20.655873", "failed": true, "rc": 70, "start": "2017-04-04 11:24:59.234573", "stderr": "Unable to verify server's identity: (104, 'Connection reset by peer')", "stdout": "Successfully attached a subscription for: Red Hat OpenShift Container Platform, Premium (1-2 Sockets)", "stdout_lines": ["Successfully attached a subscription for: Red Hat OpenShift Container Platform, Premium (1-2 Sockets)"], "warnings": []}
changed: [lenaic-master-bbe09]
changed: [lenaic-node-compute-2976a]
```
In this example, subscription-manager was failing on some nodes, but not all. Retrying on the failed nodes would have avoided to abandon those nodes.
Diffstat (limited to 'roles/etcd_server_certificates')
0 files changed, 0 insertions, 0 deletions