diff options
| author | Dan Mace <ironcladlou@gmail.com> | 2016-10-07 16:28:17 -0400 | 
|---|---|---|
| committer | Dan Mace <ironcladlou@gmail.com> | 2016-10-07 16:28:17 -0400 | 
| commit | 1bc6d4390661fe18bebbc020b2c7b25972e80b41 (patch) | |
| tree | 8ea07e846259b6bf3f7415149942707ee2346b49 | |
| parent | e5f2d1d43bc12b9bee353dab6a74ae7b79ec2de0 (diff) | |
| download | openshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.tar.gz openshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.tar.bz2 openshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.tar.xz openshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.zip | |
Retry failed master startup once
Master startup can fail when ec2 transparently reallocates the block
storage, causing etcd writes to temporarily fail. Retry failures blindly
just
once to allow time for this transient condition to to resolve and for
systemd
to restart the master (which will eventually succeed).
https://github.com/coreos/etcd/issues/3864
https://github.com/openshift/origin/issues/6065
https://github.com/openshift/origin/issues/6447
| -rw-r--r-- | roles/openshift_master/tasks/main.yml | 11 | 
1 files changed, 11 insertions, 0 deletions
| diff --git a/roles/openshift_master/tasks/main.yml b/roles/openshift_master/tasks/main.yml index ce2f96723..645871ab4 100644 --- a/roles/openshift_master/tasks/main.yml +++ b/roles/openshift_master/tasks/main.yml @@ -168,10 +168,21 @@  - include: set_loopback_context.yml    when: openshift.common.version_gte_3_2_or_1_2 +# TODO: Master startup can fail when ec2 transparently reallocates the block +# storage, causing etcd writes to temporarily fail. Retry failures blindly just +# once to allow time for this transient condition to to resolve and for systemd +# to restart the master (which will eventually succeed). +# +# https://github.com/coreos/etcd/issues/3864 +# https://github.com/openshift/origin/issues/6065 +# https://github.com/openshift/origin/issues/6447  - name: Start and enable master    service: name={{ openshift.common.service_type }}-master enabled=yes state=started    when: not openshift_master_ha | bool    register: start_result +  until: not start_result | failed +  retries: 1 +  delay: 60    notify: Verify API Server  - name: Check for non-HA master service presence | 
