summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDan Mace <ironcladlou@gmail.com>2016-10-07 16:28:17 -0400
committerDan Mace <ironcladlou@gmail.com>2016-10-07 16:28:17 -0400
commit1bc6d4390661fe18bebbc020b2c7b25972e80b41 (patch)
tree8ea07e846259b6bf3f7415149942707ee2346b49
parente5f2d1d43bc12b9bee353dab6a74ae7b79ec2de0 (diff)
downloadopenshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.tar.gz
openshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.tar.bz2
openshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.tar.xz
openshift-1bc6d4390661fe18bebbc020b2c7b25972e80b41.zip
Retry failed master startup once
Master startup can fail when ec2 transparently reallocates the block storage, causing etcd writes to temporarily fail. Retry failures blindly just once to allow time for this transient condition to to resolve and for systemd to restart the master (which will eventually succeed). https://github.com/coreos/etcd/issues/3864 https://github.com/openshift/origin/issues/6065 https://github.com/openshift/origin/issues/6447
-rw-r--r--roles/openshift_master/tasks/main.yml11
1 files changed, 11 insertions, 0 deletions
diff --git a/roles/openshift_master/tasks/main.yml b/roles/openshift_master/tasks/main.yml
index ce2f96723..645871ab4 100644
--- a/roles/openshift_master/tasks/main.yml
+++ b/roles/openshift_master/tasks/main.yml
@@ -168,10 +168,21 @@
- include: set_loopback_context.yml
when: openshift.common.version_gte_3_2_or_1_2
+# TODO: Master startup can fail when ec2 transparently reallocates the block
+# storage, causing etcd writes to temporarily fail. Retry failures blindly just
+# once to allow time for this transient condition to to resolve and for systemd
+# to restart the master (which will eventually succeed).
+#
+# https://github.com/coreos/etcd/issues/3864
+# https://github.com/openshift/origin/issues/6065
+# https://github.com/openshift/origin/issues/6447
- name: Start and enable master
service: name={{ openshift.common.service_type }}-master enabled=yes state=started
when: not openshift_master_ha | bool
register: start_result
+ until: not start_result | failed
+ retries: 1
+ delay: 60
notify: Verify API Server
- name: Check for non-HA master service presence