Delayed sequential restart of Compute Engine VMs in Managed Instance Groups

Question

I have a Managed Instance Group of Google Compute Engine VMs (based on a template with container deployment on Container-Optimized OS). The MIG is regional (multi-zoned).

I can release an updated container image (docker run, docker tag, docker push), and then I'd like to restart all VMs in the MIG one by one, so that they can have the updated container (not sure if there's a simpler/better alternative to refresh the VMs attached container). But I also want to introduce a slight delay (say 60 seconds) between each VM's restart event, so that only one or two VMs are unavailable during their restart.

What are some ways to do this programmatically (either via gcloud CLI or their API)?

I tried a rolling restart of the MIG, with maximum unavailable and minimum wait time flags set:

gcloud beta compute instance-groups managed rolling-action restart MIG_NAME \ --project="..." --region="..." \ --max-unavailable=1 --min-ready=60

... but it returns an error:

ERROR: (gcloud.beta.compute.instance-groups.managed.rolling-action.restart) Could not fetch resource: - Invalid value for field 'resource.updatePolicy.maxUnavailable.fixed': '1'. Fixed updatePolicy.maxUnavailable for regional managed instance group has to be either 0 or at least equal to the number of zones.

Is there a way to perform one-by-one instance restarts with a slight delay in between each action?

Unfortunately, this feature is not implemented yet for regional deployments. It works correctly for zonal ones. — Grzenio
– Grzenio, Commented Jan 13, 2023 at 7:17
Thanks @Grzenio, do you think using gcloud beta compute instances update-container iteratively for each instance, with a slight delay (e.g. sleep()) in between each call will be a good workaround? — Nick
– Nick, Commented Jan 13, 2023 at 7:22
Frankly, I am not able to figure out what gcloud compute instances update-container actually does, but let me suggest a semi-manual solution using the MIG api. — Grzenio
– Grzenio, Commented Jan 13, 2023 at 9:35
No worries, btw here's the doc on update-container command: cloud.google.com/compute/docs/containers/… — Nick
– Nick, Commented Jan 14, 2023 at 5:26

Grzenio · Accepted Answer · 2023-01-13 10:08:56Z

1

Unfortunately the MIGs don't handle this use-case for regional deployments as at Jan 2023. You can, however, orchestrate the rolling update yourself along (sudo code):

for (INSTANCE in instances) // Force restart the instance gcloud compute instance-groups managed update-instances MIG_NAME \ --project="..." --region="..." \ --instances=INSTANCE --minimal-action=RESTART \ --most-disruptive-allowed-action=RESTART WAIT if (container on INSTANCE not working correctly) // Break and alert the operator

answered Jan 13, 2023 at 10:08

Grzenio

36.9k49 gold badges163 silver badges242 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Nick Over a year ago

Thanks. How do I get the list of instances across multiple MIGs? Also, how to dynamically set the region in this case?

Nick Over a year ago

Strangely, the above update-instances command with restart actions still replaces the VMs and allots a new IPs, instead of just restarting the VMs to keep the same IPs.

Grzenio Over a year ago

In GCE "restart" actually means deleting the VM, creating a new one and reprogramming of the networking. Having said that, I would expect the VMs to keep the same IP in the process. Would you be able to ask another question specifically about that (for clarity) and add more details, like the full configuration of the IGM, Instance before and after the restart, etc.?

Nick Over a year ago

From what I've understood reading the docs, "replace" action deletes the VM and creates a new one, "restart" action should just restart/reboot the machine without changing the machine name or IP, as long as the replacement method for it is "replace" instead of "substitute" (as per cloud.google.com/compute/docs/instance-groups/…).

Grzenio Over a year ago

Yeah, it is misleading. Either way, you are correct that the IP should be preserved. My suspicion is that the instance got auto-repaired for some reason.

CPlus · Accepted Answer · 2023-03-10 05:47:17Z

Trying looking into opportunistic updates instead of rolling updates. We have a similar scenario. Rolling updates for MIG, particularly a stateful one won't work as it will bring down at least a minimum number (ideally the number of zones that you have in your MIG) With opportunistic updates, you can try to achieve what you are looking for. Currently we implement it the following way:

Set the instance template of the MIG to the new instance template created from new image

gcloud compute instance-groups managed set-instance-template ${instanceName} template=${instanceName}-${tag}

Run a for loop and update each VM with new template. Google provides a command which will pause the execution of the script till the MIG is stable, this ensures that you are not applying updates to another VM until your current instance is stable.

for (( i = 1; i <= $number_of_nodes; i++ )) do echo "Trying to update Kafka Node${i} with new instance template ${instanceName}-${tag}" (set -x gcloud compute instance-groups managed update-instances ${instanceName}-group \ --instances=${instanceName}-kafka-node${i} \ ) echo "Checking for MIG stabiltiy" (set -x gcloud compute instance-groups managed wait-until ${instanceName}-group \ --stable \ --region=${region} ) done

You can have a look at this documentation.

Collectives™ on Stack Overflow

Delayed sequential restart of Compute Engine VMs in Managed Instance Groups

2 Answers 2

5 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Related