1

I have a Managed Instance Group of Google Compute Engine VMs (based on a template with container deployment on Container-Optimized OS). The MIG is regional (multi-zoned).

I can release an updated container image (docker run, docker tag, docker push), and then I'd like to restart all VMs in the MIG one by one, so that they can have the updated container (not sure if there's a simpler/better alternative to refresh the VMs attached container). But I also want to introduce a slight delay (say 60 seconds) between each VM's restart event, so that only one or two VMs are unavailable during their restart.

What are some ways to do this programmatically (either via gcloud CLI or their API)?

I tried a rolling restart of the MIG, with maximum unavailable and minimum wait time flags set:

gcloud beta compute instance-groups managed rolling-action restart MIG_NAME \ --project="..." --region="..." \ --max-unavailable=1 --min-ready=60 

... but it returns an error:

ERROR: (gcloud.beta.compute.instance-groups.managed.rolling-action.restart) Could not fetch resource: - Invalid value for field 'resource.updatePolicy.maxUnavailable.fixed': '1'. Fixed updatePolicy.maxUnavailable for regional managed instance group has to be either 0 or at least equal to the number of zones. 

Is there a way to perform one-by-one instance restarts with a slight delay in between each action?

4
  • Unfortunately, this feature is not implemented yet for regional deployments. It works correctly for zonal ones. Commented Jan 13, 2023 at 7:17
  • Thanks @Grzenio, do you think using gcloud beta compute instances update-container iteratively for each instance, with a slight delay (e.g. sleep()) in between each call will be a good workaround? Commented Jan 13, 2023 at 7:22
  • Frankly, I am not able to figure out what gcloud compute instances update-container actually does, but let me suggest a semi-manual solution using the MIG api. Commented Jan 13, 2023 at 9:35
  • No worries, btw here's the doc on update-container command: cloud.google.com/compute/docs/containers/… Commented Jan 14, 2023 at 5:26

2 Answers 2

1

Unfortunately the MIGs don't handle this use-case for regional deployments as at Jan 2023. You can, however, orchestrate the rolling update yourself along (sudo code):

for (INSTANCE in instances) // Force restart the instance gcloud compute instance-groups managed update-instances MIG_NAME \ --project="..." --region="..." \ --instances=INSTANCE --minimal-action=RESTART \ --most-disruptive-allowed-action=RESTART WAIT if (container on INSTANCE not working correctly) // Break and alert the operator 
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. How do I get the list of instances across multiple MIGs? Also, how to dynamically set the region in this case?
Strangely, the above update-instances command with restart actions still replaces the VMs and allots a new IPs, instead of just restarting the VMs to keep the same IPs.
In GCE "restart" actually means deleting the VM, creating a new one and reprogramming of the networking. Having said that, I would expect the VMs to keep the same IP in the process. Would you be able to ask another question specifically about that (for clarity) and add more details, like the full configuration of the IGM, Instance before and after the restart, etc.?
From what I've understood reading the docs, "replace" action deletes the VM and creates a new one, "restart" action should just restart/reboot the machine without changing the machine name or IP, as long as the replacement method for it is "replace" instead of "substitute" (as per cloud.google.com/compute/docs/instance-groups/…).
Yeah, it is misleading. Either way, you are correct that the IP should be preserved. My suspicion is that the instance got auto-repaired for some reason.
1

Trying looking into opportunistic updates instead of rolling updates. We have a similar scenario. Rolling updates for MIG, particularly a stateful one won't work as it will bring down at least a minimum number (ideally the number of zones that you have in your MIG) With opportunistic updates, you can try to achieve what you are looking for. Currently we implement it the following way:

  • Set the instance template of the MIG to the new instance template created from new image
gcloud compute instance-groups managed set-instance-template ${instanceName} template=${instanceName}-${tag} 
  • Run a for loop and update each VM with new template. Google provides a command which will pause the execution of the script till the MIG is stable, this ensures that you are not applying updates to another VM until your current instance is stable.
for (( i = 1; i <= $number_of_nodes; i++ )) do echo "Trying to update Kafka Node${i} with new instance template ${instanceName}-${tag}" (set -x gcloud compute instance-groups managed update-instances ${instanceName}-group \ --instances=${instanceName}-kafka-node${i} \ ) echo "Checking for MIG stabiltiy" (set -x gcloud compute instance-groups managed wait-until ${instanceName}-group \ --stable \ --region=${region} ) done 

You can have a look at this documentation.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.