This container service is in a failed state

Question

All of our AKS clusters have the following error reported in Azure Portal:

This container service is in a failed state. Click here to open a new support request.

It seems we also cannot edit the cluster. When trying to scale out the nodes, I am getting the following error:

Failed to save container service 'test-aks'. Error: Operation is not allowed while cluster is being upgrading or failed in upgrade

When looking into the AKS properties, I see there is a provisioning state of "Failed":

We don't know how to troubleshoot this problem.

Did you do any changes to your cluster recently like upgrading to another version? — Karishma Tiwari - MSFT
– Karishma Tiwari - MSFT, Commented Feb 12, 2019 at 18:39
Use the az aks scale command to scale the cluster nodes using Azure CLI as described here and share the results: learn.microsoft.com/en-us/azure/aks/… It is likely that you exceeded the core quota. Let me know. — Karishma Tiwari - MSFT
– Karishma Tiwari - MSFT, Commented Feb 12, 2019 at 18:45
Any more question? Or if it's helpful you can accept it as the answer. — Charles Xu
– Charles Xu, Commented Feb 13, 2019 at 9:19
It was because I submitted an update request to the cluster, but there were no vCPUs available in my subscription. This set the state of the provisioning update to "Failed", but with no reasoning. I had to increase my quote and rerun the update command. — Dave New
– Dave New, Commented Feb 13, 2019 at 9:27

Karishma Tiwari - MSFT · Accepted Answer · 2019-02-13 23:55:04Z

Use the az aks scale command to scale the cluster nodes using Azure CLI as described here: https://learn.microsoft.com/en-us/azure/aks/scale-cluster#scale-the-cluster-nodes

az aks show --resource-group myResourceGroup --name myAKSCluster --query agentPoolProfiles

This will show you the descriptive error message in Azure CLI. It is likely that you exceeded the limit for the core quota. More details discussed on this thread: https://github.com/Azure/AKS/issues/542

Charles Xu · Accepted Answer · 2019-02-13 03:44:13Z

For the issue that you shows:

This container service is in a failed state. Click here to open a new support request.

It also happened to me. Usually, there is some limitation to the user for the use of resources. On my side, I just can use 10 vCpu. So I got the error when I scale up for more nodes if the vCpu have none left. I think it's also a possible reason for you. You can take a check.

It was because I submitted an update request to the cluster, but there were no vCPUs available in my subscription. This set the state of the provisioning update to "Failed", but with no reasoning. I had to increase my quote and rerun the update command. Thanks
@davenewza You mean you increase the quote and it works? Maybe it's the limitation of other resources. You can get more details from the log.

Collectives™ on Stack Overflow

This container service is in a failed state

2 Answers 2

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Linked

Related