1

I'm testing my application on a bare-metal Kubernetes cluster (version 1.22.1) and having an issue when launching my application as a Job.

My cluster has two nodes (master and worker) but the worker is cordoned. On the master node, 21GB of memory is available for the application.

I tried to launch my application as three different Jobs at the same time. Since I set 16GB of memory as both resource request and limit, only a single Job was started and the remaining two are in a Pending state. I have set backoffLimit: 0 to the Jobs.

NAME READY STATUS RESTARTS AGE app1--1-8pp6l 0/1 Pending 0 42s app2--1-42ssl 0/1 Pending 0 45s app3--1-gxgwr 0/1 Running 0 46s 

After the first Pod completes, only one of two Pods in a Pending state should be started. However, one was started, and the other one was in an OutOfMemory status even though no container has been started in the Pod.

NAME READY STATUS RESTARTS AGE app1--1-8pp6l 0/1 Running 0 90s app2--1-42ssl 0/1 OutOfmemory 0 93s app3--1-gxgwr 0/1 Completed 0 94s 

The events of the OutOfMemory Pod is as follows:

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3m41s (x2 over 5m2s) default-scheduler 0/2 nodes are available: 1 Insufficient memory, 1 node(s) were unschedulable. Normal Scheduled 3m38s default-scheduler Successfully assigned test/app2--1-42ssl to master Warning OutOfmemory 3m38s kubelet Node didn't have enough resource: memory, requested: 16000000000, used: 31946743808, capacity: 37634150400 

It seems that the Pod is assigned to the node even though there is not enough space for it as the other Pod has just been started.

I guess this isn't an expected behavior of Kubernetes, does anyone know the cause of this issue?

2
  • You are right, this behaviour is not expected - as I tested locally (the same config as yours - 3 jobs with limits and requests set) - each job completed when the previous one ended. I see that you have two nodes - do you want a run a job on the specific one? Why the one of the nodes has node.kubernetes.io/unreachable: taint? Did you try to wait for app1--1-8pp6l to end and then check? Which exactly Kubernetes solution are you using for bare-metal? The error could be related to specific solution. Commented Jan 11, 2022 at 20:20
  • I attached the wrong message, sorry. I actually have two nodes and the worker is cordoned. (I have edited my post as well). After app1 is completed, app2 was still in the OutOfMemory state. I'm using kubeadm to build my k8s cluster. Commented Jan 12, 2022 at 0:48

2 Answers 2

1

It's known issue for the 1.22.x versions - you can find a multiple GitHub and Stackoverflow topics about this, for example:

The fix for the issue is included in the 1.23 version:

  • Fix a regression where the Kubelet failed to exclude already completed pods from calculations about how many resources it was currently using when deciding whether to allow more pods. (#104577, @smarterclayton)

So please just upgrade your Kubernetes cluster to the newest stable version.

I hope it will help you, but keep in mind another similar issue is open on the Github even with the fix applied (mentioned here about 10 days ago - state for 13 January 2022):

Linking here for completeness - a similar symptom might get exposed after this fix as described in #106884. The kubelet considers resources for terminating pods to be in use (they are!), but the scheduler ignores terminating pods and schedules new pods. Because the kubelet now considers terminating pods, it rejects those rapidly rescheduled pods.

Then, probably the only solution is to downgrade to the 1.21 version.

0

Can you please post the pod's yaml?

I had something similar at one of my customers where they had a typo at the memory limit (860m instead of 860Mi) worth a look

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.