Skip to content

Etcd is becoming our performance bottleneck for scalability #32361

@wojtek-t

Description

@wojtek-t

My recent experiments show that etcd is becoming our performance bottleneck (note that experiments described below are run with etcd:3.0.4, but using v2 API).

Test:

  • we are running 2000-node Kubemark (hoewever with increase QPS limits in controller manager and scheduler set to 100 and inflight-requests to 800)
  • we are running density and load test on this kubemark

Outcome:

  • the watch from apiserver (cacher) to etcd was dropped twice during the test with the following logs:
W0909 03:41:23.740330 3559 reflector.go:330] pkg/storage/cacher.go:194: watch of *api.Pod ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [1043046/1043043]) [1044045] W0909 03:41:24.746867 3559 cacher.go:463] Terminating all watchers from cacher *api.Pod W0909 03:41:27.555734 3559 reflector.go:330] pkg/storage/cacher.go:194: watch of *api.Pod ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [1046887/1046856]) [1047886] W0909 03:41:28.555898 3559 cacher.go:463] Terminating all watchers from cacher *api.Pod 

Previously we all had hypothesis, that we are simply processing events not fast enough in apiserver. But after few recent changes it seems it's no longer the case.

In particular, I expanded high-water-marks logs for our buffers and:

  • the highest value for the incoming channel before the first etcd watch drop was:
I0909 03:23:04.883400 3559 etcd_watcher.go:318] watch: 20 objects queued in incoming channel. 

(it went higher after watch drop to 65, but never got higher)

  • the higher value for outgoing channel before the first etcd watch drop was:
I0909 03:28:57.772714 3559 etcd_watcher.go:160] watch (*api.Pod): 93 objects queued in outgoing channel. 

[It went to 100 after watch drop, so we are also close to limit there, but that's not the biggest problem now.]

All the above shows, that since all the buffers weren't full before the first watch drop, it clearly suggests that simply etcd wasn't able to deliver watch events fast enough.

Potential problems are:

  • we are sending too many data - if so protobufs would be a solution for it
  • etcd itself is too slow - then maybe etcd3 will make it faster

@kubernetes/sig-scalability @gmarek @xiang90 @hongchaodeng @timothysc @lavalamp @fgrzadkowski

Metadata

Metadata

Assignees

Labels

area/etcdsig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions