I am running a patroni cluster (3.4) on linux with an etcd cluster. Normally the cluster runs perfectly fine but sometimes I get some errors saying Request to etcd server failed (ReadtmeoutError, NewConnectionError, ConnectTimeoutError)
ETCD: 10.100.10.4 10.100.11.3 10.100.11.5
Patroni/PostgreSQL Nodes 10.100.10.10 10.100.11.6
2024-04-21 04:45:42,868 DEBUG: Writing {"conn_url":"postgres://10.100.10.10:5432/postgres","api_url":"http://10.100.10.10:8008/patroni","state":"running","role":"master","version":"3.3.0","xlog_location":1642347112,"timeline":5} to key /db/mycluster/members/postgresql0 ttl=30 dir=False append=False 2024-04-21 04:45:42,869 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:42,871 DEBUG: http://10.100.11.3:2379 "PUT /v2/keys/db/mycluster/members/postgresql0 HTTP/1.1" 200 790 2024-04-21 04:45:42,871 INFO: no action. I am (postgresql0), the leader with the lock 2024-04-21 04:45:46,136 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': <patroni.utils.Retry object at 0x7f24b8e5bd00>} 2024-04-21 04:45:46,136 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:46,138 DEBUG: http://10.100.11.3:2379 "GET /v2/keys/db/mycluster/?recursive=true&quorum=false HTTP/1.1" 200 None 2024-04-21 04:45:46,139 DEBUG: API thread: 10.100.11.6 - - "GET /cluster HTTP/1.1" 200 - latency: 3.354 ms 2024-04-21 04:45:51,981 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': <patroni.utils.Retry object at 0x7f24b8e5b7c0>} 2024-04-21 04:45:51,983 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:51,987 DEBUG: http://10.100.11.3:2379 "GET /v2/keys/db/mycluster/?recursive=true&quorum=false HTTP/1.1" 200 None 2024-04-21 04:45:51,989 DEBUG: API thread: 10.100.11.6 - - "GET /cluster HTTP/1.1" 200 - latency: 16.522 ms 2024-04-21 04:45:52,859 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': <patroni.utils.Retry object at 0x7f24b8e5be50>} 2024-04-21 04:45:52,861 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:56,198 ERROR: Request to server http://10.100.11.3:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'10.100.11.3\', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'10.100.11.3\', port=2379): Read timed out. (read timeout=3.332937417338447)"))') 2024-04-21 04:45:56,198 INFO: Reconnection allowed, looking for another server. 2024-04-21 04:45:56,198 INFO: Retrying on http://10.100.10.4:2379 2024-04-21 04:45:56,199 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:56,199 DEBUG: Starting new HTTP connection (1): 10.100.10.4:2379 2024-04-21 04:45:56,200 ERROR: Request to server http://10.100.10.4:2379 failed: MaxRetryError("HTTPConnectionPool(host='10.100.10.4', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24b8e5bfa0>: Failed to establish a new connection: [Errno 111] Connection refused'))") 2024-04-21 04:45:56,200 INFO: Reconnection allowed, looking for another server. 2024-04-21 04:45:56,200 INFO: Retrying on http://10.100.11.5:2379 2024-04-21 04:45:56,200 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:56,200 DEBUG: Starting new HTTP connection (1): 10.100.11.5:2379 2024-04-21 04:45:57,870 ERROR: Request to server http://10.100.11.5:2379 failed: MaxRetryError("HTTPConnectionPool(host='10.100.11.5', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f24b8e5bd30>, 'Connection to 10.100.11.5 timed out. (connect timeout=1.6666666666666667)'))") 2024-04-21 04:45:57,870 INFO: Reconnection allowed, looking for another server. 2024-04-21 04:45:57,871 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:57,871 DEBUG: Starting new HTTP connection (1): 10.100.11.5:2379 2024-04-21 04:45:59,540 ERROR: Failed to get list of machines from http://10.100.11.5:2379/v2: MaxRetryError("HTTPConnectionPool(host='10.100.11.5', port=2379): Max retries exceeded with url: /v2/machines (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f24b8e5bb20>, 'Connection to 10.100.11.5 timed out. (connect timeout=1.6666666666666667)'))") 2024-04-21 04:45:59,541 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:45:59,541 DEBUG: Starting new HTTP connection (1): 10.100.11.3:2379 2024-04-21 04:46:01,210 ERROR: Failed to get list of machines from http://10.100.11.3:2379/v2: MaxRetryError("HTTPConnectionPool(host='10.100.11.3', port=2379): Max retries exceeded with url: /v2/machines (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f24b8e5b850>, 'Connection to 10.100.11.3 timed out. (connect timeout=1.6666666666666667)'))") 2024-04-21 04:46:01,211 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:46:01,211 DEBUG: Starting new HTTP connection (1): 10.100.10.4:2379 2024-04-21 04:46:01,212 ERROR: Failed to get list of machines from http://10.100.10.4:2379/v2: MaxRetryError("HTTPConnectionPool(host='10.100.10.4', port=2379): Max retries exceeded with url: /v2/machines (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f24b8e5bdc0>: Failed to establish a new connection: [Errno 111] Connection refused'))") 2024-04-21 04:46:01,212 DEBUG: Failed to update list of etcd nodes: EtcdException('Could not get the list of servers, maybe you provided the wrong host(s) to connect to?') 2024-04-21 04:46:01,484 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None) 2024-04-21 04:46:01,484 DEBUG: Starting new HTTP connection (1): 10.100.11.3:2379 2024-04-21 04:46:02,486 ERROR: Request to server http://10.100.11.3:2379 failed: MaxRetryError("HTTPConnectionPool(host='10.100.11.3', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f24bb745160>, 'Connection to 10.100.11.3 timed out. (connect timeout=1.0)'))") 2024-04-21 04:46:02,486 INFO: Reconnection allowed, looking for another server. Firewall should not be a problem, but maybe timeouts?
This error only appear on one node (10.100.10.10)
If you need more information, please let me know!
Thank you!