- Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
In calico 3.31.4 using ebpf and vxlan we are seeing a lot of connection refused errors for services receiving traffic via nodeports when calico-node pods on the nodes where the nodeport traffic arrives are restarted.
The problem does not occur under the same conditions in calico 3.30.6 and if calico-node pods are restarted on nodes where no nodeport traffic is arriving on (but with workload pods present)
The interrupt occurrs shortly before the calico-node pod becomes ready and affects almost all traffic for about 5 seconds for a simple 2.000req/s small http get service from around 5 source ips (without keepalive).
The felixconfiguration is:
spec: bpfConnectTimeLoadBalancing: Disabled bpfEnabled: true bpfExternalServiceMode: DSR bpfHostNetworkedNATWithoutCTLB: Enabled bpfKubeProxyIptablesCleanupEnabled: true bpfKubeProxyMinSyncPeriod: 1s bpfLogLevel: "Off" bpfMapSizeConntrack: 4096000 bpfMapSizeConntrackCleanupQueue: 500000 interfaceExclude: lo,docker0 ipipEnabled: false logSeverityScreen: warning reportingInterval: 0s This is if rather small impact for us as we drain all nodes of traffic before we restart calico pods for upgrades so I currently cannot spend a lot of time trying to narrow down the issue further.
I have not tested if other forms of traffic between nodes/pods is affected.
Expected Behavior
restarting a calico node pod does cause traffic interruptions.