calico 3.31 ebpf, vxlan connection refused on calico-node pod startup

In calico 3.31.4 using ebpf and vxlan we are seeing a lot of connection refused errors for services receiving traffic via nodeports when calico-node pods on the nodes where the nodeport traffic arrives are restarted.

The problem does not occur under the same conditions in calico 3.30.6 and if calico-node pods are restarted on nodes where no nodeport traffic is arriving on (but with workload pods present)

The interrupt occurrs shortly before the calico-node pod becomes ready and affects almost all traffic for about 5 seconds for a simple 2.000req/s small http get service from around 5 source ips (without keepalive).

The felixconfiguration is:

spec: bpfConnectTimeLoadBalancing: Disabled bpfEnabled: true bpfExternalServiceMode: DSR bpfHostNetworkedNATWithoutCTLB: Enabled bpfKubeProxyIptablesCleanupEnabled: true bpfKubeProxyMinSyncPeriod: 1s bpfLogLevel: "Off" bpfMapSizeConntrack: 4096000 bpfMapSizeConntrackCleanupQueue: 500000 interfaceExclude: lo,docker0 ipipEnabled: false logSeverityScreen: warning reportingInterval: 0s

This is if rather small impact for us as we drain all nodes of traffic before we restart calico pods for upgrades so I currently cannot spend a lot of time trying to narrow down the issue further.

I have not tested if other forms of traffic between nodes/pods is affected.

Expected Behavior

restarting a calico node pod does cause traffic interruptions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calico 3.31 ebpf, vxlan connection refused on calico-node pod startup #12192

Expected Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

calico 3.31 ebpf, vxlan connection refused on calico-node pod startup #12192

Description

Expected Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions