Skip to content

calico 3.31 ebpf, vxlan connection refused on calico-node pod startup #12192

@juliantaylor

Description

@juliantaylor

In calico 3.31.4 using ebpf and vxlan we are seeing a lot of connection refused errors for services receiving traffic via nodeports when calico-node pods on the nodes where the nodeport traffic arrives are restarted.

The problem does not occur under the same conditions in calico 3.30.6 and if calico-node pods are restarted on nodes where no nodeport traffic is arriving on (but with workload pods present)

The interrupt occurrs shortly before the calico-node pod becomes ready and affects almost all traffic for about 5 seconds for a simple 2.000req/s small http get service from around 5 source ips (without keepalive).

The felixconfiguration is:

spec: bpfConnectTimeLoadBalancing: Disabled bpfEnabled: true bpfExternalServiceMode: DSR bpfHostNetworkedNATWithoutCTLB: Enabled bpfKubeProxyIptablesCleanupEnabled: true bpfKubeProxyMinSyncPeriod: 1s bpfLogLevel: "Off" bpfMapSizeConntrack: 4096000 bpfMapSizeConntrackCleanupQueue: 500000 interfaceExclude: lo,docker0 ipipEnabled: false logSeverityScreen: warning reportingInterval: 0s 

This is if rather small impact for us as we drain all nodes of traffic before we restart calico pods for upgrades so I currently cannot spend a lot of time trying to narrow down the issue further.

I have not tested if other forms of traffic between nodes/pods is affected.

Expected Behavior

restarting a calico node pod does cause traffic interruptions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/bpfeBPF Dataplane issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions