I am experimenting with netfilter in a Docker container. I have three containers, one a "router", and two "endpoints". They are each connected via pipework, so an external (host) bridge exists for each endpoint<->router connection. Something like this:
containerA (eth1) -- hostbridgeA -- (eth1) containerR containerB (eth1) -- hostbridgeB -- (eth2) containerR Then within the "router" container containerR, I have a bridge br0 configured like so:
bridge name bridge id STP enabled interfaces br0 8000.3a047f7a7006 no eth1 eth2 I have net.bridge.bridge-nf-call-iptables=0 on the host as that was interfering with some of my other tests.
containerA has IP 192.168.10.1/24 and containerB has 192.168.10.2/24.
I then have a very simple ruleset that traces forwarded packets:
flush ruleset table bridge filter { chain forward { type filter hook forward priority 0; policy accept; meta nftrace set 1 } } With this, I find that only ARP packets are traced, and not ICMP packets. In other words, if I run nft monitor while containerA is pinging containerB, I can see the ARP packets traced, but not the ICMP packets. This surprises me, because based on my understanding of nftables' bridge filter chain types, the only time a packet wouldn't go through the forward stage is if it's sent via input to the host (in this case containerR). Per the Linux packet flow diagram:
I would still expect ICMP packets to take the forward path, just like ARP. I do see the packets if I trace pre- and post-routing. So my question is, what's happening here? Is there a flowtable or other short-circuit I'm not aware of? Is it specific to container networking and/or Docker? I can check with VMs rather than containers, but am interested if others are aware of, or have encountered this, themselves.
Edit: I have since created a similar setup with a set of Alpine Virtual Machines in VirtualBox. ICMP packets do reach the forward chain, so it seems something in the host, or Docker, is interfering with my expectations. I will leave this unanswered until I, or somebody else, can identify the reason, in case it's useful for others to know.
Thanks!
Minimum reproducible example
For this I'm using Alpine Linux 3.19.1 in a VM, with the community repository enabled in /etc/apk/respositories:
# Prerequisites of host apk add bridge bridge-utils iproute2 docker openrc service docker start # When using linux bridges instead of openvswitch, disable iptables on bridges sysctl net.bridge.bridge-nf-call-iptables=0 # Pipework to let me avoid docker's IPAM git clone https://github.com/jpetazzo/pipework.git cp pipework/pipework /usr/local/bin/ # Create two containers each on their own network (bridge) pipework brA $(docker create -itd --name hostA alpine:3.19) 192.168.10.1/24 pipework brB $(docker create -itd --name hostB alpine:3.19) 192.168.10.2/24 # Create bridge-filtering container then connect it to both of the other networks R=$(docker create --cap-add NET_ADMIN -itd --name hostR alpine:3.19) pipework brA -i eth1 $R 0/0 pipework brB -i eth2 $R 0/0 # Note: `hostR` doesn't have/need an IP address on the bridge for this example # Add bridge tools and netfilter to the bridging container docker exec hostR apk add bridge bridge-utils nftables docker exec hostR brctl addbr br docker exec hostR brctl addif br eth1 eth2 docker exec hostR ip link set dev br up # hostA should be able to ping hostB docker exec hostA ping -c 1 192.168.10.2 # 64 bytes from 192.168.10.2... # Set nftables rules docker exec hostR nft add table bridge filter docker exec hostR nft add chain bridge filter forward '{type filter hook forward priority 0;}' docker exec hostR nft add rule bridge filter forward meta nftrace set 1 # Now ping hostB from hostA while nft monitor is running... docker exec hostA ping -c 4 192.168.10.2 & docker exec hostR nft monitor # Ping will succeed, nft monitor will not show any echo-request/-response packets traced, only arps # Example: trace id abc bridge filter forward packet: iif "eth2" oif "eth1" ether saddr ... daddr ... arp operation request trace id abc bridge filter forward rule meta nfrtrace set 1 (verdict continue) trace id abc bridge filter forward verdict continue trace id abc bridge filter forward policy accept ... trace id def bridge filter forward packet: iif "eth1" oif "eth2" ether saddr ... daddr ... arp operation reply trace id def bridge filter forward rule meta nfrtrace set 1 (verdict continue) trace id def bridge filter forward verdict continue trace id def bridge filter forward policy accept # Add tracing in prerouting and the icmp packets are visible: docker exec hostR nft add chain bridge filter prerouting '{type filter hook prerouting priority 0;}' docker exec hostR nft add rule bridge filter prerouting meta nftrace set 1 # Run again docker exec hostA ping -c 4 192.168.10.2 & docker exec hostR nft monitor # Ping still works (obviously), but we can see its packets in prerouting, which then disappear from the forward chain, but ARP shows up in both. # Example: trace id abc bridge filter prerouting packet: iif "eth1" ether saddr ... daddr ... ... icmp type echo-request ... trace id abc bridge filter prerouting rule meta nfrtrace set 1 (verdict continue) trace id abc bridge filter prerouting verdict continue trace id abc bridge filter prerouting policy accept ... trace id def bridge filter prerouting packet: iif "eth2" ether saddr ... daddr ... ... icmp type echo-reply ... trace id def bridge filter prerouting rule meta nfrtrace set 1 (verdict continue) trace id def bridge filter prerouting verdict continue trace id def bridge filter prerouting policy accept ... trace id 123 bridge filter prerouting packet: iif "eth1" ether saddr ... daddr ... ... arp operation request trace id 123 bridge filter prerouting rule meta nfrtrace set 1 (verdict continue) trace id 123 bridge filter prerouting verdict continue trace id 123 bridge filter prerouting policy accept trace id 123 bridge filter forward packet: iif "eth1" oif "eth2" ether saddr ... daddr ... arp operation request trace id 123 bridge filter forward rule meta nfrtrace set 1 (verdict continue) trace id 123 bridge filter forward verdict continue trace id 123 bridge filter forward policy accept ... trace id 456 bridge filter prerouting packet: iif "eth2" ether saddr ... daddr ... ... arp operation reply trace id 456 bridge filter prerouting rule meta nfrtrace set 1 (verdict continue) trace id 456 bridge filter prerouting verdict continue trace id 456 bridge filter prerouting policy accept trace id 456 bridge filter forward packet: iif "eth2" oif "eth1" ether saddr ... daddr ... arp operation reply trace id 456 bridge filter forward rule meta nfrtrace set 1 (verdict continue) trace id 456 bridge filter forward verdict continue trace id 456 bridge filter forward policy accept # Note the trace id matching across prerouting and forward chains I tried this with openvswitch as well, but for simplicity I went with a Linux bridge example which yields the same result anyway. The only real difference with openvswitch is that net.bridge.bridge-nf-call-iptables=0 isn't needed, IIRC.

