0

Background

I have a linux machine with bridge interfaces as shown below...

 ---{prenat}--> ---{postnat}--> source: 172.25.0.3 source: 192.0.2.1 +--------------------+ +-----------------------+ | br-2f0c8e39d468 |------{linux}----| br-dee49672169b | | 172.25.0.0/16 | | 192.0.2.0/24 | | Docker Compose | | containerlab Docker | | Hosts | | Hosts | +--------------------+ +-----------------------+ 

I want to NAT the IPv4 source address of any traffic from br-2f0c8e39d468 to br-dee49672169b using the interface IP address of br-dee49672169b (192.0.2.1).

$ ip route show default via 10.100.50.1 dev ens160 proto static 10.100.50.0/28 dev ens160 proto kernel scope link src 10.100.50.5 172.16.0.0/16 via 192.0.2.2 dev br-dee49672169b 172.25.0.0/16 dev br-2f0c8e39d468 proto kernel scope link src 172.25.0.1 192.0.2.0/24 dev br-dee49672169b proto kernel scope link src 192.0.2.1 $ 

Docker Compose bridge

This is my Docker Compose yaml for the zabbix br-2f0c8e39d468 segment...

version: '3.3' services: # Zabbix database zabbix-db: container_name: zabbix-db image: mariadb:10.11.4 restart: always volumes: - ${ZABBIX_DATA_PATH}/zabbix-db/mariadb:/var/lib/mysql:rw - ${ZABBIX_DATA_PATH}/zabbix-db/backups:/backups command: - mariadbd - --character-set-server=utf8mb4 - --collation-server=utf8mb4_bin - --default-authentication-plugin=mysql_native_password environment: - MYSQL_USER=${MYSQL_USER} - MYSQL_PASSWORD=${MYSQL_PASSWORD} - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} stop_grace_period: 1m networks: - statics # Zabbix server zabbix-server: container_name: zabbix-server image: zabbix/zabbix-server-mysql:ubuntu-6.4-latest restart: always ports: - 10051:10051 volumes: - /etc/localtime:/etc/localtime:ro - ${ZABBIX_DATA_PATH}/zabbix-server/alertscripts:/usr/lib/zabbix/alertscripts:ro - ${ZABBIX_DATA_PATH}/zabbix-server/externalscripts:/usr/lib/zabbix/externalscripts:ro - ${ZABBIX_DATA_PATH}/zabbix-server/dbscripts:/var/lib/zabbix/dbscripts:ro - ${ZABBIX_DATA_PATH}/zabbix-server/export:/var/lib/zabbix/export:rw - ${ZABBIX_DATA_PATH}/zabbix-server/modules:/var/lib/zabbix/modules:ro - ${ZABBIX_DATA_PATH}/zabbix-server/enc:/var/lib/zabbix/enc:ro - ${ZABBIX_DATA_PATH}/zabbix-server/ssh_keys:/var/lib/zabbix/ssh_keys:ro - ${ZABBIX_DATA_PATH}/zabbix-server/mibs:/var/lib/zabbix/mibs:ro environment: - MYSQL_ROOT_USER=root - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} - DB_SERVER_HOST=zabbix-db - ZBX_STARTPINGERS=${ZBX_STARTPINGERS} depends_on: - zabbix-db stop_grace_period: 30s sysctls: - net.ipv4.ip_local_port_range=1024 65000 - net.ipv4.conf.all.accept_redirects=0 - net.ipv4.conf.all.secure_redirects=0 - net.ipv4.conf.all.send_redirects=0 networks: - statics # Zabbix web UI zabbix-web: build: context: . dockerfile: Dockerfile container_name: zabbix-web image: zabbix/zabbix-web-nginx-mysql:ubuntu-6.4-latest restart: always ports: - 9000:8080 volumes: - /etc/localtime:/etc/localtime:ro - ${ZABBIX_DATA_PATH}/zabbix-web/nginx:/etc/ssl/nginx:ro - ${ZABBIX_DATA_PATH}/zabbix-web/modules/:/usr/share/zabbix/modules/:ro environment: - MYSQL_USER=${MYSQL_USER} - MYSQL_PASSWORD=${MYSQL_PASSWORD} - DB_SERVER_HOST=zabbix-db - ZBX_SERVER_HOST=zabbix-server - ZBX_SERVER_NAME=Zabbix Docker - PHP_TZ=America/Chicago depends_on: - zabbix-db - zabbix-server stop_grace_period: 10s networks: - statics networks: statics: driver: macvlan 

containerlab bridge

This is the yaml for the Cisco CSR1000V containerlab bridged segment on 192.0.2.0/24...

name: rr mgmt: network: statics ipv4-subnet: 192.0.2.0/24 ipv4-range: 192.0.2.0/24 # ACCESS for linux: # docker exec -it <container_name> bash # ACCESS for frr: # docker exec -it <container_name> vtysh # ACCESS for srlinux: # docker exec -it <container_name> sr_cli # ACCESS for vr-csr: # telnet <container_ip> 5000 topology: nodes: csr01: kind: vr-csr image: vrnetlab/vr-csr:16.12.08 startup-config: config/csr01/config.txt mgmt-ipv4: 192.0.2.2 csr02: kind: vr-csr image: vrnetlab/vr-csr:16.12.08 startup-config: config/csr02/config.txt mgmt-ipv4: 192.0.2.3 csr03: kind: vr-csr image: vrnetlab/vr-csr:16.12.08 startup-config: config/csr03/config.txt mgmt-ipv4: 192.0.2.6 PC01: kind: linux image: ubuntu:22.04 mgmt-ipv4: 192.0.2.4 PC02: kind: linux image: ubuntu:22.04 mgmt-ipv4: 192.0.2.5 #image: netenglabs/suzieq:latest # Manual creation of bridge required before deploying the topology # sudo brctl addbr br-clab br-clab: kind: bridge links: - endpoints: ["csr01:eth3", "csr02:eth3"] - endpoints: ["csr01:eth4", "csr02:eth4"] - endpoints: ["csr03:eth3", "csr01:eth5"] - endpoints: ["csr03:eth4", "csr02:eth5"] - endpoints: ["PC01:eth1", "csr01:eth6"] - endpoints: ["PC02:eth1", "csr02:eth6"] - endpoints: ["br-clab:eth1", "csr01:eth2"] - endpoints: ["br-clab:eth2", "csr02:eth2"] - endpoints: ["br-clab:eth3", "csr03:eth2"] 

What I tried

To circumvent the DOCKER-ISOLATION-* chains, I used this...

sudo iptables -I INPUT -i br-2f0c8e39d468 -j ACCEPT sudo iptables -I FORWARD -i br-2f0c8e39d468 -j ACCEPT sudo iptables -I FORWARD -o br-2f0c8e39d468 -j ACCEPT 

This results in the following iptables rules

$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere Chain FORWARD (policy DROP) target prot opt source destination ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (4 references) target prot opt source destination ACCEPT tcp -- anywhere 172.25.0.2 tcp dpt:http-alt ACCEPT tcp -- anywhere 172.25.0.3 tcp dpt:zabbix-trapper Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (4 references) target prot opt source destination DROP all -- anywhere anywhere DROP all -- anywhere anywhere DROP all -- anywhere anywhere DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination ACCEPT all -- anywhere anywhere /* set by containerlab */ 

I used:

$ sudo sysctl net.bridge.bridge-nf-call-iptables=1 $ sudo sysctl net.bridge.bridge-nf-call-arptables=1 $ sudo sysctl -w net.ipv4.ip_forward=1 

I also used sudo iptables -t nat -A POSTROUTING -o br-dee49672169b -j MASQUERADE.

When I ping a 192.0.2.2 Docker containerlab container from 172.25.0.3 Docker Compose system and I sniff the 192.0.2.0/24 bridge interface on the {linux} host, I see:

19:32:31.870772 IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 10775, seq 0, length 64 19:32:31.870807 IP 192.0.2.2 > 172.25.0.3: ICMP echo reply, id 10775, seq 0, length 64 19:32:32.871777 IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 10775, seq 1, length 64 19:32:32.871811 IP 192.0.2.2 > 172.25.0.3: ICMP echo reply, id 10775, seq 1, length 64 19:32:33.871761 IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 10775, seq 2, length 64 19:32:33.871794 IP 192.0.2.2 > 172.25.0.3: ICMP echo reply, id 10775, seq 2, length 64 

As you can see, the NAT is applied when sending to 192.0.2.2, but the reply goes to 172.25.0.3, so something is rather broken here.

What commands should I use to implement this NAT correctly?

4
  • Where/how are you running tcpdump? Where are you running the ping command? Commented Jun 21, 2024 at 21:00
  • @larsks I updated the question Commented Jun 22, 2024 at 18:36
  • Apparently it's because the "reverse DNAT" is applied once the replies have entered the system, regardless of whether the first interface they "got through" is a "bridge port/slave". (I do wonder if it's implemented like this for a certain reason, or if it's just some sort of "miss" that happen to "still work"...) Commented Jun 23, 2024 at 10:32
  • 1
    Wait, why is your Docker network using the macvlan driver (without specifying a parent device explicitly)? You neglected to mention that in your question earlier. That is critical information that completely changes the traffic flow. Using the macvlan driver means there isn't any bridge device associated with that network. (I'm not even sure that using macvlan without an explicit parent device leads to a useful configuration.) Commented Jun 24, 2024 at 13:54

1 Answer 1

0

Docker by default masquerades all traffic leaving a Docker network.

I've set up an environment to reproduce your configuration, like this:

networks: net172: driver_opts: com.docker.network.bridge.name: br172 ipam: config: - subnet: 172.25.0.0/16 net192: driver_opts: com.docker.network.bridge.name: br192 ipam: config: - subnet: 192.0.2.0/24 services: node-172-0: image: docker.io/alpine:latest networks: net172: ipv4_address: 172.25.0.2 init: true command: - sh - -c - | apk add tcpdump sleep inf node-192-0: image: docker.io/alpine:latest networks: net192: ipv4_address: 192.0.2.2 init: true command: - sh - -c - | apk add tcpdump sleep inf 

Initially, if I try to ping 192.0.2.2 from node-172-0, it will simply fail:

/ # ping -c1 192.0.2.2 PING 192.0.2.2 (192.0.2.2): 56 data bytes --- 192.0.2.2 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss 

This is because the packets never make it to the 192.0.2.0/24 network. Running tcpdump -nn -i br172, we see:

23:01:31.473515 IP 172.25.0.2 > 192.0.2.2: ICMP echo request, id 22, seq 0, length 64 

But running tcpdump -nn -i br192, we never see the packet arrive. That's because of the rules Docker sets up in the FORWARD chain. First we hit this rule:

-A DOCKER-ISOLATION-STAGE-1 -i br172 ! -o br172 -j DOCKER-ISOLATION-STAGE-2 

Which leads us to:

-A DOCKER-ISOLATION-STAGE-2 -o br192 -j DROP 

So the first thing we need to do is allow the kernel to forward packets between these two networks. We can do that by adding a rule to the DOCKER-USER chain, since that gets called before any other Docker rules:

iptables -A DOCKER-USER -i br172 -o br192 -j ACCEPT 

We might as well add one for the reverse direction as well, since we know we're going to need it:

iptables -A DOCKER-USER -i br192 -o br172 -j ACCEPT 

And double check to see if the first rule in the DOCKER-USER chain is -j RETURN; if you see this:

# iptables -S DOCKER-USER -N DOCKER-USER -A DOCKER-USER -j RETURN 

Then you'll need to remove it:

iptables -D DOCKER-USER -j RETURN 

With these changes in place, we can now successfully ping from node-172-0 to node-192-0:

/ # ping -c1 192.0.2.2 PING 192.0.2.2 (192.0.2.2): 56 data bytes 64 bytes from 192.0.2.2: seq=0 ttl=63 time=0.233 ms --- 192.0.2.2 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.233/0.233/0.233 ms 

If on the host we run tcpdump -nn -i any icmp, we see:

  1. The packet comes in on br172:

    23:13:54.292707 vethce5d047 P IP 172.25.0.2 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64 23:13:54.292707 br172 In IP 172.25.0.2 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64 
  2. It gets masqueraded when it exits on br192 (because masquerading happens in the POSTROUTING chain, which is part of the output path, not the input path):

    23:13:54.292776 br192 Out IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64 23:13:54.292783 veth6c3d3ea Out IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64 

    That's due to the rule that Docker added to the nat POSTROUTING chain:

    -A POSTROUTING -s 172.25.0.0/16 ! -o br172 -j MASQUERADE 
  3. Container node-192-0 sends a reply to the (masqueraded) source address:

    23:13:54.292815 veth6c3d3ea P IP 192.0.2.2 > 192.0.2.1: ICMP echo reply, id 33, seq 0, length 64 
  4. But when this enters br192, the destination address gets de-masqueraded, since it's a reply to the original request:

    23:13:54.292815 br192 In IP 192.0.2.2 > 172.25.0.2: ICMP echo reply, id 33, seq 0, length 64 23:13:54.292830 br172 Out IP 192.0.2.2 > 172.25.0.2: ICMP echo reply, id 33, seq 0, length 64 23:13:54.292832 vethce5d047 Out IP 192.0.2.2 > 172.25.0.2: ICMP echo reply, id 33, seq 0, length 64 

(NB: The above demonstrates the configuration set up by Docker version 26.1.4.)

3
  • Unfortunately there is no POSTROUTING chain in my topology. Perhaps this is because you're simulating with two Docker Compose bridges, but there is only one Docker Compose bridge in my topology. The br-dee49672169b bridge in my question comes from containerlab and it's running some Cisco CSR1000V containers. I will update my question with the full dump of my iptables rules. Commented Jun 24, 2024 at 12:41
  • This would all be correct if you weren't use the macvlan driver on your docker network (which you hadn't mentioned in your original question). Because the masquerade rules we care about are those for the source network, it wouldn't matter that your second network is managed with containerlab. Commented Jun 24, 2024 at 13:56
  • The device is the eth0 on {linux} and this actually works without specifying a parent. There is quite a bit of wierd behavior that containerlab introduces and I'm not sure we can solve this on the site because after working on this for a while, I think the details of the CSR1000V containers are pretty relevant. I'll accept your answer though Commented Jun 24, 2024 at 14:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.