-1

Vote down!

An example of NAT routing in docker-compose

Came across this gem. This is an example of setting up a NAT inside docker - in a fairly self-explanatory and minimal way. Thanks to Arne Wendt for the insight.

Full code is available on github: https://github.com/wasya-co/docker-nat-router-container-example

Why is this setup not recommended in production? I'm assuming for performance reasons, because it increases the number of network hops that each request has to go through. However, if you have a NAT in production, and if you have a physical firewall, then you aren't saving the number of hops with a different setup. My dear reader, if you have insight as to why such a setup is not recommended for production - please leave it in the comments.

First, lets overview the docker-compose file:

version: '3.8'
services:
  internal:
    image: alpine:latest
    init: true
    networks:
      routed:
    depends_on:
      - router
    volumes:
      - ./data/resolv.conf:/etc/resolv.conf
  router:
    build: ./router
    init: true
    networks:
      routed:
        priority: 1000
      default:
        priority: 1
    volumes:
      - ./data/resolv.conf:/data/resolv.conf
    # environment:
    #   - ROUTE_NET=${ROUTE_NET}
    #   - ROUTE_GATEWAY=${ROUTE_GATEWAY}
    cap_add:
      - NET_ADMIN
  external:
    init: true
    image: alpine:latest
networks:
  routed:
    driver: macvlan

Here we have set two networks and three containers. The external container represents the internet, and is connected to the host machine's network (bridge), the default network.

The routed, internal network is a macvlan network without a parent interface. The internal container is attached to this network. Containers in this network can communicate with each other but not reach targets outside the network by default. A virtual interface is automatically created as parent by docker. Optionally, you can specify a custom subnet and gateway in the compose file. If left unspecified, docker assigns a random subnet and ip address.

The router container is attached to both networks. It performs NAT routing from internal to external network, as well as DNS forwarding.

You can Specify Routed Subnet or Gateway. Find commented sections in docker-compose.yml relating to ROUTE_NET and/or ROUTE_GATEWAY. Setup IPAM config, and provide the subnet in CIDR notation to the router-container as ROUTE_NET=<my.su.bn.et/prefix>, to use a specific subnet. To use a specific gateway, provide ROUTE_GATEWAY in IPAM config and router-container environment.

~ * ~ * ~ * ~

Network setup

Docker assigns a gateway address as described above to all containers on the internal network. The router assigns itself this gateway address on the internal network interface, using either the explicitly specified address, or deducing the gateway address from the interfaces' subnet.

NAT routing is performed using iptables rules. Modifying iptables, requires the container to be ran with NET_ADMIN capabilities. Routing allows the internal network and containers, access to all other networks attached to the router container!

dnsmasq performs DNS forwarding for containers on the internal subnet, to the routers DNS resolver provided by docker. Dockers' DNS resolver only resolves container names on the same network. As the router is attached to both networks, the local resolver is able to resolve names from both these networks. The router mounts and updates a resolv.conf file in the data/ directory, to be mounted by the internal containers as /etc/resolv.conf.

The routing container requires the following packages:

apt update && apt install -y iproute2 iptables dnsmasq jq netmask bsdmainutils conntrack

Perhaps the most complicated part of this setup is the script that does the actual network address translation. NAT is a method used by routers to translate private IP addresses on a local network into a public IP address before sending data to the internet—and vice versa when receiving data. The script is reproduced in full below.

#!/bin/bash
# ENV:
#   - ROUTE_NET
#   - (ROUTE_GATEWAY)
if [ ! ${ROUTE_NET}]; then
    echo "no routed network specifed by ROUTE_NET; selecting from first interface"
    
    # filter interfaces and get first one from list:
    #   - no loopback interfaces
    #   - no "NOARP" flags
    #   - interface/link is up
    IF_INFO="$(ip -json a | jq -r '[.[] | select(.operstate=="UP" and (.flags | any(.=="LOOPBACK") | not) and (.flags | any(.=="NOARP") | not))][0]')"
    # pull required data from interface
    IF_NAME="$(echo ${IF_INFO} | jq -r .ifname)"
    IF_ADDR_INFO="$(echo ${IF_INFO} | jq -r '[.addr_info | .[] | select(.family=="inet")][0]')"
    IF_ADDR="$(echo ${IF_ADDR_INFO} | jq -r .local)"
    IF_PREFIX="$(echo ${IF_ADDR_INFO} | jq -r .prefixlen)"
    ROUTE_NET=$(netmask ${IF_ADDR}/${IF_PREFIX} | grep -oP '\S*')
    echo "using ${ROUTE_NET} from ${IF_NAME}"
fi
# sanitize routing subnet definition
ROUTE_NET_USER=${ROUTE_NET}
ROUTE_NET=$(netmask ${ROUTE_NET} | grep -oP '\S*')
echo "network to route from ${ROUTE_NET} (${ROUTE_NET_USER})"
# find interface info from subnet info
for IFACE_ADDRS in $(ip -json a | jq -r '.[] as {$ifname, $addr_info} | $addr_info | map("\(.local)/\(.prefixlen);\($ifname)") | .[]'); do
    IFS=';' read -r -a IFACE_ADDRS <<< "${IFACE_ADDRS}"    
    NET="$(netmask ${IFACE_ADDRS[0]} | grep -oP '\S*')"
    if [ "${NET}" == "${ROUTE_NET}" ]; then
        ROUTE_ADDRESS="${IFACE_ADDRS[0]}"
        ROUTE_IF="${IFACE_ADDRS[1]}"
    fi
done
# break if interface not found
if [ ! ${ROUTE_IF} ]; then
    echo "could not get routing interface for net ${ROUTE_NET}"
    echo "known interfaces and addresses:"
    ip -json a | jq -r '.[] as {$ifname, $addr_info} | $addr_info | map("\($ifname) \(.local)/\(.prefixlen)") | .[]' | column -t -s' '
    exit 1
fi
# check for IPv4 forwarding
if [ $(cat /proc/sys/net/ipv4/ip_forward) != 1 ]; then
    echo "ipv4 forwarding is disabled!"
    exit 1
fi
# echo "removing ${ROUTE_ADDRESS} on interface ${ROUTE_IF}"
# ip addr del ${ROUTE_ADDRESS} dev ${ROUTE_IF}
# determine gateway address
if [ ! ${ROUTE_GATEWAY} ]; then
    # deduce gateway address from subnet definition
    # use docker behavior: first usable address in subnet
    ROUTE_NET_LOWER=$(netmask -r ${ROUTE_NET} | sed -E 's/^\s*(.*)-.*/\1/')
    ROUTE_NET_BRC=$(echo ${ROUTE_NET_LOWER} | sed -E 's/([[:digit:]]{1,3}\.){3}//')
    ROUTE_NET_24=$(echo ${ROUTE_NET_LOWER} | grep -oP '([[:digit:]]{1,3}\.){3}')
    let ROUTE_NET_GW=ROUTE_NET_BRC+1
    ROUTE_GATEWAY="${ROUTE_NET_24}${ROUTE_NET_GW}"
fi
# add prefix if not present on gateway address
if [[ "${ROUTE_GATEWAY}" != *"/"* ]]; then
    ROUTE_NET_SUBNET=$(echo ${ROUTE_NET} | grep -oP '/\d+$')
    echo "adding subnet ${ROUTE_NET_SUBNET} to gateway address"
fi
echo "using gateway address ${ROUTE_GATEWAY}"
# add gateway address to *internal* interface
echo "adding ${ROUTE_GATEWAY} on interface ${ROUTE_IF}"
ip addr add ${ROUTE_GATEWAY} dev ${ROUTE_IF}
# populate shared resolv.conf with own (gateway) address
echo "writing resolv.conf for routed clients in /data/resolv.conf"
tac /etc/resolv.conf | sed "/^nameserver.*/i nameserver $(echo ${ROUTE_GATEWAY} | sed -E 's#/[[:digit:]]+$##')" | tac > /data/resolv.conf
# enable routing
for TARGET_IF in $(ip -json a | jq -r  --arg IFNAME "${ROUTE_IF}" '.[] | select(.ifname!=$IFNAME and (.flags | any(.=="LOOPBACK") | not) and (.flags | any(.=="NOARP") | not)) | .ifname'); do
    echo "enabling NAT and FORWARD from ${ROUTE_IF} to ${TARGET_IF}"
    
    iptables -A FORWARD -o ${TARGET_IF} -i ${ROUTE_IF} -s ${ROUTE_NET} -m conntrack --ctstate NEW -j ACCEPT
    iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
    iptables -t nat -I POSTROUTING 1 -o ${TARGET_IF} -j MASQUERADE
done
# run dnsmasq as local DNS forwarder; enable name resolution of containers on *external* networks
echo "starting dnsmasq..."
dnsmasq -q -d &
# execute command; or monitor using conntrack
if [ ${@} ]; then
    echo "executing supplied command line: ${@}"
    # exec "${@}"
    ${@}
else
    echo "monitoring NAT connections..."
    conntrack -E
fi

~ * ~ * ~ * ~

Running the examples

There are three examples that you can run in this setup: a positive example, a negative example, and a chatty positive example for debugging and demonstration purposes.

Run the positive example with:

docker-compose -f docker-compose.yml  -f examples/up.yml  up
## examples/up.yml
version: '3.8'
services:
  internal:
    tty: true
    command: ['/bin/sh']
  external:
    tty: true
    command: ['/bin/sh']

You can also run the chatty and the negative examples:

docker-compose -f docker-compose.yml  -f examples/hello.yml  up
docker-compose -f docker-compose.yml  -f examples/reverse-fail.yml  up

.^.

Please login or register to post a comment.