An example of NAT routing in docker-compose
Came across this gem. This is an example of setting up a NAT inside docker - in a fairly self-explanatory and minimal way. Thanks to Arne Wendt for the insight.
Full code is available on github: https://github.com/wasya-co/docker-nat-router-container-example
Why is this setup not recommended in production? I'm assuming for performance reasons, because it increases the number of network hops that each request has to go through. However, if you have a NAT in production, and if you have a physical firewall, then you aren't saving the number of hops with a different setup. My dear reader, if you have insight as to why such a setup is not recommended for production - please leave it in the comments.
First, lets overview the docker-compose file:
version: '3.8'
services:
internal:
image: alpine:latest
init: true
networks:
routed:
depends_on:
- router
volumes:
- ./data/resolv.conf:/etc/resolv.conf
router:
build: ./router
init: true
networks:
routed:
priority: 1000
default:
priority: 1
volumes:
- ./data/resolv.conf:/data/resolv.conf
# environment:
# - ROUTE_NET=${ROUTE_NET}
# - ROUTE_GATEWAY=${ROUTE_GATEWAY}
cap_add:
- NET_ADMIN
external:
init: true
image: alpine:latest
networks:
routed:
driver: macvlan
Here we have set two networks and three containers. The external container represents the internet, and is connected to the host machine's network (bridge), the default network.
The routed, internal network is a macvlan network without a parent interface. The internal container is attached to this network. Containers in this network can communicate with each other but not reach targets outside the network by default. A virtual interface is automatically created as parent by docker. Optionally, you can specify a custom subnet and gateway in the compose file. If left unspecified, docker assigns a random subnet and ip address.
The router container is attached to both networks. It performs NAT routing from internal to external network, as well as DNS forwarding.
You can Specify Routed Subnet or Gateway. Find commented sections in docker-compose.yml
relating to ROUTE_NET
and/or ROUTE_GATEWAY
. Setup IPAM config, and provide the subnet in CIDR notation to the router
-container as ROUTE_NET=<my.su.bn.et/prefix>
, to use a specific subnet. To use a specific gateway, provide ROUTE_GATEWAY
in IPAM config and router
-container environment.
~ * ~ * ~ * ~
Network setup
Docker assigns a gateway address as described above to all containers on the internal network. The router assigns itself this gateway address on the internal network interface, using either the explicitly specified address, or deducing the gateway address from the interfaces' subnet.
NAT routing is performed using iptables rules. Modifying iptables, requires the container to be ran with NET_ADMIN
capabilities. Routing allows the internal network and containers, access to all other networks attached to the router
container!
dnsmasq performs DNS forwarding for containers on the internal subnet, to the routers DNS resolver provided by docker. Dockers' DNS resolver only resolves container names on the same network. As the router
is attached to both networks, the local resolver is able to resolve names from both these networks. The router mounts and updates a resolv.conf
file in the data/
directory, to be mounted by the internal containers as /etc/resolv.conf
.
The routing container requires the following packages:
apt update && apt install -y iproute2 iptables dnsmasq jq netmask bsdmainutils conntrack
Perhaps the most complicated part of this setup is the script that does the actual network address translation. NAT is a method used by routers to translate private IP addresses on a local network into a public IP address before sending data to the internet—and vice versa when receiving data. The script is reproduced in full below.
#!/bin/bash
# ENV:
# - ROUTE_NET
# - (ROUTE_GATEWAY)
if [ ! ${ROUTE_NET}]; then
echo "no routed network specifed by ROUTE_NET; selecting from first interface"
# filter interfaces and get first one from list:
# - no loopback interfaces
# - no "NOARP" flags
# - interface/link is up
IF_INFO="$(ip -json a | jq -r '[.[] | select(.operstate=="UP" and (.flags | any(.=="LOOPBACK") | not) and (.flags | any(.=="NOARP") | not))][0]')"
# pull required data from interface
IF_NAME="$(echo ${IF_INFO} | jq -r .ifname)"
IF_ADDR_INFO="$(echo ${IF_INFO} | jq -r '[.addr_info | .[] | select(.family=="inet")][0]')"
IF_ADDR="$(echo ${IF_ADDR_INFO} | jq -r .local)"
IF_PREFIX="$(echo ${IF_ADDR_INFO} | jq -r .prefixlen)"
ROUTE_NET=$(netmask ${IF_ADDR}/${IF_PREFIX} | grep -oP '\S*')
echo "using ${ROUTE_NET} from ${IF_NAME}"
fi
# sanitize routing subnet definition
ROUTE_NET_USER=${ROUTE_NET}
ROUTE_NET=$(netmask ${ROUTE_NET} | grep -oP '\S*')
echo "network to route from ${ROUTE_NET} (${ROUTE_NET_USER})"
# find interface info from subnet info
for IFACE_ADDRS in $(ip -json a | jq -r '.[] as {$ifname, $addr_info} | $addr_info | map("\(.local)/\(.prefixlen);\($ifname)") | .[]'); do
IFS=';' read -r -a IFACE_ADDRS <<< "${IFACE_ADDRS}"
NET="$(netmask ${IFACE_ADDRS[0]} | grep -oP '\S*')"
if [ "${NET}" == "${ROUTE_NET}" ]; then
ROUTE_ADDRESS="${IFACE_ADDRS[0]}"
ROUTE_IF="${IFACE_ADDRS[1]}"
fi
done
# break if interface not found
if [ ! ${ROUTE_IF} ]; then
echo "could not get routing interface for net ${ROUTE_NET}"
echo "known interfaces and addresses:"
ip -json a | jq -r '.[] as {$ifname, $addr_info} | $addr_info | map("\($ifname) \(.local)/\(.prefixlen)") | .[]' | column -t -s' '
exit 1
fi
# check for IPv4 forwarding
if [ $(cat /proc/sys/net/ipv4/ip_forward) != 1 ]; then
echo "ipv4 forwarding is disabled!"
exit 1
fi
# echo "removing ${ROUTE_ADDRESS} on interface ${ROUTE_IF}"
# ip addr del ${ROUTE_ADDRESS} dev ${ROUTE_IF}
# determine gateway address
if [ ! ${ROUTE_GATEWAY} ]; then
# deduce gateway address from subnet definition
# use docker behavior: first usable address in subnet
ROUTE_NET_LOWER=$(netmask -r ${ROUTE_NET} | sed -E 's/^\s*(.*)-.*/\1/')
ROUTE_NET_BRC=$(echo ${ROUTE_NET_LOWER} | sed -E 's/([[:digit:]]{1,3}\.){3}//')
ROUTE_NET_24=$(echo ${ROUTE_NET_LOWER} | grep -oP '([[:digit:]]{1,3}\.){3}')
let ROUTE_NET_GW=ROUTE_NET_BRC+1
ROUTE_GATEWAY="${ROUTE_NET_24}${ROUTE_NET_GW}"
fi
# add prefix if not present on gateway address
if [[ "${ROUTE_GATEWAY}" != *"/"* ]]; then
ROUTE_NET_SUBNET=$(echo ${ROUTE_NET} | grep -oP '/\d+$')
echo "adding subnet ${ROUTE_NET_SUBNET} to gateway address"
fi
echo "using gateway address ${ROUTE_GATEWAY}"
# add gateway address to *internal* interface
echo "adding ${ROUTE_GATEWAY} on interface ${ROUTE_IF}"
ip addr add ${ROUTE_GATEWAY} dev ${ROUTE_IF}
# populate shared resolv.conf with own (gateway) address
echo "writing resolv.conf for routed clients in /data/resolv.conf"
tac /etc/resolv.conf | sed "/^nameserver.*/i nameserver $(echo ${ROUTE_GATEWAY} | sed -E 's#/[[:digit:]]+$##')" | tac > /data/resolv.conf
# enable routing
for TARGET_IF in $(ip -json a | jq -r --arg IFNAME "${ROUTE_IF}" '.[] | select(.ifname!=$IFNAME and (.flags | any(.=="LOOPBACK") | not) and (.flags | any(.=="NOARP") | not)) | .ifname'); do
echo "enabling NAT and FORWARD from ${ROUTE_IF} to ${TARGET_IF}"
iptables -A FORWARD -o ${TARGET_IF} -i ${ROUTE_IF} -s ${ROUTE_NET} -m conntrack --ctstate NEW -j ACCEPT
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -t nat -I POSTROUTING 1 -o ${TARGET_IF} -j MASQUERADE
done
# run dnsmasq as local DNS forwarder; enable name resolution of containers on *external* networks
echo "starting dnsmasq..."
dnsmasq -q -d &
# execute command; or monitor using conntrack
if [ ${@} ]; then
echo "executing supplied command line: ${@}"
# exec "${@}"
${@}
else
echo "monitoring NAT connections..."
conntrack -E
fi
~ * ~ * ~ * ~
Running the examples
There are three examples that you can run in this setup: a positive example, a negative example, and a chatty positive example for debugging and demonstration purposes.
Run the positive example with:
docker-compose -f docker-compose.yml -f examples/up.yml up
## examples/up.yml
version: '3.8'
services:
internal:
tty: true
command: ['/bin/sh']
external:
tty: true
command: ['/bin/sh']
You can also run the chatty and the negative examples:
docker-compose -f docker-compose.yml -f examples/hello.yml up
docker-compose -f docker-compose.yml -f examples/reverse-fail.yml up
.^.