Per process routing take 2: using cgroups, iptables and policy routing
Cgroups Iptables Linux Desktop Netfilter Policy Routing
6 minutes
In a previous article we saw how it’s possible to do per process routing using namespaces. In this one we will achieve the same by using cgroups, iptables and policy routing. Perhaps the user case is a bit marginal (see the introduction in the mentioned article) but this article is a tribute to the extreme flexibility of cgroups.
You will need a Linux kernel >= 3.14 and a modern iptables. The former is easy to obtain (at least on Debian via back-ported kernels or directly on Jessie), the later is a bit more difficult. Anyway I prepared a compiled binary, just point the IPT variable to it once unpacked in the root directory:
IPT=/mnt/scratch/iptables/sbin/iptables
You have to correctly mount the cgroup file-system, the easiest way on Jessie is by installing the package cgroupfs-mount.
The method will be based on the 3 technologies mentioned in the title:
- the cgroups net_cls controller will be used to set the classid of the packet originated from the process.
- iptables will be used to mark the packet. This is possible thanks to the patch by Daniel Borkmann, see this thread for more information. If the entire patch had been accepted, we would have used the new proposed controller. But the proliferation of cgroup controllers being a bad thing and the unclear semantics of fwmark (because it would be modifiable both by cgroups and iptables) had as a consequence that only the netfilter part of the patch got in the v3.14 kernel.
- policy routing to define a new routing table wit a different default route that is triggered wit the fwmark.
In my opinion this method has 2 advantage over the one presented in the previous articles:
- it’s much easier to change the default route for processes (even already running) because it’s easier to move a process into or out of a control group.
- you don’t need the bridging thing.
The clear disadvantage is that it’s built on newer technologies not available out of the box on older distributions, like Debian Wheezy for example.
Now let’s see how it works. First define a control group for the net_cls controller:
mkdir /sys/fs/cgroup/net_cls/new_route cd /sys/fs/cgroup/net_cls/new_route echo 0x00110011 > net_cls.classid
packet generated by processes in this control group will be annotated with the given 0x00110011 (11:11) classid. Next use iptables to fwmark packets:
$IPT -t mangle -A OUTPUT -m cgroup --cgroup 0x00110011 -j MARK --set-mark 11
note that it’s very important to put the rule in this specific table and chain to trigger rerouting. Check out this picture on wikipedia, it’s worth more that thousand words in describing the journey of a packet in the Linux network stack. Finally we have to declare an additional routing table for policy routing:
echo 11 new_route >> /etc/iproute2/rt_tables # just once! ip rule add fwmark 11 table new_route ip route add default via 10.0.10.58 table new_route
here 10.0.10.58 is the default gateway for the processes in the new_route control group. Now it’s really easy to change the default route for a process, just add it to the control group. It’s quite entraintaining to have a ping running and see how RTT changes based on the default gateway. You can find the PID for ping in the usual ways (ps is your best friend), let’s say it’s 2345:
cd /sys/fs/cgroup/net_cls/new_route echo 2345 > tasks
and you can take it out from the control group easily:
echo 2345 > ../tasks
Keep in mind that when a process in a net_cls control group forks, its child will be in the same one. But if you move the parent, the child will stay there. Normal cgroups semantics applies.
This example gives just another application of the powerful cgroups concept. Others are of course possible, like per-process dynamic firewall rules or traffic control disciplines.
Comments #
- Dan on 2014-09-09 06:04:26 +0100
On Debian Jessie I receive this error:
# $IPT -t mangle -A OUTPUT -m cgroup –cgroup 0x00110011 -j MARK –set-mark 11
iptables v1.4.21: Couldn’t load match `cgroup’:No such file or directoryTry `iptables -h’ or ‘iptables –help’ for more information.
# dpkg –list | grep cgroup
ii cgroupfs-mount 1.0 all Light-weight package to set up cgroupfs mounts
- Christian Pellegrin on 2014-09-16 13:11:04 +0100
It looks like the iptables you are using has not cgroup support compiled in. Just try:
/mnt/scratch/iptables/sbin/iptables -m cgroup --helpto check it.
- Kris on 2016-02-28 22:55:27 +0100
Hi,
Regarding “Couldn’t load match `cgroup’:No such file or directory”, compilign latests iptables (using this procedure: http://www.linuxfromscratch.org/blfs/view/cvs/postlfs/iptables.html ) and running this this solved it:
cgroupfs-mount
sudo “$IPT” (yes, run as root)But I’m facing some issues.
Particularly with VPN, I can’t bypass my VPN inteface tun0 with this technique.
Even with “ip route add … src ” the VPN tunnel is sort of bypassed (packets are send in clear) but they are sent with the tun0 source IP, never with eth0 source IP! So I never get my ping reply.Not using cgroups and adding static routes to the default table DOES work, but whenever using cgroups with the custom new_route table, everything that goes out from this table uses the VPN tun0 source IP… any clue?
- John on 2016-08-03 22:22:54 +0100
Hi there,
Thanks for taking the time to write this post. It helped me get to my ultimate goal of creating a script for my Debian 8 system to run an app/process under a different networking regime from the rest of the system.
My search led me to this post and also to this superuser page:
http://superuser.com/questions/271915/route-the-traffic-over-specific-interface-for-a-process-in-linux…and KrisWebDev’s very good bash script to automate the process.
Kris’ script had a use case that didn’t quite fit my needs, so I modified his script quite extensively to make it more generic. I posted a gist of it here in case you find it useful.
https://gist.github.com/level323/54a921216f0baaa163127d960bfebbf0
Cheers
- Fred Scott on 2017-11-18 07:37:19 +0100
Great article.
Works great when the processes are running on a physical linux host.However, if I have a docker container with 2 processes and want to do per process routing on packets from each process, it seems the classID flags are not set correctly when the packet hits the mangle table. More details on this and a kernel patch to fix this are at https://github.com/moby/moby/issues/19802 and https://lists.linuxfoundation.org/pipermail/containers/2014-January/033848.html respectively.
Has anyone successfully managed to do per process routing on two different processes in a docker container ?
- Alexander Martin on 2019-05-13 23:15:56 +0100
Hey thanks for the script!
It is working, however there appears to be an issue with ipv6. When I visit wtfismyip.com, the ipv4 address is correct, but somehow the ipv6 address of the other network card is being leaked.
My janky solution was to just turn off ipv6, but ideally I would prefer to not do this.
I tried to remedy this by mimicking your script with ip6tables, but unfortunately it seems like ipv6 has a different setup. Would be happy to compensate you for a fix. Hopefully this get to you, if so reach out at alexandermartin006 at that google service everyone uses.