A few months ago I was strolling in the Debian bug tracking system and found a curious bug filed against Arping, a program I maintain.
It said that unlike Arping 2.09, in Arping 2.11 the ARP cache was not updated after successful reply. I thought that was odd, since there's no code to touch the ARP cache, neither read nor write. Surely this behaviour hasn't changed?
I tried to reproduce the behaviour and sure enough, with Arping 2.09 the arp cache is updated, while with 2.11 it's not.
How could that be? I suspected that maybe the kernel saw the ARP reply, and snooped it into the ARP table. But I quickly confirmed that the packets going over the wire were the same for 2.09 and 2.11 (as they should be).$ arp -na | grep 192.168.0.123 $ # --- First try Arping 2.11 --- $ sudo ./arping-2.11 -c 1 192.168.0.123 ARPING 192.168.0.123 60 bytes from 00:22:33:44:55:66 (192.168.0.123): index=0 time=1.188 msec --- 192.168.0.123 statistics --- 1 packets transmitted, 1 packets received, 0% unanswered (0 extra) $ arp -na | grep 192.168.0.123 $ # --- Ok, that didn't change the ARP cache. Now try 2.09 --- $ sudo ./arping-2.09 -c 1 192.168.0.123 ARPING 192.168.0.123 60 bytes from 00:22:33:44:55:66 (192.168.0.123): index=0 time=794.888 usec --- 192.168.0.123 statistics --- 1 packets transmitted, 1 packets received, 0% unanswered (0 extra) $ arp -na | grep 192.168.0.123 ? (192.168.0.123) at 00:22:33:44:55:66 [ether] on wlan0
So what changed between 2.09 and 2.11?
Ugh. Before doing a bisection I skimmed through the descriptions. Most were comments, compile fixes and documentation. The only functionality changes were$ git log --pretty=oneline arping-2.09..arping-2.11 | wc -l 43
- Switching to
clock_gettime()
(various patches). Read gettimeofday() should never be used to measure time for why. - Switching
to
select()
frompoll()
- Adding
support to use
getifaddr()
to find the correct output interface
getifaddrs()
or some minor change that
shouldn't have mattered.
Between Arping 2.09 and 2.10 I changed the interface finding code from an ugly hack of running
/sbin/ip route get 1.1.1.1
to get the outgoing interface from the routing table. Since the output of the various "show me the routing table" commands
are different in different OSs, I had to implement this subprocess (ugly) and parsing (ugly) several times. The new
implementation uses getifaddrs()
to traverse the interfaces programmatically.
The old code was still there as a fallback. It would never actually get used unless there's a Linux
system out there that doesn't have getifaddrs()
. It seems
it
was added to glibc 2.3 back in 2002. Anyway it was trivial to temporarily switch interface selection
back to the old method. I confirmed that this was indeed what caused this change of behaviour.
Surely ip route get
doesn't send an ARP request and populates the ARP cache when it gets the reply? No.
So if ip route get 1.1.1.1
doesn't do it,
and arping-2.11 1.1.1.1
doesn't do it,
then surely ip route get 1.1.1.1 ; arping-2.11 1.1.1.1
doesn't do it?
Yes, yes it does. It seems ip route get 1.1.1.1
followed by
arping-2.11 1.1.1.1
will cause 1.1.1.1 to show
up in the ARP cache. And it doesn't matter if ip route get
is run as an ordinary user or as root! (arping of course
has to run as root or have NET_ADMIN capability).
Only the exact address given to ip route get
will be
"open to be filled" by the second command,
so it seems to be per address, and that ip route get
will modify state in the kernel.
$ arp -na | grep 192.168.0.123 $ sudo ./arping-2.11 -i wlan0 -q -c 1 192.168.0.123 $ arp -na | grep 192.168.0.123 $ # --- Ok, still no entry in the ARP cache Now try running both commands --- $ ip route get 192.168.0.123 ; sudo ./arping-2.11 -i wlan0 -q -c 1 192.168.0.123 192.168.0.123 dev wlan0 src 192.168.0.100 cache mtu 1500 advmss 1460 hoplimit 64 $ arp -na | grep 192.168.0.123 ? (192.168.0.123) at 00:22:33:44:55:66 [ether] on wlan0
I closed the bug since it's working as intended.
I have not dived into the kernel source to find the reason for this, but I may come back and update this post if and when I do.