Each original IP packet has a set of values (source address, destination address, protocol, identification, fragment offset, bit MF -more fragments-…).
In a flow of packets with certain values, the IP ID field will vary for each packet (usually increasing it in a unit), thus identifying univocally each one of the packets.
So, if we would be receiving a flow of packets (for example in a tcp connection) sent by a system with an implementation of the IP protocol like this one, we would see a continuous sequence of IP ID numbers like 44567, 44568, 44569…
Actually it’s very usual to find in computer networks devices such as NATs (Network Address Translators) and load-balancers. These devices are generically denominated “middle-boxes” and they mask by means of a network address an amount of different devices.
For example, behind this IP address: 128.0.0.1 (the visible IP to the outside) could be other devices behind and invisible to everybody. But although the existence of more than one system behind the same IP address is transparent to all the users, it’s possible to detect network configurations like the mentioned one through the “Identification” (ID) field.
Let’s go. Suppose that the system 128.0.0.1 has a load-balancer that distributes different requests to a “farm” of servers. Using the utility hping (http://www.hping.org) we could send an amount of connection requests (SYN segments) and observe the IP ID field of the obtained answers.
For example:
#hping2 -c 10 -i 1 -p 80 -S 128.0.0.1
HPING 128.0.0.1 (eth0 128.0.0.1): S set, 40 headers + 0 data bytes
46 bytes from 120.0.0.1: flags=SA seq=0 ttl=56 id=57645 win=16616 rtt=21.2 ms
46 bytes from 120.0.0.1: flags=SA seq=1 ttl=56 id=57650 win=16616 rtt=21.4 ms
46 bytes from 120.0.0.1: flags=SA seq=2 ttl=56 id=18574 win=0 rtt=21.3 ms
46 bytes from 120.0.0.1: flags=SA seq=3 ttl=56 id=18587 win=0 rtt=21.1 ms
46 bytes from 120.0.0.1: flags=SA seq=4 ttl=56 id=18588 win=0 rtt=21.2 ms
46 bytes from 120.0.0.1: flags=SA seq=5 ttl=56 id=57741 win=16616 rtt=21.2 ms
46 bytes from 120.0.0.1: flags=SA seq=6 ttl=56 id=18589 win=0 rtt=21.2 ms
46 bytes from 120.0.0.1: flags=SA seq=7 ttl=56 id=57742 win=16616 rtt=21.7 ms
46 bytes from 120.0.0.1: flags=SA seq=8 ttl=56 id=57743 win=16616 rtt=21.6 ms
46 bytes from 120.0.0.1: flags=SA seq=9 ttl=56 id=57744 win=16616 rtt=21.3 ms
— 128.0.0.1 hping statistic —
10 packets tranmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 21.1/21.3/21.7 ms
As you can see there are 2 numerical sequences to the IP ID field. This shows that the system 128.0.0.1 is in fact a load-balancer who distributes the received requests to 2 different systems, and these last ones are who really are in charge to process the information.
With this same technique it’s possible to detect, in many cases, the amount of existing systems behind other types of middle-boxes, such as NATs.
But not only that, you can also detect IP aliases (an “ip alias” is an ip who belongs to a network interface who it already has another address assigned). For example:
128.0.0.1, 172.0.0.1 and 170.210.17.150 could belong to the same network interface.
So, using hping like I used before we can find out (observing the IP ID field) if an IP is an alias of other different IP or not. We could send some requests to the first IP address and analyze the IP ID values, and then we could do the same with the other IP. If they follow the same pattern we will know that this address is an alias.
Well, not all systems follow this pattern increasing in a unit the transmitted packet. Some other systems use other ones, like maintaining it constant in some predetermined value, or they increase it in a certain amount. For example +256 for each packet transmitted to the flux of packets (for example in a tcp connection).
So, following the pattern’s variation of the IP ID field, you can determine the possible operative system. Although this information will contribute to the detection of that operative system, the result will be slightly precise.
Also with this technique you can find out if some machine is sending information to other machines and the amount of data sent…
Do you know how?
The solution for these problems to reveal such type of information consists in “randomize” the identification field of the packets sent. In the communication protocols we should make possible to randomize the maximum amount of values.
Systems like OpenBSD (http://www.openbsd.org) realize this operation. It can avoid some attacks or obtaining some extra information like the one I’ve posted in this article.