Increasing the transparency of the IoT with the SPIN PCAP reader

Insight into smart devices' live and historical network traffic

Concept of connectivity of Internet-of-things devices (IOT) on a network

Tuesday 23 February 2021
Article by: Caspar Schutijser

The Internet of Things (IoT) is becoming bigger every day. The promises are endless: the world supposedly will become safer and more efficient, amongst other things. However, there is a downside. Many IoT devices have security and privacy issues. Furthermore, it's often not very transparent to the user what an IoT device is doing in terms of network activity. SIDN Labs' SPIN project aims to give users greater insight into what their IoT devices are doing. In addition to increasing IoT transparency, SPIN enables users to exercise more control over their IoT devices and the data they share, for example by blocking their IoT devices from connecting to certain domains.

In this blog post, we will look more closely at the transparency aspect of SPIN. To that end, we explain how SPIN extracts information from observed network traffic, and how that is visually presented to the user. Additionally, this blog post introduces a new component called the PCAP reader, which has recently been added to SPIN.

SPIN: a brief introduction

SPIN stands for Security and Privacy for In-home Networks. SIDN Labs created SPIN to help address privacy and security issues related to the Internet of Things (IoT). For example, IoT devices continue to be recruited to botnets that can be used for DDoS attacks. Furthermore, IoT devices often do not respect the user's privacy, something which is not always clear to the end user. That's why we think it is necessary to enable users to gain insight into what their IoT devices are doing (e.g. in terms of what types of data their IoT devices are collecting about them and where they are sending it), and to allow users to protect their home networks.

With SPIN, a user experiences greater transparency. For example, it is possible to inspect live network traffic that is generated by IoT devices. Additionally, the user has more control over their IoT devices with SPIN, since SPIN enables them to 'interfere' with the IoT device's network traffic, for example by blocking a domain name.

Figure 1: A schematic overview of a typical SPIN deployment.

SPIN is typically deployed by installing it on a home router, as shown in Figure 1. SPIN is then able to see (and, if necessary, manipulate) all the IoT devices’ network traffic. In technical terms, a process called the SPIN daemon that runs on the home router is responsible for analysing the network traffic. The SPIN daemon does so by extracting various types of traffic measurement-related information from the operating system kernel, such as the domain name that an IoT device connects to. Additionally, the SPIN daemon can instruct the kernel to block certain traffic flows. In the next part of this blog post, we look more closely at what types of information SPIN analyses, and how users can use that information.

Analysing network traffic

In this part of the blog post, we discuss the kinds of information SPIN collects to gain insight into what a device is doing. To that end, we will present an example and explain what the user sees when using the SPIN web interface. The screenshot below shows a couple of devices’ network traffic visualised by SPIN.

Figure 2: Screenshot of the SPIN web interface. The traffic of a couple of devices is shown.

Local devices vs. remote hosts

The first thing we observe when looking at the SPIN web interface (see Figure 2) is various nodes and arrows. The nodes represent hosts, while an arrow represents traffic between two hosts. In the picture, there are a few grey nodes as well as a few blue and green nodes. If a node is grey, it indicates that the host is a device that resides on the local network, whereas the other nodes are remote hosts that the device connects to. It is useful to make this distinction because some of SPIN's actions can only be applied to local devices. For example, downloading a PCAP file of the device's traffic is possible only where local devices are concerned.

Luckily, it is not hard to determine whether a host that we are talking to is a local device or not. Hosts in an IPv4 network use the Address Resolution Protocol (ARP) to determine the MAC addresses of hosts that reside in the same subnet. The results of this mapping are stored in the ARP table. By inspecting the addresses in the ARP table, we are able to determine which ones belong to local devices. For hosts connected via IPv6, we use ARP's counterpart, the Neighbor Discovery Protocol (NDP).

Who are devices talking to?

Going back to the SPIN web interface (see Figure 2), let us turn our attention to the arrows between the nodes. They indicate traffic flows between the nodes (hosts). SPIN uses flow records to represent the traffic flows. A flow record typically includes the source and destination IP addresses, the transport layer protocol and the transport layer port numbers, if applicable. Additionally, such records include counters that represent the numbers of packets and bytes that were observed. As such, they provide structured summaries of observed network traffic.

Resolving names to IP addresses

Hosts on the Internet address each other using IP addresses. However, IP addresses are not very convenient for humans to work with. Also, in practice, a connection to an IP address is usually preceded by a DNS lookup to obtain the relevant IP address. Therefore, we also look at DNS traffic to allow us to map an IP address to the domain name that was used to look up that address. For example, the screenshot (see Figure 2) shows that the receiver connects to youtu.be. Under the hood, youtu.be resolves to many IP addresses, e.g. 2a00:1450:400e:804::200e.

Live capture of network traffic

Now that we know what types of information we are interested in when we are analysing traffic, let's take a look at how SPIN actually collects that information for live network traffic analysis.

Handling the ARP and NDP tables is straightforward: the SPIN daemon periodically scans the ARP and NDP tables for new entries to see whether any new devices have appeared on the network. SPIN records details of all devices in a table-like structure, and queries the table whenever it needs to know whether a new device has appeared.

SPIN collects the flow records using the Netfilter Connection Tracking System in the Linux kernel. As implied by the name, this system keeps track of traffic flowing through the router. Information about the associated connections (e.g. the numbers of bytes and packets observed) can be exported from the kernel to the user space. This information already has a flow record-like structure, which we can easily use for our purposes.

To make sure we are able to analyse DNS traffic, we instruct Netfilter to log packets on port 53. Packets logged by Netfilter can be inspected by userspace programs. Using this feature, we are able to direct DNS traffic to the SPIN daemon, which allows us to analyse the contents of DNS packets and take any appropriate action.

Inspecting recorded network traffic

Until recently, SPIN only supported live network traffic capture. However, it is also useful to be able to inspect network traffic from the past. That can be useful when a user wants to investigate recorded network traffic provided by a third party, for example. To make that possible, network traffic can be recorded and stored in PCAP files. The PCAP file format is a de facto standard; many popular network traffic analysis tools (e.g. tcpdump and Wireshark) are able to read and/or write PCAP files. We would like to be able to use SPIN to analyse network traffic recorded in PCAP files as well.

In order to support the analysis of recorded network traffic, we have made some changes to SPIN. The most important change to the SPIN daemon is the addition of a second way for SPIN to learn about network traffic. In practical terms, what we have done is add an interface to the SPIN daemon to supply the three types of information that we talked about earlier: changes in the ARP and NDP tables, flow records and DNS packets. The PCAP reader extracts that information from PCAP files and submits it to the SPIN daemon (using the new interface).

A new PCAP reader

The PCAP reader is a new program in the SPIN suite. It reads a PCAP file, extracts the three types of information (changes in ARP and NDP tables, flow records and DNS packets) from the packets contained in the PCAP file, and submits that information to the SPIN daemon at the recorded speed. The PCAP reader is a separate program; it supplies the relevant information to a SPIN daemon that is already running, through UNIX domain sockets. Let’s now look at how the PCAP reader extracts the information from a PCAP file.

Extracting the flow records is straightforward: for each packet contained in the PCAP file, we can extract the relevant information (IP addresses, port numbers, etc), construct a flow record, and send it off to the SPIN daemon. Currently, the PCAP reader sends a “digest” to the SPIN daemon for every packet. We are considering ways to make that process more efficient; we will elaborate in the final part of the blog.

When it comes to DNS packets, the approach taken by the PCAP reader and the SPIN daemon is very similar. The PCAP reader analyses packets on port 53 and communicates any name-to-IP-address mapping that is found to the SPIN daemon.

Emulating the ARP and NDP tables is a little more difficult. While reading a PCAP file, we don't have access to the ARP and NDP tables of the machine that the traffic was captured on. Also, we cannot infer from the PCAP file what the network configuration of that machine was, for example. There are multiple ways to solve that problem, each with its own advantages and disadvantages.

The first option is the method currently implemented in the PCAP reader. This approach involves looking for ARP and NDP replies. Some background: an ARP reply is sent in response to an ARP request. A device can broadcast an ARP request if the requester wants to know what MAC address is used by the IP address specified in the request. An ARP reply contains the answer to that question. When an ARP reply is observed in the PCAP file, we know that the IP address contained in the reply belongs to a device that is present on the local network. The PCAP reader then relays that information to the SPIN daemon. An advantage of this approach is that it is easy to implement. A disadvantage is that during the replay of a PCAP file, we do not always know upfront that an observed IP address belongs to a local device; we can only know that once an ARP reply is also observed.

An alternative method for inspecting ARP and NDP packets would be to process the PCAP file in two passes. In the first pass, we could scan the entire PCAP file for ARP and NDP replies and note what addresses are involved. Then, in the second pass, we could go through the PCAP file again and actually process all the packets that we observe. Since we would know from the first pass which IP addresses belonged to local devices, we would be able to submit that information to the SPIN daemon before the local device actually started transmitting packets. This method would provide a more accurate view of the network traffic to the SPIN daemon than the method currently implemented. Therefore we are planning to implement this method too, but we may not enable it by default until we have weighed up all the implications of always processing the PCAP file in two passes.

Finally, scanning the PCAP file for ARP and NDP packets would not be necessary if SPIN were aware of what IP ranges belonged to the local network. The easiest way to implement that would be to have the user supply the information when the PCAP file is replayed to SPIN. However, that would require more network-specific information from the user. SPIN might also be able to derive the relevant information from network configuration files. Of course, such an implementation would be very platform-specific. Neither option seems ideal at first sight; we are therefore still considering how best to solve the problem in a user-friendly manner.

Currently, the PCAP reader submits one message for each type of information that the PCAP reader encounters.

Changes to the SPIN daemon

That, then, is how the PCAP reader obtains the three types of information (changes in ARP and NDP tables, flow records and DNS packets). However, we still need a way to feed that information to the SPIN daemon, since SPIN has so far worked only with live network traffic. That is where a newly developed facility named external source (extsrc) comes in. extsrc opens a UNIX domain socket on startup. The PCAP reader (and other programs) can submit the three types of information on this UNIX domain socket to inform the SPIN daemon about various types of network activity. extsrc will then inject those messages into the system.

Figure 3: Diagram showing how the components of the SPIN suite work together.

The diagram in Figure 3 shows how all the components come together. The upper right part of the diagram shows the information exchange between the kernel and the SPIN daemon process. That already existed before the work described in this blog post. The rest of the diagram shows the new components introduced in this blog post (highlighted in blue): the UNIX domain socket of the new extsrc facility, and the PCAP reader that uses that socket to share information with the SPIN daemon.

Conclusion and future work

In this blog post, we have given an overview of how SPIN analyses network traffic. Additionally, this blog introduces a way for SPIN to analyse previously recorded network traffic, which is a new feature that we recently added to the SPIN open-source software. Together, SPIN's live traffic capture facilities and the PCAP reader increase the transparency of the IoT for end users.

We already have some ideas for future improvements to the PCAP reader. For instance, if we add support for running the PCAP reader on a different host from the SPIN daemon’s host, we can separate the collection of live network traffic (done by the PCAP reader) from the analysis of network traffic (which is still done by the SPIN daemon). That would allow us to run many collection nodes within a network and then run the analysis on a central node. Right now, the PCAP reader is already capable of listening to a network interface and reporting what it observes there to the SPIN daemon. The piece that's missing for this scenario is that the PCAP reader should be able to connect to a running SPIN daemon over the network. That's one of the items on our wish list.

Another potential improvement, which would be easier to implement, would be to control the speed at which the PCAP reader replays traffic. Right now, the PCAP reader either replays the packets at the recorded speed, or it replays packets as soon as possible. New options could include replaying packets at twice the recorded speed, for example. We also plan to work on optimising the PCAP reader. As discussed earlier, the PCAP reader sends a message to the SPIN daemon for every packet. At the cost of some complexity in the PCAP reader, we can significantly reduce the number of messages that need to be exchanged between the PCAP reader and the SPIN daemon. For instance, the PCAP reader could submit an update every second instead of sending information about network traffic right away. That would allow us to send one message per second, instead of one message per packet.

The PCAP reader is included in SPIN version 0.12, so be sure to check that out if you're interested. We look forward to your feedback! Please let us know if you make use of our work, or if you have suggestions or comments.

Article by:

Caspar Schutijser

Research engineer