Thesis: using IoT honeypots for automatic detection of attacks on IoT devices

A deep dive into the world of virtualised IoT devices

Close-up of a honey jar with a wooden honey spoon

The original blog is in Dutch. This is the English translation.

Honeypots are widely used to study attacks on Internet of Things (IoT) devices. For my MSc thesis, I developed and evaluated an extension to the conventional honeypot, which enables automated tracing of execution paths leading to previously undetected vulnerabilities in IoT firmware. The honeypot then emulates the IoT firmware and monitors the vulnerable execution paths to see whether attackers are in fact utilising them. My research has shown that it's possible to run 'execution path-aware' IoT honeypots on an automated basis, generating information to assist security researchers. However, the components on which such honeypots are based remain at an early stage of development, meaning that the honeypots are not yet mature enough for use in a production environment.

IoT honeypots in a nutshell

Traditional IoT honeypots, such as IoTPot, are machines with special software that makes them seem to be real IoT devices with security vulnerabilities. A honeypot might disguise itself as a smart lightbulb with the manufacturer's default password still active, for example. The honeypot is connected to the internet and simply waits for attackers to try to exploit its vulnerabilities. When that happens, the honeypot gathers data about the attack and the malware that's installed. That enables security researchers to build attack profiles, detailing the behaviour of botnets such as Mirai and Hajime. Unfortunately, conventional IoT honeypots don't actually behave in quite the same way as real-world devices. So attackers may realise that they're dealing with a honeypot, not an actual device. The problem is that honeypots only mimic the external behaviour of an IoT device; they don't run the same firmware as that installed on a real device. Newer IoT honeypots, such as Honware, are better: they do emulate the firmware of IoT devices, in much the same way as virtual machines (VMs). As a result, their behaviour is much more like that of a real device. Attackers therefore persist with their attempts to compromise the 'device', giving security researchers more opportunity to observe their behaviour and thus identify previously unknown vulnerabilities.

My research: EPA IoT honeypots

The aim of my work was to investigate what is required to extend the new generation of IoT honeypots so that they (1) automatically detect paths in IoT firmware that lead to vulnerable parts of the code, and (2) monitor those paths to see whether attackers are actively abusing them. In my thesis, I refer to honeypots with those functionalities as 'execution path-aware' IoT honeypots, or EPA IoT honeypots for short. The added value of EPA IoT honeypots is that they are able to automatically detect attacks that exploit previously unknown vulnerabilities, thus improving security researchers' understanding of vulnerabilities and their abuse. The approach I adopted involved a literature study followed by a prototype implementation to test the feasibility of the concept. My research methodology was based on a combination of static analysis tools and firmware emulators. The static analysis tools were used to identify vulnerabilities and exploitable execution paths in IoT software. The firmware emulators enabled me to visualise the software running on the mimicked devices. Combining the two opened the way for automated identification of firmware vulnerabilities (using analysis tools), followed by vulnerable path monitoring for malicious exploitation (using a firmware emulator). In the remainder of this blog, I will discuss the main components of an EPA IoT honeypot, and how I implemented and evaluated my prototype.

Architecture of an EPA IoT honeypot

Figure 1 shows the architecture of an EPA IoT honeypot, consisting of three components:

  • Firmware Analyser: detects vulnerabilities in IoT firmware. Each time a vulnerability is detected, the Firmware Analyser collects full details of the execution paths in the firmware that lead to the vulnerability (vulnerable paths), including the vulnerability location data.

  • Honeypot: emulates the firmware on an IoT device. From the output of the GNU Debugger (GDB) and the Firmware Analyser, the EPA IoT honeypot can tell what commands (inputs) an attacker needs to send to the honeypot in order to exploit the execution paths leading to the vulnerability. That knowledge can then be used to detect actual attacks.

  • Input Analyser: reconstructs the symbolic formula of the vulnerability from details of the vulnerable paths detected by the Firmware Analyser and analyses the commands sent to the honeypot over the internet by real attackers. It is then possible to ascertain from the collected inputs whether the attacker has succeeded in reaching and abusing a vulnerable function.

Schematic representation of the architecture of an EPA IoT honeypot

Figure 1: Architecture of an EPA IoT honeypot.

Our implementation utilised two tools. For the Firmware Analyser, we used Karonte, albeit heavily modified to make it suitable for our scenario. Karonte uses static analysis and symbolic execution to identify vulnerable paths in firmware, which depend on attackers' inputs (commands). For the honeypot, we used FirmAE: a recent open-source IoT firmware emulator capable of emulating far more firmware images than its predecessor FirmaDyne. Each of the EPA IoT honeypot's three components are considered in more detail below.

Firmware Analyser

Our focus was on 'taint-style vulnerabilities': vulnerabilities that depend on the data that an attacker propagates to a vulnerable function in the honeypot firmware. An example would be a program that asks the user to enter a password. From then on, the variable in which the program saves the password is tainted. Meaning that all subsequent interactions with the variable are suspect and require review. If the program later fails to check the input, a vulnerability can arise. The mechanism can be compared to a person getting dirt on their hands. Once their hands are dirty, everything they touch – door handles, remote controls, etc – is made dirty as well. Often, there are no significant adverse consequences. But problems can arise if the person touches some food (the vulnerability) without first washing their hands.

Figure 2 shows the main Karonte modules we used for the Firmware Analyser: Border Binaries Finder, Binary Dependency Graph, Bug Finder and Sources Parser.

Schematic representation of the Karonte modules used for the Firmware Analyzer

Figure 2: Karonte modules used for the Firmware Analyser.

A binary of a firmware image has numerous possible execution paths. Because we are interested specifically in taint-style vulnerabilities, the Firmware Analyser begins by running the Border Binaries Finder, a module that seeks to identify all the binaries in a firmware image. We can then produce a graph of all the data flows through the IoT firmware. The Border Binaries Finder assigns a score to each binary on the basis of the presence of 'data keys'. A data key is a string referring to a network interaction, e.g. 'http_referer' or 'content_type' by means of which input from an attacker can potentially penetrate the firmware. The binaries in the cluster with the highest score are known as border binaries. In the next step, the border binaries are used as input for the Binary Dependency Graph, a Karonte component that detects patterns of communication between binaries and visualises them in graphical form. The Firmware Analyser starts with the data keys in each border binary and performs a symbolic execution of the functions in which data keys occurred. The symbolic execution generates formulas from which it is possible to determine the conditions under which an execution path can access a given function, e.g. by inputting certain characters or certain string lengths. To do so, the Analyser uses the Communication Paradigm Finder (CPF), which detects whether the binaries exchange network data. With Karonte, we can now detect five types of user input, including inbound network traffic. If the CPF observes communication between binaries, it creates a 'role' for a binary, containing information about the communication point in the firmware, whether the function is a 'setter' or a 'getter' (whether information is sent or received) and which data key is used. The CPF then looks for other binaries that use the same data key and adds them to the list of binaries requiring analysis. Finally, the Firmware Analyser is able to generate a binary dependency graph on the basis of the detected roles. The component that ultimately traces the vulnerabilities is the Bug Finder. The module performs a symbolic execution using each of the roles in the binary dependency graph. The Bug Finder detects vulnerabilities in various ways. For example, a Buffer Size Detection Analysis Module maps the amount of space allocated to buffers on the stack and the heap. Taint analysis is also performed. Whenever the Firmware Analyser detects a vulnerability, it records full details of the associated possible execution paths. That enables the honeypot to subsequently detect whether a malicious input is able to reach and exploit the relevant vulnerability. Finally, the Sources Parser converts the inputs into a format that is readable for the honeypot and maps all the symbolic links in the firmware image. For my research I modified Karonte extensively. My modifications included a complete rewrite of the Bug Finder, with a view to achieving more accurate taint propagation. I also rewrote the Buffer Size Detection Analysis Module and upgraded the tool to Python3. The modifications have since been merged into Karonte, so that others can make use of them.

Honeypot

Having identified the vulnerabilities, the honeypot can be set up to automatically detect attacks that seek to exploit them. The open-source emulator FirmAE (see Figure 3) provided us with full access to the emulated firmware. Using the Custom FirmAE Communicator, the honeypot collected malicious input commands sent to the honeypot over the internet.

Schematic representation of the honeypot components of an EPA-IoT honeypot

Figure 3: Components of an EPA IoT honeypot.

The honeypot collected the input commands using custom debugging scripts automatically generated from the results of the Firmware Analyser (input sources). Finally, the honeypot operator has the opportunity to add manual detection rules, such as 'there is a vulnerability in this program at location x.' Manual rule definition support is useful for vulnerabilities for which there is no existing Intrusion Detection System signature. For example, we defined a manual rule for CVE-2021-29302 (a CVE is a referral to a vulnerability databank).

Input Analyser

The Input Analyser examines the malicious input commands sent to the honeypot over the internet, and determines whether each can access and exploit a vulnerability. That process involves four steps: (1) reconstructing the symbolic formula, (2) marking the user input, (3) determining whether the input is able to access the path to the vulnerability, and (4) checking whether the input actually exploits the vulnerability. In order to establish whether an input is able to access a vulnerable function, we first have to reconstruct the symbolic formula. That involves linking together the collected 'constraints' in the path to the vulnerable function. An example of a constraint is an 'if' statement in an execution path, where an input has to satisfy the 'if' statement conditions in order to trigger execution of the associated code. In the Firmware Analyser, the location where the input was collected was a symbolic variable. In order to determine whether the relevant path is accessible, we add an extra line to match the variable to the collected input to the honeypot (constraint). Once the symbolic formula has been defined and constrained, we can establish whether the path is accessible by resolving the formula with a constraint solver. We opted to use Z3 for that purpose. If the solver says that a solution is possible, the path is accessible and all that remains to be done is to verify that the input is able to exploit the vulnerability. We do that using a sink rule. For each type of vulnerable function, we define a rule which can be used to verify whether a code path is able to exploit a vulnerability. For example, we devised the following sink rule for memcpy functions:

taint & src > dst

In that example, an attacker can exploit the relevant vulnerability if the inputs have reached the vulnerable function and the collected input for the src is greater than the space allocated in the dst.

Evaluation

In order to evaluate the EPA IoT honeypot illustrated in Figure 1, we connected it to the internet. However, when selecting a firmware image, we discovered that FirmAE's emulation ratio wasn't quite as good as its authors claimed. For example, many of the images cannot complete the setup phase due to missing dependencies, while others were unable to emulate all services. We were most successful with firmware images of TP-Link and D-Link devices, which FirmAE appeared able to emulate with no great difficulty. We ultimately decided to use a firmware image of the TP-Link Archer C3200 V1, because it had a vulnerability that we had identified with the Firmware Analyser and a vulnerability that we could use as a custom debugging rule. We found that our EPA IoT honeypot was indeed effective in detecting attacks. That was established by performing proof-of-concept attacks targeting the identified vulnerabilities. The attacks were designed specifically to try to exploit the relevant vulnerabilities. An example of such an attack is available here. A proof-of-concept attack involves the use of code designed to demonstrate how a given vulnerability can be exploited. We additionally used variants of our proof-of-concept attacks to investigate the possibility of generating false positives. The sink rules proved to be very reliable for the detection of vulnerabilities capable of triggering vulnerable functions. In some cases, Karonte may not yet support a function in an execution path, making false positives possible. In practice, however, it is almost impossible to abuse the vulnerability we identified, because the vulnerable module is not accessible via the web interface. Nevertheless, by using the custom debugging rules, we were able to successfully identify attacks aimed at another vulnerability in the TP-Link Archer C3200 V1, namely CVE-2021-29302.

Conclusions

In my research, I used my prototype EPA IoT honeypot to demonstrate that such honeypots are potentially able to detect attacks aimed at previously unknown vulnerabilities, such as CVE-2021-29302. I was also able to demonstrate the practical viability of EPA IoT honeypot implementation, which to the best of my knowledge had not previously been achieved. Nevertheless, I believe that a lot of work remains to be done in the field of EPA IoT honeypots. For example, we were able to successfully emulate only a small number of vulnerable firmware images, meaning that the system is unfortunately not yet suitable for practical use. By contrast, the custom detection rules are independent of the vulnerabilities detected by the Firmware Analyser and therefore more suitable for use with other IoT device types. Those areas therefore represent fertile ground for further research.

Want to know more? It's open source!

Full details of my research are available from the SIDN Labs website. Furthermore, all the code I used is available on an open-source basis (FirmAE, Static analysis, Honeypot).

Interested in doing your thesis at SIDN Labs?

The work described in this blog formed my MSc thesis research project, which I carried out as an intern at SIDN Labs. If you're interested in doing a thesis on IoT honeypots or another internet security topic, by all means get in touch.