DNS vulnerability, configuration errors can cause DDoS

Current RFCs do not fully cover the problem

Cyber security concept consisting of multiple blue closed digital padlocks and 1 red open padlock

Authors: Giovane C. M. Moura (1,2), Sebastian Castro (3), John Heidemann (4), Wes Hardaker (4) 1: SIDN Labs 2: TU Delft 3: IE Domain Registry 4: USC/ISI

Last year we discovered a DNS vulnerability that, combined with a configuration error, can lead to massive DNS traffic surges. Since then, we've studied it carefully, carried out responsible disclosure, helped large operators such as Google and Cisco to fix their services, and submitted an Internet Draft to the IETF DNS working group to mitigate it.

The DNS is one of the internet's core services. Every web page visit requires multiple DNS queries and responses. When it breaks, the DNS ends up on the front pagesLink opens in new tab of prominent news sites. My colleagues and I at SIDN Labs, TU Delft, IE Domain Registry and USC/ISI recently came across a DNS vulnerability that can be exploited to create traffic surges on authoritative DNS servers – the ones that know the contents of entire DNS zones such as .org and .com. We first came across it when comparing traffic from the Netherlands' .nl zone, and New Zealand's .nz authoritative servers, for a study of internet centralisation. Digging into what we saw, we recognised that a known problem was actually much more serious than anticipated. The vulnerability, which we named tsuNAME, is caused by loops in DNS zones – known as cyclic dependencies – and by bogus resolvers/clients/forwarders. We scrutinise it in this research paperLink opens in new tab. Importantly, current RFCs do not fully cover the problem. They prevent resolvers from looping, but that does not prevent clients and forwarders from starting looping, which will then cause the resolver to flood the authoritative servers. We have therefore submitted an Internet DraftLink opens in new tab to the IETF DNS working group describing how to fix the problem, and developed CycleHunterLink opens in new tab, a tool that can prevent such attacks. We’ve also carried out responsible disclosureLink opens in new tab, working with Google and Cisco engineers who mitigated the issue on their public DNS servers. Below is a brief summation of tsuNAME and our research thus far.

What causes the tsuNAME vulnerability?

For this vulnerability to arise, we first need two DNS records to be misconfigured in a loop, such as:

#.com zone
example.com NS cat.example.nl

#.nl zone:
example.nl NS dog.example.com

In that example, a DNS resolver that attempts to resolve example.com will be forwarded to the .nl authoritative servers in order to resolve example.nl. From those servers, it will learn that the authoritative server for example.nl is dog.example.com. That causes a loop, so no domains under example.com or example.nl can be resolved. Why is that even a problem? Well, when a user queries the name, their resolvers will follow the loop, amplifying one query into several. Some parts of the DNS resolving infrastructure – which covers clients, resolvers, forwarders, and stub resolvers – cannot really cope with that scenario. The result may look like Figure 1, which shows an example of two domains in the .nz zone, which had barely any traffic, then started to receive more than 100m queries a day, each.

Daily queries to two domains under the .nz zone affected by a tsuNAME event.

Figure 1: .nz event.

In that example, .nz's operators fixed the loop in their zone, and the problem was resolved. However, until the loop was fixed, total .nz traffic increased by 50% (Figure 2), just because of two bogus domain names.

Daily queries to all .nz traffic during a tsuNAME event.

Figure 2: Daily queries to .nz . Now, what happens if an attacker holds many domain names and misconfigures them with such loops? Or if an attacker makes many requests? In those circumstances, a few queries can be amplified, perhaps overwhelming the authoritative servers. That represents a new amplification threat.

So why does this traffic surge occur?

We found three root causes of tsuNAME surges:

  1. Old resolvers (such as MS Windows 2008 DNS server) that loop indefinitely

  2. Clients/forwarders that loop indefinitely – they are not happy with SERVFAIL responses returned by resolvers (Figure 3)

  3. Both of the above in combination

 Root causes of tsuNAME events.

Figure 3: Root name causes of tsuNAME.

That may all seem rather obvious. However, as mentioned earlier, it has not been considered at an IETF level, as the following relevant RFCs illustrate:

As you can see, current RFCs prevent only resolvers from looping. But what happens if the loop occurs elsewhere? Imagine a client looping, sending new queries every second to the resolver, after receiving SERVFAIL responses from the resolver (Fig. 3). By following the current RFCs, every new query would trigger a new set of queries from the resolvers to the authoritative servers, even if they are limited. That is what Google’s Public DNS experienced in February 2020. In our previous study of internet centralisation, we found that Google’s Public DNS sent far more A/AAAA queries to .nz than to .nl. Google was following the RFCs, and yet looping clients caused GDNS to send large volumes of queries to the .nz authoritative servers.

Fixing the tsuNAME problem

The way to fix tsuNAME depends on your location on the infrastructure. If you are a resolver operator, the solution to the problem is simple: resolvers should cache these looping records, so all new queries sent by clients/forwarders and stubs can be answered directly from the cache, protecting the authoritative servers. That is what we have proposed in this IETF draftLink opens in new tab, and that is how Google Public DNS fixed it. If you are an authoritative server operator, then you need to make sure there are no misconfigurations in your zone files. To help with that, we have developed CycleHunterLink opens in new tab, an open-source tool that can be used to detect such loops in DNS zones. We suggest operators regularly check their zones for loops.

Responsible disclosure

We were able to reproduce tsuNAME in various scenarios, from various vantage points – including RIPE Atlas and a sinkhole – you can check our paper for details. We found more than 3,600 resolvers in 2,600 autonomous systems that were vulnerable to tsuNAME. We therefore decided to carry out a responsible disclosureLink opens in new tab to notify operators. We notified Google and Cisco and multiple other parties in early 2021, and on 6 May we publicly disclosed the vulnerability – having given the parties enough time to fix their bugs, which they had. After the disclosure, several TLD operators approached us and said they had been victims of tsuNAME-related events in the past. One of them was kind enough to share a time series of their traffic (Figure 4). The party in question is an EU-based ccTLD operator, which saw ten times more traffic during a tsuNAME event:

The traffic during a tsuNAME event with an EU-based ccTLD operator.

Figure 4: tsuNAME event affecting an EU-based ccTLD operator. Each coloured band corresponds to a different authoritative server. The event starts around 19:00 UTC and ends at 11:00 UTC the next day, when the cyclic dependencies were removed from the zone.

What’s next

We’ve shown that, despite being something people have known about for a long time, loops in DNS zones are still a problem. A problem that current RFCs do not fully address. We have also scrutinised tsuNAME, a type of event seen previously by many operators, but not known to the public. We have shown how it can be weaponised and what its root causes are. We will continue to work on a revised version of our Internet Draft, which incorporates feedback from the community. Regardless of what happens to the draft, we are happy to have helped Google and Cisco to protect their public DNS servers against this vulnerability, thus making a small contribution to internet security.


Further reading