Fragmentation, truncation, and timeouts: are large DNS messages falling to bits?

Our analysis based on 164 billion DNS queries

The letters DNS on colored wooden cubes on a wooden background

The Domain Name System (DNS) provides one of the core services of the Internet. DNS employs both UDP and TCP as transport protocol, and most responses are sent over UDP given it is fast (1 RTT). UDP, however, is not always suitable to deliver large DNS responses: packets can be dropped or fragmented, and as such, there is a risk that clients will not receive the answers, which can lead to unreachability. To determine how serious is the problem of large messages in DNS, we analyze 164 billion DNS queries/responses collected at the authoritative servers of the Netherlands’ .nl ccTLD – covering three full months of data (July 2019, July 2020, and October 2020). We present in this blog the main results of a paper we published at the Passive and Active Measurements Conference (PAM2021).

DNS delays

Nobody really likes to wait for a page to be loaded on the Internet. And the DNS can be one of the reasons for slow page load times. Domain names need to be resolved before pages can be loaded. Faster responses are obtained with DNS over UDP (DNS/UDP), which require one round-trip time (RTT). However, given UDP by design provides no delivery guarantees, DNS also can be used with TCP (DNS/TCP), which takes 2 RTT to retrieve the same responses (TCP requires an extra RTT due to its handshake).

DNS/UDP vs DNS/TCP

DNS/UDP is faster than DNS/TCP, but it has a problem: it has a tough time handling large messages: the original DNS specificationLink opens in new tab limited UDP messages in 512 bytes. Well, that was not enough for many cases, so in 1999 EDNS0 was proposedLink opens in new tab, allowing the extension of UDP message sizes up to 64k bytes. With EDNS0, DNS clients (resolvers) can advertise their UDP buffer to the authoritative servers, which would use that value as an upper limit when sending responses. If, however, a response was larger than the EDNS0 buffer advertised by the client, then the authoritative server would truncate it and mark it (TC bit), so the resolver would use that signal to request the query again, but then using DNS/TCP.

Silently discarded

The issue was that all of this was done at the application layer, which is agnostic to the networking layer. In other words, these buffer negotiations did not consider the maximum transmission unit (MTU) of the path between client and authoritative server – and the most common MTU on the core of the Internet is 1500 bytes. If DNS responses were larger than the path MTU, then these packets would be simply fragmented or discarded along the way. And IPv4 fragmentation is so poorly designed that it is nowadays considered fragile and should be avoidedLink opens in new tab. The worst case is when responses are silently discarded, and clients never receive a DNS response, which blocks them effectively from reaching their desired URL.

164 billion queries

While several other worksLink opens in new tab investigated this issue, we take a different vantage point from previous works: two anycast authoritative servers of the Netherlands's ccTLD (.nl). We analyze 164 billion queries, collected with our DNS big data analysis tool ENTRADALink opens in new tab, as shown in the table below:

July 219

July 2020

October 2020

IPv4

IPv6

IPv4

IPv6

IPv4

Queries/responses

29.79B

7.80B

45.38B

15.87B

48.58B

UDP

28.68B

7.54B

43.75B

15.01B

46.94B

UDP TC off

27.80B

7.24B

42.06B

13.88B

45.49B

UDP TC on

0.87B

0.31B

1.69B

1.14B

1.44B

Ratio (%)

2.93%

3.91%

3.72%

7.15%

2.96%

TCP

1.11B

0.25B

1.36B

0.85B

0.36B

Ratio (%)

3.72%

3.32%

3.59%

5.37%

3.17%

Resolvers

UDP TC off

3.09M

0.35M

2.99M

0.67M

3.12M

UDP TC on

0.61M

0.08M

0.85M

0.12M

0.87M

TCP

0.61M

0.08M

0.83M

0.12M

0.87M

ASes

UDP TC off

44.8k

8.3k

45.6k

8.5k

46.4k

UDP TC on

23.3k

8.3k

27.6k

5.4k

28.2k

TCP

23.5k

4.3k

27.3k

5.2k

27.9k

Table 1: Datasets from the .nl zone.

Datasets

We collect data from two .nl anycast authoritative servers (NS1 and NS3, run by two different anycast providers), and we show them combined in Table 1. We take yearly snapshots (2019 and 2020, July) and October 2020 – the first month after the DNS Flag Day 2020.

At our vantage points, we see that a small fraction of responses is truncated: 2.93 percent to 7.15 percent, depending on the year and IP version. This is the start point of our analysis.

Finding #1: Large responses are rare

The first analysis we do is to calculate the distribution of the response sizes our servers sent. We see in Figure 1 that 99.99% of the responses from the .nl servers are smaller than 1232 bytes (vertical dashed line), which is the size proposed by the DNS Flag Day 2020Link opens in new tab. One could say "well, that's only valid for the .nl zone". But Google Public DNS, the largest public resolver service on the Internet, reports that 99.7% of their traffic is also smaller than 1232 bytesLink opens in new tab.

Graphs showing response sizes by server/IP version for July 2020.

Figure 1: response sizes per server/IP version for July 2020.

Contrary to what we expected, the largest responses are for A and AAAA records of the .nl authoritative servers – and not DNSSEC records. And the size of the responses changed per server: NS1 is configured to return minimal responsesLink opens in new tab, while NS3 is not. Thus, minimum responses effectively prevent extra records to be added in the additional section, reducing the message response size.

Finding #2: fragmentation rarely occurs on the server side

IP fragmentation can happen on the server side, and along the way (only for IPv4, IPv6 forbids in-network fragmentation). We analyze, per server and per IP-version, the number of fragmented responses sent by our servers. Figure 2 shows the results. Very few responses are fragmented: less than 10k a day. We see 1—2 billion daily queries in total in comparison (Table 1). We show in the paper an active measurement with Ripe Atlas to measure in-network fragmentation. We found that 4.4% of queries are fragmented at the network level in the wild over IPv4.

Graph showing the number of fragmented UDP queries for authoritative .nl servers.

Figure 2: UDP fragmented queries for .nl authoritative servers.

Finding #3: Small EDNS0 buffers lead to truncation, larger ones don't prevent it

We see in Table 1 that 2.93 to 7.15% of the UDP responses are truncated. Now we investigate why. Figure 3 shows the CDF of both response sizes and EDNS0 buffer sizes for NS1. We see that most DNS/UDP queries are truncated to values under 512 bytes, independent of the IP version.

NS1: CDF of DNS/UDP TC responses for .nl: July 2020

Figure 3: NS1: CDF of DNS/UDP TC responses for .nl: July 2020.

We also see that most buffer sizes are equal to 512 bytes (left dashed vertical line), which is rather small. Oddly, we see from the purple line for IPv4 that NS1 receives 13% of its queries without EDNS0 extension. We found that this was from two ASs, who have an odd behavior, and only query NS1 (sticky resolvers).

So, when a resolver receives a truncated response, it should ask the same query again using DNS/TCP. We found that this happens in 80% of the cases, as shown in Figure 4.

DNS/UDP TC responses followed by TCP queries.

Figure 4: DNS/UDP TC responses followed by TCP queries.

Finding #4: Direct DNS Flag Day 2020 uptake was rather small, but operators adapt slowly

The DNS flag day 2020 proposed that resolver ops configure their EDNS0 buffer sizes to 1232 bytes. That, in turn, would reduce the large buffer sizes we see in Figure 5, and avoid both fragmentation and truncation. We use the October 2020 dataset and compare it against the July 2020 to measure the uptake of the Flag Day: we get all resolvers seen on both datasets, and see how many have migrated to EDNS 1232 bytes.

From 1.85 million resolvers (unique IP addresses), we see only 11338 that adopted 1232 bytes compared to July 2020, suggesting that the flag day didn’t cause operators to change their settings immediately.

But we also investigated the daily distribution for over a 1.5-year period, as shown in Figure 5. By the end of May 2021, we see 9% of the resolvers announcing 1232 bytes – twice as many as one year earlier. However, the majority still announces either 4096 bytes or other values.

Daily EDNS buffer distribution by resolvers (y axis in log-2 scale).

Figure 5: Daily EDNS buffer distribution by resolvers (y axis in log-2 scale).

Summary

This study complements previous ones on fragmentation and truncation on DNS. While rather rare, large responses exist in DNS, and they can be prevented by the increased adoption of smaller buffer sizes. Server-side fragmentation is very rare, for both IPv4 and IPv6, but in-network fragmentation is still present (4.4% for IPv4, similar to previous studies). The DNS Flag Day 2020 had some impact, but DNS operators adopted its recommendations only slowly.

This blog is based on a peer-reviewed paper.