Algorithm rollover: the effects on our network traffic and resolvers

Significantly more TCP queries

Research engineer looks at a graph on a laptop screen.

The original blog is in Dutch. This is the English translation. At the end of July, our DNS Team successfully completed the process of migrating .nl to a new cryptographic algorithm. The 'rollover' went without a hitch, so .nl is now secured by a more modern and efficient algorithm. This article describes the (sometimes surprising) effects that the procedure had on our network traffic.

Since a few weeks, .nl is secured using the ECDSA P-256/SHA-256 cryptographic algorithm, instead of the RSA/SHA-256. One consequence is that the responses sent by our name servers when queried about .nl domain names are now shorter. However, safe migration from RSA/SHA256 to ECDSA P-256/SHA-256 required a short period when our zone was signed using both algorithms. During that transitional double-signing period, our name servers' responses were actually considerably larger.

Rollover phases

The algorithm rollover was divided into several phases to ensure that resolvers were able to validate the .nl zone at all times. For details of the various phases, see our previous blog post. Here at SIDN Labs, we were particularly interested in the phase when records were signed using 2 sets of keys: 1 based on the old algorithm and 1 based on the new algorithm. That phase ran from 5 to 22 July.

During the double-signing phase, the DNSKEY record contained 4 keys: 2 ZSKs (zone-signing keys) and 2 KSKs (key-signing keys). In addition, every signed record had an extra signature. The table below shows the sizes of responses to queries about a non-existent domain name (NXDOMAIN), about the keys for .nl (DNSKEY), and about name servers (NS) before, during and after the rollover.

Server

Response size (bytes)

NXDOMAIN (9bfg7ty4qc.nl/A)

DNSKEY

NS

Before

During

After

Before

During

After

Before

During

After

ns1.dns.nl. (194.0.28.53)

1,015

1,402

759

766

1,024

310

1,214

1,022

928

ns1.dns.nl. (2001:678:2c:0:194:0:28:53)

1,015

1,402

759

766

1,024

310

1,214

1,022

928

ns3.dns.nl. (194.0.25.24)

1,016

1,403

760

767

1,025

311

1,187

1,199

929

ns3.dns.nl. (2001:678:20::24)

1,016

1,403

760

767

1,025

311

1,199

1,235

929

ns4.dns.nl. (185.159.199.200)

1,006

1,393

750

753

1,011

297

1,201

1,009

915

ns4.dns.nl. (2620:10a:80ac::200)

1,002

1,389

750

753

1,011

297

1,201

1,009

915

Table 1: Size of responses to various DNS query types with the DO flag set before, during and after the rollover (sources: DNSViz before, during, after)

A number of things stand out from the data. First, it's clear that the migration to ECDSA yielded significantly smaller responses. Second, the responses differ a little in size, depending on which of the 3 name servers they come from. The reason for the discrepancies is that the name servers run different open-source name server software packages, which don't all implement DNS name compression in the same way.

Finally, it's striking that responses with the NXDOMAIN code were particularly large during the transitional phase. In order to proof that a domain name does not exist, we use NSEC3 for .nl. An NXDOMAIN response therefore contains up to 3 extra NSEC3 records and the SOA record, and each record is signed. Consequently, during the rollover, a response to a query about a non-existent domain name contained 8 signatures, instead of the usual 4.

Packet sizes and the DNS

Size can influence whether a packet is sent over UDP or TCP. When sending a query, a resolver indicates the maximum packet size it can accept by UDP transport. Name servers apply packet size limits as well, which they communicate to resolvers. A limit is communicated by specifying a value for the DNS packet's 'EDNS(0) buffer size' parameter. The lower of the 2 limit values is taken as the maximum size for a packet transported over UDP. If a packet exceeds the limit, the name server asks the resolver to resend the query, this time using TCP. That's done by setting the 'TC flag' in the response.

Since DNS Flag Day 2020, the recommended EDNS(0) buffer size setting has been 1,232 bytes. Of course, it's also important that both resolvers and name servers support DNS over TCP.

Our ns1.dns.nl and ns4.dns.nl name servers signal an EDNS buffer size of 1,232 bytes, while ns3.dns.nl signals a buffer size of 1,400 bytes. A third of the queries we receive come from resolvers whose maximum supported message size is 1,232 bytes. A further 17.4 per cent are sent by resolvers whose maximum EDNS buffer size is 1,400 bytes, and 14 per cent by resolvers that claim to be able to handle messages of up to 4,096 bytes. A breakdown of the packet size limits encountered over the last years is presented on stats.sidnlabs.nl. From that data, it's apparent that the proportion of queries with a specified buffer size of 1,232 bytes has increased since DNS Flag Day 2020.

Given that the maximum packet size is determined by the lower of the 2 buffer sizes (the resolver's and name server's), the largest packet that any of our name servers can send over UDP is 1,400 bytes. As the table above shows, during the rollover, all responses regarding non-existent domain names were larger than the maximum buffer size supported by the name servers. In addition, responses containing NS records sent by ns3.dns.nl using IPv6 were larger than 1,232 bytes, and therefore too big for at least a third of the resolvers.

TCP traffic: what we expected

In view of the packet sizes expected during the rollover, and the supported buffer sizes, it was easy to predict that TCP traffic would increase. But by how much?

Ahead of the rollover, we made an estimate of what the upturn would be. That involved calculating how many queries would be too big for UDP transport during the rollover, triggering the TC flag to be set in the response. To that end, we assumed that all queries about non-existent domain names also requesting DNSSEC-related records such as signatures (signalled using the DO flag) would get TC-flagged responses.

We additionally assumed that some responses about existent domain names would not go via UDP. To estimate the number involved, we worked out how many requests would be too big for a UDP packet if the response size increased by 98 bytes (the typical size of an extra ECDSA signature).

Our estimate suggested that, during the rollover, we would be sending 5.6 times as many responses with the TC flag set as we did before. However, we knew from our previous research that 5.6 times as many TC-flagged responses doesn't automatically equate to 5.6 times as many TCP queries. At least 15 per cent of responses that have the TC flag set do not lead to queries being sent over TCP. Taking that observation into account, we anticipated that TCP traffic would increase to 4.7 times its pre-rollover level.

Proof of the pudding

Once the zone was actually signed using the new algorithm, it became apparent that our estimate was somewhat conservative. The total number of TCP queries went up from an average of 359 per second to 2,421 per second: nearly a seven-fold increase. As an average proportion of all DNS traffic, TCP queries increased from 0.8 per cent to 4.9 per cent.

As anticipated, the growth was to a large extent driven by queries about non-existent domain names. Before the rollover, about half of incoming TCP queries related to non-existent domain names; while the rollover was in progress, more than 3 quarters did.

Graph that provides insight into the UDP and TCP traffic before and after additionally signing the .nl zone using ECDSA P256 (grey line).

Figure 1: UDP and TCP traffic before and after additionally signing the .nl zone using ECDSA P256 (grey line).

However, the growth in queries about non-existent domain names wasn't restricted to the TCP traffic flow. As the following chart shows, the proportion of all traffic accounted for by such queries increased from 8.5 per cent to 13 per cent. During the rollover, the overall number of queries about non-existent domain names grew by a factor of more than 1.6.

Graph that provides insight into the responses with NXDOMAIN return codes before and after additionally signing the .nl zone using ECDSA P256 (grey line).

Responses with NXDOMAIN return codes before and after additionally signing the .nl zone using ECDSA P256 (grey line).

So, who caused the growth in NXDOMAIN queries?

As mentioned above, not all responses with the TC flag set result in the queries being resent over TCP. From data collected during the rollover, we can see that some resolvers that receive TC-flagged responses persist with UDP instead of sending TCP queries.

With the aim of understanding what resolvers would behave that way, we dug deeper into our data. Every day, we receive queries from more than 1.5 million resolvers. For the purpose of the analysis described here, we therefore focused on resolvers that sent more queries the day after the rollover than the day before. We selected a subset of 5,000 resolvers sending more than 1,000 queries a day that, in absolute terms, exhibited the biggest pre-rollover to post-rollover query growth.

More than half of the resolvers in question sent at least twice as many queries after the rollover as they did before, and the growth in queries about non-existent domain names was even greater. Half of the resolvers sent 4 times as many queries about non-existent domain names, and a quarter sent 8 times as many.

In some cases, the resolvers' behaviour was sufficient to trigger the automatic application of response rate limiting (RRL) by our name servers. RRL causes the TC flag to be set in the response, or even no response to be sent. The number of unanswered queries therefore increased during the rollover from 218 per second to 554 per second.

Lack of TCP support

We suspected that resolvers that went on sending the same query over UDP didn't properly support TCP. Examination of the data shows, however, that only some of the resolvers that persist with UDP lack TCP support. For example, certain resolvers that are used for large-scale measurements by universities received numerous TC-flagged responses, but didn't resend any queries over TCP.

The impact of such behaviour on end users is unclear. A resolver that doesn't receive a reply is likely to wait briefly, and then send a SERVFAIL error message back to the client. If the client doesn't have access to an alternative, functional resolver, it won't be able to reach the queried domain. However, we didn't receive any complaints about unreachable domains during the rollover.

Traffic developments after the rollover

Once the rollover was complete, things quietened down. Queries about non-existent domain names returned to their pre-rollover level. As the following chart shows, the volume of queries received over TCP also fell back – but didn't drop below their old level.

Graph that provides insight into the UDP and TCP traffic after the .nl zone was signed only using ECDSA P256 (grey line).

Figure 3: UDP and TCP traffic after the .nl zone was signed only using ECDSA P256 (grey line).

Also, the proportion of resolvers sending a TCP query at least once on a given day fell: about 11 per cent of resolvers sometimes sent TCP queries before the rollover. After the rollover, only 5 per cent did so. It would appear, therefore, that a small number of resolvers no longer need to use slower TCP connections.

Other effects have come to light as well. For example, the average size of the responses sent by our name servers has fallen by 18 per cent, from 373 bytes (353 bytes median) to 306 bytes (290 bytes median). That's not an insignificant reduction, considering that our name servers send 3.6 billion responses or more per day: it means we're sending about 211 GB less data over the internet every day.

Conclusions and plans

During the rollover, our name servers had to handle significantly more TCP queries. As previously observed, that didn't cause any problems, but did yield some surprising insights.

For example, it seems that some resolvers still don't have reliable DNS-over-TCP support, despite TCP always having been a central feature of the DNS protocol. The growth in queries about non-existent domain names that probably resulted from that situation had no material consequences for our name servers. However, some resolvers' inability to use TCP could be problematic for end users of the poorly configured networks in question.

We would therefore advise anyone that hasn't already done so to check that their resolvers do support TCP. And we hope that everyone enjoys getting smaller responses from .nl.