Thesis: Impact of scanning on authoritative name servers

Pascal Huppert investigated the risks of DNS scanning for the DNS infrastructure as part of his master thesis

Monday 23 December 2024
Article by: Pascal Huppert

The Domain Name System (DNS) normally answers individual queries that take the form “What is the IP address of domain X?” But many parties perform DNS scans, sending thousands or millions of queries to find out about available domain names, for example. Such scans can be identified, and make up around 30% of all traffic to SIDN’s DNS servers.

This research was performed at SIDN Labs as part of a master’s thesis at the University of Münster, Germany.

DNS traffic is normally the result of the action of users, for example when visiting a website or sending an email. But – like most things using computers and the internet – it can also be automated to gain data about domain names in bulk, which is often done for research purposes and for domaining – checking the availability of a list of domain names. Open-source software exists for this purpose and checking millions of names can be done easily on an average computer, possibly causing heavy traffic.

This warrants the question of what we can find out about DNS scanning: how popular is it, who does it, and why? Ultimately: does it pose an issue for the DNS infrastructure?

Main takeaways

Scans can be found by looking for outliers in resolver traffic.
Most scans are performed from the networks of hosting or cloud providers, but they can also be found in traffic from many public resolvers, research networks and other companies.
We estimate that around 30% of total traffic to the .nl servers is caused by scanning.

Background

Domain names have become a valuable resource, sometimes being bought and sold for thousand or millions of euros. This has made them attractive assets for domainers, who systematically search for, buy and sell domain names for profit.

On the other hand, the DNS is a critical part of the internet’s infrastructure, and knowing which subdomains exist can be valuable for hackers or penetration testing. The existence or non-existence of DNS entries and their data is also used in research (OpenINTEL, DNSdb), for example to gain more insight into the deployment of new protocols.

It is well known among DNS operators that scanning is a common activity, undertaken even by academic organisations (SIDN itself runs DMAP, for example). Generally speaking, it is not known how extensive scanning is, and it is unclear how many parties perform which kinds of scans using what data from which sources, and what that means for the DNS infrastructure. To fill this gap, this research provides insight into the types and sizes of scans as well as resolver behaviour in general.

We used query data collected by ENTRADA from all authoritative name servers for .nl. So to say, this research uses passive DNS measurements to spot active DNS measurements.

Finding scans

In order to identify scans, a definition of what constitutes a scan and some manual exploration of the data were necessary. Of course, domaining is scanning, that is systematically querying a large number of names. Other kinds include monitoring, as well as bulk emailing, because it cannot always be told apart from other types. Any significant traffic relating to generated domain names is regarded as scanning.

We expect DNS queries to follow a certain pattern, as prescribed by the protocol. For example, a resolver should retry its query over TCP if the original response does not fit in a single UDP packet, and a resolver should cache a response for some time so as to avoid repeating a query too frequently. Other patterns arise because of traffic content. For example, a resolver serving clients will query popular names hundreds of times each, while less popular names may be queried just once.

The key to finding scans is recognising patterns in normal traffic that are not associated with traffic from scans. For domainers, we expect them to query a large number of different domain names without redundant queries. Looking at what fraction of queries ask for a name not queried before by each source IP address, calculating this for each source, and creating a histogram of sources yields figure 1.

Figure 1: Number of resolvers with each distinct name percentage (histogram, boxplot). Distinct name percentage: the proportion of queries relating to domain names not previously queried by this resolver.

Looking more closely at the resolvers that send almost 100% non-repeated names (on the far right of figure 1), they turn out to be scans. Their traffic also often follows some kind of order, e.g. most popular names are queried first, or shortest names, or names are in alphabetical order. In these cases, it can be assumed that what we see is a scan.

By studying the query behaviour of many different sources, we can gain new ideas about the patterns that are indicative of scanning behaviour. For example:

An extremely high or low repeated name count, whereas only some names are repeated often in normal traffic
A large number of queries in total, such as multiple millions, although most resolvers send less than 500,000 per day
An unusual fraction of NX domains, which is around 5 to 10% for normal traffic and can be close to 0 or 100% for different scans
Unusual distribution of queries over the 24 hours of the day, with inexplicable peaks, for example
Querying in batches: many requests within a few milliseconds, and then periods of no querying
Unusual usage of query types, for example just querying A records or querying 4 different record types for exactly 25% of queries each, whereas public resolvers will have a long tail of uncommon query types
Deviation in other characteristics, such as the distribution of starting letters of domain names, length of names, or number of queries for the most queried names
Incorrect patterns in technical fields (non-random query IDs, non-random source ports)
Violation of conventions or standards, such asking our name servers for recursion, which AuthNSes do not perform, sending invalid queries, running into rate limiting, sending duplicate queries, or not repeating truncated queries using TCP

These patterns arise both from query content and from resolver implementation. While the former provides insight into intent, both are relevant, because scans are often pragmatic and use simpler software or configurations than real recursive resolvers.

Most patterns can be described using histograms, or scatterplots like figure 2, where each query is represented by a dot, with time plotted on the x-axis, queried name on the y-axis, and the query type indicated by colour.

Figure 2: Scan traffic is often in alphabetical order (diagonal lines) and involves specific record types (colour).

As well as revealing the alphabetical ordering of queries, a scatterplot can also reveal abnormalities in query volume or query types. For example, the traffic in the scatterplot above shows clear ascending diagonal lines, meaning names were queried in order A-Z, with SOA queries (dark green) being sent later than A queries (purple).

Scans can use 1 IP address up to multiple hundreds, or even use open resolvers. They are abnormal in 1 or more ways, usually because scanning traffic is more homogenous than normal queries. Scans are more difficult to identify in mixed traffic such as from open resolvers.

The bigger picture

Data from 2 different days was used in this research: one day for exploration and design of the methods, and another day for an evaluation that is as independent as possible from the data seen previously, to confirm the results.

Clearly, these analyses can be intriguing, and yield many results. But for gaining more profound insight, finding larger patterns and creating statistics, we cannot rely on just looking at examples.

To find patterns in the resolver population, we use feature vectors describing patterns close to the ones mentioned above. These features include descriptions of:

Percentage of queries with unique names, NX responses, different query types, Punycode, TCP and other characteristics

Length of queried domain names
Distribution over time
Query counts per name
Query repetitions
And many other features

Clustering is used to find groups of similar resolvers, meaning similar types of scanning or even sources performing a scan together. Most importantly, the feature space allows us to find previously unseen behaviour, identify groups and perform classification using typical machine learning algorithms.

Only resolvers sending more than 10k queries were processed, because smaller ones can be difficult to classify even manually and are not expected to perform scans. Where many of the features are concerned, the behaviour of the resolvers previously identified as associated with scanning is much more diverse (larger deviation in feature dimensions) than that of those not associated with scanning, which in turn is less diverse than the total population. Figure 3 shows one example feature – the standard deviation over the list of numbers describing how often each domain name was queried – calculated for 40 randomly chosen sources of each of 3 types: scanning sources, non-scanning sources and unlabelled sources. Most other features show a similar pattern, with the standard deviation of samples labelled as scans generally being much larger than that of the non-scan samples.

Figure 3: Strip plot of feature values describing the deviation of the query counts for each domain name. Colour = classification label

We can also see that each attribute will, by itself, not suffice for classification, because there is still a wide overlap between feature values of different classes. Therefore, use of multiple features is necessary. Our interest lies in finding out about different groups and types, and so we apply clustering using k-Means on the features, after weighting each by its relevance for the use case.

Weights were defined by judging each feature’s relevance and then putting each into 1 of 5 groups with exponentially doubling weight, and 1 group of excluded features. Tests were performed to confirm that no significant improvement in our ability to distinguish accurately between scans and non-scans could be achieved by changing a weight, so that a stable weighting was found. The most relevant features turned out to be:

Distinct name percentage
Percentage of queries with response code 0 and no response, respectively
Average domain name length
Repetition percentage
Unusual distributions in time, query types or name starting characters

Figure 4 shows the feature space and clusters, reduced to 2 dimensions using t-SNE. This visualisation shows that the algorithm can separate the different resolvers into distinct groups, just like a human can identify groups of points in the visualisation. We then manually inspect each group to identify scans.

Figure 4: Visualisation of the feature space reduced to 2 dimensions. Each point is a resolver, and each colour is a cluster.

Results

Other than domaining, we found many other types of scanning within the traffic. A significant proportion involve subdomain scanning/enumeration, which involves attempting to guess the subdomains of known 2LDs, often by using lists of common 3LDs. This should not be visible at the TLD level, because the 2LD only has to be queried once to find out the name servers of the 2LD. Defective or insufficient caching can lead to repeat queries nevertheless being sent to us. We see this most commonly in the traffic from open recursive resolvers. On the other hand, domaining is most often performed from the networks of hosting/cloud providers using dedicated IP addresses.

Other types of scanning identified include monitoring and scanning for similar names. This might be done for trademark protection or identifying phishing domains. Academic scans from OpenINTEL, SIDN itself (Dmap), TU Munich and others could also be found. Web scraping was expected to be visible but could not be identified.

Impact on our name servers

In total, about 12% of all traffic from 1 day was labelled during the research, leading us to estimate that 10 to 50% of the traffic was associated with scanning. A single subdomain scanning operation alone was responsible for 4% of the traffic (250 million queries), whereas other scans usually total no more than 30 million queries. Extrapolating from the labelled data suggests that about 30% of all traffic stems from scanning on an average day.

Exceptional days also exist: the SIDN Labs website’s DNS statistics show a distortion in traffic on May 21^st of this year, and the reason for this can easily be identified as a very large scan. Even though that led to 2.6 billion excess queries, no significant increase in query processing times could be observed. This leads us to believe that even large scans are not detrimental to server performance, and that rate limiting is an effective means of mitigation.

Conclusion

Unfortunately, no previous studies have been done to find scans in DNS traffic, so no numbers are available for comparison. This study still leaves many questions unanswered and raises many new ones, but the results provide reassurance that regular DNS scanning is not problematic in relation to the availability of SIDN’s authoritative DNS servers, and is probably not problematic for other TLDs either. You can find more details of this research in my thesis.

Article by:

Pascal Huppert

Graduation trainee

Pascal conducted research at SIDN Labs as part of his master’s thesis at the University of Münster, Germany.