Recursive resolver classification
Wrapping up my master's thesis
Chose your color
Frequently visited
Frequently asked questions
The Whois is an easy-to-use tool for checking the availability of a .nl domain name. If the domain name is already taken, you can see who has registered it.
On the page looking up a domain name you will find more information about what a domain name is, how the Whois works and how the privacy of personal data is protected. Alternatively, you can go straight to look for a domain name via the Whois.
To get your domain name transferred, you need the token (unique ID number) for your domain name. Your existing registrar has the token and is obliged to give it to you within five days, if you ask for it. The procedure for changing your registrar is described on the page transferring your domain name.
To update the contact details associated with your domain name, you need to contact your registrar. Read more about updating contact details.
When a domain name is cancelled, we aren't told the reason, so we can't tell you. You'll need to ask your registrar. The advantage of quarantine is that, if a name's cancelled by mistake, you can always get it back.
One common reason is that the contract between you and your registrar says you've got to renew the registration every year. If you haven't set up automatic renewal and you don't renew manually, the registration will expire.
Wanneer je een klacht hebt over of een geschil met je registrar dan zijn er verschillende mogelijkheden om tot een oplossing te komen. Hierover lees je meer op pagina klacht over registrar. SIDN heeft geen formele klachtenprocedure voor het behandelen van een klacht over jouw registrar.
Would you like to be able to register domain names for customers or for your own organisation by dealing directly with SIDN? If so, you can become a .nl registrar. Read more about the conditions and how to apply for registrar status on the page becoming a registrar.
Wrapping up my master's thesis
Recursive resolvers act as middlemen between clients and DNS name servers. Operators of authoritative name servers are interested in getting a better understanding of the recursive resolvers that query them, to optimize their own services, for example. Building a classifier for recursive resolvers was therefore the goal of the research I did for my master's thesis at SIDN Labs.
Resolvers can serve a variety of clients, ranging from end users who want to visit their favourite video streaming websites to scripts that crawl the internet for marketing or research purposes. A thorough understanding of which resolvers are most important allows operators of authoritative DNS services (such as SIDN) to understand how they should set up their server infrastructures to optimise interaction with those resolvers so as to provide the best possible service to the clients using them. Also, knowing the origins of the resolvers allows researchers to measure the adoption of new technologies in the DNS and could even enable us to estimate the number of users impacted by major changes to the DNS, such as the Root KSK rollover. Like my colleagues at .nz, I have been working on a project that involves the classification of recursive resolvers to increase our understanding of the aforementioned issues. The main difference between my project and the .nz project is that I sought not only to differentiate “real” recursive resolvers from resolvers used for monitoring purposes, but also to identify various additional kinds of resolver, such as cloud providers' resolvers, ISP resolvers and so on.
I have classified recursive resolvers based on query data collected on the .nl name servers. In principle, however, data collected on any large authoritative name server should be adequate. Recursive resolvers follow various patterns when querying .nl domain names. For example, while 82 per cent of the queries are sent by 20 per cent of resolvers for A or AAAA records, some resolvers query almost exclusively for NS records. I have collected data on twenty-seven distinctive features of nearly 1.4 million unique resolvers over the course of a single day. I have also mapped known IP addresses from known companies to their serving sectors to create seven different sector types: ISPs, hosting companies, cloud providers, IT firms, research foundations, telecommunications companies and open resolvers. That dataset served as my ground truth.
Figure 1 — Companies and their traffic percentages on .nl NSs in March 2019
The pie chart in Figure 1 shows the companies and their traffic shares on .nl NSs in March 2019. I categorised the resolvers manually, depending on the type of autonomous system they belong to. Based on this manual analysis, it is clear that ISPs, large open DNS services, cloud firms and IT-related companies form half of the traffic handled by .nl NSs. Next, I used the labelled data consisting of twenty-seven feature columns and 39,361 unique IP addresses to analyse the relevance of each feature. In view of the results, I decided to use the fifteen best features for the classification, in order to reduce the dimensionality of the dataset and prevent overfitting. The most significant features are the operating system used (identified from the TTL field of the IP packet), whether DNSSEC information is requested by the resolver, and whether certain record types are requested by the resolver.
To finalise the research, I evaluated the performance of different classifiers. Table 1 shows the F-1 scores of all the algorithms used. The F1 score is the mean of precision and recall, where an F1 score reaches its best value at 1. Of the various algorithms that are popular for internet packet classification, the random forest algorithm showed the best F1 score for all class types. It was therefore used as the main algorithm for the analysis of unlabelled data.
Table 1 - F1 score of each classifier for each class type
For some classes, I had fewer training examples than for others, which might have had a negative impact on the classification. For example, while the ground truth of open resolvers consisted of precise IP addresses obtained from open resolver companies, I manually mapped research, telecommunications and hosting companies’ IP addresses to their sectors. This ultimately resulted in 98 per cent accuracy for the open resolver class and rather lower accuracies for the other classes. Nonetheless, creating this ground truth allowed me to measure the accuracy of the classification algorithms that were used in the research.
Figure 2 shows the key results of my classification. ISP resolvers are most common.
Figure 2 - Number of IP addresses in each class on 20 March 2019 and 22 May 2019 I ran our classifier on two separate days and Figure 2 shows the results. Resolvers classified as belonging to ISPs were most common on both days, followed by resolvers run in cloud environments and public resolving services. In the future, we might see a shift towards public resolving services, if DNS over HTTPS becomes more widely deployed in applications.
To conclude, the research achieved its goals, but it became clear that classification that is 100 per cent accurate is rarely possible. My hope is that my research will suggest new angles to other researchers, draw attention to the subject and thus lead to the improvement of online DNS services. These results need to be treated with a degree of caution, however. My ground truth was both biased and ambiguous. For example, an autonomous system may host a small enterprise's recursive resolver or an open resolver. An important focus for future work is therefore to find sufficient IP addresses of each class to support improved class representation for the classifier. You can also e-mail your questions and/or opinions to metinacikalinn@gmail.com or moritz.muller@sidn.nl.
For a detailed account of the research, take a look at my thesis.
Profiling recursive resolvers at authoritative name servers Acikalin MA EEMCS pdf (6.9 MB)Article by:
Share this article