Using logo detection technology to identify malicious .nl websites
LogoMotive helps the fight against internet crime by flagging up unauthorised logo use
Chose your color
Frequently visited
Frequently asked questions
The Whois is an easy-to-use tool for checking the availability of a .nl domain name. If the domain name is already taken, you can see who has registered it.
On the page looking up a domain name you will find more information about what a domain name is, how the Whois works and how the privacy of personal data is protected. Alternatively, you can go straight to look for a domain name via the Whois.
To get your domain name transferred, you need the token (unique ID number) for your domain name. Your existing registrar has the token and is obliged to give it to you within five days, if you ask for it. The procedure for changing your registrar is described on the page transferring your domain name.
To update the contact details associated with your domain name, you need to contact your registrar. Read more about updating contact details.
When a domain name is cancelled, we aren't told the reason, so we can't tell you. You'll need to ask your registrar. The advantage of quarantine is that, if a name's cancelled by mistake, you can always get it back.
One common reason is that the contract between you and your registrar says you've got to renew the registration every year. If you haven't set up automatic renewal and you don't renew manually, the registration will expire.
Wanneer je een klacht hebt over of een geschil met je registrar dan zijn er verschillende mogelijkheden om tot een oplossing te komen. Hierover lees je meer op pagina klacht over registrar. SIDN heeft geen formele klachtenprocedure voor het behandelen van een klacht over jouw registrar.
Would you like to be able to register domain names for customers or for your own organisation by dealing directly with SIDN? If so, you can become a .nl registrar. Read more about the conditions and how to apply for registrar status on the page becoming a registrar.
LogoMotive helps the fight against internet crime by flagging up unauthorised logo use
The original blog is in Dutch. This is the English translation.
Logos give a website a familiar feel and promote trust. Scammers take advantage of that by using well-known organisations' logos on malicious websites. Unsuspecting internet users see the logos and think that they're looking at a legitimate webshop or government website, when it's actually a phishing site, a fake webshop or a site set up to spread misinformation. So, here at SIDN Labs, we've developed LogoMotive: a prototype tool designed to help analysts identify abusive domain names more quickly by flagging up .nl sites that seem to be making unauthorised use of logos. In this blog we explain why we developed LogoMotive, the research on which it's based and how it works.
At SIDN Labs, we're constantly looking for new ways of tackling domain name abuse, in order to protect .nl domain name users against internet crime.
One thing we've observed is that phishing sites, fake webshops and other malicious websites often use the logos of authoritative organisations. Examples include a phishing site with a government logo on its forged DigiD login page (Figure 1a) and a fake webshop with the Trustpilot logo and the logo of a standardisation body (Figure 1b). By putting such logos on their sites, scammers lull visitors into a false sense of security. It's then easier to trick them into parting with money or data, or accepting false information.
Figure 1a: A phishing website with a government logo on its forged DigiD login page.
Figure 1b: A fake webshop that has Trustpilot and ISO logos (lower left), although it isn't affiliated or certified.
Against that background, we set ourselves the goal of helping abuse analysts to identify suspect websites in the .nl zone by checking for unauthorised logo use. We've now developed a prototype tool called LogoMotive, by building on the findings of the pilot project we did with Currence last year.
SIDN and Currence team up to fight fake webshopsFor a tool like LogoMotive, we need two key components. First, an algorithm capable of automatically detecting logos on .nl websites. The .nl zone has more than 6.2 million domain names, which are changing all the time. So the algorithm needs to be very efficient, otherwise regular scanning of the entire zone would be too time-consuming.
The algorithm also has to be capable of recognising a variety of logos, which we'd ultimately like to be able to add to over time, on a semi-automated basis. Flexibility is important, because scammers don't always use the same logos. Recently, for example, we've seen an upturn in the use of postal logos on phishing sites, probably in response to the growth of online shopping.
LogoMotive's second essential component is a dashboard on which abuse analysts can easily check out the websites flagged up by the algorithm. For example, it needs to be possible for the analyst to record whether a flagged domain name is legitimate or malicious, and what needs to be done next, such as disabling a malicious domain name or associating a legitimate domain name with an organisation.
For ethical reasons, we want to deliberately constrain LogoMotive in certain ways. For example, we want the algorithm to look exclusively at logos, not other visual objects, such as faces. To make sure it does that, we're training the algorithm ourselves, using only logos. LogoMotive will also be limited to analysing screenshots of public web pages.
Another requirement is that LogoMotive must operate on the basis of the 'human in the loop' principle. In other words, as with our other anti-abuse tools, there won't be any autonomous robot decision-making about things such as the deactivation of domain names. LogoMotive's output will be presented on a dashboard to help human anti-abuse analysts to assess suspect domain names thoroughly and efficiently.
Let's begin by outlining how LogoMotive works. We'll then move on to taking a closer look at its two main components: the logo-detection algorithm and the dashboard.
LogoMotive starts with a list of domain names; in this case, a list of all the domain names in the .nl zone. Working from that list, it compiles a bank of screenshots. That involves a robot visiting the websites linked to the domain names on the list and screenshotting all the relevant pages. Screenshotting is a time-consuming process, because the content of each page – including images, backgrounds and other formatting – has to be loaded before a screenshot can be taken. We therefore try to be smart about deciding which websites the robot should visit. For example, we skip any site whose content hasn't changed since the previous week, and any page that's identical to one previously found on another site.
Next, the screenshots are analysed by our logo-detection algorithm (see below). At present, the algorithm supports sixteen organisations' logos that are often used for phishing, including iDEAL, the Dutch national government, SIDN, various banks, accreditation schemes and postal/courier firms.
The analysis yields a list of logos found on website screenshots, details of which are stored in a database. Finally, an abuse analyst using a straightforward dashboard can work through the 'hits' to assess whether anything is amiss (see below).
At the heart of LogoMotive is the logo-detection algorithm. We opted to use the existing YOLO (You Only Look Once) algorithm. YOLO is a neural network that's specially designed for object detection and able to detect objects more quickly than most other algorithms of its kind. Another factor behind our choice was that we had been impressed by YOLO when we used it in an earlier pilot carried out in partnership with Currence.
Machine learning method identifies brand logos on fake webshopsDuring the development process, we also considered an alternative approach based on SIFT. The SIFT algorithm identifies the characteristic elements of images and calculates how many elements two images have in common. In the context of our work, a screenshot could be considered to contain a logo if the screenshot and the logo had sufficient shared elements. The advantage of using SIFT would have been that it requires less computational power than a neural network, and doesn't need training. However, we soon discovered that SIFT was relatively slow and didn't perform as well as YOLO.
Before YOLO can recognise objects in images, it needs to be trained using a large volume of data. Our training data consisted of screenshots known to contain one or more logos, plus associated labels for describing where the logos appear in the images. One option for compiling the training data was to manually annotate relevant screenshots. However, to arrive at a reasonable dataset that way, we would have had to invest scores of hours per logo. As well as implying considerable time input during development, that would have complicated the task of adding support for new logos in the future. We therefore devised a smart way of automatically generating training data.
Our method requires two inputs: an organisation's logo and a set of screenshots from a few thousand randomly selected websites. Both are easy to obtain: the logo is available from the organisation's website and the screenshot set can be compiled by automatically crawling a random subset of the .nl zone.
We then proceeded to paste the logos onto the screenshots, randomly varying the position, size, sharpness, colour and visibility percentage of the logos. The purpose of the variations was to ensure that the neural network could handle the variety it would encounter on real web pages. The method enabled us to quickly generate large, good-quality datasets, without deploying a disproportionate amount of human capacity. Figure 2 illustrates how we generated a data point.
Figure 2: We generated training data automatically by combining logos with random screenshots to form training data points.
Figure 3 shows the LogoMotive dashboard, with examples of websites that feature the SIDN logo (i.e. www.sidn.nl and www.sidnlabs.nl). The dashboard uses the output of the YOLO algorithm after training as described above.
Figure 3: The LogoMotive dashboard shows websites on which a particular logo has been found.
Figure 4 shows the screen that an anti-abuse analyst sees if they click on one of the result lines shown in Figure 3 – in this case sidn.nl. The dashboard shows them the screenshot featuring the detected logo, together with the degree of confidence (in this case 0.98, or 98 per cent). The dashboard also displays information to help the analyst to assess the flagged websites thoroughly and efficiently. The analyst can then label the result at the bottom of the screen and, in appropriate cases, select a follow-up action. The labels can easily be modified for each individual logo, in line with the analyst's preferences.
Figure 4: Information about each website can be viewed. The user sees the screenshot(s) of the sites where the logo has been found, and can add notes and comments.
A key question is, of course, how good is LogoMotive at detecting logos? Answering that question isn't easy, however. A balance must always be struck between false negatives (instances of actual logo use that are overlooked) and false positives (reported finds where the relevant logo isn't really used). That implies deciding how certain you want the algorithm to be before flagging up a website (the 'confidence threshold'). What's more, the number of false negatives is always unknown, because you can't say how many sites have been overlooked without knowing the total number of sites with relevant logos in the .nl zone.
We have therefore configured the system to yield an expected false positive rate of 10 per cent. That is the percentage that we use for other detection systems, and is acceptable to the anti-abuse analysts who process the output. We could configure LogoMotive to deliver fewer false positives, but the price would be more false negatives.
We set the desired confidence threshold for each individual logo by manually examining a small sample of websites. With the government logo, for example, we found that a confidence threshold of 90 per cent yielded good results, while for the Thuiswinkel Waarborg logo 85 per cent was best. We don't know exactly why a higher threshold is needed for the government logo, but we suspect that it may be because the logo features a dark blue square, which is not very distinctive. The system can detect logos even if they are only half-visible, featured in a different colour, or reproduced as a very small background image. We therefore think that the number of logos overlooked is trivial. Nevertheless, we intend to investigate that more closely in due course.
Over the last few months, we have improved our prototype logo-detection system considerably. LogoMotive can now serve as a solid basis for various follow-up studies.
In the period ahead, we intend to evaluate how the system works in practice. That will mean focusing primarily on our own Anti-Abuse Desk, but also reaching out to large organisations with authoritative logos that are often abused. Recently, for example, we started a pilot with the Dutch national government's Publicity and Communication Service (DPC). We're expecting the DPC to use LogoMotive to detect not only malicious websites making unauthorised use of the government logo, such as the phishing site in Figure 1a, but also legitimate government websites that have slipped under the radar. We'll be publishing a blog about the pilot's results before too long.
Another thing we want to do is take a closer look at the machine learning issues. What is the maximum number of logos that the neural network can look for at the same time, for example? We'd also like to investigate the scope for automatically classifying the motive for a site's logo use by drawing on other data sources. We know from our previous research into fake webshops that some data, such as the time of domain registration, has predictive value in the context of abuse detection. It may well be possible, therefore, to discern similar patterns in the field of malicious logo use. We'll explore those ideas more fully in future blogs. Finally, we intend to integrate LogoMotive output with DEX, the tool we developed a while ago for studying the ecosystem around a suspect domain name.
Article by:
Research Engineer
The main focus of my work is the application of machine learning to make the internet more secure and trustworthy. My expertise with big data and algorithms is valuable for identifying patterns associated with abuse, enabling detection and intervention.
Share this article