Statistic of the month: fake webshop detections

Our self-teaching system keeps getting smarter

Fake webshops tempt online consumers with designer shoes, bags and other luxury goods at knock-down prices. What buyers actually get, however, is shoddy fake products. If they get anything at all. We've been proactively searching for fake webshops for a couple of years, because we don't want internet scams undermining consumer confidence in .nl. This blog post presents info about the number of suspect webshops we've detected using FaDe, the new self-teaching system we've developed at SIDN Labs. It also tries to explain why detections peak in November.

Using machine learning to make the internet more secure Fake webshops taken off line much sooner

Labs develops robust detector

In an earlier piece, we described our plans for a robust new machine-learning-based detection system. The system was implemented last summer and is now deployed scanning for suspect webshops on a daily basis. Before going over the numbers that shed light on the system's effectiveness, we should outline how it works. Dubbed FaDe (Fake Detector), our system scans newly registered domains every morning (see Figure 1, bottom left of the diagram). FaDe decides whether a domain is suspect by looking at ten attributes, including the registrar and the network hosting the associated website. Our support team's anti-abuse experts then look at all the domains flagged up as suspect (top of Figure 1). If their assessment is that a flagged domain really is being used for a fake webshop (a 'true positive'), they work with the relevant registrar to get it taken down. (See this blog for details of our intervention activities.) Finally, the anti-abuse team's conclusions are fed back into FaDe, so that the system can learn from the outcome of previous detections (below right in Figure 1).

Figure 1: Robust continuous detection process

How good is the detector?

Now that FaDe's been running for three months, we can get an idea of how well it works. Figure 2 shows how many suspect domains FaDe has detected: 480, 406 and 2,263 in September, October and November respectively. (The figure for November is for the period up to the 22nd.) The huge jump in November really stands out; we'll come back to that. First, let's look at another feature of the data in Figure 2: what our anti-abuse experts made of the suspect domains. In September, 79.6 per cent of suspects turned out to be fake webshops, and in October the figure was 82.3 per cent. The other domains were either judged to be legitimate or couldn't reliably be confirmed as fake webshops. In November, our experts have so far been able to look at 35.4 per cent of the suspect domains. Of those, 91.4 per cent were confirmed as fake webshops.

Figure 2: Monthly number of suspect domains flagged up by FaDe.

Why so many detections in November?

We can't yet put a reliable figure on fake webshop detections in November. However, the detections assessed so far indicate that the detector's performance is stable. And the number is certainly way up on the previous two months. Which begs the question: why? We suspect two factors, which may interact with each other. First, FaDe is getting better. After all, it's a self-teaching system: it learns from the experts' feedback, increasing its ability to distinguish between suspect and legitimate domains. Data is constantly fed back into the system, and FaDe keeps improving. That may mean that it's now flagging up fake webshops that it wouldn't previously have recognised. Second, we believe a seasonal factor is at work. We already know that fake webshops are most active during the festive period. With consumers buying lots of gifts on line, scammers try to grab a share of the spending bonanza. To test that hypothesis, we've looked at the age of domains at the time of detection. Figure 3 shows that most domains flagged as suspect by FaDe are less than a year old. But, in November, the age distribution range is much smaller. That influences the median domain age: in September and October, 50 per cent of detected domains were 82 days old or younger, whereas in November, 50 per cent of detected domains were just twenty days old or younger. The discovery that many fake webshops are using recently registered domain names suggests that the registrations have been made specifically for the festive period.

Figure 3: Age distribution of suspect domains flagged up by FaDe. The orange line represents the median, and 50 per cent of detected domains have an age within the range represented by the blue bar.

What next?

From the stats presented above, it's clear that we're heading in the right direction. We're detecting a lot of suspect sites, and the vast majority are being confirmed as fake webshops. We're therefore sharing our approach with other top-level domains (TLDs) in Europe and beyond, so that they can join the fight against fake webshops. We also plan to make stats like those in Figures 2 and 3 available on stats.sidnlabs.nl.