Machine learning method identifies brand logos on fake webshops

Practical evaluation shows that 95.9% of the detections is correct. These webshops do display an iDEAL logo.

The original blog is in Dutch. This is the English translation. Authors: Thymen Wabeke (SIDN Labs) and Alice van den Wittenboer (Currence)

Two key features of any webshop are its domain name and payment set-up. Fraudulent webshops have those features too: without a domain name, a scam site isn't findable and without a payment mechanism it can't take your money. A robust strategy for tackling fake webshops should therefore look at both features. This blog describes how SIDN Labs and Currence have developed a machine learning model that can spot 87 per cent of the webshops that display an iDEAL logo. Then, if a shop is found to be fraudulent, SIDN can disable the domain name and Currence can block its iDEAL access.

Currence and SIDN both committed to internet security

Nearly 4,500 fake webshops taken down in 2019 following detection by SIDN Research into fake webshops reveals scammers' methods and leads to thousands of fakes being taken down Statistic of the month: fake webshop detections Using machine learning to make the internet more secure

Currence owns the iDEAL brand, while SIDN manages the .nl domain. Both are characteristically Dutch organisations operating in the digital world. iDEAL is easily the most popular online payment mechanism with Dutch consumers, and 80 per cent of Dutch internet users prefer to buy from a .nl domain. Currence and SIDN share a determination to make online shopping easy and secure, and a desire to see that the internet is safe for everyone to use.

Over the last few months, SIDN and Currence have been exploring ways of working together to tackle fake webshops. We've focused on a method for detecting webshops that offer iDEAL as a payment option. Our research hasn't looked at the possibility of finding shops that have actually implemented iDEAL. Having come up with a detection method, we wanted to see what its potential was, and whether it could be useful in the fight against abuse. We haven't yet addressed aspects such as scalability and efficiency in any great detail.

Object detection algorithm used to spot iDEAL logos

Webshops often display the logos of the payment methods they support. The idea is that consumers can see at a glance how they can pay. The detection method we've been researching is built around the practice of displaying payment logos: we take a screenshot of a webshop and analyse it using a machine learning algorithm for object detection. That's an algorithm designed to recognise 'objects' -- items such as iDEAL logos -- within screenshots.

For visual object detection, it's best to use a deep learning algorithm. This kind of algorithms consists of an artificial neural network made up of multiple layers of neurons. Such networks are highly complex, but also highly effective. They're the basis for Google Lens, for example, and for face recognition applications. Various scientific articles have been published describing the successful use of neural networks for object detection. In many cases, one or more prototype implementations are also available. As a result, object detection has become a readily accessible field.

For our exploratory research, we decided to use the YOLOv3 (You Only Look Once) algorithm. YOLO divides an image into a grid (see Figure 1). For each cell of the grid, the boundaries of any objects and the class probability are estimated. The boundaries and class probabilities are then weighted and objects thus detected. The various steps are integrated to form a single process, meaning that YOLO is relatively fast. It's also accurate and multiple implementations are available. We've chosen to use the Ultralytics LLC implementation.

Figure 1: The YOLO detection process.

Experimentation-based detection model development

The free availability of neural network implementations doesn't imply that the available networks can be used just as they are. First, the network's weights should be fitted to your data set. That process is known as 'training' the model. Central to the training process are annotated examples. We obtained our examples by using the COCO Annotator to annotate the locations of iDEAL logos on screenshots of randomly selected webshops in the .nl domain. That gave us 1,400 usable examples, of which 1,120 (80 per cent) were used to train the model, and 280 (20 per cent) were used for evaluation.

Training a model is an empirical dance involving numerous choices that influence quality. For instance, we discovered by experimentation that greyscale images work better than colour images. We think that's probably because, with a colour image, details such as the exact shade of pink or the addition of lustre have implications for the pixel RGB values, but are irrelevant to the output (whether the object is or isn't an iDEAL logo). We also found it was helpful to use examples repeatedly but in slightly modified form -- rotated a few degrees, say. That technique is known as data augmentation. Not everything we tried had a positive outcome, however. For example, the model wasn't improved by automatically generating extra examples by placing the iDEAL logo in random positions on a screenshot (Figure 2). That may well be because our task is relatively simple, meaning that these cheap but less representative examples don't add anything useful to the training.

Figure 2: Unsupervised example generation involving random placement of a logo on a screenshot was not very helpful.

The experiments described above were concerned with the input to the neural network. However, YOLO also has a range of settings, known as hyper-parameters, which influence the way the model works. The YOLO implementation we were using features a useful script that enables the algorithm to evolve towards the best settings. It involves running three hundred cycles or 'generations' of training. At the end of each generation, a retrospective analysis is made to identify the generation that yielded the best result so far. The settings used for that generation are then slightly modified (mutated) and used for the next generation. The results of the three hundred generations are shown in Figure 3. Each graph shows one hyper-parameter, with the parameter value (setting) on the x axis and the associated fitness on the y axis. A higher fitness score implies a better result. Each orange dot represents a generation, and the one with a blue ring is the generation with the greatest fitness. In each case, the fittest generation's setting value was adopted for the model training proper.

Figure 3: Results of evolution towards the best hyper-parameter values. Each graph shows one hyper-parameter, with the parameter value on the x axis and the associated fitness on the y axis. A higher fitness score implies a better result. Click on the figure to see a larger version.

Figure 4 illustrates development of the ultimate model. The various iterations are plotted on the x axis. In each iteration, the model was slightly better adapted to the examples. The y axis shows two loss scores. A loss score is a smart measure of the discrepancy between the model's predictions and the annotations. As the graphs show, the loss initially declines sharply, reflecting rapid learning by the model. Over time, the rate of improvement drops off. Once the curve flattens, further training has no value and can even lead to overfitting.

Figure 4: Loss scores during model training. Successive iterations are plotted on the x axis and the associated scores on the y axis.

Evaluation of the ultimate detection model

The loss scores in Figure 4 are promising, but are derived from the training examples and not therefore a good evaluation measure. For evaluation of the model, it's best to use new, previously unseen examples. We accordingly used the 280 test examples held back from the batch produced at the outset.

Evaluation using the unseen examples yielded good results: 87 per cent of the iDEAL logos were successfully detected (recall) and 88 per cent of the detections were correct (precision). Figure 5 shows two webshops where iDEAL logos were detected.

Figure 5: Two screenshots on which iDEAL logos were detected. Click on the image to see a larger version.

Using the detection model on COVID-19 webshops

The final phase of our exploratory study was practical validation of the model: using it in a field situation to see whether it yielded useful information. Our validation exercise focused on webshops with links to COVID-19. The reason being that numerous domain names relating to COVID-19 have been registered recently, and we know that offering 'popular' or scarce products is a common criminal tactic. The webshops we looked at were typically selling things such as plexiglass, facemasks or DIY test kits.

From 16 April to 13 May, inclusive, we took daily screenshots of COVID-19 webshops with .nl domain names. Whenever the detection model detected an iDEAL logo, the domain name was referred to Currence, whose Risk Team analysed the associated webshop. The analysis yielded a lot of information about the types of webshop offering COVID-19-related products, the nature of the products involved, whether the shops actually supported iDEAL payment and, if so, through which service provider.

139 of the 145 (95.9%) detected webshops were actually using iDEAL logos. Manual assessment of 138 webshops revealed that 90 per cent of them did in fact support iDEAL payment. In a few cases, further investigation of the site was initiated by Currence.

Conclusions and follow-up

Overall, the project was a success. The method used to establish whether a webshop offers iDEAL as a payment option is promising and the initial practical evaluation yielded a lot of useful information and follow-up pointers.

In the future, for example, we could look at whether webshops offering iDEAL do in fact enable visitors to pay using iDEAL. Another useful research focus would be the scalability of the system. After all, producing and analysing screenshots of the sites linked to all 5.9 million .nl domain names would be time-consuming and require considerable processing capacity. We would also like to investigate ways of improving accuracy: annotating more examples or improving input normalisation, for instance.

Finally, logo detection has other potentially valuable applications, such as accreditation logo recognition. It's not unusual for malicious webshops to make fraudulent use of such logos to create a false sense of reliability. Use of an accreditation logo by a shop that isn't actually registered with the relevant accreditation body is therefore a good sign that the webshop in question is suspicious. That has considerable potential in the context of fake webshop detection.