Our machine learning workshop for the TMA PhD school

Introducing PhD students to machine learning in an operational setting

Thijs van den Hout presents during a workshop on operationalising machine learning models during TMA PhD School.

Thursday 21 July 2022
Article by: Thijs van den Hout, Thymen Wabeke, Giovane Moura

We gave a workshop on operationalizing machine learning models at the 10th edition of the TMA PhD school, which was held at the University of Twente on June 27-28. Our workshop attracted around 50 PhD students, with whom we explored the differences between using such models in an academic and operational environment. One of the key learning activities revolved around the students completing two assignments closely related to our work at SIDN Labs on recognizing suspicious .nl registrations,the models for which we are currently incorporating in SIDN’s operational services. In this blog, we elaborate on the TMA PhD school, our workshop and we reflect on our experience.

TMA PhD school

The PhD schoolcovered the first two days of the Network Traffic Measurement and Analysis (TMA) conference, a highly selective venue for peer-reviewed research on various aspects of network measurements. TMA is one of the conferences we regularly attend and submit papers to because it aligns with our research on Internet security. This year's conference took place in the last week of June at the University of Twente in Enschede, the Netherlands.

As in previous years, the goal of this year's PhD school was to enable PhD students in the field of network measurements to interact with seasoned academic and industry experts as well as with their fellow students.PhD students could join in interactive presentations, discussions, and hands-on workshops around this year’s theme, which was “Network Intelligence and Measurements”. The PhD school also included the usual poster sessions, which allow students to present their ongoing work to other students and lecturers and receive feedback.

SIDN Labs as co-organizer

For this 10th edition of the TMA PhD school, we at SIDN Labs were one of the co-organizers. One of our main contributions was our workshop on the operationalization of machine learning models. Its goal was to let the students experience machine learning in a setting different from their familiar academic one and learn some new machine learning techniques along the way.

Figure 1: Thijs van den Hout presents during a workshop on operationalizing machine learning models during TMA PhD School (photo: Mattijs Jonker).

We were also one of the TMA sponsors this year, which enabled the organizers to rent meeting rooms, organize a social event (at Oyfo Museum of Technology), and provide travel grants to students to come to Enschede, amongst others.

Quiz: Machine learning in academics versus operations

The first part of our workshop focused on discussing the differences in the application of machine learning in an academic setting versus an operational setting. We hosted a quiz where participants voted, for instance, on the importance of explainable AI in academics and/or operations and discussed why they thought certain aspects of machine learning to be more important in academia or in an operational setting.

After that we introduced the problem statement for the two coding assignments. We chose a problem in the space of Internet security, which is the focus of our research at SIDN Labs. Specifically, we picked a project we are currently working on, namely the classification of malicious .nl domain name registrations. The data on which we train the classifiers for this purpose contains a lot of personally identifiable information from the .nl registration database, which we clearly didn’t want to make available in the workshop. We therefore chose a publicly available dataset with a similar premise and characteristics: the classification of fraudulent credit card transactions.

Thymen Wabeke presents during a workshop on operationalising machine learning models during TMA PhD School.

Figure 2: Thijs van den Hout presents during a workshop on operationalizing machine learning models during TMA PhD School (photo: Mattijs Jonker).

Assignment 1: Developing a machine learning classifier

The goal of the first assignment was to develop an initial machine learning model which could classify credit card transactions as being fraudulent or legitimate. We highlighted the importance of picking the right evaluation metric (e.g., precision-recall curve) and discussing with stakeholders (e.g., abuse-analysts and product managers) what type of mistakes cause problems and which ones less so. A related consideration is whether the machine learning model should prioritize high accuracy or good interpretability.

Finally, participants evaluated their initial models on a previously unseen dataset only to discover their model was not performing so well… which led right to the second assignment: improving their model such that it can be used in an operational setting.

Assignment 2: Iteratively improve using active learning

For the second assignment, we simulated a scenario in which each week, experts could label 50 credit card transactions as being either fraudulent or legitimate. After each iteration, the students could train the machine learning model again on the now larger set of training data, improving it bit by bit.

This concept is called active learning; it allows the iterative improvement of machine learning models with as little ground truth data as possible and thereby reduce labeling costs. The success of active learning highly depends on the selection of informative data samples from a large pool of unlabeled data. Various strategies for selecting informative samples exist, and we asked the students to explore these strategies and pick one that performed well for the current task.

At the end of the assignment, the students had improved their machine learning model and we wrapped up the workshop with a discussion of the various strategies they used and the performance they achieved.

We look back on a successful workshop

We are very happy to have had the opportunity to host a workshop during the TMA PhD school and help the attendees to further understand the basics of the application of machine learning in an operational environment. The feedback we received was very positive overall, and the students particularly enjoyed the hands-on assignments.

In the future, we plan to give this workshop at other venues and come up with other assignments to inspire the global internet community on how to further improve Internet security using the (responsible) use of machine learning.

Check out the workshop’s assignments

We have published the assignments’ code and the accompanying slides so that you can also go through the workshop yourself (even if you’re not a PhD student ;-)).

We’d be happy to receive your feedback or explore possibilities for collaboration.

Article by: