Turning noise into knowledge: Interview with scientist Elena Merdjanovska on machine learning and noisy labels

SCIoI researcher Elena Merdjanovska discusses her work on improving AI models’ reliability by addressing noisy labels – faulty annotations – in data and new benchmarks with the press team. Her current project focuses on improving the way models learn from imperfect data, contributing to the development of responsible AI and enhancing its trustworthiness.

Thank you for joining us today, Elena. To start, could you tell us a bit about your scientific background and where you come from?

Elena: Of course. I’m originally from North Macedonia, and I have a background in electrical and computer engineering. My early research was in biomedical engineering, specifically working with EKG signals and applying machine learning techniques, mainly deep learning, to analyze this data. I’ve worked in Slovenia, and during my master’s, I published several papers in this field.

Can you describe your current project and its goals?

Elena: My current project focuses on machine learning, specifically learning with noisy labels. In machine learning, the quality of labeled data is crucial for training accurate models. However, obtaining high-quality labeled data can be expensive and time-consuming, especially in fields like medical research where data is limited and expert annotations are required. My project aims to develop methods to train models effectively even when the data has incorrect or noisy labels.

Could you provide an example to illustrate the challenges you’re addressing?

Elena: Sure. We‘re mainly dealing with textual data, so let’s take the example of training a model to classify news articles by topic. While it’s easy to gather news articles, labeling them accurately by topic is challenging. For instance, distinguishing between ‘politics’ and ‘economics’ can be subjective and prone to variation among different annotators. Similarly, in medical data, even labels provided by doctors might contain mistakes due to human error. These inaccuracies can significantly impact the performance of machine learning models.

How does your work address these labeling issues?

Elena: Our goal is to improve the way models learn from imperfect data. Typically, models might memorize incorrect labels, which reduces their overall accuracy. We are developing methods that help models identify and disregard these noisy labels during training. By doing this, the models can focus on learning the general patterns in the data rather than memorizing errors.

How do you measure the effectiveness of these methods?

Elena: We use a combination of approaches. One method involves looking at when certain patterns are learned during the training process. If a model learns a pattern early on, it’s likely a general feature. If it only learns it later, it might be noise or an anomaly. By measuring these learning timelines, we can adjust the model to focus on reliable data. Additionally, we create benchmarks with varying levels of noise to evaluate our methods under different conditions.

You mentioned that human intelligence can sense and disregard errors. How does your work relate to this?

Elena: Yes, that’s a crucial aspect. Human intelligence has an innate ability to filter out incorrect information. For example, if someone tells you a fact that doesn’t seem right, you might doubt it and not rely on it. Current artificial intelligence lacks this capability. By making models more robust to noisy labels, we’re trying to bridge this gap between human and artificial intelligence, making AI systems more reliable and trustworthy.

How does this project fit into the broader context of AI research?

Elena: Our work contributes to the development of responsible AI. By improving models’ ability to handle noisy data, we enhance their reliability and trustworthiness. This is particularly important as AI becomes more integrated into daily life. Better handling of uncertain information can lead to more ethical and trustworthy AI systems, which is a significant societal benefit.

Could you talk about your role and the impact of your research at Science of Intelligence (SCIoI)?

Elena: Certainly. At SCIoI, our primary aim is to explore the principles of intelligence, both biological and artificial. My research on noisy labels fits into this broader objective by addressing how artificial systems can better mimic human learning processes. By making AI more robust to errors, we not only improve the technology but also gain insights into how human intelligence copes with imperfect information. This contributes to SCIoI’s goal of understanding and replicating intelligent behavior across different domains.

How does your research integrate with other projects at SCIoI?

Elena: Collaboration is a key component at SCIoI. My work on noisy labels intersects with various projects that involve data quality and model robustness. For instance, some of my colleagues are working on continuous learning and interactive learning environments. The insights from my research can be applied to these projects to improve how models learn over time and interact with uncertain data. We often share findings and methodologies to enhance our collective understanding of intelligence.

What’s the next step for your research?

Elena: We‘ve already contributed to the field with the development of a new benchmark, called NoiseBench, that better reflects real-world data imperfections in natural language processing. The next step is to refine our methods and test them across various domains to ensure their generalizability. We also aim to collaborate with other researchers to apply our techniques to different types of data, further enhancing the robustness and applicability of machine learning models.

Thank you so much for your time, Elena.

Elena: Thank you. It’s been a pleasure discussing our work.