Embarking on uncharted terrain, MediaFutures scientists conducted a study to assess the effectiveness of Norwegian pre-trained language models (LM) in discerning veracity of claims. The results demonstrate that Norwegian pre-trained LMs outperform a SVM baseline system. Their research yields valuable insights into the future of automated claim detection tools for fact-checking in low-and medium-resourced languages.

In today’s landscape, characterized by the proliferation of fake news and increasing prevalence of misinformation, fact-checking has become an indispensable and essential part of journalism.

The traditional, manually conducted fact-checking, however, proves to be a tedious taks, considering the staggering quantities of information, and the speed with which it spreads.

Automated fact-checking systems have been introduced as means to alleviate the time and effort required for traditional fact verification processes.

“Automated fact-checking is a multifaceted process involving three main tasks: claim detection, evidence retrieval, and claim verification,” explains researcher Ghazaal Sheikhi. “By monitoring sources like social media or politics, we can identify claims or statements that need to be checked, gather relevant sources that help to support or refute the claim in question, and provide a verdict by determining the accuracy and validity of the claim. The development of automation tools has been tailored to meet the needs of fact-checkers, with claim detection being the top priority”, adds Sheikhi.  

Building upon this notion, a recent collaborative effort across two interdisciplinary work packages involving researchers Ghazaal Sheikhi, Samia Touileb, and  Sohail Ahmed Khan sought to explore to what extent Norwegian pre-trained language models can be used for automated claim detection for fact-checking in a low resource setting.

 A Norwegian first

Researcher Samia Touileb highlights the growing interest in automated claim detection for fact-checking, noting that it has become an appealing area of research in the field of natural language processing. “With the amount of misinformation spreading so quickly nowadays, automated claim detection has garnered significant attention in recent years”, she says.

While previous research has delved into fine-tuning LMs for claim detection in various languages, the potential of Norwegian LMs in this domain remained unexplored – until now: “To the best of our knowledge, our recent study marks the first attempt at automated claim detection for Norwegian using LLMs”, says Touileb.

The research in question sought to ascertain the performance of Norwegian LMs in automated claim detection when compared to a simple Support Vector Machine (SVM) baseline. This comparison serves as a benchmark to evaluate the effectiveness of Norwegian LMs in this context. Secondly, the study aimed to identify specific challenges and areas where these LMs encounter difficulties in the claim detection process.

To address these questions, the researchers first fine-tuned four Norwegian LMs on a small dataset provided by the non-profit fact-checking organization Faktisk.no, comprising claims manually annotated with labels reflecting their check-worthiness. Then they manually analysed the misclassifications of each model and provided an error analysis.

“The results showed that language models outperform the baseline system. Different models can be selected for different purposes. If the overall performance is to be prioritized, the NorBERT2 model is the best performing. If the recall is the focus, then the biggest NB-BERT ((Kummervold et al., 2021) large model is to be selected”, explains Touileb.

The scientists emphasized there is more that can be uncovered from the behaviour of those models, and they plan to explore this in future works.

 

The research paper “Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models” can be accessed here.

References:

Per E Kummervold, Javier De la Rosa, Freddy Wetjen, and Svein Arne Brygfjeld. 2021. Operationalizing a national digital library: The case for a Norwegian transformer model. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 20–29, Reykjavik, Iceland (Online). Link ̈oping University Electronic Press, Sweden.

Illustration:  Insurtech Insights