About us

Home / Research / Work Package 5

Norwegian Language Technologies

Home / Research / Work Package 5

About us

Home / Research / Work Package 5

/ Introduction

Language technologies are at the core of media technologies. This work package aims to provide datasets and models for Norwegian (Bokmål/Nynorsk) that support the automated understanding as well as the automated production of media texts in this language. 

Objective: WP5 adopts theoretical approaches and methodologies primarily based on linguistic data science, including neural learning. Based on language data in the media from the user partners and data and tools at the research partners, large corpora will be annotated. The labelled examples in these corpora will be used for training and evaluating supervised models that demonstrate advanced approaches in areas such as robust deep language analysis, adaptive language generation, event identification and extraction, and analyzing opinions. The partners will cooperate to explore the use of such models for innovative purposes.

/ Introduction

Language technologies are at the core of media technologies. This work package aims to provide datasets and models for Norwegian (Bokmål/Nynorsk) that support the automated understanding as well as the automated production of media texts in this language. 

Objective: WP5 adopts theoretical approaches and methodologies primarily based on linguistic data science, including neural learning. Based on language data in the media from the user partners and data and tools at the research partners, large corpora will be annotated. The labelled examples in these corpora will be used for training and evaluating supervised models that demonstrate advanced approaches in areas such as robust deep language analysis, adaptive language generation, event identification and extraction, and analyzing opinions. The partners will cooperate to explore the use of such models for innovative purposes.

/ Introduction

Language technologies are at the core of media technologies. This work package aims to provide datasets and models for Norwegian (Bokmål/Nynorsk) that support the automated understanding as well as the automated production of media texts in this language. 

Objective: WP5 adopts theoretical approaches and methodologies primarily based on linguistic data science, including neural learning. Based on language data in the media from the user partners and data and tools at the research partners, large corpora will be annotated. The labelled examples in these corpora will be used for training and evaluating supervised models that demonstrate advanced approaches in areas such as robust deep language analysis, adaptive language generation, event identification and extraction, and analyzing opinions. The partners will cooperate to explore the use of such models for innovative purposes.

/ People

Lilja Øvrelid

Lilja Øvrelid

Work Package leader

Samia Touileb

Samia Touileb

Work Package co-leader

Lubos Steskal

Lubos Steskal

Industry co-leader


Koenraad De Smedt

Koenraad De Smedt


Erik Velldal

Erik Velldal


Huiling You

Huiling You

PhD Candicate

University of Oslo 

Read more
Emiliano Guevara

Emiliano Guevara


Magnus Breder Birkenes

Magnus Breder Birkenes


/ Publications


Samia Touileb; Jeanett Murstad; Petter Mæhlum; Lubos Steskal; Lilja Charlotte Storset; Huiling You; Lilja Øvrelid

EDEN: A Dataset for Event Detection in Norwegian News Conference

LREC-COLING 2024, 2024.

Abstract | BibTeX | Links:


Ghazaal Sheikhi; Samia Touileb; Sohail Ahmed Khan

Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models Conference


Abstract | BibTeX | Links:

Jeremy Barnes, Samia Touileb, Petter Mæhlum; Pierre Lison

Identifying Token-Level Dialectal Features in Social Media Conference


Abstract | BibTeX | Links:

David Samuel; Andrey Kutuzov; Samia Touileb; Erik Velldal; Lilja Øvrelid; Egil Rønningstad; Elina Sigdel; Anna Palatkina

NorBench – A Benchmark for Norwegian Language Models Conference


Abstract | BibTeX | Links:

Samia Touileb; Lilja Øvrelid; Erik Velldal

Measuring Normative and Descriptive Biases in Language Models Using Census Data Conference


Abstract | BibTeX | Links:


Samia Touileb; Debora Nozza

Measuring Harmful Representations in Scandinavian Language Models Conference


Abstract | BibTeX | Links:

Petter Mæhlum; Andre Kåsen; Samia Touileb; Jeremy Barnes

Annotating Norwegian language varieties on Twitter for Part-of-speech Workshop


Abstract | BibTeX | Links:

Samia Touileb; Lilja Øvrelid; Erik Velldal

Occupational Biases in Norwegian and Multilingual Language Models Workshop


Abstract | BibTeX | Links:


Samia Touileb; Lilja Øvrelid; Erik Velldal

Gender and sentiment, critics and authors: a dataset of Norwegian book reviews Journal Article

In: Gender Bias in Natural Language Processing. Association for Computational Linguistics, 2020, (Pre SFI).

Abstract | BibTeX | Links:

J Barnes; Erik Velldal; Lilja Øvrelid

Improving sentiment analysis with multi-task learning of negation Journal Article

In: 2020, (Pre SFI).

BibTeX | Links:

J Barnes; Lilja Øvrelid; Erik Velldal

Sentiment analysis is not solved! Assessing and probing sentiment classification Proceedings

2020, (Pre SFI).

BibTeX | Links:

Wafia Adouane; Samia Touileb; Jean-Philippe Bernardy

Identifying Sentiments in Algerian Code-switched User-generated Comments Conference

2020, (Pre SFI).

Abstract | BibTeX | Links:

Paul Meurer; Victoria Rosén; Koenraad De Smedt

Interactive Visualizations in INESS Book Chapter

In: Butt, Miriam; Hautli-Janisz, Annette; (Eds.), Verena Lyding (Ed.): 2020, (Pre SFI).

BibTeX | Links:

Lilja Øvrelid; Petter Mæhlum; Jeremy Barnes; Erik Velldal

A Fine-Grained Sentiment Dataset for Norwegian Proceedings

2020, (Pre SFI).

BibTeX | Links:

F Jørgensen; T Aasmoe; ASR Husevåg; Lilja Øvrelid; Erik Velldal (Ed.)

NorNE: Annotating Named Entities for Norwegian Proceedings

2020, (Pre SFI).

BibTeX | Links:

Pierre Lison; Aliaksandr Hubin; Jeremy Barnes; Samia Touileb

Named Entity Recognition without Labelled Data: A Weak Supervision Approach Journal Article

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1518–1533, 2020, (Pre SFI).

Abstract | BibTeX | Links:

Koenraad de Smedt; Dimitris Koureas; Peter Wittenberg

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units Journal Article

In: 2020, (Pre SFI).

BibTeX | Links:


Jeremy Barnes; Samia Touileb; Lilja Øvrelid; Erik Velldal

Lexicon information in neural sentiment analysis: a multi-task learning approach Conference

Linköping University Electronic Press, 2019, (Pre SFI).

Abstract | BibTeX | Links:


Andrey Kutuzov; Lilja Øvrelid; Terrence Szymanski; Erik Velldal

Diachronic word embeddings and semantic shifts: a survey Proceedings

2018, (Pre SFI).

BibTeX | Links:

Erik Velldal; Lilja Øvrelid; Eivind Alexander Bergem; Cathrine Stadsnes; Samia Touileb; Fredrik Jørgensen

NoReC: The Norwegian Review Corpus Proceedings

2018, (Pre SFI).

Abstract | BibTeX


Samia Touileb; Truls Pedersen; Helle Sjøvaag

Automatic identification of unknown names with specific roles Journal Article

In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 150-158, 2017, (Pre SFI).

Abstract | BibTeX | Links:

Andrei Kutuzov; Murhaf Fares; Oepen Stephan; Erik Velldal

Word vectors, reuse, and replicability: Towards a community repository of large-text resources Proceedings

2017, (Pre SFI).

BibTeX | Links:


Victoria Rosén; Martha Thunes; Petter Haugereid; Gyri S. Losnegaard; Helge Dyvik; Paul Meurer; Gunn Lyse; Koenraad De Smedt

The enrichment of lexical resources through incremental parsebanking Journal Article

In: 2016, (Pre SFI).

BibTeX | Links:

Helge Dyvik; Paul Meurer; Victoria Rosén; Koenraad De Smedt; Petter Haugereid; Gyri S. Losnegaard; Gunn Lyse; Martha Thunes

NorGramBank: A 'Deep' Treebank for Norwegian.Proceedings of LREC Proceedings

2016, (Pre SFI).

BibTeX | Links:

Lilja Øvrelid; Petter Hohle

Universal dependencies for Norwegian Proceedings

2016, (Pre SFI).

BibTeX | Links:

Victoria Rosén; Koenraad De Smedt; Gyri S. Losnegaard; Eduard Bejcek; Agata Savary; Petya Osenova

MWEs in Treebanks: From Survey to Guidelines Proceedings

2016, (Pre SFI).

BibTeX | Links:


Emanuele Lapponi; Jonathon Read; Lilja Øvrelid

Representing and resolving negation for sentiment analysis Proceedings

2012, (Pre SFI).

BibTeX | Links:

Erik Velldal; Lilja Øvrelid; Jonathon Read; Stephan Oepen

Speculation and negation: Rules, rankers, and the role of syntax Journal Article

In: 2012, (Pre SFI).

BibTeX | Links:

/ Publications


Touileb, Samia; Murstad, Jeanett; Mæhlum, Petter; Steskal, Lubos; Storset, Lilja Charlotte; You, Huiling; Øvrelid, Lilja

EDEN: A Dataset for Event Detection in Norwegian News Conference

LREC-COLING 2024, 2024.

Abstract | Links | BibTeX


Sheikhi, Ghazaal; Touileb, Samia; Khan, Sohail Ahmed

Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models Conference


Abstract | Links | BibTeX

Samia Touileb Jeremy Barnes, Petter Mæhlum; Lison, Pierre

Identifying Token-Level Dialectal Features in Social Media Conference


Abstract | Links | BibTeX

Samuel, David; Kutuzov, Andrey; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja; Rønningstad, Egil; Sigdel, Elina; Palatkina, Anna

NorBench – A Benchmark for Norwegian Language Models Conference


Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Measuring Normative and Descriptive Biases in Language Models Using Census Data Conference


Abstract | Links | BibTeX


Touileb, Samia; Nozza, Debora

Measuring Harmful Representations in Scandinavian Language Models Conference


Abstract | Links | BibTeX

Mæhlum, Petter; Kåsen, Andre; Touileb, Samia; Barnes, Jeremy

Annotating Norwegian language varieties on Twitter for Part-of-speech Workshop


Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Occupational Biases in Norwegian and Multilingual Language Models Workshop


Abstract | Links | BibTeX


Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Gender and sentiment, critics and authors: a dataset of Norwegian book reviews Journal Article

In: Gender Bias in Natural Language Processing. Association for Computational Linguistics, 2020, (Pre SFI).

Abstract | Links | BibTeX

Barnes, J; Velldal, Erik; Øvrelid, Lilja

Improving sentiment analysis with multi-task learning of negation Journal Article

In: 2020, (Pre SFI).

Links | BibTeX

Barnes, J; Øvrelid, Lilja; Velldal, Erik

Sentiment analysis is not solved! Assessing and probing sentiment classification Proceedings

2020, (Pre SFI).

Links | BibTeX

Adouane, Wafia; Touileb, Samia; Bernardy, Jean-Philippe

Identifying Sentiments in Algerian Code-switched User-generated Comments Conference

2020, (Pre SFI).

Abstract | Links | BibTeX

Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De

Interactive Visualizations in INESS Book Chapter

In: Butt, Miriam; Hautli-Janisz, Annette; (Eds.), Verena Lyding (Ed.): 2020, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Mæhlum, Petter; Barnes, Jeremy; Velldal, Erik

A Fine-Grained Sentiment Dataset for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Jørgensen, F; Aasmoe, T; Husevåg, ASR; Øvrelid, Lilja; Velldal, Erik (Ed.)

NorNE: Annotating Named Entities for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Lison, Pierre; Hubin, Aliaksandr; Barnes, Jeremy; Touileb, Samia

Named Entity Recognition without Labelled Data: A Weak Supervision Approach Journal Article

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1518–1533, 2020, (Pre SFI).

Abstract | Links | BibTeX

de Smedt, Koenraad; Koureas, Dimitris; Wittenberg, Peter

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units Journal Article

In: 2020, (Pre SFI).

Links | BibTeX


Barnes, Jeremy; Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Lexicon information in neural sentiment analysis: a multi-task learning approach Conference

Linköping University Electronic Press, 2019, (Pre SFI).

Abstract | Links | BibTeX


Kutuzov, Andrey; Øvrelid, Lilja; Szymanski, Terrence; Velldal, Erik

Diachronic word embeddings and semantic shifts: a survey Proceedings

2018, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Bergem, Eivind Alexander; Stadsnes, Cathrine; Touileb, Samia; Jørgensen, Fredrik

NoReC: The Norwegian Review Corpus Proceedings

2018, (Pre SFI).

Abstract | BibTeX


Touileb, Samia; Pedersen, Truls; Sjøvaag, Helle

Automatic identification of unknown names with specific roles Journal Article

In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 150-158, 2017, (Pre SFI).

Abstract | Links | BibTeX

Kutuzov, Andrei; Fares, Murhaf; Stephan, Oepen; Velldal, Erik

Word vectors, reuse, and replicability: Towards a community repository of large-text resources Proceedings

2017, (Pre SFI).

Links | BibTeX


Rosén, Victoria; Thunes, Martha; Haugereid, Petter; Losnegaard, Gyri S.; Dyvik, Helge; Meurer, Paul; Lyse, Gunn; Smedt, Koenraad De

The enrichment of lexical resources through incremental parsebanking Journal Article

In: 2016, (Pre SFI).

Links | BibTeX

Dyvik, Helge; Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De; Haugereid, Petter; Losnegaard, Gyri S.; Lyse, Gunn; Thunes, Martha

NorGramBank: A 'Deep' Treebank for Norwegian.Proceedings of LREC Proceedings

2016, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Hohle, Petter

Universal dependencies for Norwegian Proceedings

2016, (Pre SFI).

Links | BibTeX

Rosén, Victoria; Smedt, Koenraad De; Losnegaard, Gyri S.; Bejcek, Eduard; Savary, Agata; Osenova, Petya

MWEs in Treebanks: From Survey to Guidelines Proceedings

2016, (Pre SFI).

Links | BibTeX


Lapponi, Emanuele; Read, Jonathon; Øvrelid, Lilja

Representing and resolving negation for sentiment analysis Proceedings

2012, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Read, Jonathon; Oepen, Stephan

Speculation and negation: Rules, rankers, and the role of syntax Journal Article

In: 2012, (Pre SFI).

Links | BibTeX

/ Publications


Touileb, Samia; Murstad, Jeanett; Mæhlum, Petter; Steskal, Lubos; Storset, Lilja Charlotte; You, Huiling; Øvrelid, Lilja

EDEN: A Dataset for Event Detection in Norwegian News Conference

LREC-COLING 2024, 2024.

Abstract | Links | BibTeX


Sheikhi, Ghazaal; Touileb, Samia; Khan, Sohail Ahmed

Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models Conference


Abstract | Links | BibTeX

Samia Touileb Jeremy Barnes, Petter Mæhlum; Lison, Pierre

Identifying Token-Level Dialectal Features in Social Media Conference


Abstract | Links | BibTeX

Samuel, David; Kutuzov, Andrey; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja; Rønningstad, Egil; Sigdel, Elina; Palatkina, Anna

NorBench – A Benchmark for Norwegian Language Models Conference


Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Measuring Normative and Descriptive Biases in Language Models Using Census Data Conference


Abstract | Links | BibTeX


Touileb, Samia; Nozza, Debora

Measuring Harmful Representations in Scandinavian Language Models Conference


Abstract | Links | BibTeX

Mæhlum, Petter; Kåsen, Andre; Touileb, Samia; Barnes, Jeremy

Annotating Norwegian language varieties on Twitter for Part-of-speech Workshop


Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Occupational Biases in Norwegian and Multilingual Language Models Workshop


Abstract | Links | BibTeX


Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Gender and sentiment, critics and authors: a dataset of Norwegian book reviews Journal Article

In: Gender Bias in Natural Language Processing. Association for Computational Linguistics, 2020, (Pre SFI).

Abstract | Links | BibTeX

Barnes, J; Velldal, Erik; Øvrelid, Lilja

Improving sentiment analysis with multi-task learning of negation Journal Article

In: 2020, (Pre SFI).

Links | BibTeX

Barnes, J; Øvrelid, Lilja; Velldal, Erik

Sentiment analysis is not solved! Assessing and probing sentiment classification Proceedings

2020, (Pre SFI).

Links | BibTeX

Adouane, Wafia; Touileb, Samia; Bernardy, Jean-Philippe

Identifying Sentiments in Algerian Code-switched User-generated Comments Conference

2020, (Pre SFI).

Abstract | Links | BibTeX

Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De

Interactive Visualizations in INESS Book Chapter

In: Butt, Miriam; Hautli-Janisz, Annette; (Eds.), Verena Lyding (Ed.): 2020, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Mæhlum, Petter; Barnes, Jeremy; Velldal, Erik

A Fine-Grained Sentiment Dataset for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Jørgensen, F; Aasmoe, T; Husevåg, ASR; Øvrelid, Lilja; Velldal, Erik (Ed.)

NorNE: Annotating Named Entities for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Lison, Pierre; Hubin, Aliaksandr; Barnes, Jeremy; Touileb, Samia

Named Entity Recognition without Labelled Data: A Weak Supervision Approach Journal Article

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1518–1533, 2020, (Pre SFI).

Abstract | Links | BibTeX

de Smedt, Koenraad; Koureas, Dimitris; Wittenberg, Peter

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units Journal Article

In: 2020, (Pre SFI).

Links | BibTeX


Barnes, Jeremy; Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Lexicon information in neural sentiment analysis: a multi-task learning approach Conference

Linköping University Electronic Press, 2019, (Pre SFI).

Abstract | Links | BibTeX


Kutuzov, Andrey; Øvrelid, Lilja; Szymanski, Terrence; Velldal, Erik

Diachronic word embeddings and semantic shifts: a survey Proceedings

2018, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Bergem, Eivind Alexander; Stadsnes, Cathrine; Touileb, Samia; Jørgensen, Fredrik

NoReC: The Norwegian Review Corpus Proceedings

2018, (Pre SFI).

Abstract | BibTeX


Touileb, Samia; Pedersen, Truls; Sjøvaag, Helle

Automatic identification of unknown names with specific roles Journal Article

In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 150-158, 2017, (Pre SFI).

Abstract | Links | BibTeX

Kutuzov, Andrei; Fares, Murhaf; Stephan, Oepen; Velldal, Erik

Word vectors, reuse, and replicability: Towards a community repository of large-text resources Proceedings

2017, (Pre SFI).

Links | BibTeX


Rosén, Victoria; Thunes, Martha; Haugereid, Petter; Losnegaard, Gyri S.; Dyvik, Helge; Meurer, Paul; Lyse, Gunn; Smedt, Koenraad De

The enrichment of lexical resources through incremental parsebanking Journal Article

In: 2016, (Pre SFI).

Links | BibTeX

Dyvik, Helge; Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De; Haugereid, Petter; Losnegaard, Gyri S.; Lyse, Gunn; Thunes, Martha

NorGramBank: A 'Deep' Treebank for Norwegian.Proceedings of LREC Proceedings

2016, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Hohle, Petter

Universal dependencies for Norwegian Proceedings

2016, (Pre SFI).

Links | BibTeX

Rosén, Victoria; Smedt, Koenraad De; Losnegaard, Gyri S.; Bejcek, Eduard; Savary, Agata; Osenova, Petya

MWEs in Treebanks: From Survey to Guidelines Proceedings

2016, (Pre SFI).

Links | BibTeX


Lapponi, Emanuele; Read, Jonathon; Øvrelid, Lilja

Representing and resolving negation for sentiment analysis Proceedings

2012, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Read, Jonathon; Oepen, Stephan

Speculation and negation: Rules, rankers, and the role of syntax Journal Article

In: 2012, (Pre SFI).

Links | BibTeX

Find us

Lars Hilles gate 30
5008 Bergen

Contact us



Responsible Editor:
Centre Director Prof. Dr. Christoph Trattner


Subscribe to our monthly Newsletter by sending mail to office@mediafutures.no


Hosted by 




Copyright © University of Bergen 2024