About us

Home / Research / Work Package 5

Norwegian Language Technologies

Home / Research / Work Package 5

About us

Home / Research / Work Package 5

/ Introduction

Language technologies are at the core of media technologies. This work package aims to provide datasets and models for Norwegian (Bokmål/Nynorsk) that support the automated understanding as well as the automated production of media texts in this language. 

Objective: WP5 adopts theoretical approaches and methodologies primarily based on linguistic data science, including neural learning. Based on language data in the media from the user partners and data and tools at the research partners, large corpora will be annotated. The labelled examples in these corpora will be used for training and evaluating supervised models that demonstrate advanced approaches in areas such as robust deep language analysis, adaptive language generation, event identification and extraction, and analyzing opinions. The partners will cooperate to explore the use of such models for innovative purposes.

/ Introduction

Language technologies are at the core of media technologies. This work package aims to provide datasets and models for Norwegian (Bokmål/Nynorsk) that support the automated understanding as well as the automated production of media texts in this language. 

Objective: WP5 adopts theoretical approaches and methodologies primarily based on linguistic data science, including neural learning. Based on language data in the media from the user partners and data and tools at the research partners, large corpora will be annotated. The labelled examples in these corpora will be used for training and evaluating supervised models that demonstrate advanced approaches in areas such as robust deep language analysis, adaptive language generation, event identification and extraction, and analyzing opinions. The partners will cooperate to explore the use of such models for innovative purposes.

/ Introduction

Language technologies are at the core of media technologies. This work package aims to provide datasets and models for Norwegian (Bokmål/Nynorsk) that support the automated understanding as well as the automated production of media texts in this language. 

Objective: WP5 adopts theoretical approaches and methodologies primarily based on linguistic data science, including neural learning. Based on language data in the media from the user partners and data and tools at the research partners, large corpora will be annotated. The labelled examples in these corpora will be used for training and evaluating supervised models that demonstrate advanced approaches in areas such as robust deep language analysis, adaptive language generation, event identification and extraction, and analyzing opinions. The partners will cooperate to explore the use of such models for innovative purposes.

/ People

Lilja Øvrelid

Lilja Øvrelid

Work Package Leader

Koenraad De Smedt

Koenraad De Smedt

Work Package Co-Leader

Lubos Steskal

Lubos Steskal

TV2

Eivind Throndsen

Eivind Throndsen

Work Package Industry leader

Schibsted

Samia Touileb

Samia Touileb

Researcher

Huiling You

Huiling You

PhD Candicate

University of Oslo 

Read more
Emiliano Guevara

Emiliano Guevara

Amedia

Erik Velldal

Erik Velldal

UiO

Magnus Breder Birkenes

Magnus Breder Birkenes

Nasjonalbiblioteket

/ Publications

2023

David Samuel; Andrey Kutuzov; Samia Touileb; Erik Velldal; Lilja Øvrelid; Egil Rønningstad; Elina Sigdel; Anna Palatkina

NorBench – A Benchmark for Norwegian Language Models Conference

2023.

Abstract | BibTeX | Links:

Jeremy Barnes, Samia Touileb, Petter Mæhlum; Pierre Lison

Identifying Token-Level Dialectal Features in Social Media Conference

2023.

Abstract | BibTeX | Links:

Ghazaal Sheikhi; Samia Touileb; Sohail Ahmed Khan

Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models Conference

2023.

Abstract | BibTeX | Links:

Samia Touileb; Lilja Øvrelid; Erik Velldal

Measuring Normative and Descriptive Biases in Language Models Using Census Data Conference

2023.

Abstract | BibTeX | Links:

2022

Samia Touileb; Debora Nozza

Measuring Harmful Representations in Scandinavian Language Models Conference

2022.

Abstract | BibTeX | Links:

Petter Mæhlum; Andre Kåsen; Samia Touileb; Jeremy Barnes

Annotating Norwegian language varieties on Twitter for Part-of-speech Workshop

2022.

Abstract | BibTeX | Links:

Samia Touileb; Lilja Øvrelid; Erik Velldal

Occupational Biases in Norwegian and Multilingual Language Models Workshop

2022.

Abstract | BibTeX | Links:

2020

Samia Touileb; Lilja Øvrelid; Erik Velldal

Gender and sentiment, critics and authors: a dataset of Norwegian book reviews Journal Article

In: Gender Bias in Natural Language Processing. Association for Computational Linguistics, 2020, (Pre SFI).

Abstract | BibTeX | Links:

J Barnes; Erik Velldal; Lilja Øvrelid

Improving sentiment analysis with multi-task learning of negation Journal Article

In: 2020, (Pre SFI).

BibTeX | Links:

J Barnes; Lilja Øvrelid; Erik Velldal

Sentiment analysis is not solved! Assessing and probing sentiment classification Proceedings

2020, (Pre SFI).

BibTeX | Links:

Wafia Adouane; Samia Touileb; Jean-Philippe Bernardy

Identifying Sentiments in Algerian Code-switched User-generated Comments Conference

2020, (Pre SFI).

Abstract | BibTeX | Links:

Paul Meurer; Victoria Rosén; Koenraad De Smedt

Interactive Visualizations in INESS Book Chapter

In: Butt, Miriam; Hautli-Janisz, Annette; (Eds.), Verena Lyding (Ed.): 2020, (Pre SFI).

BibTeX | Links:

F Jørgensen; T Aasmoe; ASR Husevåg; Lilja Øvrelid; Erik Velldal (Ed.)

NorNE: Annotating Named Entities for Norwegian Proceedings

2020, (Pre SFI).

BibTeX | Links:

Lilja Øvrelid; Petter Mæhlum; Jeremy Barnes; Erik Velldal

A Fine-Grained Sentiment Dataset for Norwegian Proceedings

2020, (Pre SFI).

BibTeX | Links:

Pierre Lison; Aliaksandr Hubin; Jeremy Barnes; Samia Touileb

Named Entity Recognition without Labelled Data: A Weak Supervision Approach Journal Article

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1518–1533, 2020, (Pre SFI).

Abstract | BibTeX | Links:

Koenraad de Smedt; Dimitris Koureas; Peter Wittenberg

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units Journal Article

In: 2020, (Pre SFI).

BibTeX | Links:

2019

Jeremy Barnes; Samia Touileb; Lilja Øvrelid; Erik Velldal

Lexicon information in neural sentiment analysis: a multi-task learning approach Conference

Linköping University Electronic Press, 2019, (Pre SFI).

Abstract | BibTeX | Links:

2018

Andrey Kutuzov; Lilja Øvrelid; Terrence Szymanski; Erik Velldal

Diachronic word embeddings and semantic shifts: a survey Proceedings

2018, (Pre SFI).

BibTeX | Links:

Erik Velldal; Lilja Øvrelid; Eivind Alexander Bergem; Cathrine Stadsnes; Samia Touileb; Fredrik Jørgensen

NoReC: The Norwegian Review Corpus Proceedings

2018, (Pre SFI).

Abstract | BibTeX

2017

Samia Touileb; Truls Pedersen; Helle Sjøvaag

Automatic identification of unknown names with specific roles Journal Article

In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 150-158, 2017, (Pre SFI).

Abstract | BibTeX | Links:

Andrei Kutuzov; Murhaf Fares; Oepen Stephan; Erik Velldal

Word vectors, reuse, and replicability: Towards a community repository of large-text resources Proceedings

2017, (Pre SFI).

BibTeX | Links:

2016

Victoria Rosén; Martha Thunes; Petter Haugereid; Gyri S. Losnegaard; Helge Dyvik; Paul Meurer; Gunn Lyse; Koenraad De Smedt

The enrichment of lexical resources through incremental parsebanking Journal Article

In: 2016, (Pre SFI).

BibTeX | Links:

Helge Dyvik; Paul Meurer; Victoria Rosén; Koenraad De Smedt; Petter Haugereid; Gyri S. Losnegaard; Gunn Lyse; Martha Thunes

NorGramBank: A 'Deep' Treebank for Norwegian.Proceedings of LREC Proceedings

2016, (Pre SFI).

BibTeX | Links:

Victoria Rosén; Koenraad De Smedt; Gyri S. Losnegaard; Eduard Bejcek; Agata Savary; Petya Osenova

MWEs in Treebanks: From Survey to Guidelines Proceedings

2016, (Pre SFI).

BibTeX | Links:

Lilja Øvrelid; Petter Hohle

Universal dependencies for Norwegian Proceedings

2016, (Pre SFI).

BibTeX | Links:

2012

Emanuele Lapponi; Jonathon Read; Lilja Øvrelid

Representing and resolving negation for sentiment analysis Proceedings

2012, (Pre SFI).

BibTeX | Links:

Erik Velldal; Lilja Øvrelid; Jonathon Read; Stephan Oepen

Speculation and negation: Rules, rankers, and the role of syntax Journal Article

In: 2012, (Pre SFI).

BibTeX | Links:

/ Publications

2023

Samuel, David; Kutuzov, Andrey; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja; Rønningstad, Egil; Sigdel, Elina; Palatkina, Anna

NorBench – A Benchmark for Norwegian Language Models Conference

2023.

Abstract | Links | BibTeX

Samia Touileb Jeremy Barnes, Petter Mæhlum; Lison, Pierre

Identifying Token-Level Dialectal Features in Social Media Conference

2023.

Abstract | Links | BibTeX

Sheikhi, Ghazaal; Touileb, Samia; Khan, Sohail Ahmed

Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models Conference

2023.

Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Measuring Normative and Descriptive Biases in Language Models Using Census Data Conference

2023.

Abstract | Links | BibTeX

2022

Touileb, Samia; Nozza, Debora

Measuring Harmful Representations in Scandinavian Language Models Conference

2022.

Abstract | Links | BibTeX

Mæhlum, Petter; Kåsen, Andre; Touileb, Samia; Barnes, Jeremy

Annotating Norwegian language varieties on Twitter for Part-of-speech Workshop

2022.

Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Occupational Biases in Norwegian and Multilingual Language Models Workshop

2022.

Abstract | Links | BibTeX

2020

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Gender and sentiment, critics and authors: a dataset of Norwegian book reviews Journal Article

In: Gender Bias in Natural Language Processing. Association for Computational Linguistics, 2020, (Pre SFI).

Abstract | Links | BibTeX

Barnes, J; Velldal, Erik; Øvrelid, Lilja

Improving sentiment analysis with multi-task learning of negation Journal Article

In: 2020, (Pre SFI).

Links | BibTeX

Barnes, J; Øvrelid, Lilja; Velldal, Erik

Sentiment analysis is not solved! Assessing and probing sentiment classification Proceedings

2020, (Pre SFI).

Links | BibTeX

Adouane, Wafia; Touileb, Samia; Bernardy, Jean-Philippe

Identifying Sentiments in Algerian Code-switched User-generated Comments Conference

2020, (Pre SFI).

Abstract | Links | BibTeX

Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De

Interactive Visualizations in INESS Book Chapter

In: Butt, Miriam; Hautli-Janisz, Annette; (Eds.), Verena Lyding (Ed.): 2020, (Pre SFI).

Links | BibTeX

Jørgensen, F; Aasmoe, T; Husevåg, ASR; Øvrelid, Lilja; Velldal, Erik (Ed.)

NorNE: Annotating Named Entities for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Mæhlum, Petter; Barnes, Jeremy; Velldal, Erik

A Fine-Grained Sentiment Dataset for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Lison, Pierre; Hubin, Aliaksandr; Barnes, Jeremy; Touileb, Samia

Named Entity Recognition without Labelled Data: A Weak Supervision Approach Journal Article

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1518–1533, 2020, (Pre SFI).

Abstract | Links | BibTeX

de Smedt, Koenraad; Koureas, Dimitris; Wittenberg, Peter

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units Journal Article

In: 2020, (Pre SFI).

Links | BibTeX

2019

Barnes, Jeremy; Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Lexicon information in neural sentiment analysis: a multi-task learning approach Conference

Linköping University Electronic Press, 2019, (Pre SFI).

Abstract | Links | BibTeX

2018

Kutuzov, Andrey; Øvrelid, Lilja; Szymanski, Terrence; Velldal, Erik

Diachronic word embeddings and semantic shifts: a survey Proceedings

2018, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Bergem, Eivind Alexander; Stadsnes, Cathrine; Touileb, Samia; Jørgensen, Fredrik

NoReC: The Norwegian Review Corpus Proceedings

2018, (Pre SFI).

Abstract | BibTeX

2017

Touileb, Samia; Pedersen, Truls; Sjøvaag, Helle

Automatic identification of unknown names with specific roles Journal Article

In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 150-158, 2017, (Pre SFI).

Abstract | Links | BibTeX

Kutuzov, Andrei; Fares, Murhaf; Stephan, Oepen; Velldal, Erik

Word vectors, reuse, and replicability: Towards a community repository of large-text resources Proceedings

2017, (Pre SFI).

Links | BibTeX

2016

Rosén, Victoria; Thunes, Martha; Haugereid, Petter; Losnegaard, Gyri S.; Dyvik, Helge; Meurer, Paul; Lyse, Gunn; Smedt, Koenraad De

The enrichment of lexical resources through incremental parsebanking Journal Article

In: 2016, (Pre SFI).

Links | BibTeX

Dyvik, Helge; Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De; Haugereid, Petter; Losnegaard, Gyri S.; Lyse, Gunn; Thunes, Martha

NorGramBank: A 'Deep' Treebank for Norwegian.Proceedings of LREC Proceedings

2016, (Pre SFI).

Links | BibTeX

Rosén, Victoria; Smedt, Koenraad De; Losnegaard, Gyri S.; Bejcek, Eduard; Savary, Agata; Osenova, Petya

MWEs in Treebanks: From Survey to Guidelines Proceedings

2016, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Hohle, Petter

Universal dependencies for Norwegian Proceedings

2016, (Pre SFI).

Links | BibTeX

2012

Lapponi, Emanuele; Read, Jonathon; Øvrelid, Lilja

Representing and resolving negation for sentiment analysis Proceedings

2012, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Read, Jonathon; Oepen, Stephan

Speculation and negation: Rules, rankers, and the role of syntax Journal Article

In: 2012, (Pre SFI).

Links | BibTeX

/ Publications

2023

Samuel, David; Kutuzov, Andrey; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja; Rønningstad, Egil; Sigdel, Elina; Palatkina, Anna

NorBench – A Benchmark for Norwegian Language Models Conference

2023.

Abstract | Links | BibTeX

Samia Touileb Jeremy Barnes, Petter Mæhlum; Lison, Pierre

Identifying Token-Level Dialectal Features in Social Media Conference

2023.

Abstract | Links | BibTeX

Sheikhi, Ghazaal; Touileb, Samia; Khan, Sohail Ahmed

Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models Conference

2023.

Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Measuring Normative and Descriptive Biases in Language Models Using Census Data Conference

2023.

Abstract | Links | BibTeX

2022

Touileb, Samia; Nozza, Debora

Measuring Harmful Representations in Scandinavian Language Models Conference

2022.

Abstract | Links | BibTeX

Mæhlum, Petter; Kåsen, Andre; Touileb, Samia; Barnes, Jeremy

Annotating Norwegian language varieties on Twitter for Part-of-speech Workshop

2022.

Abstract | Links | BibTeX

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Occupational Biases in Norwegian and Multilingual Language Models Workshop

2022.

Abstract | Links | BibTeX

2020

Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Gender and sentiment, critics and authors: a dataset of Norwegian book reviews Journal Article

In: Gender Bias in Natural Language Processing. Association for Computational Linguistics, 2020, (Pre SFI).

Abstract | Links | BibTeX

Barnes, J; Velldal, Erik; Øvrelid, Lilja

Improving sentiment analysis with multi-task learning of negation Journal Article

In: 2020, (Pre SFI).

Links | BibTeX

Barnes, J; Øvrelid, Lilja; Velldal, Erik

Sentiment analysis is not solved! Assessing and probing sentiment classification Proceedings

2020, (Pre SFI).

Links | BibTeX

Adouane, Wafia; Touileb, Samia; Bernardy, Jean-Philippe

Identifying Sentiments in Algerian Code-switched User-generated Comments Conference

2020, (Pre SFI).

Abstract | Links | BibTeX

Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De

Interactive Visualizations in INESS Book Chapter

In: Butt, Miriam; Hautli-Janisz, Annette; (Eds.), Verena Lyding (Ed.): 2020, (Pre SFI).

Links | BibTeX

Jørgensen, F; Aasmoe, T; Husevåg, ASR; Øvrelid, Lilja; Velldal, Erik (Ed.)

NorNE: Annotating Named Entities for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Mæhlum, Petter; Barnes, Jeremy; Velldal, Erik

A Fine-Grained Sentiment Dataset for Norwegian Proceedings

2020, (Pre SFI).

Links | BibTeX

Lison, Pierre; Hubin, Aliaksandr; Barnes, Jeremy; Touileb, Samia

Named Entity Recognition without Labelled Data: A Weak Supervision Approach Journal Article

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1518–1533, 2020, (Pre SFI).

Abstract | Links | BibTeX

de Smedt, Koenraad; Koureas, Dimitris; Wittenberg, Peter

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units Journal Article

In: 2020, (Pre SFI).

Links | BibTeX

2019

Barnes, Jeremy; Touileb, Samia; Øvrelid, Lilja; Velldal, Erik

Lexicon information in neural sentiment analysis: a multi-task learning approach Conference

Linköping University Electronic Press, 2019, (Pre SFI).

Abstract | Links | BibTeX

2018

Kutuzov, Andrey; Øvrelid, Lilja; Szymanski, Terrence; Velldal, Erik

Diachronic word embeddings and semantic shifts: a survey Proceedings

2018, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Bergem, Eivind Alexander; Stadsnes, Cathrine; Touileb, Samia; Jørgensen, Fredrik

NoReC: The Norwegian Review Corpus Proceedings

2018, (Pre SFI).

Abstract | BibTeX

2017

Touileb, Samia; Pedersen, Truls; Sjøvaag, Helle

Automatic identification of unknown names with specific roles Journal Article

In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 150-158, 2017, (Pre SFI).

Abstract | Links | BibTeX

Kutuzov, Andrei; Fares, Murhaf; Stephan, Oepen; Velldal, Erik

Word vectors, reuse, and replicability: Towards a community repository of large-text resources Proceedings

2017, (Pre SFI).

Links | BibTeX

2016

Rosén, Victoria; Thunes, Martha; Haugereid, Petter; Losnegaard, Gyri S.; Dyvik, Helge; Meurer, Paul; Lyse, Gunn; Smedt, Koenraad De

The enrichment of lexical resources through incremental parsebanking Journal Article

In: 2016, (Pre SFI).

Links | BibTeX

Dyvik, Helge; Meurer, Paul; Rosén, Victoria; Smedt, Koenraad De; Haugereid, Petter; Losnegaard, Gyri S.; Lyse, Gunn; Thunes, Martha

NorGramBank: A 'Deep' Treebank for Norwegian.Proceedings of LREC Proceedings

2016, (Pre SFI).

Links | BibTeX

Rosén, Victoria; Smedt, Koenraad De; Losnegaard, Gyri S.; Bejcek, Eduard; Savary, Agata; Osenova, Petya

MWEs in Treebanks: From Survey to Guidelines Proceedings

2016, (Pre SFI).

Links | BibTeX

Øvrelid, Lilja; Hohle, Petter

Universal dependencies for Norwegian Proceedings

2016, (Pre SFI).

Links | BibTeX

2012

Lapponi, Emanuele; Read, Jonathon; Øvrelid, Lilja

Representing and resolving negation for sentiment analysis Proceedings

2012, (Pre SFI).

Links | BibTeX

Velldal, Erik; Øvrelid, Lilja; Read, Jonathon; Oepen, Stephan

Speculation and negation: Rules, rankers, and the role of syntax Journal Article

In: 2012, (Pre SFI).

Links | BibTeX