Loading Events

« All Events

Research on Norwegian Open Large Language Models at UiO

December 11 @ 09:15 - 11:45

The Language Technology Group (LTG) at the Department of Informatics hosts a seminar on open large language models for the languages of Norway, with emphasis on ongoing experimentation at LTG with end-to-end model training and evaluation, as well as the introduction of an experimental chatbot prototype.

Large language models (LLMs) are the main engine under the hood of what is often called “generative AI”. For many, it may look as if they emerged out of the blue, and they are mostly associated with commercial blackbox services run by BigTech companies, with trade secrets locked behind armored gates.

But in fact LLMs originally come from research labs and models and training data used to be open: in the same sense as open source software and open science. And nowadays, open LLMs again show signs of catching up in performance with their closed proprietary counterparts, including for languages other than English.

Through the seminar, LTG will present research on open and transparent LLMs conducted at UiO. We will reflect on experiences in adaptation to Norwegian and Sámi, limitations in training and evaluation data, pre-training and fine-tuning of open models, and open methodological questions. For this work, LTG also provides a chatbot prototype, based on our most recent Norwegian model NorMistral-11B.

In addition to presentations by LTG researchers, the Norwegian Language Council will review their findings comparing open and closed LLMs with regards to their Norwegian language skills, and the National Library will present on LLM training data for research in Norway.

Everyone is welcome!

Registration

To help us plan for the event, we ask that prospective participants register on-line.

Program

Coffee is served from 09:15. Moderator: Yves Scherrer (LTG).

9:30 – 9:40 Welcome and introduction (Lilja Øvrelid, LTG)

9:40 – 9:55 The importance of openness in the era of generative AI (Andrey Kutuzov, LTG)

9:55 – 10:05 Web-derived LLM training data for Norwegian (Stephan Oepen, LTG)

10:05 – 10:15 Training data for Norwegian LLM research (National Library of Norway)

10:15 – 10:30 Coffee break

10:30 – 10:50 Developing NorMistral-11B, with chat interface demonstration (David Samuel, LTG)

10:50 – 11:05 NorEval: Native Benchmarking for Norwegian LLMs (Vladislav Mikhailov, LTG)

11:05 – 11:25 The language quality of the Norwegian output in language models. Test results from the Language Council (Kristine Eide, Language Council of Norway)

11:25 – 11:45 Outlook, Q&A (Erik Velldal, LTG)

Details

Organizer

  • Language Technology Group

Venue

  • Seminar room Logo 2438, UiO