- This event has passed.
Human-AI Interaction for video Content: Designing and Engineering Multimodal Conversational Agents

Our PhD Candidate Peter Andrews will defend his thesis on September 29th at the University of Bergen.
Trial lecture starts at 09:15 and the defense at 10.30.
We want to encourage you to attend his trial lecture and defense, and learn more about HCI.
Title:
Human-AI Interaction for video Content: Designing and Engineering Multimodal Conversational Agents
Abstract:
As young adults increasingly shift away from conventional news sources, interactive and AI-driven media present a new frontier for their engagement in news consumption. Young adults often prefer more interactive video content on streaming platforms, challenging the traditional model of passive video consumption. Second screening, interacting with a second device while watching a primary display, has emerged to satisfy the need for interaction and support with additional content and context. However, second screening can hinder comprehension, revealing the need to synchronize the experience.
This thesis unifies the second screening experience with Computer Vision (CV) and Deep Learning (DL), thereby building an interactive video framework following the \textit{From Video to Data} $\to$ \textit{From Data to Narrative} $\to$ \textit{From Narrative to Interaction} paradigm. The result is a Multimodal Conversational Agent (MCA) that can hyper-contextualize video content. This video framework encompasses three research questions: 1) How can recent advances in computer vision and artificial intelligence facilitate interaction with video content? 2) How can interactive video increase subjective understanding of the content? 3) How do young adults perceive the user experience of interactive video for news broadcasts? Answering these questions gives a better grasp of what is needed to build an end-to-end interactive video framework with AI. At the same time, empirical research can show how the capabilities of the framework can improve user experience and comprehension.
To address these questions, I develop prototypes for interactive video in sports (football) and politics. I approached the video framework in a modular manner with four in-house design prototypes – FootyVision, the Automated Commentary System (ACS), AiCommentator, and AiModerator. Collectively, these four prototypes demonstrate how CV- and NLP-based event detection and LLM-powered MCAs can synchronize and facilitate real-time interaction with video content. I tested prototypes in lab-based mixed method studies and found that interactive video with MCA can enhance engagement, immersion, and subjective understanding. However, a Human-AI Interaction (HAI) trade-off between automation and user control occurs. While a high degree of automation can tightly synchronize the experience, it comes at the cost of user control. The affordances of MCA include multimodal feedback and remediation. Multimodal feedback supports subjective understanding, which aligns with the Cognitive Theory of Multimedia Learning (CTML). Remediation involves repurposing traditional roles in innovative ways. MCAs achieve this by transforming sports commentators and political moderators into remediated personas, thus leading to increased engagement. Moreover, MCAs can also push the user into a more objective viewing state, highlighting a trade-off between objectivity and emotional involvement. Finally, trust is paramount for high-stakes environments where transparency is crucial.
Overall, my research challenges traditional linear media by integrating CV, DL, and NLP into an interactive framework that facilitates on-demand information augmented by the information space. However, future systems must address key concerns regarding the aforementioned trade-offs and the management of cognitive load. I recommend variable autonomy and transparency to give the user control over the experience, reinforcing both trust and understanding through Human-Centered AI (HCAI). By synthesizing these findings in human-AI interaction (HAI) and multimedia learning frameworks, my work provides valuable insights for researchers, developers, and broadcasters looking to engage the next generation of news consumers through interactive video.
Opponents:
- Dr. (Research Director, DR2) Petra Isenberg , Laboratoire Interdisciplinaire des Sciences du Numérique, Université Paris-Saclay
- Prof Huamin Qu, Department of Computer Science and Engineering, Hong Kong University of Science and Technology
Head of committee
Prof Miroslav Bachinski
Moderator of the defense
Prof Bjørnar Tessem