A Multimodal Approach: Acoustic-Linguistic Modelling for Neural Extractive Speech Summarisation on Podcasts

Çalik, Berk

dc.rights.license	CC-BY-NC-ND
dc.contributor	Berk Calik
dc.contributor.advisor	Schraagen, Marijn
dc.contributor.author	Çalik, Berk
dc.date.accessioned	2023-03-01T00:00:47Z
dc.date.available	2023-03-01T00:00:47Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/43582
dc.description.abstract	Podcasts, a contemporary medium of audio-only content, have rapidly progressed in consumption and generation across the internet. Along with the accelerated pace of its popularity in recent years, effectively publicising the podcast shows has become a need for all podcast creators, listeners and streaming platforms. To improve the overall visibility of podcast contents and enhance user engagement, a summary of an episode has become a need for the users or utilising in searching and recommendation systems, which can be a replacement of or in addition to keywords, manual descriptions and transcripts. Since manual summarisation for podcast episodes takes ample time, automatic summarisation becomes a valuable task. Specifically, in the context of automatic summarisation task for spoken documents, we need to consider that the extracted salient information relies on what is said but also how it is said. In the wake of this, this thesis investigates summarisation models for podcasts and proposes a multimodal approach exploiting acoustic and linguistic features. Accordingly, we aimed to explore how to automatically generate an extractive summary from a podcast episode in a multimodal way. For our research, we have employed a lexical-only pre-trained transformer model (i.e. SentenceBERT) for embedding sentences in the transcripts. In this work, speech summarisation of podcasts is defined as a classification task; with respect to that, our purpose was to extract meaningful sentences from the transcribed text where the importance is predicted by combining acoustic and linguistic information. To build an experimental setup for analysing the impact of acoustic features, we have integrated a two-layer multilayer perceptron on the top layer of the SentenceBERT model. Feature projection, ranking and selection were also performed for feature importance analysis of acoustic information. After projection and selection of acoustic features, our proposed multimodal model outperforms the baseline (text-only) and achieves moderately better ROUGE scores; with this project, we aim not to find a complete solution for the automatic summarisation of podcast episodes but to understand the critical part of the puzzle linked to incorporating acoustic features into podcast summarisation.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Since manual summarisation for podcast episodes takes ample time, automatic summarisation becomes a valuable task. Specifically, in the context of automatic summarisation tasks for spoken documents, we need to consider that the extracted salient information relies on what is said but also how it is said. Understanding the acoustic saliency on speech summarisation tasks through the state-of-the-art transformers will the main task for this project.
dc.title	A Multimodal Approach: Acoustic-Linguistic Modelling for Neural Extractive Speech Summarisation on Podcasts
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Podcast Summarisation; Speech Summarisation; Multimodality; Extractive Summarisation; Acoustic-Linguistic features
dc.subject.courseuu	Business Informatics
dc.thesis.id	5352

Files in this item

Name:: Thesis_BerkCalik.pdf
Size:: 2.663Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record

A Multimodal Approach: Acoustic-Linguistic Modelling for Neural Extractive Speech Summarisation on Podcasts

Files in this item

This item appears in the following Collection(s)

Related items

Predictive value of thrombin generation assessment for recurrent venous thromboembolism – a systematic review ﻿

A Green Space Agenda: An umbrella review and questionnaire summarising and presenting the evidence and knowledge gaps in the field of green space and health. ﻿

Care2Report: Dialogue summarisation for geriatric performance assessments ﻿

Predictive value of thrombin generation assessment for recurrent venous thromboembolism – a systematic review

A Green Space Agenda: An umbrella review and questionnaire summarising and presenting the evidence and knowledge gaps in the field of green space and health.

Care2Report: Dialogue summarisation for geriatric performance assessments