Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributorBerk Calik
dc.contributor.advisorSchraagen, Marijn
dc.contributor.authorÇalik, Berk
dc.date.accessioned2023-03-01T00:00:47Z
dc.date.available2023-03-01T00:00:47Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/43582
dc.description.abstractPodcasts, a contemporary medium of audio-only content, have rapidly progressed in consumption and generation across the internet. Along with the accelerated pace of its popularity in recent years, effectively publicising the podcast shows has become a need for all podcast creators, listeners and streaming platforms. To improve the overall visibility of podcast contents and enhance user engagement, a summary of an episode has become a need for the users or utilising in searching and recommendation systems, which can be a replacement of or in addition to keywords, manual descriptions and transcripts. Since manual summarisation for podcast episodes takes ample time, automatic summarisation becomes a valuable task. Specifically, in the context of automatic summarisation task for spoken documents, we need to consider that the extracted salient information relies on what is said but also how it is said. In the wake of this, this thesis investigates summarisation models for podcasts and proposes a multimodal approach exploiting acoustic and linguistic features. Accordingly, we aimed to explore how to automatically generate an extractive summary from a podcast episode in a multimodal way. For our research, we have employed a lexical-only pre-trained transformer model (i.e. SentenceBERT) for embedding sentences in the transcripts. In this work, speech summarisation of podcasts is defined as a classification task; with respect to that, our purpose was to extract meaningful sentences from the transcribed text where the importance is predicted by combining acoustic and linguistic information. To build an experimental setup for analysing the impact of acoustic features, we have integrated a two-layer multilayer perceptron on the top layer of the SentenceBERT model. Feature projection, ranking and selection were also performed for feature importance analysis of acoustic information. After projection and selection of acoustic features, our proposed multimodal model outperforms the baseline (text-only) and achieves moderately better ROUGE scores; with this project, we aim not to find a complete solution for the automatic summarisation of podcast episodes but to understand the critical part of the puzzle linked to incorporating acoustic features into podcast summarisation.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectSince manual summarisation for podcast episodes takes ample time, automatic summarisation becomes a valuable task. Specifically, in the context of automatic summarisation tasks for spoken documents, we need to consider that the extracted salient information relies on what is said but also how it is said. Understanding the acoustic saliency on speech summarisation tasks through the state-of-the-art transformers will the main task for this project.
dc.titleA Multimodal Approach: Acoustic-Linguistic Modelling for Neural Extractive Speech Summarisation on Podcasts
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsPodcast Summarisation; Speech Summarisation; Multimodality; Extractive Summarisation; Acoustic-Linguistic features
dc.subject.courseuuBusiness Informatics
dc.thesis.id5352


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record