View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        What you hear is what you get: An exploration of audio level feature extraction for music recommendations

        Thumbnail
        View/Open
        thesis.pdf (4.231Mb)
        Publication date
        2025
        Author
        Sahu, Kshitij
        Metadata
        Show full item record
        Summary
        In the age of digital music streaming, the ability to automatically understand and organize audio content is critical for applications such as recommendation, retrieval, and genre detection. This thesis introduces a self-supervised learning approach for extracting semantic embeddings from raw audio waveforms using a hybrid CNN-Transformer model. These embeddings are trained using contrastive learning Barlow Twins, and are intended for use in content-based music recom- mendation systems. By combining local acoustic detail extraction via CNNs with sequence modelling via Transformers, the system learns rich representations with- out labelled data. We evaluate the embeddings using t-SNE visualizations, FAISS- based similarity retrieval, and through a prototype interactive recommendation demo. The results demonstrate the effectiveness of our approach in organizing music meaningfully and enabling cold-start recommendation without user history.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49799
        Collections
        • Theses
        Utrecht university logo