View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        An Implementation and Assessment of Semantic Search Few-Shot Classification

        Thumbnail
        View/Open
        StephanHavermans_5974410_ThesisADS.pdf (1.231Mb)
        Publication date
        2021
        Author
        Havermans, S.A.C.
        Metadata
        Show full item record
        Summary
        This thesis compares multiple methods of classification following cosine-similarity calculation from semantic search with Sentence-BERT (SBERT), as well as various class representations in few-shot classification with SBERT. The performance of SBERT is then compared to that of DistilBERT on various natural language processing (NLP) tasks (clickbait classification, sentiment analysis, spam detection and topic classification) and datasets. This is done in an effort to determine for which tasks SBERT semantic search is an effective alternative to fine-tuning more traditional BERT models. The multilingual versions of both SBERT and Distil- BERT are used for topic classification on a German dataset to assess the performance of the multilingual version of SBERT. The best implementation of SBERT semantic search for few-shot classification uses a similarity-based classification as well as average embeddings for class representations. The results show that both SBERT and DistilBERT show signs of diminishing returns at around 25 samples per class when performing few-shot classification. Fine-tuning a DistilBERT model is equal to or outperforms SBERT semantic search on all assessed NLP tasks at a cost of slightly more instability.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/41192
        Collections
        • Theses
        Utrecht university logo