View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Large vision and language models-based investment predictions on entrepreneurial pitch videos

        Thumbnail
        View/Open
        thesis_zihan.pdf (2.133Mb)
        Publication date
        2025
        Author
        Zou, Zihan
        Metadata
        Show full item record
        Summary
        This research aims to explore the potential of predicting investment decisions in entrepreneurial pitches using multimodal signals, particularly visual and linguistic features. Entrepreneurial pitches are critical for securing funding, and signals from both verbal content and non-verbal cues, such as gestures and facial expressions, play a crucial role in shaping investors' decisions. However, current studies have largely focused on isolated forms of signaling, leaving a gap in understanding how multimodal features interact to influence investment outcomes. This study proposes a machine-learning approach that leverages visual and linguistic cues from pitch videos to predict the likelihood of investment. Using the "Data Management Entrepreneurial Pitches" dataset, the research seeks to address several key questions, including the efficacy of visual and linguistic unimodal models, the benefits of combining modalities into a unified linguistic space, and the performance of multimodal fusion models. To this end, a series of neural network models will be designed and tested, utilizing advanced techniques in Natural Language Processing (NLP) and Computer Vision, such as BERT, MEGA, VideoMAE, and VideoLLaVA. This thesis investigated the effectiveness of visual and linguistic multi-modal models in predicting the probability of entrepreneurial investment and comparing the performance of unimodal models with that of multi-modal fusion models.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/48669
        Collections
        • Theses
        Utrecht university logo