View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Statistical Modeling at the Syntax-Semantics Interface: Exploiting Automatically Induced Lexical Classes Evaluated through Variational Bayesian Inference

        Thumbnail
        View/Open
        J__Kamp___RMA_Thesis_approved29aug2019.pdf (658.7Kb)
        Publication date
        2019
        Author
        Kamp, J.B.
        Metadata
        Show full item record
        Summary
        So far, the task of automatic verb classification has been widely explored through supervised as well as unsupervised machine learning techniques, based on syntactic and semantic features, and strictly related to argument structure theory and Levin (1993)’s verb classes. In the present study we go a step further than the previous research in this field (e.g. Lapata and Brew, 2004, Merlo and Stevenson, 2001, or Sun and Korhonen, 2009) by using automatically induced verb classes not as a goal, but rather as a starting point for a lexicon induction experiment for individual verbs. Inspired by Rooth, Riezler, Prescher, Carroll, and Beil (1999), a first experiment involves a clustering process of verbs represented by co-occurrence vectors of argument nouns extracted from the subcategorization frames of transitive and intransitive verbs; from the resulting model, a second experiment shows that lexicons of argument nouns for fixed verbs can be created by re-estimating the nouns’ absolute frequencies with respect to the same verb, modified by cluster-related probabilities from the model. Apart from being relatively simple statistical inference steps, the relevance of this study is also determined by the detailed and combined evaluation system used for model selection, including a Pseudo-Disambiguation task, in-depth cluster metrics, and a Variational Bayes Gaussian Mixture. It was found that argument selectional preference is a good indicator of verb classes, especially for the data set that included verbs of the alternation in which the object of the transitive is the subject of the intransitive. Moreover, through the support of a quantitative, WordNet-based method, it was shown that such classes are relatively little levinian. Future research could be directed to the exploration of adjunct slots, as well as an extension of the evaluation architecture to other clustering tasks within NLP.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/34028
        Collections
        • Theses
        Utrecht university logo