View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        On how transformers learn to understand and evaluate nested arithmetic expressions

        Thumbnail
        View/Open
        Thesis_Daan_Grashoff_final_.pdf (1.230Mb)
        Publication date
        2022
        Author
        Grashoff, Daan
        Metadata
        Show full item record
        Summary
        In this thesis, we studied whether self-attention networks can learn compositional seman- tics using an arithmetic language. The goal of language aims to evaluate the meaning of nested expressions. We find that self-attention networks can learn to evaluate these nested expres- sions by taking shortcuts on less complex expressions or utilizing deeper layers on complex expressions when the nested depth grows. The complexity is in whether expressions are left- (easy) or right-branching (hard) and whether, in the case of right-branching expressions, plus (easy) or minus (complex) operators are used. We find that increasing the number of heads does not always help with more complex expressions, whereas the number of layers does always help to generalize to deeper expressions. Finally, to help with the understanding of what the self-attention networks are doing, we analyzed the attention scores and found exciting patterns such as the numbers attending to the preceding operators and nested sub-expressions attend- ing to preceding operators. These patterns may explain why in less complex expressions, the self-attention networks take shortcuts, but in more complex expressions, this is not possible by the way the self-attention networks try to solve them.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/533
        Collections
        • Theses
        Utrecht university logo