On how transformers learn to understand and evaluate nested arithmetic expressions
MetadataShow full item record
In this thesis, we studied whether self-attention networks can learn compositional seman- tics using an arithmetic language. The goal of language aims to evaluate the meaning of nested expressions. We find that self-attention networks can learn to evaluate these nested expres- sions by taking shortcuts on less complex expressions or utilizing deeper layers on complex expressions when the nested depth grows. The complexity is in whether expressions are left- (easy) or right-branching (hard) and whether, in the case of right-branching expressions, plus (easy) or minus (complex) operators are used. We find that increasing the number of heads does not always help with more complex expressions, whereas the number of layers does always help to generalize to deeper expressions. Finally, to help with the understanding of what the self-attention networks are doing, we analyzed the attention scores and found exciting patterns such as the numbers attending to the preceding operators and nested sub-expressions attend- ing to preceding operators. These patterns may explain why in less complex expressions, the self-attention networks take shortcuts, but in more complex expressions, this is not possible by the way the self-attention networks try to solve them.