Zipf's Law in L1 Attrition
Summary
The present study is an attempt to connect the psycholinguistic study of lexical attrition and the different tools used in lexical statistics to measure lexical diversity, with a focus on Zipf’s law and its potential as a lexical diversity measure. The theoretical introduction outlines certain theoretical explanations for forgetting language and vocabulary, and compares different tools used for measuring lexical diversity in previous research. Later, Zipf’s law is presented, with a focus on its origin and the potential meaningfulness of its exponent as a measure of lexical diversity. Using data collected by Keijzer (2007), the spontaneous speech of a group of attriters, a group of controls, and a group of language acquirers was studied. Three research questions were investigated: whether attriter speech conforms to Zipf’s law, whether it does so with the same or a different slope as controls (and acquirers) and whether the exponent is a strong predictor of group membership when compared with other measures of lexical diversity.
Using the same approach as in van Egmond et al. (2015), a good fit of the law was found in all three groups, with an unexpected better fit in the attriter group than in the control group. There were also significant differences in the exponent of the law among groups. The results in the exponent accord with previous findings in the study of aphasic speech and language acquisition, suggesting that the exponent might have linguistic relevance. The results are discussed relative to the different theoretical accounts of the origin of the law. To answer whether the exponent is a good predictor of group membership, it was compared with other measures of lexical diversity (number of types, number of tokens, type-token ratio, sample-independent type-token ratio, vocd-D, and MATTR), including the sociolinguistic variables of gender, education level and region of upbringing. The results show that the exponent failed to predict group membership of education level or region of origin, whereas sample-size independent measures like vocd-D performed better. This suggests that the exponent is not the most effective tool to measure lexical diversity.