Computational Models of Classifier Choice in Mandarin: from Rule-based to BERT
Summary
As in many Asian languages, Mandarin grammar often requires that a noun is preceded by a classifier word in certain syntactic positions. Many nouns can appear with multiple different classifiers, and the choice of classifier can have a significant impact on the meaning of the whole sentence. There is no dictionary-based approach or a finite set of rules covering all possible classifier-head noun combinations exhaustively, as sometimes the larger context surrounding the noun is required to make the correct choice of classifier. This makes the problem of predicting the correct classifier in a given context challenging.
Earlier studies have suggested different kinds of computational methods for this task. However, as the studies have examined very broad selections of classifiers, which have not been explicitly listed, or defined in a computational way, their results and subsequent analyses have left room for further study. Hence, this study aims to produce an extensive and transparent analysis on the task of classifier prediction, including providing explicit lists for different categorizations of classifiers. Namely, it considers the fact that linguists generally agree that Mandarin classifiers should be categorized into two clearly different categories: true classifiers and measure words. In order to also evaluate the impact of context on classifier choice, a subset of classifiers, that we consider to add information, is examined.
We used four different models, a simple rule-based model, a bidirectional LSTM model, a BERT masked language model and a BERT classification model, to predict classifiers in sentences. We were able to produce a new state-of-the-art result for generating Mandarin classifiers by using a BERT classification model. We also showed that all the models perform better with true classifiers compared to measure words or other types of classifiers. As a consequence, the results indicated that a simple rule-based model can be used to generate true classifiers reasonably well. In addition, the context-aware BERT classification model clearly outperforms the other models in predicting both measure words and classifiers that add information. However, we theorize that certain classifiers may still not be possible to accurately predict in all situations using current solutions, as the classifiers themselves carry meaning, which is not obvious from the rest of the sentence context.