Breaking left-to-right generation in Transformer models: arbitrary orderings on tagging tasks
Summary
Transformers have achieved great success in various natural
language processing tasks. Traditionally, generation happens in a left-to-right
fashion in such models, mimicking how we process and produce text as humans.
However, for tasks in which tokens do not exhibit strong left-to-right sequential
dependencies such as Combinatory Categorical Grammar (CCG) tagging, alternative
generation orders may be more effective. This thesis explores a novel approach
to break away from the traditional left-to-right ordering in Transformerbased
models. We introduce TagInsert, a sequence-to-sequence model designed to
perform tagging tasks with an ordering that is learned by the model itself, potentially
reducing error propagation typically associated with auto-regressive models.
We evaluate the performance of our architecture across three tasks: Part-of-Speech
tagging, CCG non-constructive tagging, and CCG constructive supertagging.
Our results show that arbitrary generation order improves performance in Part-
Of-Speech and CCG non-constructive tagging when compared to a left-to-right
model. Notably, for CCG non-constructive tagging, we observe a statistically significant
advantage over the standard left-to-right approach, indicating that breaking
the traditional ordering may yield better results for tasks with weak left-toright
sequential dependencies. In the case of constructive supertagging, another
version of TagInsert adapted for the task is presented. While reaching comparable
results with the existing literature, better solutions to introduce arbitrary ordering
in the task are likely needed to more effectively exploit the benefits the architecture
brings.
We also present extensive qualitative analyses to understand the behaviour of
both the non-constructive and constructive model, in order to identify properties
in which there is a benefit from breaking the left-to-right ordering.