Reliability of writing evaluation methods in the EFL classroom
Summary
The evaluation of writing is challenging for foreign language teachers in many different regards. Different evaluation methods have frequently been the topic of writing research, but the reliability of different evaluation methods has rarely been studied side-to-side. This study aimed to analyse the inter- and intra- rater reliability in an EFL context of three different evaluation methods: holistic, analytic, and relative evaluation. Four secondary school teachers were selected to rate twenty different written products by beginner and advanced EFL students, using every evaluation method once for every written product. Raters used an adapted version of the ESL Composition Profile for analytic evaluation, and one reference text for relative evalution. Results indicated a high degree of agreement between raters and great internal consistency for individual raters, showcasing differences between L1 and L2 writing evaluation procedures. However, no significant effects were found for differences between correlation coefficients of different evaluation methods. The reliability of individual raters and the reliability across multiple raters was not affected significantly by the evaluation method. Various explanations for these findings are discussed, together with classroom implications and recommendations for further writing studies on the reliability of evaluation methods.