Automated summary scoring using a linguistic feature approach

Meelker, C.M.

View/Open

Thesis.pdf (1.126Mb)

Publication date

2021

Author

Meelker, C.M.

Metadata

Show full item record

Summary

Summary-writing tasks are often used to assess reading comprehension of students. Grading these types of tasks is time-consuming and teachers have difficulty being consistent when grading. The goal of this research is therefore to explore and evaluate the possibilities of automating summary grading. Previous research has shown that students with an extensive mental model, and thus a good understanding of the original text, write high-quality summaries. Linguistic features can therefore be used to measure summary quality. A total of 82 different linguistic features is calculated for a dataset of 914 short Dutch summaries. These summaries have been graded by teachers. Through cross-validated feature selection, an optimal set of features is selected for both a regression and classification model. The regression model can be used to predict a grade and has an explained variance of 0.71. The classification model can be used to predict a 'Fail' or 'Pass' label and has an area-under-the-ROC curve of 0.91. It can therefore be concluded that linguistic feature-based models can successfully be used to automate summary grading. The models developed in this research could potentially replace a second or third reader.

URI

https://studenttheses.uu.nl/handle/20.500.12932/41186

Collections

Theses