Using LLMs to Generate Automatic Feedback on Object-Oriented Programs with Multiple Classes
Summary
With the public introduction of Large Language Models (LLMs) like ChatGPT, their possibilities for automatic hint and feedback generation in programming education has become a
topic of research interest. However, most of this research focuses on beginner programmers
creating small programs. In this thesis, we investigate the capabilities of LLMs to give
feedback on OOP-related misconception characteristics in larger code solutions. Due to
the lack of available datasets for this purpose, we additionally investigate how well LLMs
can generate code solutions for larger programming exercises. Specifically, we use Mistral
Large to generate code solutions containing one of six misconception characteristics, taken
from previous work. Next, we create a system that generates feedback for code solutions
spread over multiple files, and use this system to generate feedback for the code solutions
in our dataset. Our results show that the generated feedback is correct and appropriate
for high-level beginner students, but that the LLMs used are only able to consistently detect the more general characteristics, and struggle with identifying the more complex ones.
Overall, our work extends existing literature by exploring the capabilities of current LLMs
with regard to larger, more complex code solutions, and finds that although its performance
is decent, extensive prompting is needed to obtain these results, and the LLM capability to
detect more complex misconception characteristics is limited.