View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Integrating Contextual Metrics in LLM-Based Hint Generation for Programming Exercises

        Thumbnail
        View/Open
        final_thesis.pdf (1.045Mb)
        Publication date
        2025
        Author
        Tweel, Siem van den
        Metadata
        Show full item record
        Summary
        Large Language Models (LLMs) show promise for generating programming hints, but current systems largely ignore behavioral data from student programming sessions. This thesis investigates whether integrating contextual metrics such as time spent on tasks, error patterns, and help-seeking behavior can improve hint quality in LLM-based hint systems for programming exercises in introductory Python programming. We operationalized four contextual metrics and developed seven hint generation ap- proaches, which we applied to the CSEDM 2019 dataset containing novice programming sessions from an introductory Python course, generating 273 hints across 39 student ses- sions. We conducted an evaluation from multiple perspectives involving LLM assessment of multiple generations approaches, expert validation with three educators on the two most promising approaches and a baseline, and a user study with 16 novice programmers comparing the final approach selected by the prior evaluation against a baseline. Our findings reveal that contextual metrics’ impact is dependent on the evaluator perspective. LLM evaluation showed that contextual approaches improved overall hint quality, though modestly. However, experts showed a modest preference for baseline hints, often penalizing hints generated with contextual metrics for revealing too much information and not letting the student solve the problem themselves. Students demon- strated a slight preference for hints using the time on task contextual metric, perceiving them as more useful for overcoming immediate struggles. These contrasting outcomes highlight a fundamental challenge: hint quality assess- ment depends heavily on the evaluator’s perspective and priorities. Students prioritize actionable guidance, while experts focus on long-term pedagogical goals. Our analysis revealed challenges in using prompt engineering to achieve consistent LLM behavior for subtle, context-dependent guidance requirements. This work demonstrates that simply adding contextual metrics does not guarantee improved perceived quality.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49887
        Collections
        • Theses
        Utrecht university logo