Development and evaluation of the Python educational ontology
Summary
This thesis explores the development of an educational ontology for Python, designed to serve as a domain model for adaptive learning systems and support the annotation of programming exercises using large language models (LLMs). Two ontologies were created: one focusing on syntactic concepts, and the other representing programming patterns that extend beyond syntax to portray context-focused use of Python constructs (e.g., nested loops). To evaluate the quality of the knowledge components (KCs), represented as ontology classes, interaction data across six datasets from the Mastery Grid and StudyLens learning platforms were examined using learning curves analysis. This approach enabled the comparison between the two ontologies and ultimately informed refinement of the education Python ontology. Findings indicate that fine-grained distinctions, such as conditionals inside loops and concatenation, often serve as effective KCs. Addressing the scalability challenge in manually extracting KCs from educational content, the ontology facilitated accurate KC annotation using LLMs, particularly when KCs were provided in small batches. This thesis provides a foundation for future research in representing programming knowledge beyond syntax and enhancing exercise annotation through automated methods.