Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorGatt, A.
dc.contributor.authorSchilder, Sjoerd
dc.date.accessioned2024-09-12T23:02:09Z
dc.date.available2024-09-12T23:02:09Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/47736
dc.description.abstractThe rise of large language models (LLMs) has significantly increased productivity and reduced workloads by enabling the automatic generation of software code based on user prompts. However, the indiscriminate usage of AI for generating coding content has also brought concerns, such as the generation of malicious software, cheating in programming education, and the encroachment on intellectual property. These issues call for robust methods to automatically detect Artificial Intelligence Generated Coding Content (AIGCC). This paper contributes to the research in the detection of AIGCC by exploring classical detection methods, as well as transformer based detection methods based on CodeBERT. These methods were employed on the tasks of binary classification and author detection on solutions of the Automated Programming Progress Standard (APPS) generated by open-source LLMs. All the detection methods used surpass the previous SOTA, DetectGPT4Code, on the task of binary classification, albeit with reduced generalizability to out-of-distribution data. We also show the importance of natural language comments in the performance of a detector, as well as the increase in performance of specific detectors over more general detectors. A fine-tuned CodeBERT model also has the ability to perform detection of the author of a sample with reasonable performance. These results indicate the potential utility of these detection methods in various applications, though their deployment should be considered carefully.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectResearch for the detection of Artificial Intelligence Generated Coding Content (AIGCC) using various machine learning techniques in various experimental settings
dc.titleThe Detection of AI Generated Coding Content
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id39291


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record