Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorFrenken, Koen
dc.contributor.authorRaedts, Cas
dc.date.accessioned2025-09-22T23:02:03Z
dc.date.available2025-09-22T23:02:03Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/50426
dc.description.abstractThis thesis develops and evaluates Automated Literature-Based Innovation Output (ALBIO), a modular, local pipeline that applies local large language models (LLMs) to extract, structure, enrich, and deduplicate innovation records from unstructured text. The study asks: How can LLMs automate and enhance LBIO while preserving data control and measurement quality? ALBIO is applied to 6,568 Dutch agricultural trade-journal articles (2015–2025) and is informally compared with SWINNO. The pipeline combines schema-validated JSON generation for innovation extraction, an LLM-as-judge validation step, targeted (but currently generic) web enrichment, and hybrid duplicate resolution (lexical + semantic). A held-out ground truth is used for evaluation. Results show high precision (~81%) and moderate recall (~56%) on innovation identification (F1 ≈ 0.67), positioning ALBIO as a precision-first complement to manual LBIO rather than a full substitute at present. A ~14% discovery rate indicates that ALBIO surfaces valid innovations initially missed by human coders. Substantively, the approach counters biases of traditional innovation output indicators, such as technology bias and dependency on respondents. Furthermore, it broadens coverage beyond patents, surveys and traditional LBIO by explicitly recording process and service innovations and by classifying innovator types beyond incumbent firms. A comparison with SWINNO shows broadly similar output patterns but flags overconfidence where evidence is scarce - for example, much lower “unknown” rates for novelty-to-market - underscoring the need for cautious interpretation and better enrichment. The main bottleneck is information availability and clarity in source texts (and web data), not model capacity; variables that require external enrichment (e.g., location, industry, size, finance) are most affected. A manual audit suggests deduplication is acceptable overall but degrades for larger clusters, showing drift effects there. This thesis contributes a transparent, local LLM pipeline for LBIO, empirical evidence on precision-first scalability with discovery benefits, and a roadmap to next-generation indicators. Priorities for future work include structured enrichment (registries, patent databases, sector statistics), multi-label/probabilistic and temporal/relational representations (e.g., knowledge graphs), and more robust, auditable deduplication. Together, these advances can move automated, text-based indicators beyond counting toward richer characterisation of innovation.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis thesis explores the possibilities of using local LLMs to automatically extract innovation output indicators for trade journal texts.
dc.titleTowards Automated LIterature Based Innovation Output (ALBIO) Indicators
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsInnovation Output; Local LLM application; Literature Based Innovation Output (LBIO); Automated Literature Based Innovation Output (ALBIO);
dc.subject.courseuuInnovation Sciences
dc.thesis.id54116


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record