Monkey Moves AI Camera: Investigating the Creativity of Multimodal LLMs for Photo-Based Children’s Storytelling
Summary
This project investigates how AI technology can support children’s movement and creativity by developing an AI Camera Storytelling feature for the Monkey Moves Play app. The feature uses Multimodal Large Language Models (MLLMs) to generate personalized, creative, and age-appropriate stories based on the photos provided by users, encouraging physical activity and imaginative play. A central focus of the study is placed on the creativity of the generated content, given its crucial role in children’s development. GPT-4o and GPT-4.1 mini were utilized to create the stories through careful structured prompt engineering. Human evaluation was combined with large-scale LLM evaluation to assess language quality and creativity aspects. The results show overall satisfactory levels of creativity, rated on average 4.7/5 by LLM and 3.91/5 by human evaluators, as well as very limited correlation and agreement values between the two. The study shows how multimodal prompt engineering can guide creative and age-appropriate storytelling. Moreover, we show that LLM-as-a-judge provides scalable evaluation opportunities, but lacks in-depth understanding.