Rhythm and Reason: Controlling Rhythmic Complexity in Generated Symbolic Music for Serious Game Interventions
Summary
Music is essential in video games; it enhances the immersion and engagement of players. However, players often disengage from game music due to excessive repetition or differing musical preferences. This issue is especially problematic in serious games for therapeutic use, such as Musical Attention Control Training (MACT). In these cases, sustained engagement and interaction with the game music are essential for the intervention. To improve therapeutic outcomes, these applications often use dynamic difficulty adjustment. By varying the music’s cognitive load, they stimulate at-
tention and working memory. In rhythmic training, this is achieved by adjusting levels of syncopation. Controllable automatic music generation may offer a scalable solution. It enables the creation of a greater variety of music that can be adapted through syncopation to a player’s abilities, ultimately improving engagement and intervention effects.
To this end, we introduce RhythGen, a novel transformer-based music generator that extends the pretrained NotaGen model with time-varying control over rhythmic complexity. Its primary control mechanism targets syncopation levels to adjust the music to a player’s training needs and abilities. We implement this via a custom, lightweight procedure, which introduces time-varying control over rhythmic complexity through fine-tuning on 1,000 - 1,500 songs. Our procedure incorporates a variety of control representations and conditioning mechanisms, including the novel attention modulation mechanism. We compare this mechanism against established methods such as in-attention and in-text conditioning. RhythGen is conditioned using one of several control representations, including note-density and syncopation labels, as well as weight profiles derived from Inner Metric Analysis (IMA).
Our evaluation explores the tradeoff between generation quality and control adherence across these conditioning methods. We find that models using in-attention conditioning with discrete syncopation labels, targeted voice-specific labelling, and training generate music with specified rhythmic complexity. In contrast, in-text conditioning is largely ineffective. Our novel attention modulation mechanism successfully controls note-density when used with IMA weight profiles, but fails to capture syncopation. Finally, a user study (n=40) empirically confirms that our in-attention conditioned models can produce enjoyable music. This music has noticeable variations in rhythmic complexity and recognizable section boundaries, demonstrating RhythGen’s potential for use in
MACT and beyond.
