Human-Robot Musical Synchronization

Multimodal Approaches for Real-Time Musical Collaboration

Sutirtha Chakraborty
Maynooth University
National University of Ireland Maynooth

Abstract

The emerging field of "Robotic Musicianship"The development of machine intelligence to capture musical perception, composition, and performance capabilities in robots focuses on developing machine intelligence, in terms of algorithmsA set of rules or instructions given to a computer to help it solve problems or complete tasks and cognitive models, to capture musical perception, composition, and performance, and to transplant these skills into a robot that can then reproduce them in any context.

In such multi-ensemble technological settings, it must be assumed that humans will not play rigidly; rather they will move and express with the 'feel' of the music, wherein the roles of 'leader' and 'follower' within the troupe could often change in a fluid manner. Hence, for machines to participate and create cooperative musical performances, where synchronizationThe coordination of simultaneous processes or events to operate in unison and adaptation plays a vital role, they need to operate at a higher cognitive level.

"This thesis explores how real-time collaborations between humans and machines can be improved by integrating models from the technologies of Oscillator Synchronization and Machine Learning, thus driving closer towards a vision of a human-robot symphonic orchestra."

We develop an approach based on the joint strategy of:

We consider each musician (irrespective of human or machine) as a separate oscillatorA system that produces regular, repetitive variations, used in synchronization models to represent rhythmic elements, wherein mathematical models for oscillator coupling, for example the well-known KuramotoThe Kuramoto model describes synchronization in systems of coupled oscillators, widely used in modeling biological and musical synchronization algorithm, can be used for establishing and maintaining synchronization.

Maynooth University
Maynooth University - National University of Ireland Maynooth

Thesis Structure

Chapter 1: Introduction

Historical context of automated musical systems and the central challenge of human-robot musical interaction.

Chapter 2: Literature Review

Comprehensive review of existing work on human-robot musical synchronization and identification of research gaps.

Chapter 3: Cyborg Philharmonic Framework

Development of a comprehensive multimodal synchronization framework integrating audio, visual, and gestural cues.

Chapter 4: LeaderSTeM

Audio-based ensemble leadership tracking using advanced machine learning techniques.

Chapter 5: Visual Cues

Exploration of visual signals and gestural information in musical synchronization.

Chapter 6: Multimodal Synchronization

Integration of multiple modalities for enhanced human-robot musical collaboration.

Chapter 7: Implementation

Real-world implementation for human-robot musical ensemble with experimental validation.

Chapter 8: Conclusion

Summary of contributions, limitations, and future research directions.

Research Overview

The Central Challenge

Achieving effective synchronization in human-robot musical ensembles presents a significant challenge. Human musicians naturally adapt to each other using subtle cues and variations in rhythm, tempo, and dynamicsβ€”elements that are not easily replicated by machines.

Key Innovation Areas

CAMM Venn Diagram
Computer Assisted Music Making (CAMM) Framework showing the relationship between Composition (CAMC) and Performance (CAMP)

Research Methodology

Dual Strategy Approach

🎡 Mapping Phase

Ensures control and sensing of musical instrument components along with parameters in sound synthesis. This includes:

  • Sensor integration
  • Control mechanisms
  • Sound synthesis parameters
  • MIDIMusical Instrument Digital Interface - a technical standard that describes a communications protocol for electronic musical instruments event generation

πŸ€– Modeling Phase

Focuses on capturing the overall representation of the musical process through:

  • Deep learningA subset of machine learning using neural networks with multiple layers to model and understand complex patterns models
  • Predictive algorithms
  • Feature extraction
  • Pattern recognition

Mathematical Foundation

$$\dot{\theta}_i = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin(\theta_j - \theta_i)$$
Kuramoto Model for Oscillator Synchronization

Key Contributions

🎼 Cyborg Philharmonic Framework

Novel multimodal synchronization system integrating audio, visual, and gestural cues for human-robot musical collaboration.

🎯 LeaderSTeM Algorithm

Advanced machine learning approach for dynamic leader identification in musical ensembles using LSTM networks.

πŸ‘οΈ Visual Synchronization

Integration of computer vision techniques for gesture-based musical synchronization and conductor following.

πŸ”„ Real-time Adaptation

Continuous learning mechanisms that adapt to individual performance styles and musical preferences.

Publications and Impact

This research has resulted in several peer-reviewed publications and has been presented at international conferences in the fields of music technology, robotics, and human-computer interaction.

πŸ“š Research Publications

13 Publications 125+ Citations
The Cyborg Philharmonic: Synchronizing interactive musical performances between humans and machines

S Chakraborty, S Dutta & J Timoney

Humanities and Social Sciences Communications 8 (1), 1–9

πŸ”₯ 25 Citations 2021 πŸ† Most Cited
A cooperative and interactive gesture-based drumming interface with application to the Internet of Musical Things

A Yaseen, S Chakraborty & J Timoney

International Conference on Human-Computer Interaction, pp. 85–92

πŸ“ˆ 17 Citations 2022 πŸ€– IoMT
Adaptive touchless whole-body interaction for casual ubiquitous musical activities

S Chakraborty, A Yaseen, J Timoney, V Lazzarini & D Keller

International Computer Music Conference 2022, pp. 132–138

πŸ“ˆ 15 Citations 2022 πŸ‘ Ubimus
Multimodal synchronization in musical ensembles: Investigating audio and visual cues

S Chakraborty & J Timoney

Companion Publication of the 25th International Conference on Multimodal Interaction

πŸ“ˆ 13 Citations 2023 🎭 Multimodal
Integrity checking using third party auditor in cloud storage

S Chakraborty, S Singh & S Thokchom

2018 Eleventh International Conference on Contemporary Computing (IC3), pp. 1–6

πŸ“ˆ 13 Citations 2018 ☁️ Cloud Computing
Banging interaction: A ubimus-design strategy for the musical internet

D Keller, A Yaseen, J Timoney, S Chakraborty & V Lazzarini

Future Internet 15 (4):125

πŸ“Š 10 Citations 2023 🌐 Ubimus
A new method for detecting onset and offset for singing in real-time and offline environments

B Faghih, S Chakraborty, A Yaseen & J Timoney

Applied Sciences 12 (15):7391

πŸ“Š 9 Citations 2022 🎀 Voice Processing
LeaderSTeM-A LSTM model for dynamic leader identification within musical streams

S Chakraborty, S Kishor, S Patil & J Timoney

Joint Conference on AI Music Creativity (AIMC 2019), Stockholm, Sweden

πŸ“Š 9 Citations 2020 🧠 LSTM
Robot human synchronization for musical ensemble: progress and challenges

S Chakraborty & J Timoney

2020 5th International Conference on Robotics and Automation Engineering

πŸ“Š 6 Citations 2020 πŸ€– Robotics
Beat Estimation from Musician Visual Cues

S Chakraborty, S Aktaş, W Clifford & J Timoney

18th Sound and Music Computing Conference (SMC 2021), pp. 46–52

πŸ“Š 5 Citations 2021 πŸ‘οΈ Visual
Gesture Mediated Timbre-Led Design based Music Interface for Socio-musical Interaction

A Yaseen, S Chakraborty & J Timoney

International Conference on Human-Computer Interaction, pp. 335–347

πŸ“Š 4 Citations 2023 🎨 HCI
Dynamic Drum Collective: Introducing Kuramoto-Based Temporalities in Banging Interaction

S Chakraborty, D Keller, A Yaseen & J Timoney

Symposium 2024 (UbiMus 2024), p. 121

πŸ†• New 2024 ⚑ Kuramoto

πŸ“Š Research Impact Metrics

125+
Total Citations
13
Publications
6
Years Active
25
Max Citations
Start with Chapter 1 β†’