Human-Robot Musical Synchronization

Abstract

The emerging field of "Robotic Musicianship"The development of machine intelligence to capture musical perception, composition, and performance capabilities in robots focuses on developing machine intelligence, in terms of algorithmsA set of rules or instructions given to a computer to help it solve problems or complete tasks and cognitive models, to capture musical perception, composition, and performance, and to transplant these skills into a robot that can then reproduce them in any context.

In such multi-ensemble technological settings, it must be assumed that humans will not play rigidly; rather they will move and express with the 'feel' of the music, wherein the roles of 'leader' and 'follower' within the troupe could often change in a fluid manner. Hence, for machines to participate and create cooperative musical performances, where synchronizationThe coordination of simultaneous processes or events to operate in unison and adaptation plays a vital role, they need to operate at a higher cognitive level.

"This thesis explores how real-time collaborations between humans and machines can be improved by integrating models from the technologies of Oscillator Synchronization and Machine Learning, thus driving closer towards a vision of a human-robot symphonic orchestra."

We develop an approach based on the joint strategy of:

Mapping – responsible for ensuring control and sensing of musical instrument components along with sound synthesis parameters
Modeling – focuses on capturing the overall representation of the musical process

We consider each musician (irrespective of human or machine) as a separate oscillatorA system that produces regular, repetitive variations, used in synchronization models to represent rhythmic elements, wherein mathematical models for oscillator coupling, for example the well-known KuramotoThe Kuramoto model describes synchronization in systems of coupled oscillators, widely used in modeling biological and musical synchronization algorithm, can be used for establishing and maintaining synchronization.

Maynooth University - National University of Ireland Maynooth

Thesis Structure

Chapter 1: Introduction

Historical context of automated musical systems and the central challenge of human-robot musical interaction.

Chapter 2: Literature Review

Comprehensive review of existing work on human-robot musical synchronization and identification of research gaps.

Chapter 3: Cyborg Philharmonic Framework

Development of a comprehensive multimodal synchronization framework integrating audio, visual, and gestural cues.

Chapter 4: LeaderSTeM

Audio-based ensemble leadership tracking using advanced machine learning techniques.

Chapter 5: Visual Cues

Exploration of visual signals and gestural information in musical synchronization.

Chapter 6: Multimodal Synchronization

Integration of multiple modalities for enhanced human-robot musical collaboration.

Chapter 7: Implementation

Real-world implementation for human-robot musical ensemble with experimental validation.

Chapter 8: Conclusion

Summary of contributions, limitations, and future research directions.

Research Overview

The Central Challenge

Achieving effective synchronization in human-robot musical ensembles presents a significant challenge. Human musicians naturally adapt to each other using subtle cues and variations in rhythm, tempo, and dynamics—elements that are not easily replicated by machines.

Key Innovation Areas

Multimodal IntegrationUsing multiple modes or methods of input/output, such as combining audio, visual, and gestural data - Combining audio, visual, and gestural inputs for comprehensive synchronization
Predictive Modeling - Using LSTMLong Short-Term Memory - a type of recurrent neural network capable of learning long-term dependencies networks to anticipate musical changes
Oscillator-Based Synchronization - Mathematical models for maintaining ensemble coordination
Real-timeProcessing or responding to data immediately as it is received, without delay Adaptation - Dynamic response to changing musical conditions

Computer Assisted Music Making (CAMM) Framework showing the relationship between Composition (CAMC) and Performance (CAMP)

Research Methodology

Dual Strategy Approach

🎵 Mapping Phase

Ensures control and sensing of musical instrument components along with parameters in sound synthesis. This includes:

Sensor integration
Control mechanisms
Sound synthesis parameters
MIDIMusical Instrument Digital Interface - a technical standard that describes a communications protocol for electronic musical instruments event generation

🤖 Modeling Phase

Focuses on capturing the overall representation of the musical process through:

Deep learningA subset of machine learning using neural networks with multiple layers to model and understand complex patterns models
Predictive algorithms
Feature extraction
Pattern recognition

Mathematical Foundation

$$\dot{\theta}_i = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin(\theta_j - \theta_i)$$

Kuramoto Model for Oscillator Synchronization

Key Contributions

🎼 Cyborg Philharmonic Framework

Novel multimodal synchronization system integrating audio, visual, and gestural cues for human-robot musical collaboration.

🎯 LeaderSTeM Algorithm

Advanced machine learning approach for dynamic leader identification in musical ensembles using LSTM networks.

👁️ Visual Synchronization

Integration of computer vision techniques for gesture-based musical synchronization and conductor following.

🔄 Real-time Adaptation

Continuous learning mechanisms that adapt to individual performance styles and musical preferences.

Publications and Impact

This research has resulted in several peer-reviewed publications and has been presented at international conferences in the fields of music technology, robotics, and human-computer interaction.

📚 Research Publications

13 Publications 125+ Citations

The Cyborg Philharmonic: Synchronizing interactive musical performances between humans and machines

S Chakraborty, S Dutta & J Timoney

Humanities and Social Sciences Communications 8 (1), 1–9

🔥 25 Citations 2021 🏆 Most Cited

A cooperative and interactive gesture-based drumming interface with application to the Internet of Musical Things

A Yaseen, S Chakraborty & J Timoney

International Conference on Human-Computer Interaction, pp. 85–92

📈 17 Citations 2022 🤖 IoMT

Adaptive touchless whole-body interaction for casual ubiquitous musical activities

S Chakraborty, A Yaseen, J Timoney, V Lazzarini & D Keller

International Computer Music Conference 2022, pp. 132–138

📈 15 Citations 2022 👐 Ubimus

Multimodal synchronization in musical ensembles: Investigating audio and visual cues

S Chakraborty & J Timoney

Companion Publication of the 25th International Conference on Multimodal Interaction

📈 13 Citations 2023 🎭 Multimodal

Integrity checking using third party auditor in cloud storage

S Chakraborty, S Singh & S Thokchom

2018 Eleventh International Conference on Contemporary Computing (IC3), pp. 1–6

📈 13 Citations 2018 ☁️ Cloud Computing

Banging interaction: A ubimus-design strategy for the musical internet

D Keller, A Yaseen, J Timoney, S Chakraborty & V Lazzarini

Future Internet 15 (4):125

📊 10 Citations 2023 🌐 Ubimus

A new method for detecting onset and offset for singing in real-time and offline environments

B Faghih, S Chakraborty, A Yaseen & J Timoney

Applied Sciences 12 (15):7391

📊 9 Citations 2022 🎤 Voice Processing

LeaderSTeM-A LSTM model for dynamic leader identification within musical streams

S Chakraborty, S Kishor, S Patil & J Timoney

Joint Conference on AI Music Creativity (AIMC 2019), Stockholm, Sweden

📊 9 Citations 2020 🧠 LSTM

Robot human synchronization for musical ensemble: progress and challenges

S Chakraborty & J Timoney

2020 5th International Conference on Robotics and Automation Engineering

📊 6 Citations 2020 🤖 Robotics

Beat Estimation from Musician Visual Cues

S Chakraborty, S Aktaş, W Clifford & J Timoney

18th Sound and Music Computing Conference (SMC 2021), pp. 46–52

📊 5 Citations 2021 👁️ Visual

Gesture Mediated Timbre-Led Design based Music Interface for Socio-musical Interaction

A Yaseen, S Chakraborty & J Timoney

International Conference on Human-Computer Interaction, pp. 335–347

📊 4 Citations 2023 🎨 HCI

Dynamic Drum Collective: Introducing Kuramoto-Based Temporalities in Banging Interaction

S Chakraborty, D Keller, A Yaseen & J Timoney

Symposium 2024 (UbiMus 2024), p. 121

🆕 New 2024 ⚡ Kuramoto

📊 Research Impact Metrics

125+

Total Citations

Publications

Years Active

Max Citations