Michele Buccoli

Since 2019 I am Senior Innovation Scientist at BdSound for the development of speech-based solutions based on machine and deep learning.
Among my duties, I scout new technologies, co-lead BdSound's innovation group and act as a technical manager for internal and external projects. I co-supervise students for their M.Sc. thesis.

I received my B.Sc. in Computer Engineering in 2010 at the Università di Pisa and my M.Sc. in Computer Engineering (focused on Sound and Music Engineering) in 2013 at the Politecnico di Milano. In 2016 I defended my doctoral thesis titled "Linking signal and semantic representations of musical content for music information retrieval", and then conducted two years of post-doc research at Politecnico di Milano.

My research has concerned music information retrieval, analysis of dance motion sequences, audio and speech processing, machine and deep learning techniques. I also have some experience with Motion Caption sequences, VR applications and use of Large Language Models.

Theses

During the past eight years I have co-supervised a number of M.Sc. studests, mainly from the Politecnico di Milano.
I currently co-supervise M.Sc. students for their thesis as an intern in BdSound. Conducting a thesis at BdSound involves finding a trade-off between computational complexity, real-time feasibility and performance.
For any question, do not hesitate to contact me.

Past theses

Methods for providing input gain robustness to dnn-based real-time speech processing systems
Yilmaz Ugur Ozcan, July 2024
Short abstract Input gain variations can significantly impact the performance of DNN-based real-time speech processing systems. This thesis explores three methods to enhance robustness against these variations: Gain-Augmented Training, Differential Features, and Smoothed Frame Normalization. Experimental results show that these approaches improve the consistency and reliability of DNN outputs under varying input gain condition.
Read the full abstract

A Lightweight Speaker Verification System for Real-Time Applications
Eray Ozgunay, July 2024
Short abstract This work tackles key challenges in Speaker Verification (SV) by introducing a novel, lightweight SV system designed for real-time applications in noisy and reverberant environments. The system leverages advanced convolutional techniques within a Deep Neural Network (DNN) and real-time pooling layers to enhance responsiveness and stability across various acoustic conditions. While it may not achieve the highest performance levels, it excels in real-time processing, making it ideal for dynamic environments where speed and computational efficiency are crucial.

A cascade approach for speech enhancement based on deep learning
Filippo Gualtieri, April 2023
Short abstract We propose a cascaded network with a lightweight phase-unaware approach and an optional more computationally demanding phase-aware stage to perform single-channel Speech Enahncement based on Deep Learning (DL). Our solution performs as good as more complex baselines in terms of parameters and Floating Point Operations (FLOPs) according to both objective quality metrics and subjective evaluations

Real-time multimicrophone speaker separation for the automotive scenario, using a lightweight convolutional neural network
Federico Maver, December 2022
Short abstract We address the multichannel speaker separation problem, and we propose two causal and lightweight Deep Neural Network (DNN) models that can adapt to a wide range of microphone positions and distances. The problem focuses on the automotive scenario.

Real-time speech dereverberation using asmall-footprint convolutional neural network
Federico Di Marzo, April 2022
Short abstract We propose an innovative technique based on the use of a Convolutional Neural Network (CNN), designed to offer a small-footprint and optimized computational performance, for systems that workin real-time, with minimal latency.

Speaker recognition with small-footprint CNN
Francesco Salani, December 2021
Short abstract A speaker recognition system is a technology that aims to recognize a person's identity based on their voice. In this thesis, we propose a low-latency speaker recognition system based on Deep Neural Networks.

A deep real-time talk state detector for acoustic echo cancellation
Daniele Foscarin, September 2021
Short abstract A novel approach, using a talk state detector (TSD) to enhance the performance of a linear acoustic echo cancellation. It consists of a fully convolutional neural network classifier that performs causal processing to meet the real-time requiremment with less than 8,000 trainable parameters.

A real-time solution for speech enhancement using dilated convolutional neural networks
Fabio Segato, July 2021
Short abstract In this work, we propose a speech enhancement solution based on Deep Neural Networks that withstands the strict requirements imposed by embedded devices in terms of memory footprint and processing power. The proposed approach operates in real-time, extracting perceptually-relevant features in an efficient fashion.
Read the full abstract

A hybrid approach for computationally-efficient beamforming using sparse linear microphone arrays
Davide Balsarri, December 2020
Short abstract We propose a hybrid beamforming solution that combines two methods: one that is efficient for signals with high input SNR and one with low input SNR. Results show that our SCM-based hybrid solution outperforms most SCM-based methods and exhibits a lower computational complexity.
Read the full abstract

Voice activity detection using small-footprint deep learning
Luca Menescardi, December 2019
Short abstract Techniques employed to detect the presence or absence of human voice in an audio signal are called Voice Activity Detection (VAD) algorithms. Our approach optimizes both the feature extraction and the classification performed by the deep neural network. The goal is to comply with requirements imposed by embedded systems.
Read the full abstract

Automatic playlist generation using recurrent neural network
Rosilde Tatiana Irene, July 2018
Short abstract In this study we propose an automatic playlist generation approach which analyzes hand-crafted playlists, understands their structure and generates new playlists accordingly. We have adopted a deep learning architecture, in particular a Recurrent Neural Network, which is specialized in sequence modeling.
Read the thesis

Beat tracking using recurrent neural network : a transfer learning approach
Davide Fiocchi, April 2018
Short abstract In this work, we propose an approach to apply transfer learning for beat tracking. We use a deep RNN as the starting network trained on popular music, and we transfer it to track beats of folk music. Moreover, we test if the resultant models are able to deal with highly variable music, such as Greek folk music.
Read the thesis

Learning a personalized similarity metric for musical content
Luca Carloni, April, 2018
Short abstract We present a hybrid model for personalized similarity modeling that relies on both content-based and user-related similarity information. We exploit a non-metric scaling technique to first elaborate a low-dimensional space (or embedding) which fulfills the similarity information provided by the user, and a regression technique to learn a mapping between content-based information and embedding-related information.
Read the thesis

A personalized metric for music similarity using Siamese deep neural networks
Federico Sala, April 2018
Short abstract In this thesis we propose an approach to model a personalized music similarity metric based on a Deep Neural Network. We use a first stage for learning a generic music similarity metric relying on a great amount of data, and a second stage for customizing it using personalized annotations collected through a survey.
Read the thesis

Analysis of musical structure : an approach based on deep learning
Davide Andreoletti, July 2015
Short abstract We propose a Music Structural Analysis algorithm where we use a Deep Belief Network to extract a sequence of descriptors that is successively given as input to several Music Structural Analysis algorithms presented in literature.
Read the thesis

A music search engine based on a contextual related semantic model
Alessandro Gallo, April 2014
Short abstract In this work we propose an approach for music high-level description and music retrieval, that we named Contextual-related semantic model. Our method defines different semantic contexts and dimensional semantic relations between music descriptors belonging to the same context.
Read the thesis

Publications

Audio speech source separation and enhancement in an automotive scenario using different microphone configurations
Federico Maver, Daniele Foscarin, Davide Balsarri, Luca Menescardi, Michele Buccoli, Simone Pecorino, Antonio Grosso
ù 2024 AES 5th International Conference on Automotive Audio

An empirical evaluation of in-car acoustic measurements for the sports car scenario
David Badiane, Filippo Gualtieri, Alessandro Proverbio, Michele Buccoli, Simone Pecorino, Antonio Grosso, Michele Ebri, Alfonso Oliva, Luca Battisti, Marco Olivieri
2024 AES 5th International Conference on Automotive Audio
In collaboration with Teoresi and Ferrari

Real-Time Multichannel Speech Separation and Enhancement Using a Beamspace-Domain-Based Lightweight CNN | Link
Marco Olivieri, Luca Comanducci, Mirco Pezzoli, Davide Balsarri, Luca Menescardi, Michele Buccoli, Simone Pecorino, Antonio Grosso, Fabio Antonacci, Augusto Sarti
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Towards a general framework for the annotation of dance motion sequences | Link
Katerina El Raheb, Michele Buccoli, Massimiliano Zanoni, Akrivi Katifori, Aristotelis Kasomoulis, Augusto Sarti, Yannis Ioannidis
Multimedia Tools and Applications, 2022

Deep music on air | Link
Massimiliano Zanoni, Michele Buccoli, Guglielmo Cassinelli, Giorgio Rinolfi
Proceedings of the 9th International Conference on Digital and Interactive Arts, 2019

A Presence- and Performance-Driven Framework to Investigate Interactive Networked Music Learning Scenarios | Link
Stefano Delle Monache, Luca Comanducci, Michele Buccoli, Massimiliano Zanoni, Augusto Sarti, Enrico Pietrocola, Filippo Berbenni, and Giovanni Cospito
Wireless Communications and Mobile Computing, 2019

Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural Networks | Link
Rosilde Tatiana Irene, Clara Borrelli, Massimiliano Zanoni, Michele Buccoli, Augusto Sarti
Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019

Virtual Reality and Choreographic Practice: The Potential for New Creative Methods | Link
Rosa E. Cisneros, Karen Wood, Sarah Whatley, Michele Buccoli, Massimiliano Zanoni, Augusto Sarti
Body, Space & Technology 18 (1), 2019

Three-dimensional mapping of high-level music features for music browsing | Link
Stefano Cherubin, Clara Borrelli, Massimiliano Zanoni, Michele Buccoli, Augusto Sarti, Stefano Tubaro
Proceedings of the International Workshop on Multilayer Music Representation and Processing (MMRP), Milan, Italy, 2019

Investigating Networked Music Performances in Pedagogical Scenarios for the InterMUSIC Project | Link
Luca Comanducci, Michele Buccoli, Massimiliano Zanoni, Augusto Sarti, Stefano Delle Monache, Giovanni Cospito, Enrico Pietrocola, Filippo Berbenni
Proceedings of the 23rd Conference of Open Innovations Association (FRUCT), Bologna, Italy, 2018

Time is not on my side: network latency, presence and performance in remote music interaction| Link
Stefano Delle Monache, Michele Buccoli, Luca Comanducci, Augusto Sarti, Giovanni Cospito, Enrico Pietrocola, Filippo Berbenni
Proceedings of the XXIII Colloquio di Informatica Musicale (CIM), Udine, Italy, 2018

WhoLoDancE: Whole-body Interaction Learning for Dance Education
Anna Rizzo, Katerina El Raheb, Sarah Whatley, Rosa Maria Cisneros, Massimiliano Zanoni, Antonio Camurri, Vladimir Viro, Jean-Marc Matos, Stefano Piana, Michele Buccoli, Amalia Markatzi, Pablo Palacio, Oshri Zohar Even, Augusto Sarti, Yannis Ioannidis, Edwin-Morley Fletcher
EUROMED International Conference on Digital Heritage, 2018

Beat tracking using recurrent neural network: a transfer learning approach
Davide Fiocchi, Michele Buccoli, Massimiliano Zanoni, Fabio Antonacci, Augusto Sarti
Proc. of the 26th European Signal Processing Conference (EUSIPCO), 2018

Using multi-dimensional correlation for matching and alignment of MoCap and video signals
Michele Buccoli, Bruno Di Giorgi, Massimiliano Zanoni, Fabio Antonacci, Augusto Sarti
Proceedings of the IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), Luton, United Kingdom, 2017
The paper won the Top-10% award

Unsupervised feature learning for Music Structural Analysis
Michele Buccoli, Davide Andreoletti, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
Proceedings of 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 2016

A higher-dimensional expansion of affective norms for English terms for music tagging
Michele Buccoli, Massimiliano Zanoni, György Fazekas, Augusto Sarti, Mark Sandler and Stefano Tubaro
Proceedings of 17th International Society for Music Information Retrieval Conference (ISMIR), New York City, USA, 2016

A Dimensional Contextual Semantic Model For Music Description And Retrieval
Michele Buccoli, Alessandro Gallo, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
DMRN+10: Digital Music Research Network One-day Workshop 2015, London, UK, 2015

Feature-Based Analysis of the Effects of Packet Delay on Networked Musical Interactions
Cristina Emma Margherita Rottondi, Michele Buccoli, Massimiliano Zanoni, Dario Garao, Giacomo Verticale, Augusto Sarti
Journal of the Audio Engineering Society 63 (11), 864-875

An Unsupervised Approach To The Semantic Description Of The Sound Quality Of Violins
Michele Buccoli, Massimiliano Zanoni, Francesco Setragno, Augusto Sarti, Fabio Antonacci
European Signal Processing Conference (EUSIPCO), Nice, France, 2015

A Dimensional Contextual Semantic Model For Music Description And Retrieval
Michele Buccoli, Assandro Gallo, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015

Unsupervised Feature Learning For Bootleg Detection Using Deep Learning Architectures
Michele Buccoli, Paolo Bestagini, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
IEEE International Workshop on Information Forensics and Security (WIFS), Atlanta, USA, 2014

A Music Search Engine Based Of Semantic Text-query Query
Michele Buccoli, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
in Proceedings of the 15th international workshop on multimedia signal processing - MMSP 2013 - September 30 - October 02, 2013, Pula (Sardinia), Italy

Hackathons

I am a hackathon enthusiast. Participating to hackathons is the best way to meet new people, learn new packages, test my coding-under-stress skills and quickly realize new projects.

This section helps me keeping track of the hackathons I participated so far. The teammates for each project are usually indicated in the source code.

Winner of the hackathon

The Sound of AI hackathon

remote, July 8-10 2022
Pacifier
👶Pacifier💤 is a tool for converting the melody of any song into a lullaby to put your baby to sleep.
Source code

Waves Festival hackday

remote, September 11-12 2020
Unnamed project
This was supposed to generate music from landscape pictures, but it failed miserably.
Source code

Winner of challenge "daily routine"

HackAtHome

remote, April 3-6 2020
Corunner Virus
🏃Corunner Virus🦠 is a tool for running from home with an AI-based augmenter-reality treadmill.
Learn more (in Italian)
Demo

Abbey Road Red Hackathon

Abbey Road Studios, London, November 10-11, 2018
Abbey Blues
Abbey Blues is a tool / live performance for guitar and lyrics. The tool recognizes sentiment of the lyrics that are being singed and triggers different background music and guitar effects.
Source code

Best hack on music creation or performance

Waves Festival hackday

Vienna, September 29-30, 2017
Samosa
Samosa is a tool / live performance for face and guitar. A camera recognizes the emotion expressed by the face of the performer, triggering different background music and effect for the guitar performer.
Source code

Hacking Audio and Music Research (HAMR) 2016

Spotify HQ, New York, August 6th 2016
Unnamed project
This was supposed to recognize people rapping and scoring the quality of their rhymes, but it failed miserably.
Source code

Projects

Italian Music Tech

Italian Music Tech is a community open to Italian-speaking students, researchers, professionals, hackers and makers, and whoever wants lives in the intersection between music and technology. Italian Music Tech was founded by Lorenzo Porcaro, M. Stella Tavella, Ilaria Manco and Michele Buccoli.

Learn More

WhoLoDance

Wholodance is a Horizon2020 European project that aims at developing and applying breakthrough technologies to Dance Learning in order to achieve results that will have relevant impacts on numerous targets including, but not limited to, the dance practitioners ranging from Researchers and Professionals to Dance Students and the Interested Public.

Learn More

Intermusic

Intermusic is a project carried out on the basis of a strategic partnership between European Music Higher Education Institutions. For the Intermusic project, I conducted several perceptual test at Conservatorio di Milano to assess how latency affect remote music performances. This project is financed through the Erasmus+ Italian National Agency in the framework of the Key Action 2.

Learn more

Teaching

Teaching assistant for Creative Computing and Programming course

2019 - present
M.Sc. in Computer Engineering - Politecnico di Milano
The course is taught in English.
You can find the material of the course by connecting to the beep portal and logging in with your PoliMI credentials.

Beep Portal

Teaching assistant for the Information Retrieval and Data Mining course

A.Y 2015/2016; 2016/2017; 2017/2018; 2018/2019
M.Sc. in Computer Engineering - Politecnico di Milano
The course was taught in English.
You can find the material of the course by connecting to the beep portal and logging in with your PoliMI credentials.

Beep Portal

Organizer and Teacher of the workshop Creative Computing for Artistic Performances

A.Y 2017/2018
B.Sc. and M.Sc. in Engineering - Politecnico di Milano

Exercizes on Multimedia Signal Processing, 1st module

A.Y 2014/2015; 2016/2017; 2017/2018; 2018/2019
M.Sc. in Computer Engineering - Politecnico di Milano
The course was taught in English.

Slides and Scripts

Matlab Tutoring

A.Y 2014/2015
M.Sc. in Computer Engineering - Como Campus Politecnico di Milano
The course was taught in English.