During the past eight years I have co-supervised a number of M.Sc. studests, mainly from the
Politecnico di Milano.
I currently co-supervise M.Sc. students for their thesis as an intern in BdSound.
Conducting a thesis at BdSound involves finding a trade-off between computational complexity,
real-time feasibility and performance.
For any question, do not hesitate to contact me.
Past theses
Methods for providing input gain robustness to dnn-based real-time speech processing systems
Yilmaz Ugur Ozcan, July 2024
Short abstract
Input gain variations can significantly impact the performance of DNN-based real-time speech processing systems.
This thesis explores three methods to enhance robustness against these variations: Gain-Augmented Training, Differential Features, and Smoothed Frame Normalization.
Experimental results show that these approaches improve the consistency and reliability of DNN outputs under varying input gain condition.
Read the full abstract
A Lightweight Speaker Verification System for Real-Time Applications
Eray Ozgunay, July 2024
Short abstract
This work tackles key challenges in Speaker Verification (SV) by introducing a novel, lightweight SV system designed for real-time applications in noisy and reverberant environments.
The system leverages advanced convolutional techniques within a Deep Neural Network (DNN) and real-time pooling layers to enhance responsiveness and stability across various acoustic conditions.
While it may not achieve the highest performance levels, it excels in real-time processing, making it ideal for dynamic environments where speed and computational efficiency are crucial.
A cascade approach for speech enhancement based on deep learning
Filippo Gualtieri, April 2023
Short abstract
We propose a cascaded network with a lightweight phase-unaware approach and an optional more
computationally
demanding phase-aware stage to perform single-channel Speech Enahncement based on Deep Learning
(DL). Our solution performs as good as more complex baselines
in terms of parameters and Floating Point Operations (FLOPs) according to both objective quality
metrics and subjective evaluations
Real-time multimicrophone speaker separation for the automotive scenario, using a
lightweight convolutional neural network
Federico Maver, December 2022
Short abstract
We address the multichannel speaker separation problem, and we propose two causal and
lightweight Deep Neural Network (DNN) models that can
adapt to a wide range of microphone positions and distances. The problem focuses on the
automotive scenario.
Real-time speech dereverberation using asmall-footprint convolutional neural
network
Federico Di Marzo, April 2022
Short abstract
We propose an innovative technique based on the use of a Convolutional Neural Network (CNN),
designed to offer a small-footprint and optimized computational performance, for systems that
workin real-time,
with minimal latency.
Speaker recognition with small-footprint CNN
Francesco Salani, December 2021
Short abstract
A speaker recognition system is a technology that aims to recognize
a person's identity based on their voice.
In this thesis, we propose a low-latency speaker recognition system based on
Deep Neural Networks.
A deep real-time talk state detector for acoustic echo cancellation
Daniele Foscarin, September 2021
Short abstract
A novel approach, using a talk state detector (TSD) to enhance the performance of a linear
acoustic echo cancellation.
It consists of a fully convolutional neural network classifier that performs causal processing
to meet the real-time requiremment with less than 8,000 trainable parameters.
A real-time solution for speech enhancement using dilated convolutional neural
networks
Fabio Segato, July 2021
Short abstract
In this work, we propose a speech enhancement solution based on Deep Neural Networks that
withstands the strict
requirements imposed by embedded devices in terms of memory footprint and processing power.
The proposed approach operates in real-time, extracting perceptually-relevant features in
an efficient fashion.
Read the full abstract
A hybrid approach for computationally-efficient beamforming using sparse linear microphone
arrays
Davide Balsarri, December 2020
Short abstract We propose a hybrid beamforming solution that combines
two methods: one that is efficient for signals with high input SNR and one with low input SNR.
Results show that our SCM-based hybrid
solution outperforms most SCM-based methods and exhibits a lower computational complexity.
Read the full abstract
Voice activity detection using small-footprint deep learning
Luca Menescardi, December 2019
Short abstract Techniques employed to detect the presence or absence of human
voice in an audio signal are called Voice Activity Detection
(VAD) algorithms. Our approach optimizes both the feature extraction and the classification
performed by the deep neural network. The goal is to comply with requirements imposed by
embedded systems.
Read the full abstract
Automatic playlist generation using recurrent neural network
Rosilde Tatiana Irene, July 2018
Short abstract In this study we propose an automatic playlist generation
approach which analyzes hand-crafted playlists, understands their
structure and generates new playlists accordingly. We have adopted a deep learning architecture,
in particular a Recurrent Neural
Network, which is specialized in sequence modeling.
Read the thesis
Beat tracking using recurrent neural network : a transfer learning approach
Davide Fiocchi, April 2018
Short abstract In this work, we propose an approach to apply transfer learning
for beat tracking.
We use a deep RNN as the starting network trained on popular music, and we transfer it to track
beats of folk music.
Moreover, we test if the resultant models are able to deal with highly variable music, such as
Greek folk music.
Read the thesis
Learning a personalized similarity metric for musical content
Luca Carloni, April, 2018
Short abstract We present a hybrid model for personalized
similarity modeling that relies on both content-based and user-related similarity information.
We exploit a non-metric scaling technique to first elaborate a
low-dimensional space (or embedding) which fulfills the similarity information provided by the
user, and a regression technique to learn a mapping between
content-based information and embedding-related information.
Read the thesis
A personalized metric for music similarity using Siamese deep neural networks
Federico Sala, April 2018
Short abstract In this thesis we propose
an approach to model a personalized music similarity metric based on a Deep Neural Network.
We use a first stage for learning a generic music similarity metric relying on a great amount of
data,
and a second stage for customizing it using personalized annotations collected through a survey.
Read the thesis
Analysis of musical structure : an approach based on deep learning
Davide Andreoletti, July 2015
Short abstract We propose a Music Structural
Analysis algorithm where we use a Deep Belief Network to extract a sequence of descriptors that
is successively
given as input to several Music Structural Analysis algorithms presented in literature.
Read the thesis
A music search engine based on a contextual related semantic model
Alessandro Gallo, April 2014
Short abstract In this work we propose an approach for music high-level
description and music retrieval, that we named Contextual-related semantic model. Our method
defines different semantic contexts and dimensional semantic relations between music descriptors
belonging to the same context.
Read the thesis
Audio speech source separation and enhancement in an automotive scenario using different microphone configurations
Federico Maver, Daniele Foscarin, Davide Balsarri, Luca Menescardi, Michele Buccoli, Simone Pecorino, Antonio Grosso
ù
2024 AES 5th International Conference on Automotive Audio
An empirical evaluation of in-car acoustic measurements for the sports car scenario
David Badiane, Filippo Gualtieri, Alessandro Proverbio, Michele Buccoli, Simone Pecorino, Antonio Grosso, Michele Ebri, Alfonso Oliva, Luca Battisti, Marco Olivieri
2024 AES 5th International Conference on Automotive Audio
In collaboration with Teoresi and Ferrari
Real-Time Multichannel Speech Separation and Enhancement Using a Beamspace-Domain-Based Lightweight CNN | Link
Marco Olivieri, Luca Comanducci, Mirco Pezzoli, Davide Balsarri, Luca Menescardi, Michele Buccoli, Simone Pecorino, Antonio Grosso, Fabio Antonacci, Augusto Sarti
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
Towards a general framework for the annotation of dance motion sequences | Link
Katerina El Raheb, Michele Buccoli, Massimiliano Zanoni, Akrivi Katifori, Aristotelis
Kasomoulis, Augusto Sarti, Yannis Ioannidis
Multimedia Tools and Applications, 2022
Deep music on air | Link
Massimiliano Zanoni, Michele Buccoli, Guglielmo Cassinelli, Giorgio Rinolfi
Proceedings of the 9th International Conference on Digital and Interactive Arts, 2019
A Presence- and Performance-Driven Framework to Investigate Interactive Networked Music
Learning Scenarios | Link
Stefano Delle Monache, Luca Comanducci, Michele Buccoli, Massimiliano Zanoni, Augusto Sarti,
Enrico Pietrocola, Filippo Berbenni, and Giovanni Cospito
Wireless Communications and Mobile Computing, 2019
Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural
Networks | Link
Rosilde Tatiana Irene, Clara Borrelli, Massimiliano Zanoni, Michele Buccoli, Augusto Sarti
Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019
Virtual Reality and Choreographic Practice: The Potential for New Creative Methods | Link
Rosa E. Cisneros, Karen Wood, Sarah Whatley, Michele Buccoli, Massimiliano Zanoni, Augusto
Sarti
Body, Space & Technology 18 (1), 2019
Three-dimensional mapping of high-level music features for music browsing | Link
Stefano Cherubin, Clara Borrelli, Massimiliano Zanoni, Michele Buccoli, Augusto Sarti, Stefano
Tubaro
Proceedings of the International Workshop on Multilayer Music Representation and Processing
(MMRP), Milan, Italy, 2019
Investigating Networked Music Performances in Pedagogical Scenarios for the InterMUSIC
Project | Link
Luca Comanducci, Michele Buccoli, Massimiliano Zanoni, Augusto Sarti, Stefano Delle Monache,
Giovanni Cospito, Enrico Pietrocola, Filippo Berbenni
Proceedings of the 23rd Conference of Open Innovations Association (FRUCT), Bologna, Italy,
2018
Time is not on my side: network latency, presence and performance in remote music
interaction| Link
Stefano Delle Monache, Michele Buccoli, Luca Comanducci, Augusto Sarti, Giovanni Cospito, Enrico
Pietrocola, Filippo Berbenni
Proceedings of the XXIII Colloquio di Informatica Musicale (CIM), Udine, Italy, 2018
WhoLoDancE: Whole-body Interaction Learning for Dance Education
Anna Rizzo, Katerina El Raheb, Sarah Whatley, Rosa Maria Cisneros, Massimiliano Zanoni, Antonio
Camurri, Vladimir Viro, Jean-Marc Matos, Stefano Piana, Michele Buccoli, Amalia Markatzi, Pablo
Palacio, Oshri Zohar Even, Augusto Sarti, Yannis Ioannidis, Edwin-Morley Fletcher
EUROMED International Conference on Digital Heritage, 2018
Beat tracking using recurrent neural network: a transfer learning approach
Davide Fiocchi, Michele Buccoli, Massimiliano Zanoni, Fabio Antonacci, Augusto Sarti
Proc. of the 26th European Signal Processing Conference (EUSIPCO), 2018
Using multi-dimensional correlation for matching and alignment of MoCap and video
signals
Michele Buccoli, Bruno Di Giorgi, Massimiliano Zanoni, Fabio Antonacci, Augusto Sarti
Proceedings of the IEEE 19th International Workshop on Multimedia Signal Processing (MMSP),
Luton, United Kingdom, 2017
The paper won the Top-10% award
Unsupervised feature learning for Music Structural Analysis
Michele Buccoli, Davide Andreoletti, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
Proceedings of 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 2016
A higher-dimensional expansion of affective norms for English terms for music tagging
Michele Buccoli, Massimiliano Zanoni, György Fazekas, Augusto Sarti, Mark Sandler and Stefano
Tubaro
Proceedings of 17th International Society for Music Information Retrieval Conference (ISMIR),
New York City, USA, 2016
A Dimensional Contextual Semantic Model For Music Description And Retrieval
Michele Buccoli, Alessandro Gallo, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
DMRN+10: Digital Music Research Network One-day Workshop 2015, London, UK, 2015
Feature-Based Analysis of the Effects of Packet Delay on Networked Musical Interactions
Cristina Emma Margherita Rottondi, Michele Buccoli, Massimiliano Zanoni, Dario Garao, Giacomo
Verticale, Augusto Sarti
Journal of the Audio Engineering Society 63 (11), 864-875
An Unsupervised Approach To The Semantic Description Of The Sound Quality Of
Violins
Michele Buccoli, Massimiliano Zanoni, Francesco Setragno, Augusto Sarti, Fabio Antonacci
European Signal Processing Conference (EUSIPCO), Nice, France, 2015
A Dimensional Contextual Semantic Model For Music Description And Retrieval
Michele Buccoli, Assandro Gallo, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane,
Australia, 2015
Unsupervised Feature Learning For Bootleg Detection Using Deep Learning
Architectures
Michele Buccoli, Paolo Bestagini, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
IEEE International Workshop on Information Forensics and Security (WIFS), Atlanta, USA, 2014
A Music Search Engine Based Of Semantic Text-query Query
Michele Buccoli, Massimiliano Zanoni, Augusto Sarti, Stefano Tubaro
in Proceedings of the 15th international workshop on multimedia signal processing - MMSP 2013 -
September 30 - October 02, 2013, Pula (Sardinia), Italy
I am a hackathon enthusiast. Participating to hackathons is the best way to
meet new people, learn new packages, test my coding-under-stress skills and
quickly realize new projects.
This section helps me keeping track of the hackathons I participated so far. The teammates for
each project are usually indicated in the source code.
Winner of the hackathon
remote, July 8-10 2022
Pacifier
👶Pacifier💤 is a tool for converting the melody of any song into a lullaby to put your baby to
sleep.
Source code
remote, September 11-12 2020
Unnamed project
This was supposed to generate music from landscape pictures, but it failed miserably.
Source code
Winner of challenge "daily routine"
remote, April 3-6 2020
Corunner Virus
🏃Corunner Virus🦠 is a tool for running from home with an AI-based augmenter-reality
treadmill.
Learn more (in Italian)
Demo
Abbey Road Studios, London, November 10-11, 2018
Abbey Blues
Abbey Blues is a tool / live performance for guitar and lyrics. The tool recognizes sentiment
of the lyrics that are being singed and triggers different background music and guitar
effects.
Source code
Best hack on music creation or performance
Vienna, September 29-30, 2017
Samosa
Samosa is a tool / live performance for face and guitar. A camera recognizes the emotion
expressed by the face of the performer, triggering different background music and effect for the
guitar performer.
Source code
Spotify HQ, New York, August 6th 2016
Unnamed project
This was supposed to recognize people rapping and scoring the quality of their rhymes, but it
failed miserably.
Source code
Teaching assistant for Creative Computing and Programming course
2019 - present
M.Sc. in Computer Engineering - Politecnico di Milano
The course is taught in English.
You can find the material of the course by connecting to the beep portal and logging in with
your PoliMI credentials.
Teaching assistant for the Information Retrieval and Data Mining course
A.Y 2015/2016; 2016/2017; 2017/2018; 2018/2019
M.Sc. in Computer Engineering - Politecnico di Milano
The course was taught in English.
You can find the material of the course by connecting to the beep portal and logging in with
your PoliMI credentials.
Organizer and Teacher of the workshop Creative Computing for Artistic Performances
A.Y 2017/2018
B.Sc. and M.Sc. in Engineering - Politecnico di Milano
Exercizes on Multimedia Signal Processing, 1st module
A.Y 2014/2015; 2016/2017; 2017/2018; 2018/2019
M.Sc. in Computer Engineering - Politecnico di Milano
The course was taught in English.
Matlab Tutoring
A.Y 2014/2015
M.Sc. in Computer Engineering - Como Campus Politecnico di Milano
The course was taught in English.