theses
List of theses I co-supervised..
Theses since 2019 were conducted as internships at BdSound S.r.l. Apply here to be the next one.
Theses before 2019 were conducted as a PhD student or post-doc at Politecnico di Milano.
2025
A small-footprint real-time approach for in-cabin Emergency Vehicle Detection with deep learning
Francesco Bossio
This work proposes a real-time, small-footprint emergency siren detection system based on deep learning, designed to operate efficiently inside car cabin environments while meeting strict latency and computational constraints.
Bandwidth extension from wide-band to fullband for speech signals using a low-resource data-driven approach
Marcello Grati
This work proposes a data-driven and spectrum-based BWE method that leverages the source-filter representation of speech. The model predicts an amplitude mask applied to a shaped noise signal. Results demonstrated that the proposed approach achieves competitive perceptual quality, while drastically reducing computational complexity.
2024
Methods for providing input gain robustness to dnn-based real-time speech processing systems
Yilmaz Ugur Ozcan | Read the full abstract
Input gain variations can significantly impact the performance of DNN-based real-time speech processing systems. This thesis explores three methods to enhance robustness against these variations: Gain-Augmented Training, Differential Features, and Smoothed Frame Normalization. Experimental results show that these approaches improve the consistency and reliability of DNN outputs under varying input gain condition.
A Lightweight Speaker Verification System for Real-Time Applications
Eray Ozgunay
This work tackles key challenges in Speaker Verification (SV) by introducing a novel, lightweight SV system designed for real-time applications in noisy and reverberant environments. The system leverages advanced convolutional techniques within a Deep Neural Network (DNN) and real-time pooling layers to enhance responsiveness and stability across various acoustic conditions. While it may not achieve the highest performance levels, it excels in real-time processing, making it ideal for dynamic environments where speed and computational efficiency are crucial.
2023
A cascade approach for speech enhancement based on deep learning
Filippo Gualtieri
We propose a cascaded network with a lightweight phase-unaware approach and an optional more computationally demanding phase-aware stage to perform single-channel Speech Enahncement based on Deep Learning (DL). Our solution performs as good as more complex baselines in terms of parameters and Floating Point Operations (FLOPs) according to both objective quality metrics and subjective evaluations
2022
Real-time speech dereverberation using asmall-footprint convolutional neuralnetwork
Federico Di Marzo
We propose an innovative technique based on the use of a Convolutional Neural Network (CNN), designed to offer a small-footprint and optimized computational performance, for systems that workin real-time, with minimal latency.
Real-time multimicrophone speaker separation for the automotive scenario, using alightweight convolutional neural network
Federico Maver
We address the multichannel speaker separation problem, and we propose two causal and lightweight Deep Neural Network (DNN) models that can adapt to a wide range of microphone positions and distances. The problem focuses on the automotive scenario.
2021
A real-time solution for speech enhancement using dilated convolutional neuralnetworks
Fabio Segato | Read the full abstract
In this work, we propose a speech enhancement solution based on Deep Neural Networks that withstands the strict requirements imposed by embedded devices in terms of memory footprint and processing power. The proposed approach operates in real-time, extracting perceptually-relevant features in an efficient fashion.
A deep real-time talk state detector for acoustic echo cancellation
Daniele Foscarin
A novel approach, using a talk state detector (TSD) to enhance the performance of a linear acoustic echo cancellation. It consists of a fully convolutional neural network classifier that performs causal processing to meet the real-time requiremment with less than 8,000 trainable parameters.
Speaker recognition with small-footprint CNN
Francesco Salani
A speaker recognition system is a technology that aims to recognize a person’s identity based on their voice. In this thesis, we propose a low-latency speaker recognition system based on Deep Neural Networks.
2020
A hybrid approach for computationally-efficient beamforming using sparse linear microphonearrays
Davide Balsarri | Read the full abstract
We propose a hybrid beamforming solution that combines two methods: one that is efficient for signals with high input SNR and one with low input SNR. Results show that our SCM-based hybrid solution outperforms most SCM-based methods and exhibits a lower computational complexity.
2019
Voice activity detection using small-footprint deep learning
Luca Menescardi | Read the full abstract
Techniques employed to detect the presence or absence of human voice in an audio signal are called Voice Activity Detection (VAD) algorithms. Our approach optimizes both the feature extraction and the classification performed by the deep neural network. The goal is to comply with requirements imposed by embedded systems.
2018
Learning a personalized similarity metric for musical content
Luca Carloni | Read the full abstract
We present a hybrid model for personalized similarity modeling that relies on both content-based and user-related similarity information. We exploit a non-metric scaling technique to first elaborate a low-dimensional space (or embedding) which fulfills the similarity information provided by the user, and a regression technique to learn a mapping between content-based information and embedding-related information.
Beat tracking using recurrent neural network: a transfer learning approach
Davide Fiocchi | Read the full abstract
In this work, we propose an approach to apply transfer learning for beat tracking. We use a deep RNN as the starting network trained on popular music, and we transfer it to track beats of folk music. Moreover, we test if the resultant models are able to deal with highly variable music, such as Greek folk music.
A personalized metric for music similarity using Siamese deep neural networks
Federico Sala | Read the full abstract
In this thesis we propose an approach to model a personalized music similarity metric based on a Deep Neural Network. We use a first stage for learning a generic music similarity metric relying on a great amount of data, and a second stage for customizing it using personalized annotations collected through a survey.
Automatic playlist generation using recurrent neural network
Rosilde Tatiana Irene | Read the full abstract
In this study we propose an automatic playlist generation approach which analyzes hand-crafted playlists, understands their structure and generates new playlists accordingly. We have adopted a deep learning architecture, in particular a Recurrent Neural Network, which is specialized in sequence modeling.
2015
Analysis of musical structure: an approach based on deep learning
Davide Andreoletti | Read the full abstract
Analysis algorithm where we use a Deep Belief Network to extract a sequence of descriptors that is successively given as input to several Music Structural Analysis algorithms presented in literature.
2014
A music search engine based on a contextual related semantic model
Alessandro Gallo | Read the full abstract
In this work we propose an approach for music high-level description and music retrieval, that we named Contextual-related semantic model. Our method defines different semantic contexts and dimensional semantic relations between music descriptors belonging to the same context.