theses

List of theses I co-supervised..

Theses since 2019 were conducted as internships at BdSound S.r.l. Apply here to be the next one.

Theses before 2019 were conducted as a PhD student or post-doc at Politecnico di Milano.

2025


A small-footprint real-time approach for in-cabin Emergency Vehicle Detection with deep learning

Francesco Bossio

This work proposes a real-time, small-footprint emergency siren detection system based on deep learning, designed to operate efficiently inside car cabin environments while meeting strict latency and computational constraints.

Bandwidth extension from wide-band to fullband for speech signals using a low-resource data-driven approach

Marcello Grati

This work proposes a data-driven and spectrum-based BWE method that leverages the source-filter representation of speech. The model predicts an amplitude mask applied to a shaped noise signal. Results demonstrated that the proposed approach achieves competitive perceptual quality, while drastically reducing computational complexity.

2024


Methods for providing input gain robustness to dnn-based real-time speech processing systems

Yilmaz Ugur Ozcan | Read the full abstract

Input gain variations can significantly impact the performance of DNN-based real-time speech processing systems. This thesis explores three methods to enhance robustness against these variations: Gain-Augmented Training, Differential Features, and Smoothed Frame Normalization. Experimental results show that these approaches improve the consistency and reliability of DNN outputs under varying input gain condition.

A Lightweight Speaker Verification System for Real-Time Applications

Eray Ozgunay

This work tackles key challenges in Speaker Verification (SV) by introducing a novel, lightweight SV system designed for real-time applications in noisy and reverberant environments. The system leverages advanced convolutional techniques within a Deep Neural Network (DNN) and real-time pooling layers to enhance responsiveness and stability across various acoustic conditions. While it may not achieve the highest performance levels, it excels in real-time processing, making it ideal for dynamic environments where speed and computational efficiency are crucial.

2023


A cascade approach for speech enhancement based on deep learning

Filippo Gualtieri

We propose a cascaded network with a lightweight phase-unaware approach and an optional more computationally demanding phase-aware stage to perform single-channel Speech Enahncement based on Deep Learning (DL). Our solution performs as good as more complex baselines in terms of parameters and Floating Point Operations (FLOPs) according to both objective quality metrics and subjective evaluations

2022


Real-time speech dereverberation using asmall-footprint convolutional neuralnetwork

Federico Di Marzo

We propose an innovative technique based on the use of a Convolutional Neural Network (CNN), designed to offer a small-footprint and optimized computational performance, for systems that workin real-time, with minimal latency.

Real-time multimicrophone speaker separation for the automotive scenario, using alightweight convolutional neural network

Federico Maver

We address the multichannel speaker separation problem, and we propose two causal and lightweight Deep Neural Network (DNN) models that can adapt to a wide range of microphone positions and distances. The problem focuses on the automotive scenario.

2021


A real-time solution for speech enhancement using dilated convolutional neuralnetworks

Fabio Segato | Read the full abstract

In this work, we propose a speech enhancement solution based on Deep Neural Networks that withstands the strict requirements imposed by embedded devices in terms of memory footprint and processing power. The proposed approach operates in real-time, extracting perceptually-relevant features in an efficient fashion.

A deep real-time talk state detector for acoustic echo cancellation

Daniele Foscarin

A novel approach, using a talk state detector (TSD) to enhance the performance of a linear acoustic echo cancellation. It consists of a fully convolutional neural network classifier that performs causal processing to meet the real-time requiremment with less than 8,000 trainable parameters.

Speaker recognition with small-footprint CNN

Francesco Salani

A speaker recognition system is a technology that aims to recognize a person’s identity based on their voice. In this thesis, we propose a low-latency speaker recognition system based on Deep Neural Networks.

2020


A hybrid approach for computationally-efficient beamforming using sparse linear microphonearrays

Davide Balsarri | Read the full abstract

We propose a hybrid beamforming solution that combines two methods: one that is efficient for signals with high input SNR and one with low input SNR. Results show that our SCM-based hybrid solution outperforms most SCM-based methods and exhibits a lower computational complexity.

2019


Voice activity detection using small-footprint deep learning

Luca Menescardi | Read the full abstract

Techniques employed to detect the presence or absence of human voice in an audio signal are called Voice Activity Detection (VAD) algorithms. Our approach optimizes both the feature extraction and the classification performed by the deep neural network. The goal is to comply with requirements imposed by embedded systems.

2018


Learning a personalized similarity metric for musical content

Luca Carloni | Read the full abstract

We present a hybrid model for personalized similarity modeling that relies on both content-based and user-related similarity information. We exploit a non-metric scaling technique to first elaborate a low-dimensional space (or embedding) which fulfills the similarity information provided by the user, and a regression technique to learn a mapping between content-based information and embedding-related information.

Beat tracking using recurrent neural network: a transfer learning approach

Davide Fiocchi | Read the full abstract

In this work, we propose an approach to apply transfer learning for beat tracking. We use a deep RNN as the starting network trained on popular music, and we transfer it to track beats of folk music. Moreover, we test if the resultant models are able to deal with highly variable music, such as Greek folk music.

A personalized metric for music similarity using Siamese deep neural networks

Federico Sala | Read the full abstract

In this thesis we propose an approach to model a personalized music similarity metric based on a Deep Neural Network. We use a first stage for learning a generic music similarity metric relying on a great amount of data, and a second stage for customizing it using personalized annotations collected through a survey.

Automatic playlist generation using recurrent neural network

Rosilde Tatiana Irene | Read the full abstract

In this study we propose an automatic playlist generation approach which analyzes hand-crafted playlists, understands their structure and generates new playlists accordingly. We have adopted a deep learning architecture, in particular a Recurrent Neural Network, which is specialized in sequence modeling.

2015


Analysis of musical structure: an approach based on deep learning

Davide Andreoletti | Read the full abstract

Analysis algorithm where we use a Deep Belief Network to extract a sequence of descriptors that is successively given as input to several Music Structural Analysis algorithms presented in literature.

2014


A music search engine based on a contextual related semantic model

Alessandro Gallo | Read the full abstract

In this work we propose an approach for music high-level description and music retrieval, that we named Contextual-related semantic model. Our method defines different semantic contexts and dimensional semantic relations between music descriptors belonging to the same context.