► Tutorials

Whole program here

New Mathematical Tools for Audio Signal Processing

Tutorial 1

► Title: “Introduction to Volterra series and applications to physical audio signal processing”

► Speaker: Thomas Hélie (CNRS-IRCAM, France)

Abstract: Many audio systems and physical problems make some distortion appear as soon as the sound level increases. In this case, tools of linear signal processing are no longer suitable.

A Volterra series is an input-to-output representation which is adapted to dynamical systems including some analytic nonlinearities. It extends notions of linear filtering: the “impulse response” and the “transfer function” are generalized into multi-variate “convolution kernels” and “transfer kernels”, respectively. These kernels isolate and sort the linear, quadratic, cubic (etc) homogeneous contributions of the dynamics.
In this tutorial, Volterra series, their basic properties and their links with standard linear tools are presented.
A practical method to solve nonlinear differential problems by using Volterra series is proposed in two steps. It sequentially answers to the following questions:
(1) How to derive the transfer kernels for a given problem ?
(2) How to build a realization and a simulation from the transfer kernels ?
Then, applications on audio and acoustical problems are presented.
Finally, some computable results on convergence domains and guaranteed error bounds are given.

Bio: Thomas Hélie received the Dipl. Ing. degree from the Ecole Nationale Supérieure des Télécommunications de Bretagne (France) in 1997, two M.S. degrees (Acoustics, Signal Processing and Informatics applied to Music, Université Paris 6, and Automatic and Signal Processing, Univ. Paris-Sud) in 1999 and the PhD degree in Automatic and Signal Processing (Univ. Paris-Sud) in 2002. After a post-doctoral research in the Laboratory of NonLinear Systems at the Swiss Federal Institute of Lausanne in 2003 and a lecturer position (Univ. Paris-Sud) in 2004, he is, since 2004, searcher at the national research council in the Analysis/Synthesis Team at IRCAM-CNRS UMR 9912, Paris, France. His research topics include physics of musical instruments, physical modeling, nonlinear dynamical systems, and inversion processes.

Tutorial 2

► Title: “Music source separation”

► Speaker: Emmanuel Vincent (INRIA-Metiss, France)

Abstract: Source separation consists of extracting the signal produced by each sound source from a recording. It is a mainstream topic in music and audio processing, with applications ranging from speech enhancement and automatic speech recognition to 3D music upmixing and post-production. In this tutorial, I will present the sound cues which can be exploited for source separation and explain how they translate into three main paradigms: computational auditory scene analysis, probabilistic linear modeling and probabilistic variance modeling. I will give example algorithms for each paradigm, including the popular ICA and NMF algorithms, and illustrate the performance via a number of sound examples. Finally, I will show that the latter paradigm leads to a flexible audio source separation framework able to jointly exploit a wide range of prior information about the sources.

Bio: Emmanuel Vincent is a tenured Research Scientist with the French National Institute for Research in Computer Science and Control (INRIA, Rennes, France). Prior to that he received the Ph.D. degree in music signal processing from IRCAM (Paris, France) in 2004 and worked as a Research Assistant with the Centre for Digital Music at Queen Mary, University of London (London, U.K.) from 2004 to 2006. His research focuses on probabilistic machine learning for speech and music source separation, remixing, information retrieval and coding. He is the founding chair of the annual Signal Separation Evaluation Campaign (SiSEC) and a member of the steering committee of the International Conference on Latent Variable Analysis and Signal Separation.

Tutorial 3

► Title: “Applications of Information Geometry to Audio Signal Processing”

► Speakers: Arshia Cont, Arnaud Dessein (Ircam)

Abstract: In this tutorial, we present some applications of information geometry to audio signal processing. In general terms, information geometry is a field of mathematics that studies the notions of probability and of information by the way of differential geometry. This provides a comprehensive framework that allows to quantify, process and represent the information contained in audio signals. We focus on the computational aspects of information geometry, and discuss generic tools to deal with exponential families which encompass most of the distributions commonly used in statistical learning. Moreover, exponential families possess a canonical dually flat geometry which generalizes the standard self-dual Euclidean geometry, with two dual Bregman divergences instead of the self-dual Euclidean distance, as well as dual geodesics, a generalized Pythagorean theorem and dual projections. We demonstrate a Matlab toolbox implementing several machine learning algorithms that have been recently generalized to these geometries, such as centroid computation and hard clustering (k-means), parameter estimation and soft clustering (expectation-maximization), proximity queries in ball trees (nearest-neighbors search, range search). We show some applications to audio processing, in particular to segmentation into quasi-stationary chunks that form consistent informative entities. These entities can then be treated as symbols for applications such as music similarity analysis, musical structure discovery, query by similarity, audio recombination by concatenative synthesis, and computer-assisted improvisation.

Bio: Arshia Cont received BS degrees in Electrical Engineering and Applied Mathematics from Virginia Tech, a Masters in Acoustics, Signal Proc. and Computer Science Applied to Music from University of Paris 6, and a joint PhD from University of California at San Diego and Ircam-Centre Pompidou in Paris. He is currently a researcher at Ircam-Centre Pompidou and also serves as the scientific liaison on several electronic arts projects at Ircam-Centre Pompidou in Paris with premieres in concert halls world-wide featuring applications of his research on realtime music processing systems.

Bio: Arnaud Dessein is currently pursuing a Ph.D. degree in Computer Music, Signal Processing and Applied Mathematics at Institut de Recherche et Coordination Acoustique/Musique, Université Pierre et Marie Curie, Paris, France. His research interests include information geometry, machine learning, and their applications in audio analysis and processing, with a particular focus on real-time systems for musical interactions. He obtained a degree of engineer in Acoustics, Micro/Nanotechnologies and Telecoms from Ecole Centrale de Lille, Lille, France in 2008, and a M.Sc. degree in Acoustics, Signal Processing and Computer Science Applied to Music from Institut de Recherche et Coordination Acoustique/Musique, Université Pierre et Marie Curie and Telecom ParisTech, Paris, France in 2009. He also spent two months as a visiting researcher at the Japanese-French Laboratory for Informatics, University of Tokyo, Tokyo, Japan in 2010.