Sorry, you need to enable JavaScript to visit this website.

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). The WASPAA meeting is a traditional event supported by the Audio and Acoustic Signal Processing Committee of the IEEE Signal Processing Society. The first WASPAA meeting was convened in 1986 and since 1989 it has been held every other year.

The mixing matrix of a Feedback Delay Network (FDN) reverberator is used to control the mixing time and echo density profile. In this work, we investigate the effect of the mixing matrix on the modes (poles) of the FDN with the goal of using this information to better design the various FDN parameters. We find the modal decomposition of delay network reverberators using a state space formulation, showing how modes of the system can be extracted by eigenvalue decomposition of the state transition matrix.


Objective metrics, such as the perceptual evaluation of speech quality (PESQ) have become standard measures for evaluating speech. These metrics enable efficient and costless evaluations, where ratings are often computed by comparing a degraded speech signal to its underlying clean reference signal. Reference-based metrics, however, cannot be used to evaluate real-world signals that have inaccessible references. This project develops a nonintrusive framework for evaluating the perceptual quality of noisy and enhanced speech.


Speech enhancement is important for applications such as telecommunications, hearing aids, automatic speech recognition and voice-controlled system. The enhancement algorithms aim to reduce interfering noise while minimizing any speech distortion. In this work for speech enhancement, we propose to use polynomial matrices in order to exploit the spatial, spectral as well as temporal correlations between the speech signals received by the microphone array.


We propose an analytical method of 2.5-dimensional exterior sound field reproduction by using a multipole loudspeaker array. The method reproduces the sound field modeled by expansion coefficients of spherical harmonics based on multipole superposition. We also present an analytical method for converting the expansion coefficients of spherical harmonics to weighting coefficients for multipole superposition.


This paper provides a 3D localized sound zone generation method using a planar omni-directional loudspeaker array. In the proposed method, multiple co-centered circular arrays are arranged on the horizontal plane and an additional loudspeaker is located at the array’s center. The sound field produced by this center loudspeaker is then cancelled using the multiple circular arrays. A localized 3D sound zone can thus be generated inside a sphere with a maximum radius of that of the circular arrays because the residual sound field is contained within the sphere.


Speaker segmentation is an essential part of any diarization system.Applications of diarization include tasks such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker environments.This paper proposes a multiple hypothesis tracking (MHT) method that exploits the harmonic structure associated with the pitch in voiced speech in order to segment the onsets and end-points of speech from multiple, overlapping speakers.


Audio processing methods operating on a time-frequency representation of the signal can introduce unpleasant sounding artifacts known as musical noise. These artifacts are observed in the context of audio coding, speech enhancement, and source separation. The change in kurtosis of the power spectrum introduced during the processing was shown to correlate with the human perception of musical noise in the context of speech enhancement, leading to the proposal of measures based on it. These baseline measures are here shown to correlate with human perception only in a limited manner.
