Sorry, you need to enable JavaScript to visit this website.

Recent work on zero resource word discovery makes intensive use of audio fragment clustering to find repeating speech patterns. In the absence of acoustic models, the clustering step traditionally relies on dynamic time warping (DTW) to compare two samples and thus suffers from the known limitations of this technique. We propose a new sample comparison method, called 'similarity by terative classification', that exploits the modeling capacities of hidden Markov models (HMM) with no supervision.

Categories:
4 Views

When different parts of speech content such as voice-overs and narration are recorded in real-world environments with different acoustic properties and background noise, the difference in sound quality between the recordings is typically quite audible and therefore undesirable. We propose an algorithm to equalize multiple such speech recordings so that they sound like they were recorded in the same environment. As the timbral content of the speech and background noise typically differ considerably, a simple equalization matching results in a noticeable mismatch in the output signals.

Categories:
7 Views

Direction of arrival (DoA) estimation of high-resolution beams is critical for cell search and maintaining communication in millimeter wave (mmWave) cellular systems. All-digital solutions for DoA estimation, however desirable for their flexibility and performance, are impractical because of their use of high-speed, power hungry Analog-Digital Converters (ADCs) at each antenna element. In this paper we take a novel approach to formulate a fully digital DoA estimation solution.

Categories:
28 Views

Sound zones are two or more regions within a listening space where listeners are provided with personal audio. Acoustic contrast control (ACC) is a sound zoning method that maximizes the average squared sound pressure in one zone constrained to constant pressure in other zones. State-of-the-art time domain broadband acoustic contrast control (BACC) methods are designed for anechoic environments. These methods are not able to realize a flat frequency response in a limited frequency range within a reverberant environment.

Categories:
15 Views

In complex environments, the presence or absence of multipath signals not only depends on the relative motion between the GNSS receiver and navigation satellites, but also on the environment where the receiver is located. Thus it is difficult to use a specific propagation model to accurately capture the dynamics of multipath signal parameters when the GNSS receiver is moving in urban canyons or other severe obstructions. This paper introduces a statistical model for the line-of-sight and multipath signals received by a GNSS receiver.

Categories:
3 Views

Automated objective methods of audio evaluation are fast, cheap, and require little effort by the investigator. However, objective evaluation methods do not exist for the output of all audio processing algorithms, often have output that correlates poorly with human quality assessments, and require ground truth data in their calculation. Subjective human ratings of audio quality are the gold standard for many tasks, but are expensive, slow, and require a great deal of effort to recruit subjects and run listening tests.

Categories:
7 Views

This paper presents a feature learning approach for speaker identification that is based on nonnegative matrix factorisation. Recent studies have shown that with such models, the dictionary atoms can represent well the speaker identity. The approaches proposed so far focused only on speaker variability and not on session variability. However, this later point is a crucial aspect in the success of the I-vector approach that is now the state-of-the-art in speaker identification.

Categories:
4 Views

Pages