Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world's largest and most comprehensive technical conference on signal processing and its applications. It provides a fantastic networking opportunity for like-minded professionals from around the world. ICASSP 2017 conference will feature world-class presentations by internationally renowned speakers and cutting-edge session topics. Visit ICASSP 2017

As part of an ongoing research into extracting mission-critical information from Search and Rescue speech communications, a corpus of unscripted, goal-oriented, two-party spoken conversations has been designed and collected. The Sheffield Search and Rescue (SSAR) corpus comprises about 12 hours of data from 96 conversations by 24 native speakers of British English with a southern accent. Each conversation is about a collaborative task of exploring and estimating a simulated indoor environment.

Categories:
4 Views

In frequency warping (FW)-based Voice Conversion (VC), the source spectrum is modified to match the frequency-axis of the target spectrum followed by an Amplitude Scaling (AS) to compensate the amplitude differences between the warped spectrum and the actual target spectrum. In this paper, we propose a novel AS technique which linearly transfers the amplitude of the frequency warped spectrum using the knowledge of a Gaussian Mixture Model (GMM)-based converted spectrum without adding any spurious peaks.

Categories:
11 Views

The success of Empirical Mode Decomposition (EMD) resides in its practical approach to dissect non-stationary data. EMD repetitively goes through the entire data span to iteratively extract Intrinsic Mode Functions (IMFs). This approach, however, is not suitable for data stream as the entire data set has to be reconsidered every time a new point is added. To overcome this, we propose Online EMD, an algorithm that extracts IMFs on the fly.

Categories:
74 Views

A novel speaker segmentation approach based on deep neural network is proposed and investigated. This approach uses deep speaker vectors (d-vectors) to represent speaker characteristics and to find speaker change points. The d-vector is a kind of frame-level speaker recognition feature, whose discriminative training process corresponds to the goal of discriminating a speaker change point from a single speaker speech segment in a short time window.

Categories:
11 Views

The growing demand for wireless connectivity has turned bandwidth into a scarce resource that has to be carefully managed and fairly distributed to users. However, the variability of the wireless channel can severely degrade the service received by each user. The Double Relay Communication Protocol (DRCP) is a transmission scheme that addresses these problems by exploiting spatial diversity to enhance the fairness of the system without requiring any additional infrastructure (i.e relay nodes or a backhaul connection).

Categories:
3 Views

The recently reported Wirtinger flow (WF) algorithm has been demonstrated as a promising method for solving the problem of phase retrieval by applying a gradient descent scheme. An empirical choice of stepsize is suggested in practice. However, this heuristic stepsize selection rule is not optimal. In order to accelerate the convergence rate, we propose an improved WF with optimal stepsize. It is revealed that this optimal stepsize is the solution of a univariate cubic equation with real-valued coefficients.

Categories:
25 Views

Pages