Sorry, you need to enable JavaScript to visit this website.

Sound event localization and detection (SELD) is a task for the classification of sound events and the localization of direction of arrival (DoA) utilizing multichannel acoustic signals. Prior studies employ spectral and channel information as the embedding for temporal attention. However, this usage limits the deep neural network from extracting meaningful features from the spectral or spatial domains.

Categories:
17 Views

Existing frame-wise neural beamformers for speech extraction tasks can obtain promising performance in relatively high signal-to-noise ratio (SNR) scenarios using small microphone arrays, while they still suffer from performance degradation in relatively low SNR environments, e.g., SNR<-5 dB. As an attempt to solve this problem, this paper proposes an all-neural beamformer based on Kronecker product decomposition, denoted by NeuKP-BF, for large-scale microphone arrays.

Categories:
5 Views

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec.

Categories:
55 Views

The coded aperture snapshot spectral imager (CASSI) system senses spatial and spectral information using a binary coded aperture and a dispersive element, thus the quality of reconstructed hyperspectral images is mainly determined by the structure of coded apertures. Traditional coded apertures (Random, Bernoulli, etc.), encoding hyperspectral images in focal array plane, suffer from suboptimal reconstruction accuracy. Therefore, optimizing coded aperture design improves the reconstruction quality for the scene.

Categories:
80 Views

We present an algorithm for the resolution of delayed and overlapping pulses of a common unknown shape from multi- channel measurements. We show that just a few Fourier sam- ples acquired by a Time Encoding Machine (TEM) suffice to solve this challenging problem. This acquisition scheme is desired for ultra-low power applications in wearables, such as EMG skin sensor tattoo.

Categories:
17 Views

Pages