Audio Processing Systems

Acquisition of Asynchronous Data and Parameter Estimation based on Double-Cross- Correlation Processor with Phase Transform (Demo at WASPAA 2021)

Coherent processing of signals captured by a wireless acoustic sensor network (WASN) requires an estimation of such parameters as the sampling-rate and sampling-time offset (SRO and STO). The acquired asynchronous signals of such WASN exhibit an accumulating time drift (ATD) linearly growing with time and dependent on SRO and STO values. In our demonstration, we present a real WASN based on Respberry-Pi computers, where SRO and ATD values are estimated by using a double-cross-correlation processor with phase transfrom (DXCP-PhaT) recently proposed.

2021_WASPAA_DXCPPhaT_Demo_slides.pdf

2021_WASPAA_DXCPPhaT_Demo_slides.pdf (393)

Categories:: Sensor Array Processing
Audio Processing Systems

169 Views

VADOI: VOICE-ACTIVITY-DETECTION OVERLAPPING INFERENCE FOR END-TO-END LONG-FORM SPEECH RECOGNITION

While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when over- lapping percentage decreases. Setting aside computational cost, the setup with 50% overlapping during inference can achieve the best performance. However, a lower overlapping percentage has an advantage of fast inference speed.

VADOI Poster.pdf

Poster (291)

Categories:: Audio Processing Systems

29 Views

TEXT ADAPTIVE DETECTION FOR CUSTOMIZABLE KEYWORD SPOTTING

Read more about TEXT ADAPTIVE DETECTION FOR CUSTOMIZABLE KEYWORD SPOTTING
1 comment
Log in to post comments

poster.pdf

poster.pdf (299)

Categories:: Audio Processing Systems

53 Views

ATTENTIVE MAX FEATURE MAP AND JOINT TRAINING FOR ACOUSTIC SCENE CLASSIFICATION

Read more about ATTENTIVE MAX FEATURE MAP AND JOINT TRAINING FOR ACOUSTIC SCENE CLASSIFICATION
Log in to post comments

Various attention mechanisms are being widely applied to acoustic scene classification. However, we empirically found that the attention mechanism can excessively discard potentially valuable information, despite improving performance. We propose the attentive max feature map that combines two effective techniques, attention and a max feature map, to further elaborate the attention mechanism and mitigate the above-mentioned phenomenon. We also explore various joint training methods, including multi-task learning, that allocate additional abstract labels for each audio recording.

ICASSP2022_AMFM_poster_final.pdf

ICASSP2022_AMFM_poster_final.pdf (271)

Categories:: Audio Processing Systems

18 Views

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

UniSpeech-SAT_presentation.pdf

UniSpeech-SAT_slide (274)

Categories:: Audio Processing Systems

173 Views

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

UniSpeech-SAT_poster.pdf

UniSpeech-SAT_poster (298)

Categories:: Audio Processing Systems

125 Views

SPEECH RECOVERY FOR REAL-WORLD SELF-POWERED INTERMITTENT DEVICES

Read more about SPEECH RECOVERY FOR REAL-WORLD SELF-POWERED INTERMITTENT DEVICES
Log in to post comments

icassp-2022-poster.pdf

poster (236)

Categories:: Audio Processing Systems

16 Views

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent variables of DNNs via a VB inference framework.