- Read more about Acquisition of Asynchronous Data and Parameter Estimation based on Double-Cross- Correlation Processor with Phase Transform (Demo at WASPAA 2021)
- 1 comment
- Log in to post comments
Coherent processing of signals captured by a wireless acoustic sensor network (WASN) requires an estimation of such parameters as the sampling-rate and sampling-time offset (SRO and STO). The acquired asynchronous signals of such WASN exhibit an accumulating time drift (ATD) linearly growing with time and dependent on SRO and STO values. In our demonstration, we present a real WASN based on Respberry-Pi computers, where SRO and ATD values are estimated by using a double-cross-correlation processor with phase transfrom (DXCP-PhaT) recently proposed.
- Categories:
- Read more about VADOI: VOICE-ACTIVITY-DETECTION OVERLAPPING INFERENCE FOR END-TO-END LONG-FORM SPEECH RECOGNITION
- Log in to post comments
While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when over- lapping percentage decreases. Setting aside computational cost, the setup with 50% overlapping during inference can achieve the best performance. However, a lower overlapping percentage has an advantage of fast inference speed.
VADOI Poster.pdf
- Categories:
- Read more about TEXT ADAPTIVE DETECTION FOR CUSTOMIZABLE KEYWORD SPOTTING
- 1 comment
- Log in to post comments
poster.pdf
- Categories:
- Read more about ATTENTIVE MAX FEATURE MAP AND JOINT TRAINING FOR ACOUSTIC SCENE CLASSIFICATION
- Log in to post comments
Various attention mechanisms are being widely applied to acoustic scene classification. However, we empirically found that the attention mechanism can excessively discard potentially valuable information, despite improving performance. We propose the attentive max feature map that combines two effective techniques, attention and a max feature map, to further elaborate the attention mechanism and mitigate the above-mentioned phenomenon. We also explore various joint training methods, including multi-task learning, that allocate additional abstract labels for each audio recording.
- Categories:
- Read more about UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
- Log in to post comments
- Categories:
- Read more about UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
- Log in to post comments
- Categories:
- Read more about SPEECH RECOVERY FOR REAL-WORLD SELF-POWERED INTERMITTENT DEVICES
- Log in to post comments
icassp-2022-poster.pdf
- Categories:
- Read more about A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer
- Log in to post comments
We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent variables of DNNs via a VB inference framework.
- Categories:
- Read more about EXPLAINING DEEP LEARNING MODELS FOR SPOOFING AND DEEPFAKE DETECTION WITH SHAPLEY ADDITIVE EXPLANATIONS
- Log in to post comments
ICASSP - poster.pdf
- Categories:
- Read more about A Method for Estimating the Grouping of Participants in Classroom Group Work Using Only Audio Information
- Log in to post comments
- Categories: