
___Although dated, this student thesis is re-published as the proposed negative feedback topology and the current mode arrangement of silicon bipolar junction transistors is rarely elaborated in the many excellent contemporary books on audio power amplifier design.
- Categories:

- Categories:

- Read more about Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
- Log in to post comments
Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as AudioSet. However, Transformers are demanding in terms of model size and computational requirements compared to CNNs. We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers.
- Categories:

- Read more about MASKED MODELING DUO: LEARNING REPRESENTATIONS BY ENCOURAGING BOTH NETWORKS TO MODEL THE INPUT
- Log in to post comments
Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it learns representations indirectly by reconstructing masked input patches. Several methods learn representations directly by predicting representations of masked patches; however, we think using all patches to encode training signal representations is suboptimal. We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.
- Categories:

- Read more about Acquisition of Asynchronous Data and Parameter Estimation based on Double-Cross- Correlation Processor with Phase Transform (Demo at WASPAA 2021)
- 1 comment
- Log in to post comments
Coherent processing of signals captured by a wireless acoustic sensor network (WASN) requires an estimation of such parameters as the sampling-rate and sampling-time offset (SRO and STO). The acquired asynchronous signals of such WASN exhibit an accumulating time drift (ATD) linearly growing with time and dependent on SRO and STO values. In our demonstration, we present a real WASN based on Respberry-Pi computers, where SRO and ATD values are estimated by using a double-cross-correlation processor with phase transfrom (DXCP-PhaT) recently proposed.
- Categories:

- Read more about VADOI: VOICE-ACTIVITY-DETECTION OVERLAPPING INFERENCE FOR END-TO-END LONG-FORM SPEECH RECOGNITION
- Log in to post comments
While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when over- lapping percentage decreases. Setting aside computational cost, the setup with 50% overlapping during inference can achieve the best performance. However, a lower overlapping percentage has an advantage of fast inference speed.
VADOI Poster.pdf

- Categories:

- Read more about TEXT ADAPTIVE DETECTION FOR CUSTOMIZABLE KEYWORD SPOTTING
- 1 comment
- Log in to post comments
poster.pdf

- Categories:

- Read more about ATTENTIVE MAX FEATURE MAP AND JOINT TRAINING FOR ACOUSTIC SCENE CLASSIFICATION
- Log in to post comments
Various attention mechanisms are being widely applied to acoustic scene classification. However, we empirically found that the attention mechanism can excessively discard potentially valuable information, despite improving performance. We propose the attentive max feature map that combines two effective techniques, attention and a max feature map, to further elaborate the attention mechanism and mitigate the above-mentioned phenomenon. We also explore various joint training methods, including multi-task learning, that allocate additional abstract labels for each audio recording.
- Categories:

- Read more about UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
- Log in to post comments
- Categories:

- Read more about UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
- Log in to post comments
- Categories: