ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Domain_Robust_Deep_Embedding_Learning_for_Speaker_Recognition

Read more about Domain_Robust_Deep_Embedding_Learning_for_Speaker_Recognition
Log in to post comments

Slides_of_Domain_Robust_Deep_Embedding_Learning_for_Speaker_Recognition.pdf

Slides_of_Domain_Robust_Deep_Embedding_Learning_for_Speaker_Recognition.pdf (234)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

19 Views

Fine-Grained Dynamic Loss for Accurate Single-Image Super-Resolution

Read more about Fine-Grained Dynamic Loss for Accurate Single-Image Super-Resolution
Log in to post comments

6873.pdf

6873.pdf (195)

Categories:: Image/Video Processing

18 Views

A NON-CONVEX PROXIMAL APPROACH FOR CENTROID-BASED CLASSIFICATION

Read more about A NON-CONVEX PROXIMAL APPROACH FOR CENTROID-BASED CLASSIFICATION
Log in to post comments

slides_ICASSP_2022.pdf

slides (220)

Categories:: Signal and System Modeling, Representation and Estimation

17 Views

SPE-89.4: UNSUPERVISED DATA SELECTION FOR SPEECH RECOGNITION WITH CONTRASTIVE LOSS RATIOS

This paper proposes an unsupervised data selection method by using a submodular function based on contrastive loss ratios of target and training data sets. A model using a contrastive loss function is trained on both sets. Then the ratio of frame-level losses for each model is used by a submodular function. By using the submodular function, a training set for automatic speech recognition matching the target data set is selected.

park_ICASSP2022_in_person_poster.pdf

Poster for the in-person conference of ICASSP 2022 (241)

Presentation slides for the in-person conference of ICASSP 2022.pdf

Presentation slides for the in-person conference of ICASSP 2022 (296)

Categories:: General Topics in Speech Recognition (SPE-GASR)

41 Views

Constant Q Cepstral Coefficients for Normal vs. Pathological Infant Cry

Read more about Constant Q Cepstral Coefficients for Normal vs. Pathological Infant Cry
Log in to post comments

icassp_2022_infant_cry_ppt.pdf

classification approach for normal vs. pathological infant cry (186)

Categories:: Speech Analysis (SPE-ANLS)

27 Views

DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH

Read more about DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH
Log in to post comments

In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability.
Several factors have facilitated the growing deepfake threat.
On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material.

ICASSP_2022_poster.pdf

ICASSP_2022_poster.pdf (308)

Categories:: Multimedia Forensics

38 Views

Grassmannian Dimensionality Reduction Using Triplet Margin Loss for UME Classification of 3D Point Clouds

ICASSP2022_presentation_TLGDRUME.pdf

Grassmannian Dimensionality Reduction Using Triplet Margin Loss for UME Classification of 3D Point Clouds Presentation pdf file (188)

Categories:: Other applications of machine learning (MLR-APPL)

14 Views

Group-wise Feature Selection for Supervised Learning

Read more about Group-wise Feature Selection for Supervised Learning
Log in to post comments

Feature selection has been explored in two ways, global feature selection and instance-wise feature selection. Global feature selection picks the same feature selector for the entire dataset, while instance-wise feature selection allows different feature selectors for different data instances. We propose group-wise feature selection, a new setting that sits between global and instance-wise feature selections.

ICASSP_poster.pdf

ICASSP_poster.pdf (201)

Categories:: Pattern recognition and classification (MLR-PATT)

22 Views

VADOI: VOICE-ACTIVITY-DETECTION OVERLAPPING INFERENCE FOR END-TO-END LONG-FORM SPEECH RECOGNITION

While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when over- lapping percentage decreases. Setting aside computational cost, the setup with 50% overlapping during inference can achieve the best performance. However, a lower overlapping percentage has an advantage of fast inference speed.