ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

SCORE DIFFICULTY ANALYSIS FOR PIANO PERFORMANCE EDUCATION BASED ON FINGERING

Read more about SCORE DIFFICULTY ANALYSIS FOR PIANO PERFORMANCE EDUCATION BASED ON FINGERING
Log in to post comments

In this paper, we introduce score difficulty classification as a subtask of music information retrieval (MIR), which may be used in music education technologies, for personalised curriculum generation, and score retrieval. We introduce a novel dataset for our task, Mikrokosmos-difficulty, containing 147 piano pieces in symbolic representation and the corresponding difficulty labels derived by its composer Béla Bartók and the publishers. As part of our methodology, we propose piano technique feature representations based on different piano fingering algorithms.

ICASSP_final_poster.pdf

Poster (241)

Categories:: Music Signal Processing

39 Views

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Read more about Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Log in to post comments

Recent advances in cross-lingual text-to-speech (TTS) made it possible to synthesize speech in a language foreign to a monolingual speaker. However, there is still a large gap between the pronunciation of generated cross-lingual speech and that of native speakers in terms of naturalness and intelligibility. In this paper, a triplet training scheme is proposed to enhance the cross-lingual pronunciation by allowing previously unseen content and speaker combinations to be seen during training.

icassp2022_slides_ye_v1.pptx

icassp2022_slides_ye_v1.pptx (342)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

19 Views

JOINT SOURCE LOCALIZATION AND ASSOCIATION THROUGH OVERCOMPLETE REPRESENTATION UNDER MULTIPATH PROPAGATION ENVIRONMENT

ICASSP2022_Poster4992.pdf

posters (279)

Categories:: Applications of Sensor Array and Multi-channel Signal Processing

9 Views

VISION TRANSFORMER EQUIPPED WITH NEURAL RESIZER ON FACIAL EXPRESSION RECOGNITION TASK

Read more about VISION TRANSFORMER EQUIPPED WITH NEURAL RESIZER ON FACIAL EXPRESSION RECOGNITION TASK
Log in to post comments

When it comes to wild conditions, Facial Expression Recognition is often challenged with low-quality data and imbalanced, ambiguous labels. This field has much benefited from CNN based approaches; however, CNN models have structural limitations to see the facial regions in distance. As a remedy, Transformer has been introduced to vision fields with a global receptive field but requires adjusting input spatial size to the pretrained models to enjoy its strong inductive bias at hands.

1683-1.pdf

1683-1.pdf (237)

Categories:: Image/Video Processing

11 Views

EmotionFlow: Capture the Dialogue Level Emotion Transitions

Read more about EmotionFlow: Capture the Dialogue Level Emotion Transitions
Log in to post comments

Emotion recognition in conversations (ERC) has attracted increasing interests in recent years, due to its wide range of applications, such as customer service analysis, health-care consultation, etc. One key challenge of ERC is that users' emotions would change due to the impact of others' emotions. That is, the emotions within the conversation can spread among the communication participants. However, the spread impact of emotions in a conversation is rarely addressed in existing researches.

ICASSP2022slides (1).pdf

ICASSP2022slides (1).pdf (375)

Categories:: Spoken Language Understanding (SLP-UNDE)

25 Views

L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

Read more about L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment
Log in to post comments

The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition1. We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points and adding constrains that improve the baseline model’s efficiency and overcome the major difficulties encountered by the participants of the previous challenge.

ICASSP2022_Presentation.pdf

Presentation Slides (407)

Categories:: Applications in Music and Audio Processing (MLR-MUSI)

11 Views

Applying Differential Privacy to Tensor Completion

Read more about Applying Differential Privacy to Tensor Completion
Log in to post comments

poster#1362.pdf

poster#1362.pdf (183)

Categories:: Statistical Signal Processing

11 Views

Applying Differential Privacy to Tensor Completion

Read more about Applying Differential Privacy to Tensor Completion
Log in to post comments

paper#1362.pdf

paper#1362.pdf (205)

Categories:: Statistical Signal Processing

13 Views

SEISMIC FAULT IDENTIFICATION USING GRAPH HIGH-FREQUENCY COMPONENTS AS INPUT TO GRAPH CONVOLUTIONAL NETWORK

Many activities such as drilling and exploration in the oil and gas industries rely on identifying seismic faults. Using graph high-frequency components as inputs to a graph convolutional network, we propose a method for detecting faults in seismic data. In Graph Signal Processing (GSP), digital signal processing (DSP) concepts are mapped to define the processing techniques for signals on graphs. As a first step, we extract patches of the seismic data centered around the points of concern. Each patch is then represented in a graph domain, with the seismic amplitudes as the graph signals.

ICASSP_2022.pptx

ICASSP_2022.pptx (229)

Categories:: Signal and System Modeling, Representation and Estimation

14 Views

THE VICOMTECH AUDIO DEEPFAKE DETECTION SYSTEM BASED ON WAV2VEC2 FOR THE 2022 ADD CHALLENGE

This paper describes our submitted systems to the 2022 ADD challenge withing the tracks 1 and 2. Our approach is based on the combination of a pre-trained wav2vec2 feature extractor and a downstream classifier to detect spoofed audio. This method exploits the contextualized speech representations at the different transformer layers to fully capture discriminative information. Furthermore, the classification model is adapted to the application scenario using different data augmentation techniques.

poster_CHAL_5_6.pdf

Poster (280)

presentacion_ICASSP_22.pdf

Presentation (420)

Categories:: Speech Processing

66 Views

Pages