ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

TED TALK TEASER GENERATION WITH PRE-TRAINED MODELS

Read more about TED TALK TEASER GENERATION WITH PRE-TRAINED MODELS
Log in to post comments

While we have seen significant advances in automatic summarization for text, research on speech summarization is still limited. In this work, we address the challenge of automatically generating teasers for TED talks. In the first step, we create a corpus for automatic summarization of TED and TEDx talks consisting of the talks' recording, their transcripts and their descriptions. The corpus is used to build a speech summarization system for the task. We adapt and combine pre-trained models for automatic speech recognition (ASR) and text summarization using the collected data.

vico.pdf

vico.pdf (280)

Categories:: Spoken Language Understanding (SLP-UNDE)
Other

16 Views

LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION

Read more about LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
Log in to post comments

poster.pdf

poster.pdf (244)

Categories:: Other

27 Views

4D Convolutional Neural Networks for Multi Spectral and Multi Temporal Remote Sensing Data Classification

Multi-temporal remotely sensed observations acquired by multi-spectral sensors contain a wealth of information related to the Earth’s state. Deep learning methods have demonstrated a great potential in analyzing such observations. Traditional 2D and 3D approaches are unable to effectively extract valuable information encoded across all available dimensions.

ICASSP_2022_Presentation_Giannopoulos.pdf

ICASSP_2022_Presentation_Giannopoulos.pdf (224)

ICASSP_2022_Poster_Giannopoulos.pdf

ICASSP_2022_Poster_Giannopoulos.pdf (415)

Categories:: Neural network learning (MLR-NNLR)

31 Views

A Novel Sequential Monte Carlo Framework For Predicting Ambiguous Emotion State (poster)

icassp2022_poster_ready.pdf

icassp2022_poster_ready.pdf (321)

Categories:: Other

29 Views

A Novel Sequential Monte Carlo Framework For Predicting Ambiguous Emotion State (presentation slides)

icassp_oral_pre_new_final2.pdf

icassp_oral_pre_new_final2.pdf (248)

Categories:: Other

22 Views

ICASSP 2022 L3DAS22 CHALLENGE: ENSEMBLE OF RESNET-CONFORMERS WITH AMBISONICS DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION

poster_paper9336_A0.pdf

poster_paper9336_A0.pdf (310)

Categories:: Spatial and Multichannel Audio
Content-Based Audio Processing

34 Views

Conformer-based Hybrid ASR System for Switchboard Dataset

Read more about Conformer-based Hybrid ASR System for Switchboard Dataset
Log in to post comments

The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe. We study different training aspects and methods to improve worderror-rate as well as to increase training speed.

ICASSP-presentation-slides.pdf

ICASSP-presentation-slides.pdf (254)

ICASSP-paper-poster.pdf

ICASSP-paper-poster.pdf (832)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

22 Views

Towards Transferable Speech Emotion Representation: On Loss Functions For Cross-Lingual Latent Representations

In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques which provide transfer learning possibilities. However, generalizing over languages, corpora and recording conditions is still an open challenge. In this work we address this gap by exploring loss functions that aid in transferability, specifically to non-tonal languages.

icassp_2022_presentation.pdf

icassp_2022_presentation.pdf (303)

Categories:: Speech Processing

31 Views

AN ERROR CORRECTION SCHEME FOR IMPROVED AIR-TISSUE BOUNDARY IN REAL-TIME MRI VIDEO FOR SPEECH PRODUCTION

The best performance in Air-tissue boundary (ATB) segmentation of real-time Magnetic Resonance Imaging (rtMRI) videos in speech production is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. However, the evaluation of this model, as well as other ATB segmentation techniques reported in the literature, is done using Dynamic Time Warping (DTW) distance between the entire original and predicted contours. Such an evaluation measure may not capture local errors in the predicted contour.