IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

LEVERAGING EFFECTIVE LANGUAGE AND SPEAKER CONDITIONING IN INDIC TTS FOR LIMMITS 2024 CHALLENGE

In this paper, we explain the model that was developed by the NLP\_POSTECH team for the LIMMITS 2024 Grand Challenge. Among the three tracks, we focus on Track 1, which necessitates the creation of a few-shot text-to-speech (TTS) system that generates natural speech across diverse languages. Towards this end, to realize multi-lingual capability, we incorporate a learnable language embedding. In addition, for precise imitation of target speaker voices, we leverage an inductive speaker bias conditioning methodology.

ICASSP 2024.pptx.pdf

ICASSP 2024.pptx.pdf (347)

Categories:: Speech Processing

116 Views

GENERATING PERSONA-AWARE EMPATHETIC RESPONSES WITH RETRIEVAL-AUGMENTED PROMPT LEARNING

Read more about GENERATING PERSONA-AWARE EMPATHETIC RESPONSES WITH RETRIEVAL-AUGMENTED PROMPT LEARNING
2 comments
Log in to post comments

Empathetic response generation requires perceiving and un- derstanding the user’s emotion to deliver a suitable response. However, existing models generally remain oblivious of an interlocutor’s persona, which has been shown to play a vital role in expressing appropriate empathy to different users. To address this problem, we propose a novel Transformer-based architecture that incorporates retrieval-augmented prompt learning to generate persona-aware empathetic responses.

ICASSPslide_llwang.pptx

ICASSPslide_llwang.pptx (258)

Categories:: Other

47 Views

TEN-GUARD: TENSOR DECOMPOSITION FOR BACKDOOR ATTACK DETECTION IN DEEP NEURAL NETWORKS

Read more about TEN-GUARD: TENSOR DECOMPOSITION FOR BACKDOOR ATTACK DETECTION IN DEEP NEURAL NETWORKS
Log in to post comments

As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into re-
search and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain
provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where
small changes to an input (triggers) can cause the model toproduce incorrect outputs (e.g., to misclassify). This paper

ICASSP_Poster_Khondoker.pdf

ICASSP_Poster_Khondoker.pdf (234)

Categories:: Machine Learning for Signal Processing

31 Views

Enhanced Axle-Based Vehicle Classification Using Angle-Based Micro-Doppler Signature

Read more about Enhanced Axle-Based Vehicle Classification Using Angle-Based Micro-Doppler Signature
Log in to post comments

This study introduces an angle-based micro-Doppler analysis using Frequency Modulated Continuous Wave (FMCW) radar tailored for axle-based vehicle classification. The novel approach exploits the signal angle of arrival to separate incoming signals and noise from distinct targets. This is done by analysing the phase difference of a dual antenna radar system based on the time-frequency representation of the radar beat signal. Vehicles driving side by side can now be discriminated. Multipath signals and clutter are more easily identified and filtered out.

Poster_ICASSP_A0_PDF.pdf

Poster_ICASSP_A0_PDF.pdf (259)

Categories:: Other

16 Views

Vision Transformer MST++: Efficient Hyperspectral Skin Reconstruction

Read more about Vision Transformer MST++: Efficient Hyperspectral Skin Reconstruction
Log in to post comments

Channel reconstruction transforms a subsampled mutispectral image into hyperspectral, offering hyperspectral imaging benefits without a dedicated camera. MST++ is a
state of the art channel reconstruction technique, but it faces memory limitations for high spatial resolution images. In this context, we introduce VITMST++, a novel architecture in-
corporating Vision Transformer embedding and compression, multi-resolution image context and a channel-weighted loss. Developed for the ICASSP 2024 Hyperspectral Skin Chal-

ICASSP_VITMST++_final.pdf

ICASSP_VITMST++_final.pdf (269)

Categories:: Other applications of machine learning (MLR-APPL)

100 Views

KEEP_KNOWLEDGE_IN_PERCEPTION_slides

Read more about KEEP_KNOWLEDGE_IN_PERCEPTION_slides
Log in to post comments

This is the ppt of our paper: KEEP KNOWLEDGE IN PERCEPTION: ZERO-SHOT IMAGE AESTHETIC ASSESSMENT, in ICASSP 2024.

KEEP KNOWLEDGE IN PERCEPTION.pptx

KEEP KNOWLEDGE IN PERCEPTION.pptx (249)

Categories:: Quality Assessment

28 Views

CO-OCCURRENCE GRAPH-ENHANCED HIERARCHICAL PREDICTION OF ICD CODES

Read more about CO-OCCURRENCE GRAPH-ENHANCED HIERARCHICAL PREDICTION OF ICD CODES
Log in to post comments

Recent healthcare applications of natural language processing involve multi-label classification of health records using the International Classification of Diseases (ICD). While prior research highlights intricate text models and explores external knowledge like hierarchical ICD ontology, fewer studies integrate code relationships from whole datasets to enhance ICD coding accuracy. This study presents a modular approach, sequentially combining graph-based integration of ICD code co-occurrence with a hard-coded hierarchical enriched text representation drawn from the ICD ontology.

Poster_for_ICASSP_2024__CO_OCCURRENCE_GRAPH_ENHANCED_HIERARCHICAL_PREDICTION_OF_ICD_CODES_final.pdf

Poster_for_ICASSP_2024__CO_OCCURRENCE_GRAPH_ENHANCED_HIERARCHICAL_PREDICTION_OF_ICD_CODES_final.pdf (309)

Categories:: Other

19 Views

Enabling Device Control Planning Capabilities of Small Language Model

Read more about Enabling Device Control Planning Capabilities of Small Language Model
Log in to post comments

Smart home device control is a difficult task if the instruction is abstract and the planner needs to adjust dynamic home configurations. With the increasing capability of Large Language Model (LLM), they have become the customary model for zero-shot planning tasks similar to smart home device control. Although cloud supported large language models can seamlessly do device control tasks, on-device small language models show limited capabilities. In this work, we show how we can leverage large language models to enable small language models for device control task.

icassp_sudipta.pptx

icassp_sudipta.pptx (277)

Categories:: Other

21 Views

ELECTROENCEPHALOGRAM SENSOR DATA COMPRESSION USING AN ASYMMETRICAL SPARSE AUTOENCODER WITH A DISCRETE COSINE TRANSFORM LAYER

Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity.

Presentation_EEG_Compression.pptx

EEG data compression, autoencoder, DCT layer (354)

Categories:: Bio Imaging and Signal Processing

26 Views

dklement_dvbx_slides

Read more about dklement_dvbx_slides
Log in to post comments

Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to estimate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discriminative training, which directly optimizes a predefined loss.

DVBx-slides_fin.pdf

DVBx-slides_fin.pdf (744)

Categories:: Other

24 Views

IEEE ICASSP 2024

Pages