Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

In this paper, we explain the model that was developed by the NLP\_POSTECH team for the LIMMITS 2024 Grand Challenge. Among the three tracks, we focus on Track 1, which necessitates the creation of a few-shot text-to-speech (TTS) system that generates natural speech across diverse languages. Towards this end, to realize multi-lingual capability, we incorporate a learnable language embedding. In addition, for precise imitation of target speaker voices, we leverage an inductive speaker bias conditioning methodology.

Categories:
77 Views

Empathetic response generation requires perceiving and un- derstanding the user’s emotion to deliver a suitable response. However, existing models generally remain oblivious of an interlocutor’s persona, which has been shown to play a vital role in expressing appropriate empathy to different users. To address this problem, we propose a novel Transformer-based architecture that incorporates retrieval-augmented prompt learning to generate persona-aware empathetic responses.

Categories:
36 Views

As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into re-
search and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain
provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where
small changes to an input (triggers) can cause the model toproduce incorrect outputs (e.g., to misclassify). This paper

Categories:
17 Views

This study introduces an angle-based micro-Doppler analysis using Frequency Modulated Continuous Wave (FMCW) radar tailored for axle-based vehicle classification. The novel approach exploits the signal angle of arrival to separate incoming signals and noise from distinct targets. This is done by analysing the phase difference of a dual antenna radar system based on the time-frequency representation of the radar beat signal. Vehicles driving side by side can now be discriminated. Multipath signals and clutter are more easily identified and filtered out.

Categories:
11 Views

Channel reconstruction transforms a subsampled mutispectral image into hyperspectral, offering hyperspectral imaging benefits without a dedicated camera. MST++ is a
state of the art channel reconstruction technique, but it faces memory limitations for high spatial resolution images. In this context, we introduce VITMST++, a novel architecture in-
corporating Vision Transformer embedding and compression, multi-resolution image context and a channel-weighted loss. Developed for the ICASSP 2024 Hyperspectral Skin Chal-

Categories:
58 Views

This is the ppt of our paper: KEEP KNOWLEDGE IN PERCEPTION: ZERO-SHOT IMAGE AESTHETIC ASSESSMENT, in ICASSP 2024.

Categories:
17 Views

Recent healthcare applications of natural language processing involve multi-label classification of health records using the International Classification of Diseases (ICD). While prior research highlights intricate text models and explores external knowledge like hierarchical ICD ontology, fewer studies integrate code relationships from whole datasets to enhance ICD coding accuracy. This study presents a modular approach, sequentially combining graph-based integration of ICD code co-occurrence with a hard-coded hierarchical enriched text representation drawn from the ICD ontology.

Categories:
15 Views

Smart home device control is a difficult task if the instruction is abstract and the planner needs to adjust dynamic home configurations. With the increasing capability of Large Language Model (LLM), they have become the customary model for zero-shot planning tasks similar to smart home device control. Although cloud supported large language models can seamlessly do device control tasks, on-device small language models show limited capabilities. In this work, we show how we can leverage large language models to enable small language models for device control task.

Categories:
16 Views

Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity.

Categories:
16 Views

Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to estimate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discriminative training, which directly optimizes a predefined loss.

Categories:
15 Views

Pages