IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.
- Read more about LEVERAGING EFFECTIVE LANGUAGE AND SPEAKER CONDITIONING IN INDIC TTS FOR LIMMITS 2024 CHALLENGE
- Log in to post comments
In this paper, we explain the model that was developed by the NLP\_POSTECH team for the LIMMITS 2024 Grand Challenge. Among the three tracks, we focus on Track 1, which necessitates the creation of a few-shot text-to-speech (TTS) system that generates natural speech across diverse languages. Towards this end, to realize multi-lingual capability, we incorporate a learnable language embedding. In addition, for precise imitation of target speaker voices, we leverage an inductive speaker bias conditioning methodology.
- Categories:
- Read more about GENERATING PERSONA-AWARE EMPATHETIC RESPONSES WITH RETRIEVAL-AUGMENTED PROMPT LEARNING
- 2 comments
- Log in to post comments
Empathetic response generation requires perceiving and un- derstanding the user’s emotion to deliver a suitable response. However, existing models generally remain oblivious of an interlocutor’s persona, which has been shown to play a vital role in expressing appropriate empathy to different users. To address this problem, we propose a novel Transformer-based architecture that incorporates retrieval-augmented prompt learning to generate persona-aware empathetic responses.
- Categories:
- Read more about TEN-GUARD: TENSOR DECOMPOSITION FOR BACKDOOR ATTACK DETECTION IN DEEP NEURAL NETWORKS
- Log in to post comments
As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into re-
search and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain
provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where
small changes to an input (triggers) can cause the model toproduce incorrect outputs (e.g., to misclassify). This paper
- Categories:
- Read more about Enhanced Axle-Based Vehicle Classification Using Angle-Based Micro-Doppler Signature
- Log in to post comments
This study introduces an angle-based micro-Doppler analysis using Frequency Modulated Continuous Wave (FMCW) radar tailored for axle-based vehicle classification. The novel approach exploits the signal angle of arrival to separate incoming signals and noise from distinct targets. This is done by analysing the phase difference of a dual antenna radar system based on the time-frequency representation of the radar beat signal. Vehicles driving side by side can now be discriminated. Multipath signals and clutter are more easily identified and filtered out.
- Categories:
- Read more about Vision Transformer MST++: Efficient Hyperspectral Skin Reconstruction
- Log in to post comments
Channel reconstruction transforms a subsampled mutispectral image into hyperspectral, offering hyperspectral imaging benefits without a dedicated camera. MST++ is a
state of the art channel reconstruction technique, but it faces memory limitations for high spatial resolution images. In this context, we introduce VITMST++, a novel architecture in-
corporating Vision Transformer embedding and compression, multi-resolution image context and a channel-weighted loss. Developed for the ICASSP 2024 Hyperspectral Skin Chal-
- Categories:
- Read more about KEEP_KNOWLEDGE_IN_PERCEPTION_slides
- Log in to post comments
This is the ppt of our paper: KEEP KNOWLEDGE IN PERCEPTION: ZERO-SHOT IMAGE AESTHETIC ASSESSMENT, in ICASSP 2024.
- Categories:
- Read more about CO-OCCURRENCE GRAPH-ENHANCED HIERARCHICAL PREDICTION OF ICD CODES
- Log in to post comments
Recent healthcare applications of natural language processing involve multi-label classification of health records using the International Classification of Diseases (ICD). While prior research highlights intricate text models and explores external knowledge like hierarchical ICD ontology, fewer studies integrate code relationships from whole datasets to enhance ICD coding accuracy. This study presents a modular approach, sequentially combining graph-based integration of ICD code co-occurrence with a hard-coded hierarchical enriched text representation drawn from the ICD ontology.
- Categories:
- Read more about Enabling Device Control Planning Capabilities of Small Language Model
- Log in to post comments
Smart home device control is a difficult task if the instruction is abstract and the planner needs to adjust dynamic home configurations. With the increasing capability of Large Language Model (LLM), they have become the customary model for zero-shot planning tasks similar to smart home device control. Although cloud supported large language models can seamlessly do device control tasks, on-device small language models show limited capabilities. In this work, we show how we can leverage large language models to enable small language models for device control task.
- Categories:
- Read more about ELECTROENCEPHALOGRAM SENSOR DATA COMPRESSION USING AN ASYMMETRICAL SPARSE AUTOENCODER WITH A DISCRETE COSINE TRANSFORM LAYER
- Log in to post comments
Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity.
- Categories:
- Read more about dklement_dvbx_slides
- Log in to post comments
Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to estimate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discriminative training, which directly optimizes a predefined loss.
- Categories: