- Read more about Poster for ICASSP 2024 paper "Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion"
- Log in to post comments
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements.
- Categories:
- Read more about Poster for ICASSP 2024 paper "Hot-Fixing Wake Work Recognition for End-to-End ASR via Neural Model Reprogramming"
- Log in to post comments
This paper proposes two novel variants of neural reprogramming to enhance wake word recognition in streaming end-to-end ASR models without updating model weights. The first, "trigger-frame reprogramming", prepends the input speech feature sequence with the learned trigger-frames of the target wake word to adjust ASR model’s hidden states for improved wake word recognition. The second, "predictor-state initialization", trains only the initial state vectors (cell and hidden states) of the LSTMs in the prediction network.
- Categories:
- Read more about Improving Medical Dialogue Generation with Abstract Meaning Representations
- Log in to post comments
Medical Dialogue Generation plays a critical role in telemedicine by facilitating the dissemination of medical expertise to patients. Existing studies focus on incorporating textual representations, which have limited their ability to represent text semantics, such as ignoring important medical entities.
- Categories:
- Read more about Privacy Preserving Federated Learning from Multi-input Functional Proxy Re-encryption
- Log in to post comments
Federated learning (FL) allows different participants to collaborate on model training without transmitting raw data, thereby protecting user data privacy. However, FL faces a series of security and privacy issues (e.g. the leakage of raw data from publicly shared parameters). Several privacy protection technologies, such as homomorphic encryption, differential privacy and functional encryption, are introduced for privacy enhancement in FL. Among them, the FL frameworks based on functional encryption better balance security and performance, thus receiving increasing attention.
- Categories:
- Read more about Poster for IMAGE ATTRIBUTION BY GENERATING IMAGES
- Log in to post comments
We introduce GPNN-CAM, a novel method for CNN explanation, that bridges two distinct areas of computer vision:
Image Attribution, which aims to explain a predictor by highlighting image regions it finds important, and Single
Image Generation (SIG), that focuses on learning how to generate variations of a single sample. GPNN-CAM leverages samples generated by Generative
- Categories:
- Read more about A UNIFIED DNN-BASED SYSTEM FOR INDUSTRIAL PIPELINE SEGMENTATION
- Log in to post comments
This paper presents a unified system tailored for autonomous pipe segmentation within an industrial setting. To this end, it is designed to analyze RGB images captured by Unmanned Aerial Vehicle (UAV)-mounted cameras to predict binary pipe segmentation maps.
- Categories:
- Read more about AUDIO-VISUAL SPEECH RECOGNITION IN-THE-WILD: MULTI-ANGLE VEHICLE CABIN CORPUS AND ATTENTION-BASED METHOD
- Log in to post comments
Audio-Visual Speech Recognition In-The-Wild: Multi-Angle Vehicle Cabin Corpus And Attention-Based Method
- Categories:
- Read more about Lightning Talk- Situation-Aware Tranmit Beamforming for Automotive radar
- Log in to post comments
Millimeter-wave radar is a common sensor modality used in automotive driving for target detection and perception. These radars can benefit from side information on the environment being sensed, such as lane topologies or data from other sensors. Existing radars do not leverage this information to adapt waveforms or perform prior-aware inference. In this paper, we model the side information as an occupancy map and design transmit beamformers that are customized to the map. Our method maximizes the probability of detection in regions with a higher uncertainty on the presence of a target.
- Categories:
- Read more about DISCOVERING MALICIOUS SIGNATURES IN SOFTWARE FROM STRUCTURAL INTERACTIONS
- Log in to post comments
Malware represents a significant security concern in today's digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space.
However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed (``zero-day") malware and are limited by customized virtual machine (VM) environments.
To overcome these limitations, we propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science.
- Categories:
- Read more about MLPs Compass: What is Learned When MLPs are Combined with PLMs?
- Log in to post comments
While Transformer-based pre-trained language models and their variants exhibit strong semantic representation capabilities, the question of comprehending the information gain derived from the additional components of PLMs remains an open question in this field. Motivated by recent efforts that prove Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs), this paper aims to quantify whether simple MLPs can further enhance the already potent ability of PLMs to capture linguistic information.
poster-MLP.pdf
- Categories: