Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

In this paper, we consider the design of deep neural networks augmented with multiple auxiliary classifiers departing from the main (backbone) network. These classifiers can be used to perform early-exit from the network at various layers, making them convenient for energy-constrained applications such as IoT, embedded devices, or Fog computing. However, designing an optimized early-exit strategy is a difficult task, generally requiring a large amount of manual fine-tuning.

Categories:
108 Views

Multiple-object tracking (MOT) and classification are core technologies for processing moving point clouds in radar or lidar applications. For accurate object classification, the one-to-one association relationship between the model of each objects' motion (trackers) and the observation sequences including auxiliary features (e.g., radar cross section) is important.

Categories:
64 Views

Multiple-object tracking (MOT) and classification are core technologies for processing moving point clouds in radar or lidar applications. For accurate object classification, the one-to-one association relationship between the model of each objects' motion (trackers) and the observation sequences including auxiliary features (e.g., radar cross section) is important.

Categories:
39 Views

Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a perceptual-driven approach for single image super-resolution that is able to produce photorealistic images. Despite the visual quality of these generated images, there is still room for improvement. In this fashion, the model is extended to further improve the perceptual quality of the images. We have designed a network architecture with a novel basic block to replace the one used by the original ESRGAN. Moreover, we introduce noise inputs to the generator network in order to exploit stochastic variation.

Categories:
233 Views

Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc. In this work, we propose a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection. First, we evaluated our model when the spoken phonemes were not given as input.

Categories:
33 Views

Convolutional Neural Networks (CNNs) currently represent the best tool for classification of image content. CNNs are trained in order to develop generalized expressions in form of unique features to distinguish different classes. During this process, one or more filter weights might develop the same or similar values. In this case, the redundant filters can be pruned without damaging accuracy.Unlike normal pruning methods, we investigate the possibility of replacing a full-sized convolutional neural network with an ensemble of its narrow versions.

Categories:
64 Views

Depression detection from speech continues to attract significant research attention but remains a major challenge, particularly when the speech is acquired from diverse smartphones in natural environments. Analysis methods based on vocal tract coordination have shown great promise in depression and cognitive impairment detection for quantifying relationships between features over time through eigenvalues of multi-scale cross-correlations.

Categories:
47 Views

Audio steganography is a technology that embeds messages into audio without raising any suspicion from hearing it. Current steganography methods are based on heuristic cost designs. In this work, we proposed a framework based on Generative Adversarial Network (GAN) to approach optimal embedding for audio steganography in the temporal domain. This is the first attempt to approach optimal embedding with GAN and automatically learn the embedding probability/cost for audio steganography.

Categories:
145 Views

Pages