Sorry, you need to enable JavaScript to visit this website.

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once (YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame.

Categories:
4 Views

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once (YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame.

Categories:
40 Views

Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity. Prior studies show that it is possible to disentangle emotional prosody using an encoder-decoder network conditioned on discrete representation, such as one-hot emotion labels. Such networks learn to remember a fixed set of emotional styles.

Categories:
33 Views

Computer Aided Diagnosis (CAD) systems are increasingly utilizing image analysis and Deep Learning (DL) techniques, due to their high accuracy in several medical imaging fields, including the detection of Acute Lymphoblastic (or Lymphocytic) Leukemia (ALL) from peripheral blood samples. However, no method in the literature has specifically analyzed the focus quality of ALL images or proposed a technique for sharpening the samples in an adaptive way for the purpose of classification.

Categories:
34 Views

Change detection plays a vital role in monitoring and analyzing temporal changes in Earth observation tasks. This paper proposes a novel adaptive multi-scale and multi-level features fusion network for change detection in very-high-resolution bi-temporal remote sensing images. The proposed approach has three advantages. Firstly, it excels in abstracting high-level representations empowered by a highly effective feature extraction module.

Categories:
19 Views

Online prediction for streaming time series data has practical use for many real-world applications where downstream decisions depend on accurate forecasts for the future. Deployment in dynamic environments requires models to adapt quickly to changing data distributions without overfitting. We propose POLA (Predicting Online by Learning rate Adaptation) to automatically regulate the learning rate of recurrent neural network models to adapt to changing time series patterns across time.

Categories:
9 Views

Pages