ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.
Scene text detection has received attention for years and achieved an impressive performance across various benchmarks. In this work, we propose an efficient and accurate approach to detect multi-oriented text in scene images. The proposed feature fusion mechanism allows us to use a shallower network to reduce the computational complexity. A self-attention mechanism is adopted to suppress false positive detections.
- Categories:
High-dimensional data structures, known as tensors, are fundamental in many applications, including multispectral imaging and color video processing. Compression of such huge amount of multidimensional data collected over time is of paramount importance, necessitating the process of quantization of measurements into discrete values. Furthermore, noise and issues related to the acquisition and transmission of signals frequently lead to unobserved, lost or corrupted measurements.
- Categories:
- Read more about Presentation: Trapezoidal Segment Sequencing: A Novel Approach for Fusion of Human-produced Continuous Annotations
- Log in to post comments
Generating accurate ground truth representations of human subjective experiences and judgements is essential for advancing our understanding of human-centered constructs such as emotions. Often, this requires the collection and fusion of annotations from several people where each one is subject to valuation disagreements, distraction artifacts, and other error sources.
- Categories:
- Read more about DEEP MATRIX COMPLETION ON GRAPHS: APPLICATION IN DRUG TARGET INTERACTION PREDICTION
- Log in to post comments
- Categories:
- Read more about Reconstruction of FRI Signals Using Deep Neural Network Approaches
- Log in to post comments
Finite Rate of Innovation (FRI) theory considers sampling and reconstruction of classes of non-bandlimited continuous signals that have a small number of free parameters, such as a stream of Diracs. The task of reconstructing FRI signals from discrete samples is often transformed into a spectral estimation problem and solved using Prony's method and matrix pencil method which involve estimating signal subspaces. They achieve an optimal performance given by the Cramér-Rao bound yet break down at a certain peak signal-to-noise ratio (PSNR).
- Categories:
- Read more about Generalized Coherence-based Signal Enhancement
- Log in to post comments
This contribution presents a novel approach for coherence-based signal enhancement. An estimator for the coherent-to-diffuse ratio (CDR) is devised, which exploits the concept of generalized magnitude coherence and thus, unlike common state-of-the-art schemes, can simultaneously take advantage of more than two microphones. Moreover, the speech enhancement by CDR-based spectral weighting is not performed as a post-filtering step, but by enhancing the most appropriate microphone signal.
- Categories:
A Turing machine is a model describing the fundamental limits of any realizable computer, digital signal processor (DSP), or field programmable gate array (FPGA). This paper shows that there exist very simple linear time-invariant (LTI) systems which can not be simulated on a Turing machine. In particular, this paper considers the linear system described by the voltage-current relation of an ideal capacitor. For this system, it is shown that there exist continuously differentiable and computable input signals such that the output signal is a continuous function which is not computable.
- Categories:
- Read more about WHAT IS BEST FOR SPOKEN LANGUAGE UNDERSTANDING: SMALL BUT TASK-DEPENDANT EMBEDDINGS OR HUGE BUT OUT-OF-DOMAIN EMBEDDINGS?
- Log in to post comments
Word embeddings are shown to be a great asset for several Natural Language and Speech Processing tasks. While they are already evaluated on various NLP tasks, their evaluation on spoken or natural language understanding (SLU) is less studied. The goal of this study is two-fold: firstly, it focuses on semantic evaluation of common word embeddings approaches for SLU task; secondly, it investigates the use of two different data sets to train the embeddings: small and task-dependent corpus or huge and out-of-domain corpus.
- Categories:
- Read more about Mutual-Information-Based Sensor Placement for Spatial Sound Field Recording
- Log in to post comments
A sensor (microphone) placement method based on mutual information for spatial sound field recording is proposed. The sound field recording methods using distributed sensors enable the estimation of the sound field inside a target region of arbitrary shape; however, it is a difficult task to find the best placement of sensors. We focus on the mutual-information-based sensor placement method in which spatial phenomena are modeled as a Gaussian process (GP).
- Categories: