Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website

Recent studies have shown that convolutional neural networks (CNNs) can boost the performance of audio steganalysis. In this paper, we propose a well-designed fully CNN architecture for MP3 steganalysis based on rich high-pass filtering (HPF). On the one hand, multi-type HPFs are employed for "residual" extraction to enlarge the traces of the signal in view of the truth that signal introduced by secret messages can be seen as high-pass frequency noise.

Categories:
79 Views

Moving platforms enable sparse arrays to assume higher degrees of freedom and lead to increased number of lags. In essence, array motion can fill the holes in the spatial autocorrelation lags associated with a fixed platform and, therefore, increase the number of sources detectable by the same number of physical array sensors. In this paper, we consider coprime arrays, and assume quasi-stationarity of the environment, where the source locations and waveforms are considered invariant over array motion of half wavelength.

Categories:
37 Views

Polar codes have drawn much attention and been adopted in 5G New Radio (NR) due to their capacity-achieving performance. Recently, as the emerging deep learning (DL) technique has breakthrough achievements in many fields, neural network decoder was proposed to obtain faster convergence and better performance than belief propagation (BP) decoding. However, neural networks are memory-intensive and hinder the deployment of DL in communication systems. In this work, a low-complexity recurrent neural network (RNN) polar decoder with codebook-based weight quantization is proposed.

Categories:
9 Views

Video Object Tracking -VOT- in realistic scenarios is a difficult task. Image factors such as occlusion, clutter, confusion, object shape, and zooming, among others, have an impact on video tracker methods performance. While these conditions do affect trackers performance, there is not a clear distinction between the scene content challenges like occlusion and clutter, against challenges due to distortions generated by capture, compression, processing, and transmission of videos. This paper is concerned with the latter interpretation of quality as it affects VOT performance.

Categories:
41 Views

The localization of acoustic sound sources is beneficial to signal processing applications of speech enhancement, dereverberation, separation and tracking. Difficulties in position estimation arise in real world environments due to coherent reflections degrading performance of subspace localization techniques. This paper proposes a method of multiple signal classification (MUSIC) subspace localization, which is suitable for reverberant rooms. The method is based on the modal decomposition of a room's region-to-region transfer function, which is assumed to be known.

Categories:
31 Views

Music has a powerful influence on a listener's emotions. In this paper, we represent lyrics and chords in a shared vector space using a phrase-aligned chord-and-lyrics corpus. We show that models that use these shared representations predict a listener's emotion while hearing musical passages better than models that do not use these representations. Additionally, we conduct a visual analysis of these learnt shared vector representations and explain how they support existing theories in music.

Categories:
62 Views

Active Noise Cancellation (ANC) is a well researched topic for minimizing unwanted acoustic noise, and spatial ANC is a recently introduced concept that focuses on continuous spatial regions. Adaptive filter designing for spatial ANC is often based on frequency-domain spherical harmonic decomposition method, which has a major limitation due to the increased system latency. In this paper, we develop a time-domain spherical harmonic based signal decomposition method and use it to develop two time-space domain feed-forward adaptive filters for spatial ANC.

Categories:
23 Views

This work proposes a new neural network framework to simultaneously rank multiple hypotheses generated by one or more automatic speech recognition (ASR) engines for a speech utterance. Features fed in the framework not only include those calculated from the ASR information, but also involve natural language understanding (NLU) related features, such as trigger features capturing long-distance constraints between word/slot pairs and BLSTM features representing intent-sensitive sentence embedding.

Categories:
81 Views

Despite a great success in learning representation for image data, it is challenging to learn the stochastic latent features from natural language based on variational inference. The difficulty in stochastic sequential learning is due to the posterior collapse caused by an autoregressive decoder which is prone to be too strong to learn sufficient latent information during optimization. To compensate this weakness in learning procedure, a sophisticated latent structure is required to assure good convergence so that random features are sufficiently captured for sequential decoding.

Categories:
21 Views

Despite the great advances, most of the recently developed automatic speech recognition systems focus on working in a server-client manner, and thus often require a high computational cost, such as the storage size and memory accesses. This, however, does not satisfy the increasing demand for a succinct model that can run smoothly in embedded devices like smartphones.

Categories:
23 Views

Pages