ICASSP 2018

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Joint License Plate Super-Resolution and Recognition in One Multi-Task GAN Framework

Read more about Joint License Plate Super-Resolution and Recognition in One Multi-Task GAN Framework
Log in to post comments

poster_ICASSP2018_MinghuiZhang.pdf

poster_ICASSP2018_MinghuiZhang.pdf (506)

Categories:: Image/Video Processing

46 Views

Edge-aware Context Encoder for Image Inpainting

Read more about Edge-aware Context Encoder for Image Inpainting
Log in to post comments

We present Edge-aware Context Encoder (E-CE): an image inpainting model which takes scene structure and context into account. Unlike previous CE which predicts the missing regions using context from entire image, E-CE learns to recover the texture according to edge structures, attempting to avoid context blending across boundaries. In our approach, edges are extracted from the masked image, and completed by a full-convolutional network. The completed edge map together with the original masked image are then input into the modified CE network to predict the missing region.

Poster_ICASSP2018_0413.pdf

Poster: Edge-aware Context Encoder for Image Inpainting (558)

Categories:: Image/Video Processing

260 Views

Attention-based Dialog State Tracking for Conversational Interview Coaching

Read more about Attention-based Dialog State Tracking for Conversational Interview Coaching
Log in to post comments

This study proposes an approach to dialog state tracking (DST) in a conversational interview coaching system. For the interview coaching task, the semantic slots, used mostly in traditional dialog systems, are difficult to define manually. This study adopts the topic profile of the response from the interviewee as the dialog state representation. In addition, as the response generally consists of several sentences, the summary vector obtained from a long short-term memory neural network (LSTM) is likely to contain noisy information from many irrelevant sentences.

ICASSP2018_Poster_20180410-3_Wu.pdf

ICASSP2018_Poster_20180410-3_Wu.pdf (602)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

36 Views

FINITE-ALPHABET NOMA FOR TWO-USER UPLINK CHANNEL

Read more about FINITE-ALPHABET NOMA FOR TWO-USER UPLINK CHANNEL
Log in to post comments

We consider the non-orthogonal multiple access (NOMA) design for a classical two-user multiple access channel (MAC) with finite-alphabet inputs. In contrast to the majority of existing NOMA schemes using continuous Gaussian distributed inputs, we consider practical quadrature amplitude modulation (QAM) constel- lations at both transmitters, whose sizes are not necessarily the same.

poster.pdf

poster.pdf (815)

Categories:: Signal Processing for Communications and Networking
Communications and Networking

22 Views

SOURCE AND DIRECTION OF ARRIVAL ESTIMATION BASED ON MAXIMUM LIKELIHOOD COMBINED WITH GMM AND EIGENANALYSIS

A method is proposed for estimating the source signal and its direction of arrival (DOA) in this paper. It is based on ML estimation of the transfer function between microphones combined with the EM algorithm for a Gaussian Mixture Model (GMM), assuming that the signal is captured at each microphone with delay corresponding to the traveling of sound and some decay. By this modeling, search for the maximum log-likelihood in the ML estimation can be realized simply by eigenvalue decomposition of a properly designed matrix.

ICASSP2018_B0.pdf

Poster (487)

Categories:: Sensor Array Processing

32 Views

Insense: Incoherent Sensor Selection for Sparse Signals

Read more about Insense: Incoherent Sensor Selection for Sparse Signals
Log in to post comments

Sensor selection refers to the problem of intelligently selecting a small subset of a collection of available sensors to reduce the sensing cost while preserving signal acquisition performance. The majority of sensor selection algorithms find the subset of sensors that best recovers an arbitrary signal from a number of linear measurements that is larger than the dimension of the signal.

ICASSP_Insense_Poster.pdf

ICASSP_Insense_Poster.pdf (577)

Categories:: Nonlinear Systems and Signal Processing

3 Views

Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition

Read more about Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition
Log in to post comments

We propose a spatiotemporal attention based deep neural networks for dimensional emotion recognition in facial videos. To learn the spatiotemporal attention that selectively focuses on emotional sailient parts within facial videos, we formulate the spatiotemporal encoder-decoder network using Convolutional LSTM (ConvLSTM)modules, which can be learned implicitly without any pixel-level annotations. By leveraging the spatiotemporal attention, we also formulate the 3D convolutional neural networks (3D-CNNs) to robustly recognize the dimensional emotion in facial videos.

2018_ICASSP_jylee.pdf

2018_ICASSP_jylee.pdf (447)

ICASSP18_camera_ready.pdf

ICASSP18_camera_ready.pdf (531)

Categories:: Image/Video Processing

83 Views

SPARSE DISPARITY ESTIMATION USING GLOBAL PHASE ONLY CORRELATION FOR STEREO MATCHING ACCELERATION

In this study, we propose an efficient stereo matching method which estimates sparse disparities using global phase only correlation (POC). Conventionally, cost functions are to be calculated for all disparity candidates and the associated computational cost has been impediment in achieving a real-time performance. Therefore, we consider to use fullimage 2D phase only correlation (FIPOC) for detecting the valid disparity candidates. This will require comparatively fewer calculations for the same number of disparity.

poster_ICASSP_1.00.pdf

Shimada_4399 (459)

Categories:: Image/Video Processing

45 Views

End-to-End Multimodal Speech Recognition

Read more about End-to-End Multimodal Speech Recognition
Log in to post comments

Transcription or sub-titling of open-domain videos is still a chal- lenging domain for Automatic Speech Recognition (ASR) due to the data’s challenging acoustics, variable signal processing and the essentially unrestricted domain of the data. In previous work, we have shown that the visual channel – specifically object and scene features – can help to adapt the acoustic model (AM) and language model (LM) of a recognizer, and we are now expanding this work to end-to-end approaches.

icassp-poster-end.pdf

icassp-poster-end.pdf (620)

Categories:: General Topics in Speech Recognition (SPE-GASR)

16 Views

Multi-Task Autoencoder For Noise-Robust Speech Recognition

Read more about Multi-Task Autoencoder For Noise-Robust Speech Recognition
Log in to post comments

For speech recognition in noisy environments, we propose a multi-task autoencoder which estimates not only clean speech but also noise from noisy speech. We introduce the deSpeeching autoencoder, which excludes speech signals from noisy speech, and combines it with the conventional denoising autoencoder to form a unified multi-task autoencoder (MTAE). We evaluate it using the Aurora 2 data set and 6-hour noise data set collected by ourselves. It reduced WER by 15.7% from the conventional denoising autoencoder in the Aurora 2 test set A.

haoy-icassp18.pdf

haoy-icassp18.pdf (549)

Categories:: Robust Speech Recognition (SPE-ROBU)

125 Views

Pages