Sorry, you need to enable JavaScript to visit this website.

Multimodal signal processing

Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering


Multi-modality fusion technologies have greatly improved the performance of neural network-based Video Description/Caption, Visual Question Answering (VQA) and Audio Visual Scene-aware Di-alog (AVSD) over the recent years. Most previous approaches only explore the last layers of multiple layer feature fusion while omit-ting the importance of intermediate layers. To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Net-work (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

Paper Details

Authors:
Lei shi,Shijie Geng, Kai Shuang,Chiori Hori, Songxiang Liu, Peng Gao, Sen Su
Submitted On:
16 May 2020 - 1:27am
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

Icassp2020_Multi-Layer_Content_Interaction_Through_Quaternion_Product_for_Visual_Question_Answering

(32)

Subscribe

[1] Lei shi,Shijie Geng, Kai Shuang,Chiori Hori, Songxiang Liu, Peng Gao, Sen Su, "Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5370. Accessed: Sep. 25, 2020.
@article{5370-20,
url = {http://sigport.org/5370},
author = {Lei shi;Shijie Geng; Kai Shuang;Chiori Hori; Songxiang Liu; Peng Gao; Sen Su },
publisher = {IEEE SigPort},
title = {Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering},
year = {2020} }
TY - EJOUR
T1 - Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering
AU - Lei shi;Shijie Geng; Kai Shuang;Chiori Hori; Songxiang Liu; Peng Gao; Sen Su
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5370
ER -
Lei shi,Shijie Geng, Kai Shuang,Chiori Hori, Songxiang Liu, Peng Gao, Sen Su. (2020). Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering. IEEE SigPort. http://sigport.org/5370
Lei shi,Shijie Geng, Kai Shuang,Chiori Hori, Songxiang Liu, Peng Gao, Sen Su, 2020. Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering. Available at: http://sigport.org/5370.
Lei shi,Shijie Geng, Kai Shuang,Chiori Hori, Songxiang Liu, Peng Gao, Sen Su. (2020). "Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering." Web.
1. Lei shi,Shijie Geng, Kai Shuang,Chiori Hori, Songxiang Liu, Peng Gao, Sen Su. Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5370

WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION


The presence of auditory and visual senses enables humans to obtain a profound understanding of the real-world scenes. While audio and visual signals are capable of providing scene knowledge individually, the combination of both offers a better insight about the underlying event. In this paper, we address the problem of audio-visual event localization where the goal is to identify the presence of an event that is both audible and visible in a video, using fully or weakly supervised learning.

Paper Details

Authors:
Janani Ramaswamy
Submitted On:
16 May 2020 - 1:09am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

What_makes_the_sound_ICASSP2020.pdf

(41)

Subscribe

[1] Janani Ramaswamy, "WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5369. Accessed: Sep. 25, 2020.
@article{5369-20,
url = {http://sigport.org/5369},
author = {Janani Ramaswamy },
publisher = {IEEE SigPort},
title = {WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION},
year = {2020} }
TY - EJOUR
T1 - WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
AU - Janani Ramaswamy
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5369
ER -
Janani Ramaswamy. (2020). WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION. IEEE SigPort. http://sigport.org/5369
Janani Ramaswamy, 2020. WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION. Available at: http://sigport.org/5369.
Janani Ramaswamy. (2020). "WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION." Web.
1. Janani Ramaswamy. WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5369

Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation


Human cognition is supported by the combination of multi- modal information from different sources of perception. The two most important modalities are visual and audio. Cross- modal visual-audio generation enables the synthesis of da- ta from one modality following the acquisition of data from another. This brings about the full experience that can only be achieved through the combination of the two. In this pa- per, the Self-Attention mechanism is applied to cross-modal visual-audio generation for the first time.

Paper Details

Authors:
Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen
Submitted On:
13 May 2020 - 9:29pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

SA-CMGAN

(39)

Subscribe

[1] Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen, "Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5179. Accessed: Sep. 25, 2020.
@article{5179-20,
url = {http://sigport.org/5179},
author = {Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen },
publisher = {IEEE SigPort},
title = {Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation},
year = {2020} }
TY - EJOUR
T1 - Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation
AU - Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5179
ER -
Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen. (2020). Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation. IEEE SigPort. http://sigport.org/5179
Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen, 2020. Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation. Available at: http://sigport.org/5179.
Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen. (2020). "Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation." Web.
1. Huadong Tan ; Guang Wu ; Pengcheng Zhao ; Yanxiang Chen. Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5179

Intra Prediction in the Emerging VVC Video Coding Standard

Paper Details

Authors:
Alexey Filippov, Vasily Rufitskiy, Jianle Chen, and Elena Alshina
Submitted On:
30 March 2020 - 10:14am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Session:
Document Year:
Cite

Document Files

Intra Prediction in the Emerging VVC Video Coding Standard

(49)

Subscribe

[1] Alexey Filippov, Vasily Rufitskiy, Jianle Chen, and Elena Alshina, "Intra Prediction in the Emerging VVC Video Coding Standard", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5061. Accessed: Sep. 25, 2020.
@article{5061-20,
url = {http://sigport.org/5061},
author = {Alexey Filippov; Vasily Rufitskiy; Jianle Chen; and Elena Alshina },
publisher = {IEEE SigPort},
title = {Intra Prediction in the Emerging VVC Video Coding Standard},
year = {2020} }
TY - EJOUR
T1 - Intra Prediction in the Emerging VVC Video Coding Standard
AU - Alexey Filippov; Vasily Rufitskiy; Jianle Chen; and Elena Alshina
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5061
ER -
Alexey Filippov, Vasily Rufitskiy, Jianle Chen, and Elena Alshina. (2020). Intra Prediction in the Emerging VVC Video Coding Standard. IEEE SigPort. http://sigport.org/5061
Alexey Filippov, Vasily Rufitskiy, Jianle Chen, and Elena Alshina, 2020. Intra Prediction in the Emerging VVC Video Coding Standard. Available at: http://sigport.org/5061.
Alexey Filippov, Vasily Rufitskiy, Jianle Chen, and Elena Alshina. (2020). "Intra Prediction in the Emerging VVC Video Coding Standard." Web.
1. Alexey Filippov, Vasily Rufitskiy, Jianle Chen, and Elena Alshina. Intra Prediction in the Emerging VVC Video Coding Standard [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5061

Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition

Paper Details

Authors:
Submitted On:
25 September 2019 - 12:50am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

MMSP_poster_A0_v3.2.pdf

(94)

Subscribe

[1] , "Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4838. Accessed: Sep. 25, 2020.
@article{4838-19,
url = {http://sigport.org/4838},
author = { },
publisher = {IEEE SigPort},
title = {Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition},
year = {2019} }
TY - EJOUR
T1 - Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4838
ER -
. (2019). Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition. IEEE SigPort. http://sigport.org/4838
, 2019. Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition. Available at: http://sigport.org/4838.
. (2019). "Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition." Web.
1. . Lightweight Deep Convolutional Neural Networks for Facial Epression Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4838

An Occlusion Probability Model for Improving the Rendering Quality of Views


Occlusion as a common phenomenon in object surface can seriously affect information collection of light field. To visualize light field data-set, occlusions are usually idealized and neglected for most prior light field rendering (LFR) algorithms. However, the 3D spatial structure of some features may be missing to capture some incorrect samples caused by occlusion discontinuities. To solve this problem, we propose an occlusion probability (OCP) model to improve the capturing information and the rendering quality of views with occlusion for the LFR.

Paper Details

Authors:
Submitted On:
20 September 2019 - 5:41am
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

occlusion_MMSP2019.pdf

(79)

Subscribe

[1] , "An Occlusion Probability Model for Improving the Rendering Quality of Views", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4772. Accessed: Sep. 25, 2020.
@article{4772-19,
url = {http://sigport.org/4772},
author = { },
publisher = {IEEE SigPort},
title = {An Occlusion Probability Model for Improving the Rendering Quality of Views},
year = {2019} }
TY - EJOUR
T1 - An Occlusion Probability Model for Improving the Rendering Quality of Views
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4772
ER -
. (2019). An Occlusion Probability Model for Improving the Rendering Quality of Views. IEEE SigPort. http://sigport.org/4772
, 2019. An Occlusion Probability Model for Improving the Rendering Quality of Views. Available at: http://sigport.org/4772.
. (2019). "An Occlusion Probability Model for Improving the Rendering Quality of Views." Web.
1. . An Occlusion Probability Model for Improving the Rendering Quality of Views [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4772

FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction


Shearlet Transform (ST) is one of the most effective methods for Densely-Sampled Light Field (DSLF) reconstruction from a Sparsely-Sampled Light Field (SSLF). However, ST requires a precise disparity estimation of the SSLF. To this end, in this paper a state-of-the-art optical flow method, i.e. PWC-Net, is employed to estimate bidirectional disparity maps between neighboring views in the SSLF. Moreover, to take full advantage of optical flow and ST for DSLF reconstruction, a novel learning-based method, referred to as Flow-Assisted Shearlet Transform (FAST), is proposed in this paper.

Paper Details

Authors:
Reinhard Koch, Robert Bregovic, Atanas Gotchev
Submitted On:
16 September 2019 - 12:24pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

ICIP2019_FAST.pdf

(120)

Subscribe

[1] Reinhard Koch, Robert Bregovic, Atanas Gotchev , "FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4643. Accessed: Sep. 25, 2020.
@article{4643-19,
url = {http://sigport.org/4643},
author = {Reinhard Koch; Robert Bregovic; Atanas Gotchev },
publisher = {IEEE SigPort},
title = {FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction},
year = {2019} }
TY - EJOUR
T1 - FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction
AU - Reinhard Koch; Robert Bregovic; Atanas Gotchev
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4643
ER -
Reinhard Koch, Robert Bregovic, Atanas Gotchev . (2019). FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction. IEEE SigPort. http://sigport.org/4643
Reinhard Koch, Robert Bregovic, Atanas Gotchev , 2019. FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction. Available at: http://sigport.org/4643.
Reinhard Koch, Robert Bregovic, Atanas Gotchev . (2019). "FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction." Web.
1. Reinhard Koch, Robert Bregovic, Atanas Gotchev . FAST: Flow-Assisted Shearlet Transform for Densely-sampled Light Field Reconstruction [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4643

AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION


Despite the recent success of multi-modal action recognition in videos, in reality, we usually confront the situation that some data are not available beforehand, especially for multimodal data. For example, while vision and audio data are required to address the multi-modal action recognition, audio tracks in videos are easily lost due to the broken files or the limitation of devices. To cope with this sound-missing problem, we present an approach to simulating deep audio feature from merely spatial-temporal vision data.

Paper Details

Authors:
Hu-Cheng Lee, Chih-Yu Lin, Pin-Chun Hsu, Winston H. Hsu
Submitted On:
14 May 2019 - 5:08am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:

Document Files

20190516_AUDIO_FEATURE_GENERATION_FOR_MISSING_MODALITY_PROBLEM_IN_VIDEO_ACTION_RECOGNITION.pptx

(106)

Subscribe

[1] Hu-Cheng Lee, Chih-Yu Lin, Pin-Chun Hsu, Winston H. Hsu, "AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4504. Accessed: Sep. 25, 2020.
@article{4504-19,
url = {http://sigport.org/4504},
author = {Hu-Cheng Lee; Chih-Yu Lin; Pin-Chun Hsu; Winston H. Hsu },
publisher = {IEEE SigPort},
title = {AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION
AU - Hu-Cheng Lee; Chih-Yu Lin; Pin-Chun Hsu; Winston H. Hsu
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4504
ER -
Hu-Cheng Lee, Chih-Yu Lin, Pin-Chun Hsu, Winston H. Hsu. (2019). AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION. IEEE SigPort. http://sigport.org/4504
Hu-Cheng Lee, Chih-Yu Lin, Pin-Chun Hsu, Winston H. Hsu, 2019. AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION. Available at: http://sigport.org/4504.
Hu-Cheng Lee, Chih-Yu Lin, Pin-Chun Hsu, Winston H. Hsu. (2019). "AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION." Web.
1. Hu-Cheng Lee, Chih-Yu Lin, Pin-Chun Hsu, Winston H. Hsu. AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4504

Dynamic Temporal Alignment of Speech to Lips

Paper Details

Authors:
Shmuel Peleg
Submitted On:
8 May 2019 - 2:14am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP 2019 poster.pdf

(118)

Subscribe

[1] Shmuel Peleg, "Dynamic Temporal Alignment of Speech to Lips", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4017. Accessed: Sep. 25, 2020.
@article{4017-19,
url = {http://sigport.org/4017},
author = {Shmuel Peleg },
publisher = {IEEE SigPort},
title = {Dynamic Temporal Alignment of Speech to Lips},
year = {2019} }
TY - EJOUR
T1 - Dynamic Temporal Alignment of Speech to Lips
AU - Shmuel Peleg
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4017
ER -
Shmuel Peleg. (2019). Dynamic Temporal Alignment of Speech to Lips. IEEE SigPort. http://sigport.org/4017
Shmuel Peleg, 2019. Dynamic Temporal Alignment of Speech to Lips. Available at: http://sigport.org/4017.
Shmuel Peleg. (2019). "Dynamic Temporal Alignment of Speech to Lips." Web.
1. Shmuel Peleg. Dynamic Temporal Alignment of Speech to Lips [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4017

Learning Shared Vector Representations of Lyrics and Chords in Music


Music has a powerful influence on a listener's emotions. In this paper, we represent lyrics and chords in a shared vector space using a phrase-aligned chord-and-lyrics corpus. We show that models that use these shared representations predict a listener's emotion while hearing musical passages better than models that do not use these representations. Additionally, we conduct a visual analysis of these learnt shared vector representations and explain how they support existing theories in music.

Paper Details

Authors:
Timothy Greer, Karan Singla, Benjamin Ma, and Shrikanth Narayanan
Submitted On:
7 May 2019 - 8:12pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Learning_Shared_Reps_ICASSP_Pres_2(1).pdf

(141)

Subscribe

[1] Timothy Greer, Karan Singla, Benjamin Ma, and Shrikanth Narayanan, "Learning Shared Vector Representations of Lyrics and Chords in Music", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3971. Accessed: Sep. 25, 2020.
@article{3971-19,
url = {http://sigport.org/3971},
author = {Timothy Greer; Karan Singla; Benjamin Ma; and Shrikanth Narayanan },
publisher = {IEEE SigPort},
title = {Learning Shared Vector Representations of Lyrics and Chords in Music},
year = {2019} }
TY - EJOUR
T1 - Learning Shared Vector Representations of Lyrics and Chords in Music
AU - Timothy Greer; Karan Singla; Benjamin Ma; and Shrikanth Narayanan
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3971
ER -
Timothy Greer, Karan Singla, Benjamin Ma, and Shrikanth Narayanan. (2019). Learning Shared Vector Representations of Lyrics and Chords in Music. IEEE SigPort. http://sigport.org/3971
Timothy Greer, Karan Singla, Benjamin Ma, and Shrikanth Narayanan, 2019. Learning Shared Vector Representations of Lyrics and Chords in Music. Available at: http://sigport.org/3971.
Timothy Greer, Karan Singla, Benjamin Ma, and Shrikanth Narayanan. (2019). "Learning Shared Vector Representations of Lyrics and Chords in Music." Web.
1. Timothy Greer, Karan Singla, Benjamin Ma, and Shrikanth Narayanan. Learning Shared Vector Representations of Lyrics and Chords in Music [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3971

Pages