Sorry, you need to enable JavaScript to visit this website.

Audio and Acoustic Signal Processing

The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese


id201.pptx

File id201.pptx (124 downloads)

Paper Details

Authors:
Bijun Ling, Jie Liang
Submitted On:
15 October 2016 - 4:47am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

id201.pptx

(124 downloads)

Keywords

Subscribe

[1] Bijun Ling, Jie Liang, "The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese ", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1225. Accessed: Oct. 19, 2017.
@article{1225-16,
url = {http://sigport.org/1225},
author = {Bijun Ling; Jie Liang },
publisher = {IEEE SigPort},
title = {The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese },
year = {2016} }
TY - EJOUR
T1 - The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese
AU - Bijun Ling; Jie Liang
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1225
ER -
Bijun Ling, Jie Liang. (2016). The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese . IEEE SigPort. http://sigport.org/1225
Bijun Ling, Jie Liang, 2016. The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese . Available at: http://sigport.org/1225.
Bijun Ling, Jie Liang. (2016). "The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese ." Web.
1. Bijun Ling, Jie Liang. The influence of syllable structure and prosodic strengthening on consonant production in Shanghai Chinese [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1225

A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK


This paper proposes a novel regression approach to binaural speech segregation based on deep neural network (DNN). In contrast to the conventional ideal binary mask (IBM) method using DNN with the interaural time difference (ITD) and interaural level difference (ILD) as the auditory features, the log-power spectra (LPS) features of target speech are directly predicted via a regression DNN model by concatenating the monaural LPS features and the binaural features as the input.

Paper Details

Authors:
Nana Fan, Jun Du, Lirong Dai
Submitted On:
14 October 2016 - 11:07pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

oral-presentation3.pptx

(116 downloads)

oral-presentation3.pptx

(112 downloads)

Keywords

Additional Categories

Subscribe

[1] Nana Fan, Jun Du, Lirong Dai, "A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1207. Accessed: Oct. 19, 2017.
@article{1207-16,
url = {http://sigport.org/1207},
author = {Nana Fan; Jun Du; Lirong Dai },
publisher = {IEEE SigPort},
title = {A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK},
year = {2016} }
TY - EJOUR
T1 - A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK
AU - Nana Fan; Jun Du; Lirong Dai
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1207
ER -
Nana Fan, Jun Du, Lirong Dai. (2016). A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK. IEEE SigPort. http://sigport.org/1207
Nana Fan, Jun Du, Lirong Dai, 2016. A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK. Available at: http://sigport.org/1207.
Nana Fan, Jun Du, Lirong Dai. (2016). "A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK." Web.
1. Nana Fan, Jun Du, Lirong Dai. A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1207

The Design and Implementation of HMM-based Dai Speech Synthesis


By far there are more than 1.2 million Dai compatriots using Dai language in Yunnan province,researching Dai speech synthesis has great significance in advancing the informationization of Dai.This paper researches the implementation of Dai speech synthesis by taking the HMM speech synthesis framework and STRAIGHT synthesizer into account.
In this paper,collection and selection of Dai text corpus,recording of speech corpus,text normalization,segmentation,Romanization and the implementation of acoustic model training are described.

Paper Details

Authors:
Wang Zhan,Yang Jian,Yang xin
Submitted On:
14 October 2016 - 11:30am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

会议海报.pdf

(0)

Keywords

Subscribe

[1] Wang Zhan,Yang Jian,Yang xin, "The Design and Implementation of HMM-based Dai Speech Synthesis", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1204. Accessed: Oct. 19, 2017.
@article{1204-16,
url = {http://sigport.org/1204},
author = {Wang Zhan;Yang Jian;Yang xin },
publisher = {IEEE SigPort},
title = {The Design and Implementation of HMM-based Dai Speech Synthesis},
year = {2016} }
TY - EJOUR
T1 - The Design and Implementation of HMM-based Dai Speech Synthesis
AU - Wang Zhan;Yang Jian;Yang xin
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1204
ER -
Wang Zhan,Yang Jian,Yang xin. (2016). The Design and Implementation of HMM-based Dai Speech Synthesis. IEEE SigPort. http://sigport.org/1204
Wang Zhan,Yang Jian,Yang xin, 2016. The Design and Implementation of HMM-based Dai Speech Synthesis. Available at: http://sigport.org/1204.
Wang Zhan,Yang Jian,Yang xin. (2016). "The Design and Implementation of HMM-based Dai Speech Synthesis." Web.
1. Wang Zhan,Yang Jian,Yang xin. The Design and Implementation of HMM-based Dai Speech Synthesis [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1204

Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra


The bilateral cavities of the piriform fossa are the side branches of the vocal tract and produce anti-resonance(s) in the transfer function. This effect has been known for male vocal tracts, but female data were few. This study investigates contributions of the piriform fossa to vowel spectra in female vocal tracts by means of MRI-based vocal-tract modeling and acoustic experiment with the water-filling technique. Results from three female subjects indicate that the piriform fossa generates one or two dips in the frequency region of 4-6 kHz.

Paper Details

Authors:
Congcong Zhang, Kiyoshi Honda, Ju Zhang, Jianguo Wei
Submitted On:
15 October 2016 - 12:24am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

zcc_ISCSLP2016.pdf

(107 downloads)

Keywords

Subscribe

[1] Congcong Zhang, Kiyoshi Honda, Ju Zhang, Jianguo Wei, "Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1203. Accessed: Oct. 19, 2017.
@article{1203-16,
url = {http://sigport.org/1203},
author = {Congcong Zhang; Kiyoshi Honda; Ju Zhang; Jianguo Wei },
publisher = {IEEE SigPort},
title = {Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra},
year = {2016} }
TY - EJOUR
T1 - Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra
AU - Congcong Zhang; Kiyoshi Honda; Ju Zhang; Jianguo Wei
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1203
ER -
Congcong Zhang, Kiyoshi Honda, Ju Zhang, Jianguo Wei. (2016). Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra. IEEE SigPort. http://sigport.org/1203
Congcong Zhang, Kiyoshi Honda, Ju Zhang, Jianguo Wei, 2016. Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra. Available at: http://sigport.org/1203.
Congcong Zhang, Kiyoshi Honda, Ju Zhang, Jianguo Wei. (2016). "Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra." Web.
1. Congcong Zhang, Kiyoshi Honda, Ju Zhang, Jianguo Wei. Contributions of the Piriform Fossa of Female Speakers to Vowel Spectra [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1203

A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin


This paper presents a multi-channel/multi-speaker 3D audiovisual
corpus for Mandarin continuous speech recognition and
other fields, such as speech visualization and speech synthesis.
This corpus consists of 24 speakers with about 18k utterances,
about 20 hours in total. For each utterance, the audio
streams were recorded by two professional microphones in
near-field and far-field respectively, while a marker-based 3D
facial motion capturing system with six infrared cameras was

Paper Details

Authors:
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou
Submitted On:
14 October 2016 - 10:40am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

3D Audio-Visual Speech Corpus in Mandarin

(129 downloads)

Keywords

Subscribe

[1] Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou, "A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1200. Accessed: Oct. 19, 2017.
@article{1200-16,
url = {http://sigport.org/1200},
author = {Jun Yu; Rongfeng Su; Lan Wang; Wenpeng Zhou },
publisher = {IEEE SigPort},
title = {A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin},
year = {2016} }
TY - EJOUR
T1 - A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin
AU - Jun Yu; Rongfeng Su; Lan Wang; Wenpeng Zhou
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1200
ER -
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou. (2016). A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin. IEEE SigPort. http://sigport.org/1200
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou, 2016. A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin. Available at: http://sigport.org/1200.
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou. (2016). "A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin." Web.
1. Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou. A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1200

Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification


In the conventional frame feature based music genre
classification methods, the audio data is represented by
independent frames and the sequential nature of audio is totally
ignored. If the sequential knowledge is well modeled and
combined, the classification performance can be significantly
improved. The long short-term memory(LSTM) recurrent
neural network (RNN) which uses a set of special memory
cells to model for long-range feature sequence, has been
successfully used for many sequence labeling and sequence

Paper Details

Authors:
Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu
Submitted On:
14 October 2016 - 9:18am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

ISCSLP2016_JiaDai_pptA4.pdf

(151 downloads)

Keywords

Additional Categories

Subscribe

[1] Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu, "Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1195. Accessed: Oct. 19, 2017.
@article{1195-16,
url = {http://sigport.org/1195},
author = {Jia Dai; Shan Liang; Wei Xue; Chongjia Ni; Wenju Liu },
publisher = {IEEE SigPort},
title = {Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification},
year = {2016} }
TY - EJOUR
T1 - Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification
AU - Jia Dai; Shan Liang; Wei Xue; Chongjia Ni; Wenju Liu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1195
ER -
Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu. (2016). Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification. IEEE SigPort. http://sigport.org/1195
Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu, 2016. Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification. Available at: http://sigport.org/1195.
Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu. (2016). "Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification." Web.
1. Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu. Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1195

Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM


The increasing profusion of commercial automatic speech recognition technology applications has been driven by big-data techniques, making use of high quality labelled speech datasets. Children’s speech displays greater time and frequency domain variability than typical adult speech, lacks the depth and breadth of training material, and presents difficulties relating to capture quality. All of these factors act to reduce the achievable performance of systems that recognise children’s speech.

Paper Details

Authors:
Ian McLoughlin, Wu Guo, Lirong Dai
Submitted On:
14 October 2016 - 5:48am
Short Link:
Type:
Event:
Document Year:
Cite

Document Files

ISCSLP_poster(MengjieQian) .pdf

(92 downloads)

Keywords

Additional Categories

Subscribe

[1] Ian McLoughlin, Wu Guo, Lirong Dai, "Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1186. Accessed: Oct. 19, 2017.
@article{1186-16,
url = {http://sigport.org/1186},
author = {Ian McLoughlin; Wu Guo; Lirong Dai },
publisher = {IEEE SigPort},
title = {Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM},
year = {2016} }
TY - EJOUR
T1 - Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM
AU - Ian McLoughlin; Wu Guo; Lirong Dai
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1186
ER -
Ian McLoughlin, Wu Guo, Lirong Dai. (2016). Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM. IEEE SigPort. http://sigport.org/1186
Ian McLoughlin, Wu Guo, Lirong Dai, 2016. Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM. Available at: http://sigport.org/1186.
Ian McLoughlin, Wu Guo, Lirong Dai. (2016). "Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM." Web.
1. Ian McLoughlin, Wu Guo, Lirong Dai. Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1186

Detection of Mood Disorder Using Speech Emotion Profiles and LSTM


In mood disorder diagnosis, bipolar disorder (BD) patients are often misdiagnosed as unipolar depression (UD) on initial presentation. It is crucial to establish an accurate distinction between BD and UD to make a correct and early diagnosis, leading to improvements in treatment and course of illness. To deal with this misdiagnosis problem, in this study, we experimented on eliciting subjects’ emotions by watching six eliciting emotional video clips. After watching each video clips, their speech responses were collected when they were interviewing with a clinician.

Paper Details

Authors:
Tsung-Hsien Yang, Kun-Yi Huang, and Ming-Hsiang Su
Submitted On:
14 October 2016 - 9:30pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ISCSLP-2016-1014-1.pdf

(131 downloads)

Keywords

Subscribe

[1] Tsung-Hsien Yang, Kun-Yi Huang, and Ming-Hsiang Su, "Detection of Mood Disorder Using Speech Emotion Profiles and LSTM", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1183. Accessed: Oct. 19, 2017.
@article{1183-16,
url = {http://sigport.org/1183},
author = {Tsung-Hsien Yang; Kun-Yi Huang; and Ming-Hsiang Su },
publisher = {IEEE SigPort},
title = {Detection of Mood Disorder Using Speech Emotion Profiles and LSTM},
year = {2016} }
TY - EJOUR
T1 - Detection of Mood Disorder Using Speech Emotion Profiles and LSTM
AU - Tsung-Hsien Yang; Kun-Yi Huang; and Ming-Hsiang Su
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1183
ER -
Tsung-Hsien Yang, Kun-Yi Huang, and Ming-Hsiang Su. (2016). Detection of Mood Disorder Using Speech Emotion Profiles and LSTM. IEEE SigPort. http://sigport.org/1183
Tsung-Hsien Yang, Kun-Yi Huang, and Ming-Hsiang Su, 2016. Detection of Mood Disorder Using Speech Emotion Profiles and LSTM. Available at: http://sigport.org/1183.
Tsung-Hsien Yang, Kun-Yi Huang, and Ming-Hsiang Su. (2016). "Detection of Mood Disorder Using Speech Emotion Profiles and LSTM." Web.
1. Tsung-Hsien Yang, Kun-Yi Huang, and Ming-Hsiang Su. Detection of Mood Disorder Using Speech Emotion Profiles and LSTM [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1183

The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words


In Mandarin language speaking, some consonant and vowel pairs are hard to be distinguished and pronounced clearly even for some native speakers. This study investigates the signal distance between consonants compared in pairs from the signal processing point of view to reveal the correlation of signal distance and consonant pronunciation. Some popular speech quality objective measures are innovatively applied to obtain the signal distance.

Paper Details

Authors:
Huijun Ding, Chenxi XIE, Lei ZENG, Yang XU, Guo DAN
Submitted On:
14 October 2016 - 12:32am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ISCSLP Poster_Correlation Between Signal Distance.pdf

(112 downloads)

Keywords

Subscribe

[1] Huijun Ding, Chenxi XIE, Lei ZENG, Yang XU, Guo DAN, "The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1180. Accessed: Oct. 19, 2017.
@article{1180-16,
url = {http://sigport.org/1180},
author = {Huijun Ding; Chenxi XIE; Lei ZENG; Yang XU; Guo DAN },
publisher = {IEEE SigPort},
title = {The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words},
year = {2016} }
TY - EJOUR
T1 - The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words
AU - Huijun Ding; Chenxi XIE; Lei ZENG; Yang XU; Guo DAN
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1180
ER -
Huijun Ding, Chenxi XIE, Lei ZENG, Yang XU, Guo DAN. (2016). The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words. IEEE SigPort. http://sigport.org/1180
Huijun Ding, Chenxi XIE, Lei ZENG, Yang XU, Guo DAN, 2016. The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words. Available at: http://sigport.org/1180.
Huijun Ding, Chenxi XIE, Lei ZENG, Yang XU, Guo DAN. (2016). "The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words." Web.
1. Huijun Ding, Chenxi XIE, Lei ZENG, Yang XU, Guo DAN. The Correlation Between Signal Distance and Consonant Pronunciation in Mandarin Words [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1180

Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning

Paper Details

Authors:
Submitted On:
13 October 2016 - 9:31am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

poster-v2.pdf

(111 downloads)

Keywords

Subscribe

[1] , "Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1175. Accessed: Oct. 19, 2017.
@article{1175-16,
url = {http://sigport.org/1175},
author = { },
publisher = {IEEE SigPort},
title = {Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning},
year = {2016} }
TY - EJOUR
T1 - Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning
AU -
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1175
ER -
. (2016). Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning. IEEE SigPort. http://sigport.org/1175
, 2016. Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning. Available at: http://sigport.org/1175.
. (2016). "Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning." Web.
1. . Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1175

Pages