Sorry, you need to enable JavaScript to visit this website.


Welcome to ISCSLP 2016 - October 17-20, 2016, Tianjin, China

The ISCSLP will be hosted by Tianjin University. Tianjin has a reputation throughout China for being extremely friendly, safe and a place of delicious food. Welcome to Tianjin to attend the ISCSLP2016. The 10th International Symposium on Chinese Spoken Language Processing (ISCSLP 2016) will be held on October 17-20, 2016 in Tianjin. ISCSLP is a biennial conference for scientists, researchers, and practitioners to report and discuss the latest progress in all theoretical and technological aspects of spoken language processing. While the ISCSLP is focused primarily on Chinese languages, works on other languages that may be applied to Chinese speech and language are also encouraged. The working language of ISCSLP is English.


Nasal Finals play an important role in distinguishing lexical meanings in Standard Chinese, but it is still unclear what the primary perceptual cues for nasal Finals are. The present study looks into this question, especially the primary perceptual cues for native Chinese listeners. We conducted two perceptual experiments with three-formant synthetic stimuli in which the second formant (F2) and the third formant (F3) were varied. Experiment I varied F2 and F3 simultaneously in the vowel part (including vowel nucleus and nasalized vowel).


Spoken keyword search in low-resource condition suffers from out-of-vocabulary (OOV) problem and insufficient text data for language model (LM) training. Web-crawled text data is used to expand vocabulary and to augment language model. However, the mismatching between web text and the target speech data brings difficulties to effective utilization. New words from web data need an evaluation to exclude noisy words or introduce proper probabilities. In this paper, several criteria to rank new words from web data are investigated and are used as features


This paper establishs CTC-based systems on Chinese Mandarin ASR task, three different level output units are explored: characters, context independent phonemes and context dependent phoneme. To make training stable we propose Newbob-Trn strategy, furthermore, blank label prior cost is proposed to improve the performance. Further, we establish the CTC-trained UniLSTM-RC model, which ensures the real-time requirement of an online system, meanwhile, brings performance gain on Chinese Mandarin ASR task.


The boundary positions of /a/-/ɤ/ were significantly different among the four tone conditions, with much less identification of /a/ category under the high-falling tone condition in contrast to the other three tones. Moreover, the maximum identification scores of /ɤ/ category were significantly lower under the falling-rising tone compared with other tone conditions. The relation between F0 and F1, as well as the substantial feature of falling-rising tone might account for these effects of tone categories on Mandarin vowel perception.


The present study investigated production and perception of focus in L2 Mandarin of Qiang speakers. Three target sentences were uttered under four focus conditions, i.e., initial, medial, final and neutral focus by 10 Qiang-Mandarin speakers. Systematic acoustic analysis showed that: (1) In Qiang-Mandarin, on-focus words exhibit significant F0 rising, intensity increasing and duration lengthening. There is no Post-focus Compression (PFC). The duration of pre-focus and post-focus words remains largely intact.


This paper proposes two types of machine-extracted linguistic features from unlimited text input for Mandarin prosody generation. One is the improved punctuation confidence (iPC) which is a modified version of the previously proposed punctuation confidence that represents likelihood of inserting major punctuation marks (PMs) at word boundaries. Another is the quotation confidence (QC) which measures likelihood of a word string to be quoted as a meaningful or emphasized unit.

