Sorry, you need to enable JavaScript to visit this website.

The bilateral cavities of the piriform fossa are the side branches of the vocal tract and produce anti-resonance(s) in the transfer function. This effect has been known for male vocal tracts, but female data were few. This study investigates contributions of the piriform fossa to vowel spectra in female vocal tracts by means of MRI-based vocal-tract modeling and acoustic experiment with the water-filling technique. Results from three female subjects indicate that the piriform fossa generates one or two dips in the frequency region of 4-6 kHz.

Categories:
3 Views

This paper presents a multi-channel/multi-speaker 3D audiovisual
corpus for Mandarin continuous speech recognition and
other fields, such as speech visualization and speech synthesis.
This corpus consists of 24 speakers with about 18k utterances,
about 20 hours in total. For each utterance, the audio
streams were recorded by two professional microphones in
near-field and far-field respectively, while a marker-based 3D
facial motion capturing system with six infrared cameras was

Categories:
9 Views

In the conventional frame feature based music genre
classification methods, the audio data is represented by
independent frames and the sequential nature of audio is totally
ignored. If the sequential knowledge is well modeled and
combined, the classification performance can be significantly
improved. The long short-term memory(LSTM) recurrent
neural network (RNN) which uses a set of special memory
cells to model for long-range feature sequence, has been
successfully used for many sequence labeling and sequence

Categories:
7 Views

The increasing profusion of commercial automatic speech recognition technology applications has been driven by big-data techniques, making use of high quality labelled speech datasets. Children’s speech displays greater time and frequency domain variability than typical adult speech, lacks the depth and breadth of training material, and presents difficulties relating to capture quality. All of these factors act to reduce the achievable performance of systems that recognise children’s speech.

Categories:
1 Views

In mood disorder diagnosis, bipolar disorder (BD) patients are often misdiagnosed as unipolar depression (UD) on initial presentation. It is crucial to establish an accurate distinction between BD and UD to make a correct and early diagnosis, leading to improvements in treatment and course of illness. To deal with this misdiagnosis problem, in this study, we experimented on eliciting subjects’ emotions by watching six eliciting emotional video clips. After watching each video clips, their speech responses were collected when they were interviewing with a clinician.

Categories:
49 Views

In Mandarin language speaking, some consonant and vowel pairs are hard to be distinguished and pronounced clearly even for some native speakers. This study investigates the signal distance between consonants compared in pairs from the signal processing point of view to reveal the correlation of signal distance and consonant pronunciation. Some popular speech quality objective measures are innovatively applied to obtain the signal distance.

Categories:
4 Views

Speech production requires control for coordination among different articulatory organs. During the natural speech, the articulatory co-variation is more common rather than compensation, but the studies supporting this view are few. In this study, the coordination of lip and tongue articulation was examined during speech using articulatory data. Native speakers of Chinese served as subjects. Speech materials consisted of short Chinese sentences, which include words having the cardinal vowels at different locations in sentences with and without emphasis.

Categories:
2 Views

This study aims at examination on the relationship between the
perception and production of Mandarin tone by Kazak minor
learners from China. The eight-day perceptual training course
of Mandarin tone is designed. Perception is assessed by means
of identification test. Production data is collected both at
pretest and post-test, and evaluated by native speakers of
Mandarin Chinese. The results from the perception at pretest
and post-test reveal that training Kazak learners to perceive
Mandarin tones has been shown to be effective, with

Categories:
40 Views

Directions into Velocities of Articulators (DIVA) model is a kind of self-adaptive neural network model which controls movements of a simulated vocal tract to produce words, syllables or phonemes. However, DIVA model lacks of emotion functions. To implement the emotion function in DIVA model, we investigate the process of affective speech production based on the combination of fundamental frequency (F0) and formant frequencies, as well as the relations between F0 and formants of emotional speech.

Categories:
12 Views

Pages