Sorry, you need to enable JavaScript to visit this website.

Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements.

Categories:
28 Views

Video conferencing tools have seen a significant increase in usage in the past few years but they still consume a significant bandwidth of ~100 Kbps to a few Mbps. In this work, we present Txt2Vid-Web: a practical, web-based, low bandwidth, video conferencing platform building upon the Txt2Vid work [1].

Categories:
184 Views

Bipolar Disorder (BD) is one of the most common mental illness. Using rating scales for assessment is one of the approaches for diagnosing and tracking BD patients. However, the requirement for manpower and time is heavy in the process of evaluation. In order to reduce the cost of social and medical resources, this study collects the user’s data by the App on smartphones, consisting of location data, self-report scales, daily mood, sleeping time and records of multi-media which are heterogeneous digital phenotyping data, to build a database.

Categories:
7 Views

Elderly people commonly face health problems related to their sedentary life. Thus, their physical strength, mental capability, and motor skills are decreasing. Moreover, overweight and physical problems are becoming a serious health problem around the world. On the other hand, they suffer from the social isolation that directly affects their physical and mental health. Gamification for elderly people emerges to motivate them to exercise and socialize with their peers, through social interaction on mobile devices.

Categories:
208 Views

Eye trackers are found on various electronic devices. In this paper, we propose to exploit the gaze information acquired by an eye tracker for depth estimation. The data collected from the eye tracker in a fixation interval are used to estimate the depth of a gazed object. The proposed method can be used to construct a sparse depth map of an augmented reality space. The resulting depth map can be applied to, for example, controlling the visual information displayed to the viewer.

Categories:
35 Views

We explore new aspects of assistive living on smart human-robot interaction (HRI) that involve automatic recognition and online validation of speech and gestures in a natural interface, providing social features for HRI. We introduce a whole framework and resources of a real-life scenario for elderly subjects supported by an assistive bathing robot, addressing health and hygiene care issues.

Categories:
9 Views

Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric Kullback-Leibler divergence (KLD) to rate phonetic classes with larger discrimination across different affective states. We perform experimental evaluations using the IEMOCAP database.

Categories:
6 Views

Natural and affective handshakes of two participants define the course of dyadic interaction. Affective states of the participants are expected to be correlated with the nature of the dyadic interaction. In this paper, we extract two classes of the dyadic interaction based on temporal clustering of affective states. We use the k-means temporal clustering to define the interaction classes, and utilize support vector machine based classifier to estimate the interaction class types from multimodal, speech and motion, features.

Categories:
8 Views

We propose a new algorithm for source localization on rigid surfaces, which allows one to convert daily objects into human-computer touch interfaces using surface-mounted vibration sensors. This is achieved via estimating the time-difference-of-arrivals (TDOA) of the signals across the sensors. In this work, we employ a smooth parametrized function to model the gradual noise-to-signal energy transition at each sensor. Specifically, the noise-to-signal transition is modeled by a four-parameter logistic function.

Categories:
22 Views

Pages