Sorry, you need to enable JavaScript to visit this website.

Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent.
Existing approaches using textual inputs combined with bounding boxes or region masks fall short in providing precise spatial guidance, often leading to misaligned or unintended object orientation.
To address these limitations, we propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that utilizes simple user-provided scribbles as visual prompts to guide image generation.

Categories:
20 Views

Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements.

Categories:
29 Views

Video conferencing tools have seen a significant increase in usage in the past few years but they still consume a significant bandwidth of ~100 Kbps to a few Mbps. In this work, we present Txt2Vid-Web: a practical, web-based, low bandwidth, video conferencing platform building upon the Txt2Vid work [1].

Categories:
188 Views

Bipolar Disorder (BD) is one of the most common mental illness. Using rating scales for assessment is one of the approaches for diagnosing and tracking BD patients. However, the requirement for manpower and time is heavy in the process of evaluation. In order to reduce the cost of social and medical resources, this study collects the user’s data by the App on smartphones, consisting of location data, self-report scales, daily mood, sleeping time and records of multi-media which are heterogeneous digital phenotyping data, to build a database.

Categories:
7 Views

Elderly people commonly face health problems related to their sedentary life. Thus, their physical strength, mental capability, and motor skills are decreasing. Moreover, overweight and physical problems are becoming a serious health problem around the world. On the other hand, they suffer from the social isolation that directly affects their physical and mental health. Gamification for elderly people emerges to motivate them to exercise and socialize with their peers, through social interaction on mobile devices.

Categories:
208 Views

Eye trackers are found on various electronic devices. In this paper, we propose to exploit the gaze information acquired by an eye tracker for depth estimation. The data collected from the eye tracker in a fixation interval are used to estimate the depth of a gazed object. The proposed method can be used to construct a sparse depth map of an augmented reality space. The resulting depth map can be applied to, for example, controlling the visual information displayed to the viewer.

Categories:
35 Views

We explore new aspects of assistive living on smart human-robot interaction (HRI) that involve automatic recognition and online validation of speech and gestures in a natural interface, providing social features for HRI. We introduce a whole framework and resources of a real-life scenario for elderly subjects supported by an assistive bathing robot, addressing health and hygiene care issues.

Categories:
10 Views

Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric Kullback-Leibler divergence (KLD) to rate phonetic classes with larger discrimination across different affective states. We perform experimental evaluations using the IEMOCAP database.

Categories:
6 Views

Natural and affective handshakes of two participants define the course of dyadic interaction. Affective states of the participants are expected to be correlated with the nature of the dyadic interaction. In this paper, we extract two classes of the dyadic interaction based on temporal clustering of affective states. We use the k-means temporal clustering to define the interaction classes, and utilize support vector machine based classifier to estimate the interaction class types from multimodal, speech and motion, features.

Categories:
8 Views

Pages