Sorry, you need to enable JavaScript to visit this website.

Generative models are popular tools with a wide range of applications. Nevertheless, it is as vulnerable to adversarial samples as classifiers. The existing attack methods mainly focus on generating adversarial examples by adding imperceptible perturbations to input, which leads to wrong result. However, we focus on another aspect of attack, i.e., cheating models by significant changes. The former induces Type II error and the latter causes Type I error. In this paper, we propose Type I attack to generative models such as VAE and GAN.

Categories:
50 Views

This paper introduces the application of a visual relationship classifier as a standalone system that is meant to be used with external detectors. Through these lens, we propose a training scheme that uses unannotated pairs of objects as negative samples in order to improve precision. The proposed network architecture incorporates common techniques presented in related state-of-the-art solutions with a novel positional encoding scheme.

Categories:
33 Views

Unsupervised Learning (UL) models are a class of Machine Learning (ML) which concerns with reducing dimensionality, data factorization, disentangling and learning the representations among the data. The UL models gain their popularity due to their abilities to learn without any predefined label, and they are able to reduce the noise and redundancy among the data samples.

Categories:
58 Views

Spatial-temporal graph convolutional networks (ST-GCN) have achieved outstanding performances on human action recognition, however, it might be less superior on a two-person interaction recognition (TPIR) task due to the relationship of each skeleton is not considered. In this study, we present an improvement of the ST-GCN model that focused on TPIR by employing the pairwise adjacency matrix to capture the relationship of person-person skeletons (ST-GCN-PAM). To validate the effectiveness of the proposed ST-GCN-PAM model on TPIR, experiments were conducted on NTU RGB+D 120.

Categories:
108 Views

This paper presents an approach to perform human activity recognition in videos through the employment of a deep recurrent network, taking as inputs appearance and optical flow information. Our method proposes a novel architecture named BubbleNET, which is based on a recurrent layer dispersed into several modules (referred to as bubbles) along with an attention mechanism based on squeeze-and-excitation strategy, responsible to modulate each bubble contribution.

Categories:
5 Views

State-of-the-art hearing aids (HA) are limited in recognizing acoustic environments. Much effort is spent on research to improve listening experience for HA users in every acoustic situation. There is, however, no dedicated public database to train acoustic environment recognition algorithms with a specific focus on HA applications accounting for their requirements. Existing acoustic scene classification databases are inappropriate for HA signal processing.

Categories:
200 Views

As handwriting input becomes more prevalent, the large symbol inventory required to support Chinese handwriting recognition poses unique challenges. This paper describes how the Apple deep learning recognition system can accurately handle up to 30,000 Chinese characters while running in real-time across a range of mobile devices.

Categories:
24 Views

We propose a general projection-free metric learning framework, where the minimization objective $\min_{\M \in \cS} Q(\M)$ is a convex differentiable function of the metric matrix $\M$, and $\M$ resides in the set $\cS$ of generalized graph Laplacian matrices for connected graphs with positive edge weights and node degrees.
Unlike low-rank metric matrices common in the literature, $\cS$ includes the important positive-diagonal-only matrices as a special case in the limit.

Categories:
28 Views

Pages