- Read more about Mixture Model Auto-Encoders: Deep Clustering through Dictionary Learning
- Log in to post comments
mixmate.pdf
- Categories:
- Read more about Neural Collapse in Deep Homogeneous Classifiers and the Role of Weight Decay
- Log in to post comments
Neural Collapse is a phenomenon recently discovered in deep classifiers where the last layer activations collapse onto their class means, while the means and last layer weights take on the structure of dual equiangular tight frames. In this paper we present results showing the role of weight decay in the emergence of Neural Collapse in deep homogeneous networks. We show that certain near-interpolating minima of deep networks satisfy the Neural Collapse condition, and this can be derived from the gradient flow on the regularized square loss.
- Categories:
- Read more about Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training
- Log in to post comments
Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. Applying these approaches to Automatic Speech Recognition, we demonstrate that the state-of-the-art Conformer models generally have multiple ambient layers. We study the stability of these layers across runs and model sizes, propose that group normalization may be used without disrupting their formation, and examine their correlation with model weight updates in each layer.
- Categories:
- Read more about Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training
- Log in to post comments
Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. Applying these approaches to Automatic Speech Recognition, we demonstrate that the state-of-the-art Conformer models generally have multiple ambient layers. We study the stability of these layers across runs and model sizes, propose that group normalization may be used without disrupting their formation, and examine their correlation with model weight updates in each layer.
- Categories:
- Read more about BLOCK-SPARSE ADVERSARIAL ATTACK TO FOOL TRANSFORMER-BASED TEXT CLASSIFIERS
- Log in to post comments
Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. The adversarial perturbation in our method is imposed to be block-sparse so that the resultant adversarial example differs from the original sentence in only a few words. Due to the discrete nature of textual data, we perform gradient projection to find the minimizer of our proposed optimization problem.
- Categories:
- Read more about SparseBFA: Attacking Sparse Deep Neural Networks with the Worst-Case Bit Flips on Coordinates
- Log in to post comments
poster.pdf
- Categories:
- Read more about Towards Robust Visual Transformer Networks via K-Sparse Attention
- Log in to post comments
Transformer networks, originally developed in the community of machine translation to eliminate sequential nature of recurrent neural networks, have shown impressive results in other natural language processing and machine vision tasks. Self-attention is the core module behind visual transformers which globally mixes the image information. This module drastically reduces the intrinsic inductive bias imposed by CNNs, such as locality, while encountering insufficient robustness against some adversarial attacks.
- Categories:
- Read more about GlassoFormer: a Query-Sparse Transformer for Post-Fault Power Grid Voltage Prediction
- Log in to post comments
We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than many existing benchmark algorithms in terms of accuracy and stability.
- Categories:
- Read more about CDX-Net: Cross-Domain Multi-Feature Fusion Modeling via Deep Neural Networks for Multivariate Time Series Forecasting in AIOps
- Log in to post comments
CDX-Net.pdf
- Categories: