Pattern recognition and classification (MLR-PATT)

SSL-Net: A Synergistic Spectral and Learning-based Network for Efficient Bird Sound Classification

Efficient and accurate bird sound classification is of importance for ecology, habitat protection and scientific research, as it plays a central role in monitoring the distribution and abundance of species. However, prevailing methods typically demand extensively labeled audio datasets and have highly customized frameworks, imposing substantial computational and annotation loads. In this study, we present an efficient and general framework called SSL-Net, which combines spectral and learned features to identify different bird sounds.

ICASSP_2024_submit_version.pdf

ICASSP_2024_submit_version.pdf (8)

Categories:: Pattern recognition and classification (MLR-PATT)

3 Views

Neural Network Training Strategy to Enhance Anomaly Detection Performance: A Perspective on Reconstruction Loss Amplification

Unsupervised anomaly detection (UAD) is a widely adopted approach in industry due to rare anomaly occurrences and data imbalance. A desirable characteristic of an UAD model is contained generalization ability which excels in the reconstruction of seen normal patterns but struggles with unseen anomalies. Recent studies have pursued to contain the generalization capability of their UAD models in reconstruction from different perspectives, such as design of neural network (NN) structure and training strategy.

ICASSP24_MLSP-P32.9_LAMP_YeongHyeonPark.pdf

ICASSP24_MLSP-P32.9_LAMP_YeongHyeonPark.pdf (48)

Categories:: Image/Video Processing
Pattern recognition and classification (MLR-PATT)
Pattern recognition and classification (MLR-PATT)

169 Views

SPASE: SPAtial Saliency Explanation for time series models

Read more about SPASE: SPAtial Saliency Explanation for time series models
1 comment
Log in to post comments

We have seen recent advances in the fields of Machine Learning (ML), Deep Learning (DL), and Artificial intelligence (AI) that the models are becoming increasingly complex and large in terms of architecture and parameter size. These complex ML/DL models have beaten the state of the art in most fields of computer science like computer vision, NLP, tabular data prediction and time series forecasting, etc. With the increase in models’ performance, model explainability and interpretability has become essential to explain/justify model outcome, especially for business use cases.

ICASSP_SPASE.pdf

Paper pre-print (24)

Categories:: Pattern recognition and classification (MLR-PATT)

33 Views

Multivariate Fourier Distribution Perturbation: Domain Shifts with Uncertainty in Frequency Domain

Diversifying training data techniques have achieved tremendous success in Domain Generalization (DG) tasks. The key to diversifying domain data is by increasing the types of domain styles. After investigating this issue from the perspective of the Fourier transform, the domain cue is found to be implicitly encoded in the amplitude component of Fourier features, which is more indicative of domain-specific information than statistics (means and standard deviations).

ICASSP-2195.pptx

ICASSP-2195.pptx (13)

Categories:: Pattern recognition and classification (MLR-PATT)

3 Views

SCENE TEXT RECOGNITION MODELS EXPLAINABILITY USING LOCAL FEATURES

Read more about SCENE TEXT RECOGNITION MODELS EXPLAINABILITY USING LOCAL FEATURES
Log in to post comments

Explainable AI (XAI) is the study on how humans can be able to understand the cause of a model’s prediction. In this work, the problem of interest is Scene Text Recognition (STR) Explainability, using XAI to understand the cause of an STR model’s prediction. Recent XAI literatures on STR only provide a simple analysis and do not fully explore other XAI

SceneTextRecognitionPaper.pdf

SceneTextRecognitionPaper.pdf (50)

Categories:: Pattern recognition and classification (MLR-PATT)

16 Views

COVARIANCE-AWARE FEATURE ALIGNMENT WITH PRE-COMPUTED SOURCE STATISTICS FOR TEST-TIME ADAPTATION TO MULTIPLE IMAGE CORRUPTIONS

Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they require to instantly adapt to the multiple corruptions during testing rather than being re-trained at a high cost.

presentation_ICIP2023.pdf

Presentation slide (64)

eposter_ICIP2023.pdf

Poster (56)

Categories:: Pattern recognition and classification (MLR-PATT)
Neural network learning (MLR-NNLR)

92 Views

MaskDUL: Data Uncertainty Learning in Masked Face Recognition

Read more about MaskDUL: Data Uncertainty Learning in Masked Face Recognition
Log in to post comments

Since mask occlusion causes plentiful loss of facial feature, Masked Face Recognition (MFR) is a challenging image processing task, and the recognition results are susceptible to noise. However, existing MFR methods are mostly deterministic point embedding models, which are limited in representing noise images. Moreover, Data Uncertainty Learning (DUL) fails to achieve reasonable performance in MFR.

MaskDUL_ICASSP.pdf

MaskDUL_ICASSP.pdf (45)

Poster.pdf

Poster.pdf (71)

Categories:: Pattern recognition and classification (MLR-PATT)
Other

32 Views

Self-supervised learning for infant cry analysis

Read more about Self-supervised learning for infant cry analysis
Log in to post comments

In this paper, we explore self-supervised learning (SSL) for analyzing a first-of-its-kind database of cry recordings containing clinical indications of more than a thousand newborns. Specifically, we target cry-based detection of neurological injury as well as identification of cry triggers such as pain, hunger, and discomfort.

Poster_SSL_for_cry_analysis.pdf

Poster_SSL_for_cry_analysis.pdf (81)

Paper_SSL_for_cry_analysis.pdf

Paper_SSL_for_cry_analysis.pdf (70)

Categories:: Bioacoustics and Medical Acoustics
Pattern recognition and classification (MLR-PATT)

60 Views

Cross-site Generalization for imbalanced epileptic classification

Read more about Cross-site Generalization for imbalanced epileptic classification
Log in to post comments

Recently, many studies have been conducted on automated epileptic seizures detection. However, few of these techniques are applied in clinical settings for several reasons. One of them is the imbalanced nature of the seizure detection task. Additionally, the current detection techniques do not really generalize to other patient populations. To address these issues, we present in this paper a hybrid CNN-LSTM model robust to cross-site variability. We investigate the use of data augmentation (DA) methods as an efficient tool to solve imbalanced training problems.

Cross-site Generalization for imbalanced epileptic classification Poster.pdf

Poster of Cross-site Generalization for imbalanced epileptic classification article (44)

Categories:: Biomedical signal processing
Pattern recognition and classification (MLR-PATT)

4 Views

Rate-Distortion-Classification Model In Lossy Image Compression

Read more about Rate-Distortion-Classification Model In Lossy Image Compression
1 comment
Log in to post comments

Rate-distortion (RD) theory is a fundamental theory for lossy image compression that treats compressing the original images to a specified bitrate with minimal signal distortion, which is an essential metric in practical application. Moreover, with the development of visual analysis applications (such as classification, detection, segmentation, etc.), the semantic distortion in compressed images are also an important dimension in the theoretical analysis of lossy image compression.

DCC2023_RDC_poster_.pdf

Poster file (74)

Categories:: Multimodal signal processing
Pattern recognition and classification (MLR-PATT)

55 Views

Pattern recognition and classification (MLR-PATT)

Pages