Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it learns representations indirectly by reconstructing masked input patches. Several methods learn representations directly by predicting representations of masked patches; however, we think using all patches to encode training signal representations is suboptimal. We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.

Categories:
20 Views

Stuttering is a complicated language disorder. The most common form of stuttering is developmental stuttering, which begins in childhood. Early monitoring and intervention are essential for the treatment of children with stuttering. Automatic speech recognition technology has shown its great potential for non-fluent disorder identification, whereas the previous work has not considered the privacy of users' data. To this end, we propose federated intelligent terminals for automatic monitoring of stuttering speech in different contexts.

Categories:
61 Views

A novel matrix completion problem is considered herein: observations based on fully sampled columns and quasi-polynomial side information is exploited. The framework is motivated by quantum chemistry problems wherein full matrix computation is expensive, but partial computations only lead to column information. The proposed algorithm successfully estimates the row-space of a true matrix given a priori knowledge of the true matrix. A theoretical error bound is provided, which captures the possible inaccuracies of the side information.

Categories:
18 Views

We describe recursive unique projection-aggregation (RUPA) decoding and iterative unique projection-aggregation (IUPA) decoding of Reed-Muller (RM) codes, which remove non-unique projections from the recursive projection-aggregation (RPA) and iterative projection-aggregation (IPA) algorithms respectively.
We show that these algorithms have competitive error-correcting performance while requiring up to 95% projections less than the baseline RPA algorithm.

Categories:
25 Views

Epileptic seizure detection from long recordings of scalp electroencephalography (EEG) is a challenging task owing to their unpredictability in nature with the inclusion of noise, artifacts and subject dependency. We hypothesize that selection of training EEG data plays important role in the model performance. Thus, we introduced an active learning based training data selection and modification method with a Riemannian geometry, centroid alignment, tangent space mapping and a support vector machine classifier.

Categories:
15 Views

In this paper, we propose a restoration method of time-varying graph signals, i.e., signals on a graph whose signal values change over time, using deep algorithm unrolling. Deep algorithm unrolling is a method that learns parameters in an iterative optimization algorithm with deep learning techniques. It is expected to improve convergence speed and accuracy while the iterative steps are still interpretable. In the proposed method, the minimization problem is formulated so that the time-varying graph signal is smooth both in time and spatial domains.

Categories:
69 Views

Instrument playing technique (IPT) is a key element of musical presentation. However, most of the existing works for IPT detection only concern monophonic music signals, yet little has been done to detect IPTs in polyphonic instrumental solo pieces with overlapping IPTs or mixed IPTs. In this paper, we formulate it as a framelevel multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument. We create a new dataset, Guzheng Tech99, containing Guzheng recordings and onset, offset, pitch, IPT annotations of each note.

Categories:
14 Views

We recently proposed a color constancy method based on the observations that the human visual system might be "discounting the illuminant" by using space-average color and the highest luminance patches. Based on these observations, our algorithm relies on two assumptions: (i) there are several bright pixels in the scene, and (ii) the world is gray, on average. The main idea of the algorithm is to estimate the illuminant by finding the deviation of the brightest pixels from the gray value. During experiments, we observed that some pixels decrease the performance of the method.

Categories:
13 Views

Cricket sounds are usually regarded as pleasant and, thus, can be used as suitable test signals in psychoacoustic experiments assessing the human listening acuity to specific temporal and spectral features. In addition, the simple structure of cricket sounds makes them prone to reverse engineering such that they can be analyzed and re-synthesized with desired alterations in their defining parameters.

Categories:
19 Views

Pages