Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Federated learning (FL) has emerged as a promising paradigm for decentralized machine learning while preserving data privacy. However, under communication constraints, the standard FL protocol faces the risk of client dropout. Although some research has focused on the risk from the perspectives of communication optimization and privacy protection, it is still challenging to deal with the client dropout issue in dynamic networks, where clients may join or drop the training process at any time.

Categories:
11 Views

We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding.

Categories:
9 Views

In the metallurgical industry, taking measurements during production can be infeasible or undesired, and only the terminated process can be measured. This poses problems for regression models, as the intermediate target values for a time series are hidden in the accumulated end-of-process measurement. The lack of data quality and quantity also often limits the modeling to linear estimators, as neural networks struggle to converge and/or overfit on scarce noisy data.

Categories:
5 Views

Traditional social learning frameworks consider environments with a homogeneous state where each agent receives observations conditioned on the same hypothesis. In this work, we study the distributed hypothesis testing problem for graphs with a community structure, assuming that each cluster receives data conditioned on some different true state. This situation arises in many scenarios, such as when sensors are spatially distributed, or when individuals in a social network have differing views or opinions.

Categories:
10 Views

Signal decomposition techniques aim to break down nonstationary signals into their oscillatory components, serving as a preliminary step in various practical signal processing applications. This has motivated researchers to explore different strategies, yielding several distinct approaches. A wellknown optimization-based method, the Variational Mode Decomposition (VMD), relies on the formulation of an optimization problem utilizing constant-bandwidthWiener filters. However, this poses limitations in constant bandwidth and the need for constituent count.

Categories:
19 Views

Appropriate prosodic choices depend on the context. One approach is for a human-in-the- loop (HitL) to pick the best prosody.
Often there are specific nuanced prosodic choices that convey the intended meaning in a given context.
We propose a system where HitL users can provide any number of prosodic controls. This allows for flexibility and removes the need for redundant (inefficient) work defining the entire prosodic specification.

Categories:
13 Views

Self-supervised pre-trained speech models have strongly improved speech recognition, yet they are still sensitive to domain shifts and accented or atypical speech. Many of these models rely on quantisation or clustering to learn discrete acoustic units. We propose to correct the discovered discrete units for accented speech back to a standard pronunciation in an unsupervised manner. A masked language model is trained on discrete units from a standard accent and iteratively corrects an accented token sequence by masking unexpected cluster sequences and predicting their common variant.

Categories:
17 Views

Recent CNN and Transformer-based models tried to utilize frequency and periodicity information for long-term time series forecasting. However, most existing work is based on Fourier transform, which cannot capture fine-grained and local frequency structure. In this paper, we propose a Wavelet-Fourier Transform Network (WFTNet) for long-term time series forecasting.

Categories:
15 Views

Real-time audio communications over IP have become essential to our daily lives. Packet-switched networks, however, are inherently prone to jitter and data losses, thus creating a strong need for effective packet loss concealment (PLC) techniques.

Categories:
6 Views

Solar energy adoption is moving at a rapid pace. The variability in solar energy production causes grid stability issues and hinders mass adoption. To solve these issues, more accurate photovoltaic power forecasting systems are needed. In intra-hour forecasting, the most challenging issue is high output fluctuations due to cloud motion, which can occlude the sun.

Categories:
11 Views

Pages