Sorry, you need to enable JavaScript to visit this website.

This paper proposes a benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task. The submissions are evaluated according to the two polyphonic sound detection score scenarios proposed for the DCASE 2021 Challenge Task 4, which allow to make an analysis on whether submissions are designed to perform fine-grained temporal segmentation, coarse-grained temporal segmentation, or have been designed to be polyvalent on the scenarios proposed.

Categories:
37 Views

Catastrophic forgetting is a thorny challenge when updating keyword spotting (KWS) models after deployment. To tackle such challenges, we propose a progressive continual learning strategy for small-footprint spoken keyword spotting (PCL-KWS). Specifically, the proposed PCL-KWS framework introduces a network instantiator to generate the task-specific sub-networks for remembering previously learned keywords. As a result, the PCL-KWS approach incrementally learns new keywords without forgetting prior knowledge.

Categories:
15 Views

Existing speech-based coronavirus disease 2019 (COVID-19) detection systems provide poor interpretability and limited robustness to unseen data conditions. In this paper, we propose a system to overcome these limitations. In particular, we propose to fuse two different feature modalities with patient metadata in order to capture different properties of the disease. The first feature set is based on modulation spectral properties of speech. The second comprises spectral shape/descriptor features recently used for COVID-19 detection.

Categories:
6 Views

In this paper, we demonstrate how a generative model can be used to build a better recognizer through the control of content and style. We are building an online handwriting recognizer from a modest amount of training samples. By training our controllable handwriting synthesizer on the same data, we can synthesize handwriting with previously underrepresented content (e.g., URLs and email addresses) and style (e.g., cursive and slanted). Moreover, we propose a framework to analyze a recognizer that is trained with a mixture of real and synthetic training data.

Categories:
7 Views

Rank and select queries are the fundamental building blocks of the compressed data structures. On a given bit string of length n, counting the number of set bits up to a certain position is named as the rank, and finding the position of the kth set bit is the select query. We present a new data structure and the procedures on it to support rank/select operations.

Categories:
105 Views

Gagie and Nekrich (2009) gave an algorithm for adaptive prefix-free coding that, given a string $S [1..n]$ over an alphabet of size $\sigma = o (n / \log^{5 / 2} n)$, encodes $S$ in at most $n (H + 1) + o (n)$ bits, where $H$ is the empirical entropy of $S$, such that encoding and decoding $S$ take $O (n)$ time. They also proved their bound on the encoding length is optimal, even when the empirical entropy is high. Their algorithm is impractical, however, because it uses complicated data structures.

Categories:
53 Views

Pages