FSPEN: An Ultra-Lightweight Network for Real Time Speech Enhancement

Deep learning-based speech enhancement methods have shown promising result in recent years. However, in practical applications, the model size and computational complexity are important factors that limit their use in end-products. Therefore, in products that require real-time speech enhancement with limited resources, such as TWS headsets, hearing aids, IoT devices, etc., ultra-lightweight models are necessary. In this paper, an ultra-lightweight network FSPEN is proposed for real-time speech enhancement task. We propose a full-band and sub-band network structure for extracting global and local features, and an inter-frame path extension method that can enhance network modeling capacity while preserving complexity. Experiments demonstrate that the proposed FSPEN achieves a performance of PESQ 2.97 on the VoiceBank+Demand dataset at 89M multiply-accumulate operation per second (MAC) and 79k parameters.

Documents

Poster

FSPEN: An Ultra-Lightweight Network for Real Time Speech Enhancement

poster.pptx

QUESTIONS?