Documents
Presentation Slides
Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game
- Citation Author(s):
- Submitted by:
- Shounan An
- Last updated:
- 13 May 2020 - 11:20pm
- Document Type:
- Presentation Slides
- Document Year:
- 2020
- Event:
- Presenters:
- Shounan An
- Paper Code:
- S&T-P4
- Categories:
- Log in to post comments
We present an effective method to solve a small-footprint keyword spotting (KWS) and voice command based user interface for mobile game. For KWS task, our goal is to design and implement a computationally very light deep neural network model into mobile device, in the same time to improve the accuracy in various noisy environments. We propose a simple yet effective convolutional neural network (CNN) with Google’s tensorflow-lite for android and Apple’s core ML for iOS deployment. Tensorflow provides post training integer quantization and with tensorflow-lite we could deploy 8-bit quantization model into android device. Meanwhile, we choose Apple’s core ML for it’s better performance than tensorflow-lite for iOS device. The size of our CNN model is 0.2 MB with 7 MB memory usage totally, and the CPU usage is around 1% (test phone: Galaxy S8 and iPhone 8). To improve the overall accuracy of KWS in noisy environments, we design a hybrid thresholding method, which make use of both average inference score and volume of incoming speech signal. We also propose a voice command SDK which running game command recognition on-the-fly inside the mobile device as well. We design a convolutional recurrent neural network transducer (CNN-RNN-T) as our automatic speech recognition (ASR) model. Text-to-speech (TTS) was applied to generate game command voices with various data augmentation techniques. Our CNN-RNN-T’s character error rate (CER) is 17%, the model size is 7 MB and the processing time is less than 1 second for 3 seconds incoming speech signal, which provides real-time voice command recognition for further game actions. To the best of our knowledge, this is the first work try to resolve both KWS and voice command recognition running inside device for mobile game. We will perform live demonstration of KWS and game command based user interface integrated into a full-sized, production-quality mobile game A3: Still Alive, which is one of the major games from Netmarble this year and will be available on market soon.