Sorry, you need to enable JavaScript to visit this website.

Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A

Citation Author(s):
Vivek Tyagi, Arunasish Sen, Sriranjani R, Pragathi Praveena
Submitted by:
Niranjan Viladkar
Last updated:
18 March 2016 - 1:50am
Document Type:
Poster
Document Year:
2016
Event:
 

In the past 5 years significant advances in Large Vocabulary Speech Recognition (LVSR), Deep Learning (DL) and Spoken Language Understanding (SLU), along with the explosive growth of wireless network bandwidth have given rise to three compelling Conversational AI agents that are available on the Andriod, iOS and Microsoft Smartphones. Conversational AI agents such as Google Now, Apple Siri and Microsoft Cortana are now the most preferred way of mobile web search and to perform command and control of the various smartphone apps.

However these AI agents are tightly coupled to the World Wide Web (WWW) and smartphone command and control and generally are not available as a voice interface to query an “Enterprise Knowledge Base”.

In this demonstration we present Xerox Conversational AI Assistant (XCAI) which can provide spoken query and answer (Q&A) functionality in real-time using LVSR, SLU, and Text-to-Speech (TTS)[5] over an enterprise knowledge base. This has enormous applications in customer care centers where XCAI can provide automated self-care service to the customers to the various repetitive troubleshooting and WH (what, where, why, who, how) queries, while the more complicated open ended queries can be resolved by the human agents.

To develop XCAI, we have developed a highly accurate real time LVSR system. We have used Kaldi[1] library to train a state-of-the-art LDA and Maximum Likelihood Linear Transformed (MLLT) acoustic model (AM) and a Real-time LVSR decoder with the GStreamer[2] framework. We have used TEDLIUM[3] corpus which consists of about 1200 speakers (360 hours) with diverse accents (US English, UK English, Continental European, Indian, Chinese) to train a multi-accent English LDA, MLLT AM. We use online Ceptral Mean and Variance Normalization (CMN ) which was observed to significantly improve the recognition accuracy with the LDA, MLLT AM. Further we have used OpenEphyra[4] QA system to convert the LVSR output into a query (SLU) which is then searched over the Web for the answer extraction, where world-wide web is used as the knowledge base. OpenEphyra[4] uses a combination of pattern matching, semantic and statistic approaches to create a semantic representation of the query which is then passed to the knowledge base. In this demonstration we restrict our XCAI domain to provide Q&A over historical and geographical factoids and list questions and show a task completion accuracy of about 90% even with the multi-accented English queries. However, by replacing WWW by an enterprise knowledge base (such as that of customer care centre, factory shop, hospitals, airport, municipal citizen services) XCAI can provide Conversational AI Q&A system to query these service centres automatically through the spoken queries over the smartphones or even networked kiosks, communication devices.

Our XCAI system’s LVSR, SLU (OpenEphyra) and TTS modules run in the Cloud (Xerox datacenter) and hence can provide real-time Conversational AI technology for the information seeking, product troubleshooting and various other self-services in several environments ranging from customer care centers to factory shops to hospitals to municipal citizen service centers.

Demo video link : https://drive.google.com/open?id=0B1BDkMhT0QPiNEp0aDlIbUd6T1E

[1] Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely, “The kaldi speech recognition toolkit,” in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. Dec. 2011, IEEE Signal Processing Society, IEEE Catalog No.: CFP11SRW-USB.
[2] “Gstreamer: Open source multimedia framework,” http://gstreamer.freedesktop.org/.
[3] Anthony Rousseau, Paul Delglise, and Yannick Estve, “TED-LIUM: an Automatic Speech Recognition Dedicated Corpus,” in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, May 2012, European Language Resources Association (ELRA).
[4] Nico Schlaefer, Petra Gieselmann, Thomas Schaaf, and Alex Waibel, “A pattern learning approach to question answering within the ephyra framework,” in Proceedings of the 9th International Conference on Text, Speech and Dialogue, Berlin, Heidelberg, 2006, TSD’06, pp. 687– 694, Springer-Verlag.
[5] “MaryTTS: Text-to-Speech System,” http://mary.dfki.de/.

up
0 users have voted: