Computing Matching Statistics on Wheeler DFAs

Citation Author(s):: Nicola Cotumaccio
Travis Gagie
Marinella Sciortino

Alessio Conte, Nicola Cotumaccio, Travis Gagie, Giovanni Manzini, Nicola Prezza, Marinella Sciortino
Submitted by:: Nicola Cotumaccio
Last updated:: 14 March 2023 - 1:37pm
Document Type:: Presentation Slides
Document Year:: 2023
Event:: DCC 2023
Presenters:: Nicola Cotumaccio

Categories:: Other
Keywords:: Data Compression

Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array.
In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.

Computing Matching Statistics on Wheeler DFAs.pdf

Computing Matching Statistics on Wheeler DFAs (275)

Thumbs Up

CITE

Documents

Presentation Slides

Computing Matching Statistics on Wheeler DFAs

Computing Matching Statistics on Wheeler DFAs.pdf

QUESTIONS?