Documents
Presentation Slides
Constructing the CDAWG CFG using LCP-Intervals
- Citation Author(s):
- Submitted by:
- Alan Cleary
- Last updated:
- 21 February 2023 - 12:25pm
- Document Type:
- Presentation Slides
- Document Year:
- 2023
- Event:
- Presenters:
- Jordan Dood
- Paper Code:
- 240
- Categories:
- Keywords:
- Log in to post comments
It is known that a context-free grammar (CFG) that produces a single string can be derived from the compact directed acyclic word graph (CDAWG) for the same string. In this work, we show that the CFG derived from a CDAWG is deeply connected to the maximal repeat content of the string it produces and thus has O(m) rules, where m is the number of maximal repeats in the string. We then provide a generic algorithm based on this insight for constructing the CFG from the LCP-intervals of a string in O(n) time, where n is the length of the string. This includes a novel data-structure to support stabbing queries on LCP-intervals in O(1 + k) time after O(n) preprocessing time, where k is the number of intervals stabbed. These results connect the CFG to properties of the string it produces and relates it to other string data-structures, allowing it to be studied independently of the CDAWG and providing opportunity for innovation of grammar-based compression algorithms.