Semantrix: A Compressed Semantic Matrix

We present a compact data structure to represent both the duration and length of homogeneous segments of trajectories from moving objects in a way that, as a data warehouse, it allows us to efficiently answer cumulative queries. The division of trajectories into relevant segments has been studied in the literature under the topic of Trajectory Segmentation. In this paper, we design a data structure to compactly represent them and the algorithms to answer the more relevant queries. We experimentally evaluate our proposal in the real context of an enterprise with mobile workers (truck drivers) where we aim at analyzing the time they spend in different activities. To test our proposal under higher stress conditions we generated a huge amount of synthetic realistic trajectories and evaluated our system with those data to have a good idea about its space needs and its efficiency when answering different types of queries

DCC20.pdf

DCC20.pdf (692)

Thumbs Up

Comments

Thank's for the illustrated

Permalink Submitted by Dominik Koeppl on 5 April 2020 - 8:43am

Thank's for the illustrated presentation!
The problem looks interesting to work on.
Following your talk and shortly reading the arXiv paper, I have the following two questions:
1. I wonder whether there are publicly available datasets for this problem.
2. The space usage of Semantrix is much higher than storing the plain matrix. To shrink this large gap:
- Do you see any particularity in the obtained matrix OS of your used datasets (which hopefully reflect a real-world situation)? You used run-length encoding and encode the run lengths by the bit vector.
However, I think that using a grammar-based approach could give better compression if the OS sequence is highly repetitive.
For instance, ESP also exploits run-length encoding, and there are index data structures supporting pattern search queries.
- the rows of all matrices S1 to S7 are non-decreasing sequences. How about storing these values like the PLCP array unary with a select support data structure? Is there a problem that the values become much larger than the width of the rows?

Thank's for your review,

Permalink Submitted by Tirso Rodeiro on 29 April 2020 - 12:30pm

Thank's for your review, Dominik!

First, we are currently generating the truck data over real truck trips for a private company, I do not think that dataset would be public any time soon. On the other hand, as far as I am concerned, there are not public datasets for this particular problem available.

Talking about other possible alternatives on used structures, it could be interesting to apply ESP over our runs to solve pattern queries or even SLP-based indexes.
However, as we are focused on time response in acumulative queries, I am nor really sure that using PLCP would improve our matrices for our particular interests.

Stay safe, Dominik!

Tirso, thank's for your

Permalink Submitted by Dominik Koeppl on 30 April 2020 - 5:35am

Tirso, thank's for your answers.
About the PLCP-based compression: I think that you gain compression, but perhaps you will be a little bit slower since you have to evaluate a select operation for each access.

CITE

Documents

Presentation Slides