Sorry, you need to enable JavaScript to visit this website.

Semantrix: A Compressed Semantic Matrix

Primary tabs

Citation Author(s):
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro
Submitted by:
Tirso Rodeiro
Last updated:
20 March 2020 - 3:35pm
Document Type:
Presentation Slides
Document Year:
Presenters Name:
Tirso Varela Rodeiro



We present a compact data structure to represent both the duration and length of homogeneous segments of trajectories from moving objects in a way that, as a data warehouse, it allows us to efficiently answer cumulative queries. The division of trajectories into relevant segments has been studied in the literature under the topic of Trajectory Segmentation. In this paper, we design a data structure to compactly represent them and the algorithms to answer the more relevant queries. We experimentally evaluate our proposal in the real context of an enterprise with mobile workers (truck drivers) where we aim at analyzing the time they spend in different activities. To test our proposal under higher stress conditions we generated a huge amount of synthetic realistic trajectories and evaluated our system with those data to have a good idea about its space needs and its efficiency when answering different types of queries

1 user has voted: Dominik Koeppl


Thank's for the illustrated presentation!
The problem looks interesting to work on.
Following your talk and shortly reading the arXiv paper, I have the following two questions:
1. I wonder whether there are publicly available datasets for this problem.
2. The space usage of Semantrix is much higher than storing the plain matrix. To shrink this large gap:
- Do you see any particularity in the obtained matrix OS of your used datasets (which hopefully reflect a real-world situation)? You used run-length encoding and encode the run lengths by the bit vector.
However, I think that using a grammar-based approach could give better compression if the OS sequence is highly repetitive.
For instance, ESP also exploits run-length encoding, and there are index data structures supporting pattern search queries.
- the rows of all matrices S1 to S7 are non-decreasing sequences. How about storing these values like the PLCP array unary with a select support data structure? Is there a problem that the values become much larger than the width of the rows?

Thank's for your review, Dominik!

First, we are currently generating the truck data over real truck trips for a private company, I do not think that dataset would be public any time soon. On the other hand, as far as I am concerned, there are not public datasets for this particular problem available.

Talking about other possible alternatives on used structures, it could be interesting to apply ESP over our runs to solve pattern queries or even SLP-based indexes.
However, as we are focused on time response in acumulative queries, I am nor really sure that using PLCP would improve our matrices for our particular interests.

Stay safe, Dominik!

Tirso, thank's for your answers.
About the PLCP-based compression: I think that you gain compression, but perhaps you will be a little bit slower since you have to evaluate a select operation for each access.

Dataset Files