TrajMesa: a distributed NoSQL-based trajectory data management system

Published in TKDE, 2021

Download paper here

Abstract: With the development of positioning technology, a large number of trajectories have been generated, which are very useful for many urban applications. However, it is challenging to manage trajectory data for its spatio-temporal dynamics and high-volume properties. Existing trajectory data management frameworks suffer from efficiency or scalability problem, and support only limited trajectory query types. This paper takes the first attempt to build a holistic distributed NoSQL trajectory query engine, named TrajMesa, based on GeoMesa, an open-source indexing toolkit for spatio-temporal data. TrajMesa can manage a prohibitively large number of trajectories, and support plenty of query types efficiently. Specifically, we first design a novel trajectory storage schema, which reduces the storage size tremendously. We then devise a novel indexing key schema for time ranges, based on which ID (i.e. moving object identifier) temporal query can be supported efficiently. To reduce the amount of retrieved trajectory data for a spatial range query, we propose a position code to indicate the spatial location of trajectories accurately. We also propose a bunch of pruning strategies for similarity query and k-NN query in the NoSQL environment. Extensive experiments are conducted using two real datasets and one synthetic dataset, verifying the powerful query efficiency and scalability of TrajMesa. The results show that TrajMesa is about 100 ∼ 1000 times faster than the state-of-the-art trajectory management frameworks in our experimental settings. TrajMesa is currently deployed in JD company, processing over 1T trajectories of JD Logistics every day

image-20220406203748626

Li R, He H, Wang R, et al. TrajMesa: a distributed NoSQL-based trajectory data management system[J]. IEEE Transactions on Knowledge and Data Engineering, 2021.