JUST: JD Urban Spatio-Temporal Data Engine

Published in ICDE, 2020

Download paper here

Abstract: With the prevalence of positioning techniques, a prodigious number of spatio-temporal data is generated constantly. To effectively support sophisticated urban applications, e.g., location-based services, based on spatio-temporal data, it is desirable for an efficient, scalable, update-enabled, and easy-touse spatio-temporal data management system. This paper presents JUST, i.e., JD Urban Spatio-Temporal data engine, which can efficiently manage big spatio-temporal data in a convenient way. JUST incorporates the distributed NoSQL data store, i.e., Apache HBase, as the underlying storage, GeoMesa as the spatio-temporal data indexing tool, and Apache Spark as the execution engine. We creatively design two indexing techniques, i.e., Z2T and XZ2T, which accelerates spatiotemporal queries tremendously. Furthermore, we introduce a compression mechanism, which not only greatly reduces the storage cost, but also improves the query efficiency. To make JUST easy-to-use, we design and implement a complete SQL engine, with which all operations can be performed through a SQL-like query language, i.e., JustQL. JUST also supports inherently new data insertions and historical data updates without index reconstruction. JUST is deployed as a PaaS in JD with multi-users support. Many applications have been developed based on the SDKs provided by JUST. Extensive experiments are carried out with six state-of-the-art distributed spatio-temporal data management systems based on two real datasets and one synthetic dataset. The results show that JUST has a competitive query performance and is much more scalable than them.

image-20220406203150325

Li R, He H, Wang R, et al. Just: Jd urban spatio-temporal data engine[C]//2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020: 1558-1569.