Abstract: The rapid development of positioning technology produces an extremely large volume of spatio-temporal data with various geometry types such as point, line string, polygon, or a mixed combination of them. As one of the most basic but time-consuming operations, 𝑘 nearest neighbors join (𝑘NN join) has attracted much attention. However, most existing works for 𝑘NN join either ignore temporal information or consider point data only. This paper proposes a novel and useful problem, i.e., ST-𝑘NN join, which considers both spatial closeness and temporal concurrency. To support ST-𝑘NN join over a huge amount of spatio-temporal data with any geometry types efficiently, we propose a novel distributed solution based on Apache Spark. Specifically, our method adopts a two-round join framework. In the first round join, we propose a new spatio-temporal partitioning method that achieves spatiotemporal locality and load balance at the same time. We also propose a lightweight index structure, i.e., Time Range Count Index (TRCindex), to enable efficient ST-𝑘NN join. In the second round join, to reduce the data transmission among different machines, we remove duplicates based on spatio-temporal reference points before shuffling local results. Extensive experiments are conducted using three real big datasets, showing that our method is much more scalable and achieves 9X faster than baselines. A demonstration system is deployed and the source code is released.
Li R, Wang R, Liu J, He H, et al. Distributed Spatio-Temporal k Nearest Neighbors Join[C]//Proceedings of the 29th International Conference on Advances in Geographic Information Systems. 2021: 435-445. [SIGSPATIAL 2021].