A Hybrid MPI+OpenMP Application for Processing
Big Trajectory Data
Natalija STOJANOVIC, Dragan STOJANOVIC
University of Nis, Faculty of Electronic Engineering,
14, A. Medvedeva, 18000 Nis, Serbia
Abstract: In this paper, we present the use of parallel/distributed programming frameworks, MPI and OpenMP, in processing and analysis of big trajectory data. We developed a distributed application that initially performs a spatial join between big trajectory data and regions of interest, and further aggregates join results to provide analysis of movement. The solution was implemented using hybrid distributed/parallel programming model, based on MPI and OpenMP programming interfaces. The experimental evaluation in detecting the most popular places in the city, based on large-scale trajectory dataset, demonstrates the performance gains and feasibility of our approach.
Keywords: High performance computing, Big data processing, Geospatial analysis, MPI, OpenMP.
CITE THIS PAPER AS:
Natalija STOJANOVIC, Dragan STOJANOVIC, A Hybrid MPI+OpenMP Application for Processing Big Trajectory Data, Studies in Informatics and Control, ISSN 1220-1766, vol. 24 (2), pp. 229-236, 2015. https://doi.org/10.24846/v24i2y201511
Many today’s computing and data intensive applications, such as computer games, database and Web searching, financial and economic forecasting, climate modelling, environment monitoring, and bioinformatics, demand acceleration and significant performance improvements . There are several approaches that can be employed to improve the performance of advanced computing and data intensive applications. All of them are based on parallel computing paradigm on single multi/many core computer systems, or over distributed computing infrastructure within a cluster or cloud architecture. Such solutions employ parallel and distributed programming models, frameworks and interfaces such as: multi-core CPU, many-core graphics processing unit (GPU), network/cluster of workstations and PaaS in computer clouds .
Advances in remote sensing technologies, sensor networks and the proliferation of mobile devices in everyday use have resulted in an acquisition of massive amounts of geospatial data and moving object trajectories. Also, large-scale geo-scientific modelling and simulations, as well as geo-social network activities (e.g. Twitter and Facebook) are generating petabytes of spatio-temporal data per day. These ever-increasing volumes of spatio-temporal data call for new models and computationally effective algorithms in order to efficiently store, process, analyze and visualize such a big data in advanced data-intensive systems and applications. Recently, high-performance computing (HPC) is promoted to meet the requirements of advanced Geographic Information Systems (GIS) applications .
The recent proliferation of distributed and cloud computing infrastructures and platforms, both public clouds (e.g., Amazon EC2) and private computer clouds and computer clusters, has given a further rise for processing and analysis of complex Big data. Especially, the implementation that can work on clusters of multicore shared-memory computers (nodes), have set this paradigm as an emerging research and development topic. In this paper, we employ MPI (Message Passing Interface), a message passing parallel programming over cluster of computers/nodes, and OpenMP for shared memory parallel programming within a node, to implement an application for large scale trajectory data processing and analysis.
The Message Passing Interface (MPI) has become the major model for programming distributed-memory applications. Message passing works by creating processes which are distributed among the group of computing nodes. When a MPI program runs, all processes execute the same code.
OpenMP notation can be added to a sequential program to define how the work can be shared among the threads that execute on different processor’s cores and to order access to shared data as needed. OpenMP supports the so-called fork-join and shared memory programming model.
Hybrid MPI+OpenMP approach integrates different levels of parallelism. This approach employs features of a distributed memory using message-passing and a shared-memory using multithreading. MPI is used for process communication between multicore nodes and OpenMP is used for thread communication within a multicore node.
The main contributions of the paper are:
- We propose the use of a hybrid programming model and develop a hybrid MPI+OpenMP application that performs spatial join between trajectory data
- set and spatial regions around points/places of interest (POI), and further aggregation of join results, to detect the most popular POIs in the city (Popular Places algorithm).
- We perform the experimental evaluation that indicates the improvements in performance with respect to pure MPI-based and sequential (single node) solutions and show feasibility of using hybrid programming model for data-intensive GIS computing.
- We analyze and examine the effects of hybrid MPI/OpenMP implementation and propose hints for large scale spatio-temporal data processing in other GIS domains.
The rest of the paper is structured as follows. Section II presents the research work related to high-performance processing and analysis of large-scale spatial and spatio-temporal data using existing HPC paradigms. In section III we describe the pure MPI and the hybrid MPI+OpenMP implementation for processing of big trajectory data set over set of places of interest (POI). Section IV gives the results and presents the evaluation of hybrid MPI+OpenMP, pure MPI and sequential implementation of Popular Places algorithm. Section V concludes the paper and gives directions for future research.
- PATEL, S., W HWU, WEN-MEI, Accelerator Architectures, IEEE Micro, vol. 28, no. 4, 2008, pp. 4-12.
- Pacheco, P., An Introduction to Parallel Programming, Morgan Kaufman, 2011.
- Shekhar, S., High Performance Computing with Spatial Datasets, in Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems – HPDGIS, 2010, pp. 1-2.
- Mayer-Schönberger, V., K. CUKIER, K., Big Data: A Revolution That Will Transform How We Live, Work, and Think, Eamon Dolan/ Houghton Mifflin Harcourt, 2013, p. 256.
- Clematis, A., MINETER, M., MARCIANO, R., High performance computing with geographical data , Parallel Computing, vol. 29, no. 10, 2003, pp. 1275–1279, Oct. 2003.
- Aji, A., WANG, F., VO, H., LEE, R., LIU, Q., ZHANG, X., SALTZ, J., Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce, Proceedings VLDB Endowment, vol. 6, no. 11, Aug. 2013.
- Zhang, J., Towards Personal High-Performance Geospatial Computing (HPC-G), in Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems – HPDGIS, 2010, pp. 3-10.
- Akhter, S., K. AIDA, Y. CHEMIN, GRASS GIS on High Performance Computing with MPI, OpenMP and Ninf-G Programming Framework, Proceeding of ISPRS, 2010, Japan.
- Shi, X., High Performance Computing: Fundamental Research Challenges in Service Oriented GIS, ACM SIGSPATIAL- HPDGIS 2010 workshop: International Workshop on High Performance and Distributed Geographic Information Systems, San Jose, California, 2010, pp. 31-34.
- Wang, Y., S. WANG, D. ZHOU, Retrieving and Indexing Spatial Data in the Cloud Computing Environment, Cloud Computing, First International Conference-CloudCom Beijing, China, 2009, pp. 322-331.
- Ma, Q., B. YANG, W. QIAN, A. ZHOU, Query Processing of Massive Trajectory Data based on MapReduce, in Proceeding of the First International Workshop on Cloud Data Management – CloudDB , 2009, pp. 9-16.
- Yang, B., Q. MA, W. QIAN, A. ZHOU, Truster: Trajectory Data Processing on Clusters, in Proceedings of 14th International Conference Database Systems for Advanced Applications DASFAA, 2009, pp. 768-771.
- Eldawy, A., M. F. MOKBEL, A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data, Proc. VLDB Endow., vol. 6, no. 12, 2013, pp. 1230–1233 .
- Kunaseth, M.,. RICHARDS, D., GLOSLI, J., KALIA, R., NAKANO, A., VASHISHTA, P., Analysis of scalable data-privatizati,on threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics, Journal of Supercomputing, vol. 66, 2013, pp. 406-430.
- Minnini, P., D. ROSENBERG, R. REDDY, A. POUQUET, A Hybrid MPI–OpenMP Scheme for Scalable Parallel Pseudospectral Computations for Fluid Turbulence, Parallel Computing, vol. 37, no. 6, 2011, pp. 316-326.
- Mascetti, S., D. FRENI, C., BETTINI, X. S. WANG, S. JAJODIA, On the Impact of User Movement Simulations in the Evaluation of LBS Privacy- Preserving Techniques, in Proceedings of the 1st International Workshop on Privacy in Location-Based Applications, 2008, vol. 397, 2008.