Saturday , May 18 2024

A Pragmatic Approach for Refined Feature Selection for
the Prediction of Road Accident Severity

1 Anna University (CEG Campus)
Guindy, Chennai, Tamilnadu, 600 025, India
2 Rathinam Technical Campus
Coimbatore, Tamilnadu, 641 021, India

Abstract: Road accident analysis is very challenging task and investigating the dependencies between the attributes become complex because of many environmental and road related factors. An exhaustive research is being conducted to identify the optimal factors which influence fatal accidents. In this paper we propose a novel methodology called Voting Algorithm for Aggregated Feature Selection (VAAFS) which selects an optimal number of significant features with majority votes identified by more than one Feature Mining algorithms. The optimal features selected by VAAFS will be then extended to the classifiers over an Indian road accident data set obtained from the Coimbatore City Traffic Head Quarters, Tamilnadu, India and with international datasets obtained from Fatality Analysis Reporting System (FARS), USA, and the STATS19 data collection system, maintained by the of United Kingdom (UK) to model the accident severity. The output from VAAFS shows that type of vehicle, high risk road users like pedestrian and two wheelers, young road users, government holidays, selected week days, manner of collision, seating position etc. are most significant factors in modeling Accident Severity. The proposed method is highly successful in reducing misclassification error rate and to improve the predictive accuracy with optimal features than the previous studies. It seems very promising for observing road accident patterns.

Keywords: Chi-Square, Classification, Ensemble, Fatality, Feature Selection, Majority Voting, Road Accidents.

>>Full text
R. GEETHA RAMANI, Shanthi SELVARAJ, A Pragmatic Approach for Refined Feature Selection for the Prediction of Road Accident Severity, Studies in Informatics and Control, ISSN 1220-1766, vol. 23 (1), pp. 41-52, 2014.

1. Introduction

The costs of fatalities and injuries due to traffic accidents have a great impact on society. World Health Organization (WHO, 2009) predicts that road collisions will jump from the ninth leading cause of death in 2004 to the fifth in 2030. Road accidents have earned India a dubious distinction. More people die in road accidents in India than anywhere else in the world. According to the report of National Crime Records Bureau (NCRB), India, the total number of deaths every year due to road accidents has now passed the 1,35,000 mark. About 60,000 lives are lost every year in road accidents and this rate is 25 times more than that of the U.S.A. The alarming rate of increase of fatality due to road-accidents in the country warrants a method to understand the causes users and their behavior.

Road traffic accident is under the influence of many factors. With an exponential growth of population, number of vehicles and the need for their use, understanding the multiple causes of road accident fatalities has become more significant especially in the advent of sophisticated technology [20]. In recent years, with the growth of the volume and travel speed of road traffic, the number of traffic accidents, especially severe crashes, has been increasing rapidly on a yearly basis [13]. Identification of these factors can help improve the overall driving safety situation, not only by preventing accidents but also by reducing their severity [14]. It is crucial for engineers to extract useful information from existing data to analyze the causes of traffic accidents, so that traffic administrations can be more accurately informed and better policies can be introduced [19].

The characteristics and availability of fatal road-crash databases have been listed worldwide [17]. Among the listed databases most of the databases are having only summary data rather individual accident information. The ever increasing tremendous amount of data has far exceeded human ability for comprehension without the use of powerful tools [10]. Data mining is the process of analyzing data from various perspectives and summarizing it into useful, meaningful and related information [10, 7]. This information can then be seen as a kind of outline of the input data, and can be used in further investigation or can be applied in the field of machine learning and predictive analytics. There are many data mining algorithms and tools that have been developed for feature selection, classification and clustering. These algorithms are used to discern and uncover knowledge patterns and make out significant and meaningful information associated with the application domain. Though, many studies have dealt this problem, lack of consensus is still visible for analyzing such data sets. This is further augmented with the complex features due to varied geographical, environmental, and social practices. In this paper, a new novel method of traffic accident data mining, based on aggregated voting method and through a comparative analysis of a variety of traffic accident data mining techniques, is put forward to identify the significance of different attributes and their respective values. The proposed method is validated on an Indian Accident data set and a foreign data set. Precisely this work has the following objectives:

  1. it makes an attempt to initiate a scientific process through data mining tools effectively and to provide reasonable findings for traffic management for a site-specific purpose;
  2. a novel accident severity prediction framework for accident datasets with enhanced prediction accuracy is proposed;
  3. a set of optimal and significant features are identified to predict accident severity;
  4. the performance of machine learning algorithms, binary class categorization of two accident datasets have been compared and evaluated.

The work has been explicated with the road accidents datasets that are collected from the Traffic Head Quarters, Coimbatore City, Tamilnadu, India for the year 2012 and Fatality Analysis Reporting System (FARS) which is available in the University of Alabama’s Critical Analysis Reporting Environment (CARE) system, USA and Road accident training dataset obtained from the STATS19 data collection system, maintained by the government of United Kingdom (UK).

The paper is organized as follows. Section 2 gives a review of previous models studied for the analysis of road traffic accidents. The nature of the input data is described in Section 3 and it provides the necessary details about modeling strategies used in this study. Results are discussed in Section 4 with concluding remarks in Section 5.


  1. PANDE, A., M. ABDEL-ATY, A. DAS, A Classification Tree Based Modelling Approach for Segment Related Crashes on Multilane Highways, Journal of Safety Research, vol. 41, 2010, pp. 391-397.
  2. AMEEN, J. R. M., J. A. NAJI, Causal Models for Road Accident Fatalities in Yemen, Accident Analysis and Prevention, Elsevier, vol. 33, 2001, 547-561.
  3. AYRAMO, S., P. PIRTALA, J. KAUTTONEN, K. NAVEED, T. KARKKAINEN, Mining Road Traffic Accidents, Reports of the Department of Mathematical Information Technology Series C. Software and Computational Engineering, No. C. 2/2009.
  4. PRATO, C. G., V. GITELMAN, S. BEKHOR, Mapping Patterns and Characteristics of Fatal Road Accidents in Israel Prato, Proceedings of the 12th WCTR Conference, July 11-15, Lisbon, Portugal, 2010.
  5. CHERNOFF, H., E. L. LEHMANN, The Use of Maximum Likelihood Estimates in χ2 Tests for Goodness of Fit, Annals Mathematical Statistics, vol. 25, issue 3, 1954, pp. 579-586.
  6. CHONG, M., A. ABRAHAM, M. PAPRZYCKI, Traffic Accident Analysis Using Machine Learning Paradigms, Informatica, vol. 29, 2005, pp. 89-98.
  7. CIOS, K., W. PEDRYCZ, R. SWINIARSKI, Data Mining Methods for Knowledge Discovery, Boston: Kluwer Academic Publishers, 1998.
  8. DARBY, P., W. MURRAY, R. RAESIDE, Applying Online Fleet Driver Assessment to Help Identify, Target and Reduce Occupational Road Safety Risks, Safety Science, Science Direct, vol. 47 2009, pp. 436-442.
  9. DURDURAN, S. S., A Decision Making System to Automatic Recognize of Traffic Accidents on the Basis of a GIS Platform, Expert Systems with Applications, vol. 37, 2010, pp. 7729-7736.
  10. HAN, J., M. KAMBER, Data Mining; Concepts and Techniques, Morgan Kaufmann Publishers, 2000.
  11. HALL, M. A., L. A. SMITH, Feature Selection for Machine Learning Comparing a Correlation based Filter Approach to the Wrapper, Proceedings of the 12th International Florida Artificial Intelligence Research Society Conf., 1998.
  12. JAMES, M., Classification Algorithms, John Wiley, 1985.
  13. XI, J., Z. GAO, S. NIU, T. DING, G. NING, A Hybrid Algorithm of Traffic Accident Data Mining on Cause Analysis, Mathematical Problems in Engineering, 2013, 2013/302627.
  14. KASHANI, A. T., A. SHARIAT-MOHAYMANY, A. RANJBARI, A Data Mining Approach to Identify Key Factors of Traffic Injury Severity, Promet – Traffic & Transportation, vol. 23, no. 1, 2011, pp. 11-17.
  15. KIM, D., Y. LEE, S. WASHINGTON, K. CHOI, Modelling Crash Outcome Probabilities at Rural Intersections: Application of Hierarchical Binomial Logistic Models, Accident Analysis and Prevention, Elsevier, vol. 39, 2007,             pp. 125-134.
  16. KIRA, K. L. A. RENDELL, A Practical Approach to Feature Selection, Proceedings of the 9th International Conference on Machine Learning (ICML 1992), 1992.
  17. LUOMA, J., M. SIVAK, Characteristics and Availability of Fatal Road-Crash Databases Worldwide, The University of Michigan, Transportation Research Institute, 2006.
  18. MARUKATAT, Structure-based Rule Selection Framework for Association Rule Mining of Traffic Accident Data, Computational Intelligence and Security, vol. 4456, 2007, pp. 231-239.
  19. NABI, H., L. R. SALMI, S. LAFONT, M. CHIRON, M. ZINS, E. LAGARDE, Attitudes Associated with Behavioural Predictors of Serious Road Traffic Crashes: Results from the Gazel Cohort, Injury Prevention, vol. 13, no. 1, 2007,   pp. 26-31.
  20. SINGH, R. K., S. K. SUMAN, Accident Analysis and Prediction of Model on National Highways, International Journal of Advanced Technology in Civil Engineering, ISSN: 2231–5721, vol. 1, Issue 2, 2012, pp. 25-30.
  21. ARYANI SOEMITRO, R. A., Y. S. BAHAT, Accident Analysis Assessment to the Accident Influence Factors on Traffic Safety Improvement Case: Palangka Raya Tangkiling National Road, Proceedings of the Eastern Asia Society for Transportation Studies, Vol. 5, 2005, pp. 2091-2105.
  22. SHANTHI, S., R. GEETHA RAMANI, Feature Relevance Analysis and Classification of Road Traffic Accident Data through Data Mining Techniques, Proceedings of IAENG-World Congress on Engineering and Computer Science, San Francisco, USA, vol. 1, 2012, pp. 122-127.
  23. SHANTHI, S., R. GEETHA RAMANI, Vehicle Safety Device (Airbag) Specific Classification of Road Traffic Accident Patterns through Data Mining Techniques, Springer Publications: Advances in Intelligent Systems and Computing, Proceedings of the Second International Conference on Advances in Computing and Information technology, Chennai, vol.177, 2012, pp. 433-443.
  24. SHANTHI, S., R. GEETHA RAMANI, A Comparative evaluation of Classification Methods in the Prediction of Road Traffic Accident Patterns, Proceedings of the International Conference on Future Communication and Computer Technology, Beijing, China, ISBN: 978-988-15121-4-7, 2012.
  25. SHANTHI, S., R. GEETHA RAMANI, Gender Specific Classification of Road Accident Patterns through Data Mining Techniques, IEEE International Conference on Advances in Engineering, Science and Management, March 30-31, 2012, pp. 359-369, ISBN: 978-81-909042-2-3.
  26. SHANTHI, S., R. GEETHA RAMANI, Classification of Seating Position Specific Patterns in Road Traffic Accident Data through Data Mining Techniques, Second International Conference on Computer Applications, ICCA 2012, vol. 5, 2012, pp. 98-104.
  27. SHANTHI, S., R. GEETHA RAMANI, Classification of Vehicle Collision Patterns in Road Accidents using Data Mining Algorithms, International Journal of Computer Applications, vol. 35, No. 12, 2012, pp. 30-37.
  28. SOHN, S. Y., S. H. LEE, Data Fusion, Ensemble and Clustering to Improve the Classification Accuracy for the Severity of Road Traffic Accidents in Korea, Safety Science, vol. 4. issue 1, 2003,       pp. 1-14.
  29. TESEMA, T. B., A. ABRAHAM, C. GROSAN, Rule Mining and Classification of Road Traffic Accidents using Adaptive Regression Trees, Journal of Simulation, vol. 6, issues 10 & 11, 2005.
  30. QUINLAN, R, 5: Programs for Machine Learning, Morgan Kaufmann Publishers: San Mateo, CA., 1993.
  31. VALLI, P. P., Road Accident Models for Large Metropolitan Cities of India, IATSS Research, vol. 29, issue 1, 2005, pp. 57-65.
  32. WITTEN, I. H., E. FRANK, Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations, San-Francisco, Morgan Kaufmann Publishers, 2000.
  33. SUN, Y., Iterative Relief for Feature Weighting: Algorithms, Theories, and Applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, issue 6, 2007.