California Traffic Records Collision Severity Prediction

Here we analyze the California Statewide Integrated Traffic Records System dataset from January 1, 2001 to mid-October 2020 and use it to predict collision severity. This prediction task was chosen as it is useful to inform which safety measures to take to reduce collision fatalities. From the dataset, we selected a variety of features to classify between collision severity ranging from property damage only to injury to fatal. We compared Logistic Regression, Decision Tree and Naive Bayes models of classification. The best performance was achieved using the Logistic Regression with the inclusion of victim data to get a balanced accuracy of 66.25%. The most important features of the ones evaluated were whether the collision was a hit and run misdemeanor, whether towing was required, and whether is was a rear-end collision.

Margot Wagner
Margot Wagner
Postdoctoral Researcher

Interested in the use of data science and AI in mental health and using neuroscience to inspire next generation AI tools.