Crash trend plot


Traffic-related injuries are the leading cause of death across the world. Every year 1.3 million people die from traffic crashes. India witnesses 150 thousand road deaths annually, a significant concern for road safety researchers. As a transportation researcher, I often collect crash-related data from various sources (for example, police FIR records). This valuable data is often used to understand severe crashes and their patterns and to identify blackspots.

Here in this article, we will use such a data record (dummy one). Once we get such data, we (Road safety researchers) usually clean it and prepare for exploratory data analysis.

Context and Goal of the Article

In this article, we will learn how to use Python, Matplotlib, and Seaborn to create an excellent crash trend plot.

Note: Please refer to the earlier articles in the series if you are still getting familiar with Matplotlib and Seaborn libraries.

Article Outline

The current article comprised of the following:

  • Introduction
  • Loading libraries
  • Loading dataset
  • Changing column names
  • Converting the wide dataset into a long format
  • Generating barplot
  • Saving the plot

Loading libraries

The first thing is to load the relevant libraries in your Python environment. We specifically use pandas, matplotlib, and seaborn libraries to achieve the goal here.

import numpy as np              # Array manipulation
import pandas as pd             # Data manipulation
import matplotlib.pyplot as plt # Plotting
import seaborn as sns           # Generating statistical plots

Data description

This crash dataset is an actual representation (here, I used dummy values) of the data we gather from different agencies as a researcher.

Imagine you work as a road safety researcher. You are interested in understanding the crash trend for a particular highway. So you asked the authorities to provide road crash counts data for a particular highway (say Highway-123) from 2012 to 2022. The authority sends you the datasheet via email. You cleaned and compiled the data based on each year and various crash types in a file (crash_data.csv ). Now that you want to understand the trend, you want to generate a plot using Python.

Loading dataset

Load the dataset using the pandas read_csv( ) method and print the first five rows.

The dataset contains five columns which are year, fatal_killed, grevious_injured, minor_injured and without_injury . The counts correspond to each year, and the category is presented in cells.

crash_data = pd.read_csv("crash_data.csv")
First five observations

We can also print the column names using .column attribute and convert them into a list using .to_list() .


[‘year’, ‘fatal_killed’, ‘grevious_injured’, ‘minor_injured’, ‘without_injury’]

Changing column names

Next, we will change the column names to make the column identity slightly more precise. We will use .rename( ) method and use a dictionary to alter the names.

The new column names are Fatal, Greviously Injured, Manor Injury and Without Injury .

crash_data.rename(columns = {
                     "grevious_injured":"Greviously Injured",
                     "minor_injured": "Minor Injury",
                     "without_injury": "Without Injury"}, 
                  inplace = True)

Renamed columns

Converting wide to long

If we think about the data shown above, it is in a wide format, which means each category has its column. To generate a meaningful plot, we need to put all these injury categories (Fatal, Greviously Injured, Manor Injury and Without Injury) into a single column and their values into another column. We keep the year column as it is but with repetitive values corresponding to each observation.

This way, we can plot the year on the x-axis in a progressing order, their values in the bar, and color code them using the Crash Type category.

This can be done by converting this wide dataset into a long dataset. Pandas offer .melt( ) method, we provide the constant column, i.e., year, plus category column names (Fatal, Greviously Injured, Manor Injury and Without Injury ) into value_vars argument.

This outputs a long dataset with year, variable and value columns.

crash_data = pd.melt(crash_data,
                                 'Greviously Injured',
                                 'Minor Injury',
                                 'Without Injury'])

Long data format

Renaming columns

Next, we again rename these columns as YearCrash Type, and Crash Count.

crash_data.rename(columns = {"year":"Year",
                     "variable":"Crash Type",
                     "value":"Crash Count"}, 
                  inplace=True)  #supports inplace = True

Renamed columns

Generating a Crash Treand Barplot

The barplot generation and customization require the following steps:

Part A: Instantiation of the subplots

The first step is to instantiate a plot object using plt.subplots( ) and supply a figure size of 10 inches width and 6 inches height. Then saving the returned figureand axes object into fig and ax.

Next, we need to define the color categories for each bar group

Part B: Generating the plot

In this part, we generate the plot using sns.barplot( ) method, which takes data = crash_data, x = 'Year’ , y = 'Crash Count’hue = ‘Crash Type’ax = ax, and palette = sns.color_palette(color)

Part C: Adding bar labels

  • The next part is to iterate through ax.patch objects
  • Setting each patch/bar edge color to black
  • Annotating labels by getting the patch object x, y coordinates, and heights
  • Setting the size and rotation

Part D: Removing spines

In this part, we used ax.spines to set top and right spines visibility to False

Part E: Adding axes labels and tick parameters

Here, we customize the axes and tick labels

Part F: Customizing legend

In this section, we use ax.legend( ) method to customize the legend size and title.

Part G: Customizing plot title

Here we used the ax.set_title( ) to set the plot title and fix it’s y position.

# Part A
fig, ax = plt.subplots(figsize=(10, 6))

color = ["red", "orange", "yellow", "#a3ff00"]

# Part B
# Generating a bar plot using seborn library
sns.barplot(data = crash_data,
            x = "Year",
            y = "Crash Count",
            hue = "Crash Type",
            ax = ax,
            palette = sns.color_palette(color)

# Part C
# Retrieve axis and fig objects from the current plot environment
# Adding bar labels

for p in ax.patches:
    p.set_edgecolor("black") # Add black border across all bars
    t = ax.annotate(str(np.int64(p.get_height())), xy = (p.get_x() + 0.03, p.get_height() + 7))
    t.set(size = 8, rotation=90)

# Part D
# Removing spines
for s in ["top", "right"]:

# Part E
# Adding axes labels and tick parameters
ax.set_xlabel("Year", size = 14)
ax.set_ylabel("Count", size = 14)
ax.tick_params(labelsize = 12)

# Part F
# Legend customisation
ax.legend(fontsize = 11,
          title = "Crash Type",
          title_fontsize = 12)

# Part G
ax.set_title("Crash Trend (2012-2022)",
             fontsize = 14,
             y = 1.05)
Final crash trend barplot (2012–2022)

Saving the plot

Once we are satisfied with the plot configurations and aesthetics, we can save it using the .savefig( ) method, where we name it as crash_plot.png, supply the plot resolution dpi = 300 and set the bbox_inches = 'tight' which removes the white spaces.

# Saving the plot
            dpi = 300,
            bbox_inches = 'tight')

This is one of the simplest ways to generate a publication-ready high-quality dodged bar plot to show the crash trend over the year. I hope you have learned something. Next, apply this skill to your dataset and see how it goes.

Click here for the data and code

I hope you learned something new! 😃

If you learned something new and liked this article, share it with your friends and colleagues. If you have any suggestions, drop a comment.