Introduction
Traffic-related injuries are the leading cause of death across the world. Every year 1.3 million people die from traffic crashes. India witnesses 150 thousand road deaths annually, a significant concern for road safety researchers. As a transportation researcher, I often collect crash-related data from various sources (for example, police FIR records). This valuable data is often used to understand severe crashes and their patterns and to identify blackspots.
Here in this article, we will use such a data record (dummy one). Once we get such data, we (Road safety researchers) usually clean it and prepare for exploratory data analysis.
Context and Goal of the Article
In this article, we will learn how to use Python, Matplotlib, and Seaborn to create an excellent crash trend plot.
# Note: Please refer to the earlier articles in the series if you are still getting familiar with Matplotlib and Seaborn libraries.
Article Outline
The current article comprised of the following:
- Introduction
- Loading libraries
- Loading dataset
- Changing column names
- Converting the wide dataset into a long format
- Generating barplot
- Saving the plot
Loading libraries
The first thing is to load the relevant libraries in your Python environment. We specifically use pandas, matplotlib, and seaborn libraries to achieve the goal here.
import numpy as np # Array manipulation
import pandas as pd # Data manipulation
import matplotlib.pyplot as plt # Plotting
import seaborn as sns # Generating statistical plots
Data description
This crash dataset is an actual representation (here, I used dummy values) of the data we gather from different agencies as a researcher.
Imagine you work as a road safety researcher. You are interested in understanding the crash trend for a particular highway. So you asked the authorities to provide road crash counts data for a particular highway (say Highway-123) from 2012 to 2022. The authority sends you the datasheet via email. You cleaned and compiled the data based on each year and various crash types in a file (crash_data.csv
). Now that you want to understand the trend, you want to generate a plot using Python.
Loading dataset
Load the dataset using the pandas read_csv( )
method and print the first five rows.
The dataset contains five columns which are year, fatal_killed, grevious_injured, minor_injured and
without_injury
. The counts correspond to each year, and the category is presented in cells.
crash_data = pd.read_csv("crash_data.csv")
crash_data.head()
We can also print the column names using .column
attribute and convert them into a list using .to_list()
.
crash_data.columns.to_list()
[‘year’, ‘fatal_killed’, ‘grevious_injured’, ‘minor_injured’, ‘without_injury’]
Changing column names
Next, we will change the column names to make the column identity slightly more precise. We will use .rename( )
method and use a dictionary to alter the names.
The new column names are Fatal, Greviously Injured, Manor Injury and Without Injury
.
crash_data.rename(columns = {
"fatal_killed":"Fatal",
"grevious_injured":"Greviously Injured",
"minor_injured": "Minor Injury",
"without_injury": "Without Injury"},
inplace = True)
crash_data.head()
Converting wide to long
If we think about the data shown above, it is in a wide format, which means each category has its column. To generate a meaningful plot, we need to put all these injury categories (Fatal, Greviously Injured, Manor Injury and Without Injury
) into a single column and their values into another column. We keep the year column as it is but with repetitive values corresponding to each observation.
This way, we can plot the year
on the x-axis in a progressing order, their values in the bar, and color code them using the Crash Type
category.
This can be done by converting this wide dataset into a long dataset. Pandas offer .melt( )
method, we provide the constant column, i.e., year
, plus category column names (Fatal, Greviously Injured, Manor Injury and Without
Injury
) into value_vars
argument.
This outputs a long dataset with year, variable and
value
columns.
crash_data = pd.melt(crash_data,
id_vars=['year'],
value_vars=['Fatal',
'Greviously Injured',
'Minor Injury',
'Without Injury'])
crash_data.head(10)
Renaming columns
Next, we again rename these columns as Year
, Crash Type
, and Crash Count
.
crash_data.rename(columns = {"year":"Year",
"variable":"Crash Type",
"value":"Crash Count"},
inplace=True) #supports inplace = True
crash_data
Generating a Crash Treand Barplot
The barplot generation and customization require the following steps:
Part A: Instantiation of the subplots
The first step is to instantiate a plot object using plt.subplots( )
and supply a figure size of 10 inches width
and 6 inches height
. Then saving the returned figure
and axes
object into fig
and ax
.
Next, we need to define the color categories for each bar group
Part B: Generating the plot
In this part, we generate the plot using sns.barplot( )
method, which takes data = crash_data
, x = 'Year’
, y = 'Crash Count’
, hue = ‘Crash Type’
, ax = ax
, and palette = sns.color_palette(color)
Part C: Adding bar labels
- The next part is to iterate through
ax.patch
objects - Setting each patch/bar edge color to black
- Annotating labels by getting the patch object x, y coordinates, and heights
- Setting the size and rotation
Part D: Removing spines
In this part, we used ax.spines
to set top
and right
spines visibility to False
Part E: Adding axes labels and tick parameters
Here, we customize the axes and tick labels
Part F: Customizing legend
In this section, we use ax.legend( )
method to customize the legend size and title.
Part G: Customizing plot title
Here we used the ax.set_title( )
to set the plot title and fix it’s y
position.
# Part A
fig, ax = plt.subplots(figsize=(10, 6))
color = ["red", "orange", "yellow", "#a3ff00"]
# Part B
# Generating a bar plot using seborn library
sns.barplot(data = crash_data,
x = "Year",
y = "Crash Count",
hue = "Crash Type",
ax = ax,
palette = sns.color_palette(color)
)
# Part C
# Retrieve axis and fig objects from the current plot environment
# Adding bar labels
for p in ax.patches:
p.set_edgecolor("black") # Add black border across all bars
t = ax.annotate(str(np.int64(p.get_height())), xy = (p.get_x() + 0.03, p.get_height() + 7))
t.set(size = 8, rotation=90)
# Part D
# Removing spines
for s in ["top", "right"]:
ax.spines[s].set_visible(False)
# Part E
# Adding axes labels and tick parameters
ax.set_xlabel("Year", size = 14)
ax.set_ylabel("Count", size = 14)
ax.tick_params(labelsize = 12)
# Part F
# Legend customisation
ax.legend(fontsize = 11,
title = "Crash Type",
title_fontsize = 12)
# Part G
ax.set_title("Crash Trend (2012-2022)",
fontsize = 14,
y = 1.05)
Saving the plot
Once we are satisfied with the plot configurations and aesthetics, we can save it using the .savefig( )
method, where we name it as crash_plot.png
, supply the plot resolution dpi = 300
and set the bbox_inches = 'tight'
which removes the white spaces.
# Saving the plot
fig.savefig('crash_plot.png',
dpi = 300,
bbox_inches = 'tight')
This is one of the simplest ways to generate a publication-ready high-quality dodged bar plot to show the crash trend over the year. I hope you have learned something. Next, apply this skill to your dataset and see how it goes.
Click here for the data and code
I hope you learned something new! 😃
If you learned something new and liked this article, share it with your friends and colleagues. If you have any suggestions, drop a comment.