**Introduction**

A bar plot is a graphical representation which shows the relationship between a categorical and a numerical variable. In general, there are two popular types used for data visualization, which are dodged and stacked bar plot.

A dodged bar plot is used to compare a grouping variable, where the groups are plotted side by side. It could be used to compare categorical counts or relative proportions, and in general used to compare numerical statistics such as mean/median.

In the current article, we will deal with count-based bar plots where we compare the proportions corresponding to a grouping variable.

**Article outline**

The current article will cover the following:

## Loading libraries

The first step is to load the required libraries.

```
import numpy as np # array manipulation
import pandas as pd # Data Manipulation
import matplotlib.pyplot as plt # Visualisation
import seaborn as sns # Visualisation
```

**Basic knowledge of matplotlib’s subplots**

If you have basic knowledge of matplotlib’s ** subplots( )** method, you can proceed with the article, else I will highly recommend reading the first blog on this visualisation guide series.

**Link:** Introduction to Line Plot — Matplotlib, Pandas and Seaborn Visualization Guide (Part 1)

**Basic barplot using Rectangle method**

In this article, we will learn how to generate dodge plots. But before we proceed with such advanced statistical plot, first we need to be familiar with how matplotlib builds a bar plot step by step.

To build a bar plot, we need to go through the following steps:

** Step 1**: From

**import**

*matplotlib.patches***.**

*Rectangle***: Using**

*Step 2***instantiate figure (**

*plt.subplots( )***) and axes (**

*fig***) objects.**

*ax***: Use the**

*Step 3***method to generate the**

*Rectangle( )***object. The**

*patch/rectangle***method takes**

*Rectangle( )***and**

*x***as tuple, then**

*y***and**

*width***of the bar.**

*height***: We generate two such Rectangle\patch object (**

*Step 4**rec1*and

**) that we are going to add\impose of the axes (**

*rec2***) object.**

*ax***: Now add these patch/rectangle objects on the axes (**

*Step 5***) using**

*ax***method.**

*add_patch( )*```
from matplotlib.patches import Rectangle
fig, ax = plt.subplots()
# Define rectangle
# Rectangle((x, y), width, height)
rec1 = Rectangle((0.1, 0), 0.2, 0.9)
rec2 = Rectangle((0.5, 0), 0.2, 0.5)
# Adding patch object/ rectangles
ax.add_patch(rec1)
ax.add_patch(rec2)
```

*Help on the methods*

You can get help using the python’s inbuilt ** help( )** method, where you can supply any object name (for example

**) to get information on the associated attributes and methods.**

*Rectangle*`help(Rectangle)`

**Check for the Patches**

Let’s check whether the axes (** ax**) object contains the paches/rectangles. We can check that by applying the attribute

**on axes object (**

*patches***). The output clearly shows that the axes object contains two patches.**

*ax*`ax.patches`

<Axes.ArtistList of 2 patches>

**Changing the Rectangle/Patch Colour**

We can customise patch properties. Let’s change the patch property of the 2nd rectangle by accessing the object via ** ax.patches[1]** and apply the

**to change the colour to red.**

*set_color(“red”)*```
ax.patches[1].set_color("red")
fig
```

Now you have a basic idea how matplotlib generates the rectangles of a bar plot. This approach is good, but difficult to use when we have many bars to plot. Thus, to overcome this issue, we can use a more convenient method offered by the axes object (** ax**) called

**.**

*bar( )*Let’s proceed with step by step method:

Step 1: Instantiate a figure (** fig**) and axes (

**) object.**

*ax*Step 2: Generate a list of x-axis and y-axis values.

Step 3: use the

**method of axes (**

*bar( )***) object and pass the**

*ax***and**

*x***lists.**

*y*This way you can generate a basic bar plot.

```
# Adding bars using defined values
fig, ax = plt.subplots()
x = [0, 1, 2, 3, 4]
y = [1, 3, 5, 2, 7]
# Use ax.bar()
ax.bar(x, y)
```

Again, let’s check the patches of the axes (** ax**) object using

**attribute. Now you can observe that it contains**

*patches***patches/rectangles.**

*5*```
# Check number of patches
ax.patches
```

<Axes.ArtistList of 5 patches>

Like last time, here also we can change the colour of the rectangle/patch objects. Let’s change the 4th patch’s colour to red. It uses the same method ** set_color( )** but here we need to apply this on the

**4th**patch using

**. We supplied 3 because Python is a zero-index-based language.**

*patches[3]*```
# Changing 4th patch color to "red"
# Caange patch 1 to red
ax.patches[3].set_color("red")
fig
```

We are now familiar with the bar plot and how to generate them from scratch. Now let’s proceed with a new form of plot called “** dodged bar plot**”.

## Dodged barplot [matplotlib style]

A dodged bar plot is used to present the count/proportions/statistics (mean/median) for two or more variables side by side. It helps in making comparison between variables.

For the current plot, we are going to use tips dataset.

*Source:*

*Bryant, P. G. and Smith, M. A. (1995), Practical Data Analysis: Case Studies in Business Statistics, Richard D. Irwin Publishing, Homewood, IL.*

The Tips dataset contains ** 244 observations** and

**(excluding the index). The variables descriptions are as follows:**

*7 variables*** bill**: Total bill (cost of the meal), including tax, in US dollars

**: Tip (gratuity) in US dollars**

*tip***: Sex of person paying for the meal (Male, Female)**

*sex***: Presence of smoker in a party? (No, Yes)**

*smoker***: day of the week (Saturday, Sunday, Thursday and Friday)**

*weekday***: time of day (Dinner/Lunch)**

*time***: the size of the party**

*size*Let’s load the tips dataset using pandas ** read_csv( )** method and print the first 4 observations using

**method.**

*head()*```
tips = pd.read_csv("datasets/tips.csv")
tips.head(4)
```

## Aim of the plot

The aim of the plot is to calculate and impose gender wise smoker proportion using a dodged bar plot. See the below figure which represent the final plot that we are going to plot using various approach (matplotlib, pandas and seaborn). In the plot, we will present the gender category in the x-axis and their proportion corresponding to smoker category in the y-axis. Further, we are going to add labels on top of the bar and customise the legend.

## Estimate gender/sex wise smoker percentage

To generate this dodged plot, we need to compute the sex wise smoker and non-smoker proportion. To achieve this, we have to go through the following steps:

**Step 1**: apply the ** groupby( )** method and group the data based on ‘

**’ and select the ‘**

*sex***’ column from each group.**

*smoker***Step 2**: Then apply the ** value_counts( )** method and supply

**to compute proportion.**

*normalize = True***Step 3**: Next, multiply it with 100, using ** .mul(100)** and

**it to**

*round***.**

*2 decimal places***Step 4**: Apply ** unstack( )** method so that the sex labels presented in index and smoker status presented in columns and percentage values are presented in cells.

**Step 5**: Save the output into ** df** variable.

```
df = (tips
.groupby("sex")["smoker"]
.value_counts(normalize=True)
.mul(100)
.round(2)
.unstack())
df
```

Next, we will take out the Data Frame index using ** df.inde**x and save in

**label**and generate a range count using the

**method. We will need these two objects to customise the plots. If we print these objects, we can observe that the label contains sex labels (**

*np.range( )***and**

*Female***) and the x variable contains**

*Male***and**

*0***as a list.**

*1*```
# Generating labels and index
label = df.index
x = np.arange(len(label))
print(label)
print(x)
```

Index([‘Female’, ‘Male’], dtype=’object’, name=’sex’)

[0 1]

## Understanding the plotting mechanism

The very first thing we need to do is to use ** subplots( )** method from matplotlib and generate axes (

**) and figure (**

*ax***) objects. The figure size is set to 8 by 6 inches.**

*fig*Next, set the bar width to ** 0.2** and use the

**method and apply it to axes object (**

*bar( )***), over which we will impose the bars.**

*ax*In the ** bar( )** method, we need to separately supply the columns of the

**object. Here in the first one we supplied the**

*df***(previously generated object) and the ‘**

*x***’ column at x and y position. Then**

*No***value,**

*width***(to mark the bar) and bar border colour using**

*label***argument. Then saved the bar object to**

*edgecolor***.**

*rect1*Similarly, for the ‘** Yes**’ column, we have created another object and save it to

**.**

*rect2*Now if we see the plotted object we can observe that the blue and orange bar are in a single column which is far from the desired dodged plot. This is because the bars from each group (No/Yes) are imposed one above another.

To rectify the situation, we need to move the blue bars to the left by 0.1 and the orange bars to the right by 0.1.

```
#create the base axis
fig, ax = plt.subplots(figsize = (8,6))
#set the width of the bars
width = 0.2
#add first pair of bars
rect1 = ax.bar(x,
df["No"],
width = width,
label = "No",
edgecolor = "black")
#add second pair of bars
rect2 = ax.bar(x,
df["Yes"],
width = width,
label = "Yes",
edgecolor = "black")
```

Now, if we deduct 0.1 from the blue bars’ x-axis position (x – width/2) and add 0.1 to the orange bars (x + width/2) and plot it again, we can see that the bars now looked like dodged bars.

There is one problem, that the x-axis labels are not matching to the final plot, which we actually wanted. We need to correct it.

```
#create the base axis
fig, ax = plt.subplots(figsize = (8,6))
#set the width of the bars
width = 0.2
# create the first bar by shifting it to left side by width/2
rect1 = ax.bar(x - width/2,
df["No"],
width = width,
label = "No",
edgecolor = "black")
# create the first bar by shifting it to right side by width/2
rect2 = ax.bar(x + width/2,
df["Yes"],
width = width,
label = "Yes",
edgecolor = "black")
```

Let’s reset the x-axis tick labels using the ** set_xticks(x)** which will set it to the list values stored in

**. But we need to label it as per the sex.**

*x*```
# Reset x-ticks
ax.set_xticks(x)
fig
```

Next, set the x-tick labels using the ** set_xticklabels( )** method by supplying the

**object (created initially). Now we have got the desired x-tick labels.**

*label*```
# Setting x-axis tick labels
ax.set_xticklabels(label)
fig
```

## Concept of Patch objects (groups)

Now let’s move to one of the important topic in bar plots called patch. Every rectangle you see in a barplot know as ** patch **object which contains numerous information like height of the bar, width, their x and y position, colour etc. Let’s enquire about the patches from our axes (

**) object. If we apply the**

*ax***attribute on the axes (**

*.patches***), then it will show that it contains 4 patch objects corresponding to 4 bars.**

*ax*```
# Number of patches
ax.patches
```

<Axes.ArtistList of 4 patches>

To retrieve the information and make use of it, we need to know the order of the patches.

- The blue patches contain information regarding the “
**No**” column and the orange patches contain information regarding “” column.*Yes* - The order will be blue Female bar (
), blue Male bar (*patch 0*), orange Female bar (*patch 1*), orange Male bar (*patch 2*).*patch 3*

Let’s retrieve the height from the first patch. To do so, you need to select the first patch object using ** .patches[0] **and apply the

**method, which reveals the height, i.e., 62.07.**

*get_height( )*```
# 0 & 1 are blue pair; 2 & 3 are orange pair (left to right)
ax.patches[0].get_height()
```

62.07

## Labelling bars

Now we know the concept of patches, we can add labels to each bar by retrieving height information from each patch object using a for loop. To achieve this, follow the following steps:

**Step 1**: Loop through each patch objects (** ax.patches**) and save it to a temporary variable ‘

**.**

*p’*** Step 2**: use

**method to annotate the labels. It takes the height value, x and y positions. We can retrieve the height using**

*ax.annotate( )***and convert it to a string object using**

*get_height( )***to add a percentage (**

*str( )***) symbol. Further, the**

*%***and**

*x***position can be retrieved using**

*y***and**

*get_x( )***method. To improve the padding at the top of the bars, we add some padding of**

*get_height( )***(in the x-direction) and**

*0.03***(in the y-direction). Next, save it to a variable ‘**

*1***’.**

*t*** Step 3**: use the

**method to change the annotated text properties.**

*set( )*```
# Adding bar values
for p in ax.patches:
t = ax.annotate(str(p.get_height()) + "%", xy = (p.get_x() + 0.03, p.get_height()+ 1))
t.set(color = "black", size = 14)
fig
```

**Customising bar plot**

The first step of customising it to remove some splines (plot border lines). I usually prefer turning off the top and right spines. To achieve this, use a for loop and use ** ax.spines[position]** and apply

**to**

*set_visible()***False**.

We can also alter the tick parameters [using ** tick_params( )**], and axis labels [using

**and**

*set_xlabel( )***] to make the plot informative and aesthetically beautiful.**

*set_ylabel( )*```
# Remove spines
for s in ["top", "right"]:
ax.spines[s].set_visible(False)
# Adding axes and tick labels
ax.tick_params(axis = "x", labelsize = 14, labelrotation = 45)
ax.set_ylabel("Percentage", size = 14)
ax.set_xlabel("Sex", size = 14)
fig
```

Last, but not the least, let’s customise the legend. Here, using the ** ax.legend( )** method, I have modified the existing labels to “

**” and “**

*N***”.**

*Y*As we know that each plot ranges 0 to 1 in the x and y direction. We can use this information to position our plot legend to the middle of the plot. We can access the legend using ** ax.legend_** and set the position using

**and supply the**

*.set_bbox_to_anchor( )***and**

*x***position using a list.**

*y*Now our plot is finalized and ready to use.

```
# Customize legend
ax.legend(labels = ["N", "Y"],
fontsize = 12,
title = "Smoker",
title_fontsize = 18)
# # Fix legend position
ax.legend_.set_bbox_to_anchor([0.6, 0.5])
fig
```

*Saving the plot*

To save a plot, we can use the figure object (** fig**) and apply the

**method, where we need to supply the path (**

*savefig( )***) and plot object name (**

*images/***) and resolution (**

*dodged_barplot.png***).**

*dpi=300*```
# Save figure object
fig.savefig("images/dodged_barplot.png", dpi = 300)
```

# Dodged bar plot using pandas DataFrame’s plot( ) method

The next step is to generate the same dodged plot, but using the pandas DataFrame based ** plot( )** method.

First step is to prepare the data, which is the same as we did in the last plot.

```
tips = pd.read_csv("datasets/tips.csv")
df = (tips
.groupby("sex")["smoker"]
.value_counts(normalize=True)
.mul(100)
.round(2)
.unstack())
df
```

## Pandas plot( ) method

Let’s generate the dodged plot using pandas ** plot( )** method-based approach. To achieve this, we need to follow the following steps.

Step 1: Use ** subplots( )** method from matplotlib and generate axes (

**) and figure (**

*ax***) object. Set the figure size to 8 by 6 inches.**

*fig*Step 2: apply ** plot( )** method on the DataFrame (

**) object. Specify the kind = “**

*df***” and ax =**

*bar***and edgecolor = “**

*ax***”.**

*black***Bam**! Your plot framework is almost ready.

```
fig, ax = plt.subplots(figsize = (10, 4))
df.plot(kind = "bar",
ax = ax,
edgecolor = "black")
```

Next part is labelling and customizing the plot, which is exactly the same as we did in the raw matplotlib based approach. Here, I did not alter the legend labels [“** No**”, “

**”].**

*Yes*```
# Add data labels
for p in ax.patches:
t = ax.annotate(str(p.get_height()) + "%", xy = (p.get_x() + 0.03, p.get_height()+ 1))
t.set(color = "black", size = 14)
# Remove spines
for s in ["top", "right"]:
ax.spines[s].set_visible(False)
# Add axes labels and tick parameters
ax.set_xlabel("Sex", size = 14)
ax.set_ylabel("Percentage", size = 14)
ax.tick_params(labelsize = 14, labelrotation = 0)
# Customise legend
ax.legend(labels = ["No", "Yes"],
fontsize = 12,
title = "Smoker",
title_fontsize = 18)
# Fix legend position
ax.legend_.set_bbox_to_anchor([0.5, 0.3])
fig
```

# Dodged barplot with pandas DataFrame [seaborn style]

Next, we will generate the same plot, but using ** seaborn** plotting style. In the seaborn we need the input data as pandas DataFrame.

The process of calculating groupwise proportion is similar with small difference. Here, use the ** reset_index( )** method instead of

**to convert index to columns. Now the output is a pandas DataFrame type which includes all the columns as stacked Series object.**

*untack( )*```
df = (
tips
.groupby("sex")["smoker"]
.value_counts(normalize = True)
.mul(100)
.rename('percent')
.reset_index()
.round(2)
)
df
```

## Plotting a dodged plot [seaborn method]

Here, we will be going to use the ** catplot( )** method from seaborn library. We need to supply the x variable as “

**”, y variable as “**

*sex***”, fill color, i.e., hue = “**

*percent***”, DataFrame object (**

*smoker***) and legend =**

*df***.**

*False*As the catplot does not take an axes (** ax**) object; thus we need to somehow retrieve the axes (

**) and figure (**

*ax***) objects.**

*fig*We can retrieve the axes (** ax**) object using the

**and figure (**

*plt.gca( )***) object using the**

*fig***. The**

*plt.gcf( )***refers to `**

*gca***` and**

*get current axes***refers to the `**

*gcf***`.**

*get current figure*```
sns.catplot(x = "sex",
y = 'percent',
hue = "smoker",
kind = 'bar',
data = df,
legend = False)
ax = plt.gca()
fig = plt.gcf()
```

The next step is to customising the plot, i.e., adding data labels, modifying ticks and axis labels.

Lastly, we will fix the size of the plot using the ** fig.set_size_inches( )**.

```
sns.catplot(x = "sex",
y = 'percent',
hue = "smoker",
kind='bar',
data = df,
legend = False)
################################
# Customization
################################
# Retrieve axis and fig objects from the current plot environment
ax = plt.gca()
fig = plt.gcf()
# Add bar labels
for p in ax.patches:
p.set_edgecolor("black") # Add black border across all bars
t = ax.annotate(str(p.get_height().round(2)) + "%", xy = (p.get_x() + 0.1, p.get_height() + 1))
t.set(size = 14)
# Adding axes labels and tick parameters
ax.set_xlabel("Sex", size = 16)
ax.set_ylabel("Percentage", size = 16)
ax.tick_params(labelsize = 14)
# Legend customisation
ax.legend(fontsize = 12,
title = "Smoker",
title_fontsize = 12)
# Resetting figure size
fig.set_size_inches(8, 4)
```

Once you learn base matplotlib, you can customise the plots in various ways. I hope you now know various ways to generate a dodged plot. Apply the learned concepts to your datasets.

**References**:

*Click here **for the **data and code*

**I hope you learned something new!**