dynamic

Introduction

A bar plot is a graphical representation which shows the relationship between a categorical and a numerical variable. In general, there are two popular types used for data visualization, which are dodged and stacked bar plot.

A stacked bar plot is used to represent the grouping variable. Where group counts or relative proportions are being plotted in a stacked manner. Occasionally, it is used to display the relative proportion summed to 100%.

Article Outline

The article comprised of the following segments:

Table of content

Loading libraries

The very first step is to load the required libraries which comprised of numpy, pandas and matplotlib.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Basic knowledge of matplotlib’s subplots

If you have basic knowledge of matplotlib’s subplots( ) method, you can proceed with this article, else I will highly recommend reading the first blog on this visualisation guide series.

Link: Introduction to Line Plot — Matplotlib, Pandas and Seaborn Visualization Guide (Part 1)

Dataset Description

For the current article, we are going to use tips dataset.

Source:
Bryant, P. G. and Smith, M. A. (1995), Practical Data Analysis: Case Studies in Business Statistics, Richard D. Irwin Publishing, Homewood, IL.

The Tips dataset contains 244 observations and 7 variables (excluding the index). The variables descriptions are as follows:

bill: Total bill (cost of the meal), including tax, in US dollars
tip: Tip (gratuity) in US dollars
sex: Sex of person paying for the meal (Male, Female)
smoker: Presence of smoker in a party? (No, Yes)
weekday: day of the week (Saturday, Sunday, Thursday and Friday)
time: time of day (Dinner/Lunch)
size: the size of the party

Reading dataset

The first step is to read the tips dataset using pandas read_csv( ) method and printing the first 4 rows.

tips = pd.read_csv("datasets/tips.csv")
tips.head(4)
First 4 observations

Objective

The goal of this article is to generate the following stack bar plot which represent the gender-wise smoker proportion, where smoker category for each gender group sums to 100%.

Final plot

Data Preparation

To generate the stacked bar plot we need to compute the sex wise smoker proportion. To achieve this, you need to go through the following steps.

Step 1: Groupby the data based on sex and select smoker column from each group.

Step 2: Apply the value_counts( ) method and supply normalize = True to convert it to proportion.

Step 3: Multiply it with 100 using mul( ) method.

Step 4: Round it to 2 decimal places, using round( ) method.

Step 5: use unstack( ) method so that the sex labels will be presented in index and smoker status in columns and percentage values in cells.

We will save the final output to df object and use it for the final plot generation.

df = (tips
      .groupby("sex")["smoker"]
      .value_counts(normalize=True)
      .mul(100)
      .round(2)
      .unstack())
df
Unstacked output

Stacked bar plot using Matplotlib style

In the first step, we are going to use raw matplotlib syntax to achieve the final plot. Follow the following steps:

Step 1: instantiate the subplots( ) method with 12 inch width and 6 inch height and save the figure and axes objects to fig and ax.

Step 2: We will use the ax.bar( ) method and in the x-axis we will supply the index values using df.index and on the y-axis e will supply the “No” column values. Further, we will label them as No and supply a bar width of 0.3.

Step 3: Again we will use the ax.bar( ) method, but this time we will supply the “Yes” column that we want to stack over the “No” bar. One thing to note that as we would like to stack it over the “No” bars, thus wherever the No bar ends the “Yes” bar will start from there. Thus, we need to inform the bar( ) method that now the current starting position (here bottom) is the df.No column values (or heights). We need to label this as Yes and supply the width of 0.3.

fig, ax = plt.subplots(figsize = (12,6))
ax.bar(df.index, df["No"], label = "No", width = 0.3) 
ax.bar(df.index, df["Yes"], bottom = df.No, label = "Yes", width = 0.3) 
Initial stack bar plot

Stacked bar plot customisation

In the customisation part, the first thing is to add the data labels inside the bars. To do so, similar to dodged bar plot here, we need to get familiar with the plot internals.

Let’s understand the container object of the bar plot that will help us to achieve our desired plot. The axes (ax) object contains an object called container. If we run ax.containers, it will display a list of 2 objects, where each object contains a bar container of 2 artists. In simple language, the container object contains each bar pairs from No (blue bars) and Yes (orange bars) inputs.

# Check for containers
ax.containers

[<BarContainer object of 2 artists>, <BarContainer object of 2 artists>]

We can take out the first and second object from containers using index and print them separately, which also displays the same output.

# Print what containter 0 and 1 have
print(ax.containers[0])
print(ax.containers[1])

<BarContainer object of 2 artists>
<BarContainer object of 2 artists>

Now, let’s say we want to get further deep inside each container and print the properties of each bar. We can go for first container 0 (index) and then 1 (index).

The output shows that the first container contains two blue bars with height 62.07 and 61.78. Similarly, the second container contains two orange bars of height 37.93 and 38.22.

# Accessing what each container holds
print(ax.containers[0][0])
print(ax.containers[0][1])
print(ax.containers[1][0])
print(ax.containers[1][1])

Rectangle(xy=(-0.15, 0), width=0.3, height=62.07, angle=0)
Rectangle(xy=(0.85, 0), width=0.3, height=61.78, angle=0)
Rectangle(xy=(-0.15, 62.07), width=0.3, height=37.93, angle=0)
Rectangle(xy=(0.85, 61.78), width=0.3, height=38.22, angle=0)

We can use two for loops to achieve the same output. Here, first we loop through each container and then loop through each bar.

# Access what each of the containers contain using for loop
for c in ax.containers:
    for v in c:
        print(v)

Rectangle(xy=(-0.15, 0), width=0.3, height=62.07, angle=0)
Rectangle(xy=(0.85, 0), width=0.3, height=61.78, angle=0)
Rectangle(xy=(-0.15, 62.07), width=0.3, height=37.93, angle=0)
Rectangle(xy=(0.85, 61.78), width=0.3, height=38.22, angle=0)

Getting the bar height

To get the height of each bar while looping through bars, we can use the get_height( ) method and round the values to 2 decimal places.

The first two output shows the height of blue bars (from left to right) and the rest two are related to orange bars.

# Accessing the heights from each rectangle
for c in ax.containers:
    for v in c:
        print(v.get_height().round(2))

62.07
61.78
37.93
38.22

For labelling the stacked bars, we need to have the bars’ height obtained from each container in the form of a list.

To achieve this, we can use a list comprehension. We need to go through the following steps:

Step 1: First loop through teach container and save it to a temporary variable c

Step 2: Use a list comprehension where we use another loop that loop through the bars under each container (c).

Step 3: Use a conditional if statement which returns the height of the bar if the height is greater than 0, else it returns an empty string. This is enabled so that it add bar label only when the bar height is above 0.

# Looping and printing each container object's height
for c in ax.containers:
    # Optional: if the segment is small or 0, customize the labels
    print([round(v.get_height(), 2) if v.get_height() > 0 else '' for v in c])

[62.07, 61.78]
[37.93, 38.22]

Adding labels, removing spines, modifying axes labels and legend

Adding labels: To add the label, we go through the above discussed approach and save the list of bar heights in labels.

Note: Here, we have added a % symbol by converting the height values to string using the str( ) method.

Then we need to use the exclusive bar labelling method ax.bar_label( ), where we will supply the container object [c] and labels [labels = labels]. Further, we will specify the position of the labels and size (14).

Removing Spines: To achieve this, use need to use a for loop that loop through the position list [“top”, “right”] and supply these positions to ax.spines[position] and set the visibility to False using the set_visible() method.

Adding labels: Next step is to alter the tick parameters [using tick_params( )], and axis labels [using set_xlabel( ) and set_ylabel( )] to make the plot informative and aesthetically beautiful.

Adding legend: The final step is to customise the legend. Here, using the ax.legend( ) method.

  • I have modified the existing labels to “no” and “yes”,
  • Set the legend and title font size to 12 and 14 respectively.
  • Add a legend title called “smoker”.
  • Position the legend using bbox_to_anchor by supplying the x and y position.
# Add labels
for c in ax.containers:
    labels = [str(round(v.get_height(), 2)) + "%" if v.get_height() > 0 else '' for v in c]
    ax.bar_label(c,
                 label_type='center',
                 labels = labels,
                 size = 14) # add a container object "c" as first argument
# Remove spines
for s in ["top", "right"]:
    ax.spines[s].set_visible(False)
# Add labels
ax.tick_params(labelsize = 14, labelrotation = 0)
ax.set_ylabel("Percentage", size = 14)
ax.set_xlabel("Sex", size = 14)
# Add legend
ax.legend(labels = ["no", "yes"],
          fontsize = 12,
          title = "Smoker",
          title_fontsize = 18,
          bbox_to_anchor = [0.55, 0.7])
# Fix legend position
# ax.legend_.set_bbox_to_anchor([0.55, 0.7])
fig
Final stacked bar plot

Saving the plot

To save the stacked bar plot, we can use the figure object (fig) and apply the savefig( ) method, where we need to supply the path (images/) and plot object name (dodged_barplot.png) and resolution (dpi=300).

# Save figure
fig.savefig("images/stackedbarplot.png", dpi = 300)

Stacked barplot with pandas DataFrame [pandas plot( ) method]

Next we will generate the same plot but using pandas DataFrame based approach. Let’s print the df.

df

Generating stacked bar plot

The next step is to generate the same stacked bar plot, but now we will be using pandas DataFrame based plot( ) method. To generate, we need to go through the following steps:

Step 1: Instantiate the subplots( ) method with 12 inch width and 6 inch height and save the figure and axes objects to fig and ax.

Step 2: We will use the df.plot( ) method. Next we need to tell the plot method that the kind of the plot is bar, and it should be a stacked bar plot thus enabled stacked = True. Then supply the axes (ax) object to ax, bar width of 0.3 and edge color “black”.

This will generate the basic framework for the stacked bar plot.

fig, ax = plt.subplots(figsize = (12, 6))
# Plot
df.plot(kind = "bar",
        stacked = True,
        ax = ax,
        width = 0.3,
        edgecolor = "black")
Basic stack bar plot framework

OR

Another approach is to use the df.plot.bar( ) method to generate the above plot.

# OR
fig, ax = plt.subplots(figsize = (12, 6))
# Plot
df.plot.bar(stacked = True,
            ax = ax,
            width = 0.3,
            edgecolor = "black")
Basic stack bar plot framework

Customising bar plot

The stacked bar plot customisation part is exactly the same.

# Adding bar labels
for c in ax.containers:
    labels = [str(round(v.get_height(), 2)) + "%" if v.get_height() > 0 else '' for v in c]
    ax.bar_label(c,
                 label_type='center',
                 labels = labels,
                 size = 14) # add a container object "c" as first argument
# Removing spines
for s in ["top", "right"]:
    ax.spines[s].set_visible(False)
# Adding tick and axes labels
ax.tick_params(labelsize = 14, labelrotation = 0)
ax.set_ylabel("Percentage", size = 14)
ax.set_xlabel("Sex", size = 14)
# Customising legend
ax.legend(labels = ["no", "yes"],
          fontsize = 12,
          title = "Smoker",
          title_fontsize = 18)
# Fixing legend position
ax.legend_.set_bbox_to_anchor([0.55, 0.7])
fig
Stacked bar plot using pandas plot method

Once you learn base Matplotlib, you can customize the plots in various ways. I hope you now know various ways to generate a stacked bar plot. Apply the learned concepts to your datasets.

References:

[1] J. D. Hunter, “Matplotlib: A 2D Graphics Environment”, Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007.

Click here for the data and code

I hope you learned something new!