Introduction
Unlike binary logistic regression (two categories in the dependent variable), ordered logistic regression can have three or more categories assuming they can have a natural ordering (not nominal).
The basic philosophy behind this regression model is that as one variable increases, it would result in a shift towards either end of the spectrum of the ordinal responses.
When to use an ordinal logistic regression model
There are various scenarios where an ordinal regression could be useful
- When your outcome variable has a finite number of categories with a natural ordering.
- The proportional odds assumption holds.
To understand how to model ordinal outcomes and interpret the model coefficients, in this article, we will model the student’s college apply
status using R statistical programming language.
Article Outline
This article contains the following:
- Introduction
- Loading libraries
- Data description
- Data types
- Model fitting
- Adding p-values
- Calculating Odds ratio
- Estimating 95% Confidence Intervals (CI)
- Model interpretation
- Probability plots
Loading libraries
The first step is to install and import the following libraries in R.
# Loading libraries
library(MASS)
library(foreign)
library(tidyverse)
library(margins)
Data Description
Here we are going to use the application to graduate school dataset obtained from the UCLA website: https://stats.idre.ucla.edu
It contains four variables where apply
is the ordinal outcome/dependent variable, and gpa
, pared
, and public
are the independent variables.
apply (1/2/3): It has 3 levels namely unlikely
, somewhat likely
, and very likely
, coded as 1
, 2
, and 3
respectively. The 3
being highest and 1
being lowest.
gpa: grade point average (gpa).
pared (0/1): a binary variable that indicates if at least one parent went to graduate school. The value of 1 indicates at least one parent went to graduate school.
public (0/1): refers to the type of undergraduate institute, where 1
indicates that the undergraduate institution is public, and 0
private.
Data Preparation
Let’s load the dataset using the read_csv( )
function and print the first six rows using the head( )
function.
data <- read_csv("admission.csv")
head(data)
Let’s check the data types using the str( )
function. The apply
column is of string
type, and the rest of the columns are of type numeric
.
str(data)
Fixing Data Types
Next, we will fix the data type to suit the model requirements. First, we need to convert the apply
column to an ordinal column. We can do this using the ordered( )
function where we need to supply the apply
column and its labels in a natural order. Next, convert the pared
and public
columns to a categorical variable.
Print the labels to see the order.
data$apply <- ordered(data$apply, levels = c("unlikely", "somewhat likely", "very likely"))
data$pared <- as.factor(data$pared)
data$public <- as.factor(data$public)
levels(data$apply)
levels(data$pared)
levels(data$public)
Model Fitting
Now we have a polished dataset; we can fit the model using the porl( )
function from MASS package, where we need to supply the apply ~ pared + public + gpa
and Hess = True
.
- Next, print the model summary using
summary(model)
. - The model outputs
coefficients
andtau-cuts
- These coefficients are used to estimate the
latent score formula (y*)
and then we can compare it withtau-cuts
.
model <- polr(apply ~ pared + public + gpa, data = data, Hess = TRUE)
summary(model)
Adding P-values
Next, we will add the p-values to report the significant variables at a 95% confidence interval. We can estimate it using the pnorm( )
function using t-value
from the summary table. Next, we can add these p-values to the summary table using cbind( )
function which binds it column-wise.
(ctable <- coef(summary(model)))
p <- pnorm(abs(ctable[,"t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable,
"p-value" = round(p, digits = 2)))
Calculating 95% CI
You can also estimate the 95% confidence intervals of the estimates using the confint( )
function.
(ci <- confint(model))
Odds Ratio
We already discussed that model coefficients are not directly interpretable as they are in log-odds terms. One of the best ways to interpret them is to estimate the odds ratio. We can obtain an odds ratio by taking an exponent of the coefficient.
# Odd ratios
round(exp(coef(model)), digits = 2)
The interpretation of the odds ratio
GPA: When a student’s GPA increases by one unit, the likelihood of them being more likely to apply (very or somewhat likely versus unlikely) is multiplied by 1.85, equivalent to an 85% increase while keeping all other variables constant.
Parents: When comparing students whose parents attended college to those whose parents did not, the odds of the former being more likely to apply (i.e., very or somewhat likely versus unlikely) is 2.85 times that of the latter while keeping all other variables constant.
Public: When comparing public schools to those in private schools, the odds of public school being more likely to apply (i.e., very or somewhat likely versus unlikely) is 5.71% lower, which is equivalent to (1–0.943) x 100%, while keeping all other variables constant.
Probability calculations
While calculating probabilities, we must remember that
a) Ordinal logistic regression uses log-odds of cumulative probabilities,
b) Cumulative logit(.) requires subtracting the model estimates.
Equations
Here we get two equations as the probability of the third one can be estimated by subtracting it from 1 (total probabilities sum up to 1)
- logit(P(Y<=1)) = logit(F_unlikely) = 2.20 — (1.05 * Pared — 0.06 * Public + 0.616 * GPA)
- logit(P(Y<=2)) = logit(F_somewhat_likely) = 4.30 — (1.05 * Pared — 0.06 * Public + 0.616 * GPA)
Interpretation of an observation
Let’s compute the logit(F_unlikely) and logit(F_somewhat_likely) for an observation.
Here we are going to take the second observation from our dataset, where, pared = 1
, public = 0
, and gpa = 3.21
.
The estimation shows that logit_F_unlikely = -0.82736
and logit_F_somewhat = 1.27264
. The next step is to estimate the odds.
# Let's take the second observation from original dataset
pared = 1 # at least one parent went to graduate school
public = 0 # attended a private institution
gpa = 3.21 # gpa secured
logit_F_unlikely = 2.20 - (1.05 * pared - 0.06 * public + 0.616 * gpa)
logit_F_unlikely
logit_F_somewhat = 4.30 - (1.05 * pared - 0.06 * public + 0.616 * gpa)
logit_F_somewhat
# Note: logit_F_unlikely and logit_F_somewhat are in logit form
[1] -0.82736
[1] 1.27264
Converting it to odds
We can compute the odds by taking the exponent. The estimation shows that odds_F_unlikely = 0.437202
and odds_F_somewhat = 3.570266
.
# Convert to odds
Odds_F_unlikely = exp(logit_F_unlikely)
Odds_F_unlikely
Odds_F_somewhat = exp(logit_F_somewhat)
Odds_F_somewhat
[1] 0.437202
[1] 3.570266
Convert to probabilities
Next, we will estimate the probabilities using the formula, probability =
odds/(1 + odds)
. The cumulative probabilities for F-unlikely = 0.30
and F_somewhat = 0.78
. As this represents cumulative probabilities, we can estimate the probabilities as unlikely, somewhat likely, and very likely.
F_unlikely = Odds_F_unlikely / (1 + Odds_F_unlikely)
F_unlikely
F_somewhat = Odds_F_somewhat / (1 + Odds_F_somewhat)
F_somewhat
[1] 0.3042036
[1] 0.7811943
Getting probabilities for individual categories
In this stage, we estimate the probabilities of being unlikely, somewhat likely, and very likely.
Here are the probability estimates:
P_unlikely = 0.30
P_somewhat = 0.78 — 0.30 = 0.48
P_very_likely = 1 — 0.78 = 0.22
- So total probabilities
(P_unlikely + P_somewhat + P_very_likely = 0.30 + 0.48 + 0.22 = 1)
sum up to 1.
# Get probabilities for individual categories based on the concept of cumulative probabilities
# for unlikely i.e., P(unlikely)
P_unlikely = F_unlikely
P_unlikely
# for somewhat likely i.e., P(somewhat) = P(Y<=2) - P(Y<=1)
P_somewhat_likely = F_somewhat - F_unlikely
P_somewhat_likely
# for somewhat likely i.e., P(very) = 1 - P(Y<=2)
P_very_likely = 1- F_somewhat
P_very_likely
[1] 0.3042036
[1] 0.4769908
[1] 0.2188057
Plotting probabilities for the original data
Now we have trained the model, we can use the model for making predictions. We can utilize the predict( )
function to predict the probabilities (type = “probs”)
for each observation and combine them with existing data using cbind( )
. This way, we can visualize the probability changes over a variable range.
data1 <- cbind(data, predict(model, data, type = "probs"))
head(data1)
As the probabilities are in their separate columns (wide format)
, we need to bring these column headings into one column and their probability values into another. So, one approach is to convert the wide format
into a long format
using tidyverse’s pivot_longer( )
function, where we need to pass the column names that we want to put under Level
column and their corresponding values into Probability
column.
plotdata <- data1 %>%
pivot_longer(
cols = c("unlikely", "somewhat likely", "very likely"),
names_to = "Level",
values_to = "Probability"
)
head(plotdata)
Once we have a long data format, we can use ggplot
library to visualize it.
- Here we plot the
gpa
in x-axis andProbability
on the y-axis and color the lines based onLabel
. - Next, we create a faceted plot. Row-wise faced based on
pared
and column-wise facet based onpublic
. - Adding a black-and-white theme [
theme_bw( )
] to it. - Then we can save the plot using
ggsave( )
function with supplyingwidth = 20, height = 15, units = ‘cm’, and dpi = 600
.
# Plotting probabilities
ggplot(plotdata, aes(x = gpa, y = Probability, color = Level)) +
geom_line(size = 0.8) +
facet_grid(pared ~ public, labeller = "label_both") +
theme_bw()
# Saving the plot
ggsave("original_data_probs.png",
width = 20,
height = 15,
units = "cm",
dpi = 600)
Plotting the probabilities for a new dataset
We can also use the trained model to estimate the probabilities of another dataset (a test dataset).
Here to illustrate, we are going to generate a dummy dataset
- Generating 400 observations and saving them to
newdata
. - Converting
pared
andpublic
into factor variable. - Predicting probabilities and combining them with
newdata
.
# Generating a new dataset
newdata <- data.frame(
pared = rep(0:1, 200),
public = rep(0:1, each = 200),
gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4)
)
# Converting it to categorical variable
newdata$pared <- as.factor(newdata$pared)
newdata$public <- as.factor(newdata$public)
# Estimating and combining probabilities to newdata
newdata <- cbind(newdata, predict(model, newdata, type = "probs"))
head(newdata)
The next step is to convert wide format
to long format
# Wide to long
plotdata1 <- newdata %>%
pivot_longer(
cols = c("unlikely", "somewhat likely", "very likely"),
names_to = "Level",
values_to = "Probability"
)
head(plotdata1)
The next step is plotting the probabilities and saving the plot in .png
format with 600 dpi
resolution.
# Plotting probabilities
ggplot(plotdata1, aes(x = gpa, y = Probability, color = Level)) +
geom_line() +
facet_grid(pared ~ public, labeller = "label_both") +
theme_bw()
# Saving the plot
ggsave("new_data_probs.png",
width = 20,
height = 15,
units = "cm",
dpi = 600)
Ordered logistic regression is instrumental when you want to predict an ordered outcome. It has several applications in social science, transportation, econometrics, and other relevant domains.
I hope you now understand how to fit an ordered logistic regression model and how to interpret it. Try this approach on your data and see how it goes.
Note: The same can be done using Python as well, using the pandas
and statsmodels
library.
Thank you note:
I am grateful to the R community for providing exceptional packages that have been instrumental in my work. I also extend my heartfelt appreciation to Dr. Gregg Harbaugh for his insightful explanation of ordinal regression, which can be found at: https://www.youtube.com/watch?v=jWIJ7P1G9P4. Additionally, I express my gratitude to UCLA for their invaluable regression tutorials that utilize R and have been tremendously helpful in my learning journey.
Click here for the data and code
I hope you’ve found this article useful!