Unlike binary logistic regression (two categories in the dependent variable), ordered logistic regression can have three or more categories assuming they can have a natural ordering (not nominal).
The basic philosophy behind this regression model is that as one variable increases, it would result in a shift towards either end of the spectrum of the ordinal responses.
When to use an ordinal logistic regression model
There are various scenarios where an ordinal regression could be useful
- When your outcome variable has a finite number of categories with a natural ordering.
- The proportional odds assumption holds.
To understand how to model ordinal outcomes and interpret the model coefficients, in this article, we will model the student’s college
apply status using R statistical programming language.
This article contains the following:
- Loading libraries
- Data description
- Data types
- Model fitting
- Adding p-values
- Calculating Odds ratio
- Estimating 95% Confidence Intervals (CI)
- Model interpretation
- Probability plots
The first step is to install and import the following libraries in R.
# Loading libraries library(MASS) library(foreign) library(tidyverse) library(margins)
Here we are going to use the application to graduate school dataset obtained from the UCLA website: https://stats.idre.ucla.edu
It contains four variables where
apply is the ordinal outcome/dependent variable, and
publicare the independent variables.
apply (1/2/3): It has 3 levels namely
somewhat likely, and
very likely, coded as
3 respectively. The
3 being highest and
1 being lowest.
gpa: grade point average (gpa).
pared (0/1): a binary variable that indicates if at least one parent went to graduate school. The value of 1 indicates at least one parent went to graduate school.
public (0/1): refers to the type of undergraduate institute, where
1 indicates that the undergraduate institution is public, and
Let’s load the dataset using the
read_csv( ) function and print the first six rows using the
head( ) function.
data <- read_csv("admission.csv") head(data)
Let’s check the data types using the
str( ) function. The
apply column is of
string type, and the rest of the columns are of type
Fixing Data Types
Next, we will fix the data type to suit the model requirements. First, we need to convert the
apply column to an ordinal column. We can do this using the
ordered( ) function where we need to supply the
apply column and its labels in a natural order. Next, convert the
public columns to a categorical variable.
Print the labels to see the order.
data$apply <- ordered(data$apply, levels = c("unlikely", "somewhat likely", "very likely")) data$pared <- as.factor(data$pared) data$public <- as.factor(data$public) levels(data$apply) levels(data$pared) levels(data$public)
Now we have a polished dataset; we can fit the model using the
porl( ) function from MASS package, where we need to supply the
apply ~ pared + public + gpa and
Hess = True .
- Next, print the model summary using
- The model outputs
- These coefficients are used to estimate the
latent score formula (y*)and then we can compare it with
model <- polr(apply ~ pared + public + gpa, data = data, Hess = TRUE) summary(model)
Next, we will add the p-values to report the significant variables at a 95% confidence interval. We can estimate it using the
pnorm( ) function using
t-value from the summary table. Next, we can add these p-values to the summary table using
cbind( ) function which binds it column-wise.
(ctable <- coef(summary(model))) p <- pnorm(abs(ctable[,"t value"]), lower.tail = FALSE) * 2 (ctable <- cbind(ctable, "p-value" = round(p, digits = 2)))
Calculating 95% CI
You can also estimate the 95% confidence intervals of the estimates using the
confint( ) function.
(ci <- confint(model))
We already discussed that model coefficients are not directly interpretable as they are in log-odds terms. One of the best ways to interpret them is to estimate the odds ratio. We can obtain an odds ratio by taking an exponent of the coefficient.
# Odd ratios round(exp(coef(model)), digits = 2)
The interpretation of the odds ratio
GPA: When a student’s GPA increases by one unit, the likelihood of them being more likely to apply (very or somewhat likely versus unlikely) is multiplied by 1.85, equivalent to an 85% increase while keeping all other variables constant.
Parents: When comparing students whose parents attended college to those whose parents did not, the odds of the former being more likely to apply (i.e., very or somewhat likely versus unlikely) is 2.85 times that of the latter while keeping all other variables constant.
Public: When comparing public schools to those in private schools, the odds of public school being more likely to apply (i.e., very or somewhat likely versus unlikely) is 5.71% lower, which is equivalent to (1–0.943) x 100%, while keeping all other variables constant.
While calculating probabilities, we must remember that
a) Ordinal logistic regression uses log-odds of cumulative probabilities,
b) Cumulative logit(.) requires subtracting the model estimates.
Here we get two equations as the probability of the third one can be estimated by subtracting it from 1 (total probabilities sum up to 1)
- logit(P(Y<=1)) = logit(F_unlikely) = 2.20 — (1.05 * Pared — 0.06 * Public + 0.616 * GPA)
- logit(P(Y<=2)) = logit(F_somewhat_likely) = 4.30 — (1.05 * Pared — 0.06 * Public + 0.616 * GPA)
Interpretation of an observation
Let’s compute the logit(F_unlikely) and logit(F_somewhat_likely) for an observation.
Here we are going to take the second observation from our dataset, where,
pared = 1 ,
public = 0 , and
gpa = 3.21 .
The estimation shows that
logit_F_unlikely = -0.82736 and
logit_F_somewhat = 1.27264 . The next step is to estimate the odds.
# Let's take the second observation from original dataset pared = 1 # at least one parent went to graduate school public = 0 # attended a private institution gpa = 3.21 # gpa secured logit_F_unlikely = 2.20 - (1.05 * pared - 0.06 * public + 0.616 * gpa) logit_F_unlikely logit_F_somewhat = 4.30 - (1.05 * pared - 0.06 * public + 0.616 * gpa) logit_F_somewhat # Note: logit_F_unlikely and logit_F_somewhat are in logit form
Converting it to odds
We can compute the odds by taking the exponent. The estimation shows that
odds_F_unlikely = 0.437202 and
odds_F_somewhat = 3.570266 .
# Convert to odds Odds_F_unlikely = exp(logit_F_unlikely) Odds_F_unlikely Odds_F_somewhat = exp(logit_F_somewhat) Odds_F_somewhat
Convert to probabilities
Next, we will estimate the probabilities using the formula,
odds/(1 + odds) . The cumulative probabilities for
F-unlikely = 0.30 and
F_somewhat = 0.78 . As this represents cumulative probabilities, we can estimate the probabilities as unlikely, somewhat likely, and very likely.
F_unlikely = Odds_F_unlikely / (1 + Odds_F_unlikely) F_unlikely F_somewhat = Odds_F_somewhat / (1 + Odds_F_somewhat) F_somewhat
Getting probabilities for individual categories
In this stage, we estimate the probabilities of being unlikely, somewhat likely, and very likely.
Here are the probability estimates:
P_unlikely = 0.30
P_somewhat = 0.78 — 0.30 = 0.48
P_very_likely = 1 — 0.78 = 0.22
- So total probabilities
(P_unlikely + P_somewhat + P_very_likely = 0.30 + 0.48 + 0.22 = 1)sum up to 1.
# Get probabilities for individual categories based on the concept of cumulative probabilities # for unlikely i.e., P(unlikely) P_unlikely = F_unlikely P_unlikely # for somewhat likely i.e., P(somewhat) = P(Y<=2) - P(Y<=1) P_somewhat_likely = F_somewhat - F_unlikely P_somewhat_likely # for somewhat likely i.e., P(very) = 1 - P(Y<=2) P_very_likely = 1- F_somewhat P_very_likely
Plotting probabilities for the original data
Now we have trained the model, we can use the model for making predictions. We can utilize the
predict( ) function to predict the probabilities
(type = “probs”)for each observation and combine them with existing data using
cbind( ). This way, we can visualize the probability changes over a variable range.
data1 <- cbind(data, predict(model, data, type = "probs")) head(data1)
As the probabilities are in their separate columns
(wide format) , we need to bring these column headings into one column and their probability values into another. So, one approach is to convert the
wide format into a
long format using tidyverse’s
pivot_longer( ) function, where we need to pass the column names that we want to put under
Level column and their corresponding values into
plotdata <- data1 %>% pivot_longer( cols = c("unlikely", "somewhat likely", "very likely"), names_to = "Level", values_to = "Probability" ) head(plotdata)
Once we have a long data format, we can use
ggplot library to visualize it.
- Here we plot the
gpain x-axis and
Probabilityon the y-axis and color the lines based on
- Next, we create a faceted plot. Row-wise faced based on
paredand column-wise facet based on
- Adding a black-and-white theme [
theme_bw( )] to it.
- Then we can save the plot using
ggsave( )function with supplying
width = 20, height = 15, units = ‘cm’, and dpi = 600.
# Plotting probabilities ggplot(plotdata, aes(x = gpa, y = Probability, color = Level)) + geom_line(size = 0.8) + facet_grid(pared ~ public, labeller = "label_both") + theme_bw() # Saving the plot ggsave("original_data_probs.png", width = 20, height = 15, units = "cm", dpi = 600)
Plotting the probabilities for a new dataset
We can also use the trained model to estimate the probabilities of another dataset (a test dataset).
Here to illustrate, we are going to generate a dummy dataset
- Generating 400 observations and saving them to
publicinto factor variable.
- Predicting probabilities and combining them with
# Generating a new dataset newdata <- data.frame( pared = rep(0:1, 200), public = rep(0:1, each = 200), gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4) ) # Converting it to categorical variable newdata$pared <- as.factor(newdata$pared) newdata$public <- as.factor(newdata$public) # Estimating and combining probabilities to newdata newdata <- cbind(newdata, predict(model, newdata, type = "probs")) head(newdata)
The next step is to convert
wide format to
# Wide to long plotdata1 <- newdata %>% pivot_longer( cols = c("unlikely", "somewhat likely", "very likely"), names_to = "Level", values_to = "Probability" ) head(plotdata1)
The next step is plotting the probabilities and saving the plot in
.png format with
600 dpi resolution.
# Plotting probabilities ggplot(plotdata1, aes(x = gpa, y = Probability, color = Level)) + geom_line() + facet_grid(pared ~ public, labeller = "label_both") + theme_bw() # Saving the plot ggsave("new_data_probs.png", width = 20, height = 15, units = "cm", dpi = 600)
Ordered logistic regression is instrumental when you want to predict an ordered outcome. It has several applications in social science, transportation, econometrics, and other relevant domains.
I hope you now understand how to fit an ordered logistic regression model and how to interpret it. Try this approach on your data and see how it goes.
Note: The same can be done using Python as well, using the
Thank you note:
I am grateful to the R community for providing exceptional packages that have been instrumental in my work. I also extend my heartfelt appreciation to Dr. Gregg Harbaugh for his insightful explanation of ordinal regression, which can be found at: https://www.youtube.com/watch?v=jWIJ7P1G9P4. Additionally, I express my gratitude to UCLA for their invaluable regression tutorials that utilize R and have been tremendously helpful in my learning journey.
Click here for the data and code
I hope you’ve found this article useful!