Ordered logit model

Introduction

Unlike binary logistic regression (two categories in the dependent variable), ordered logistic regression can have three or more categories assuming they can have a natural ordering (not nominal).

The basic philosophy behind this regression model is that as one variable increases, it would result in a shift towards either end of the spectrum of the ordinal responses.

When to use an ordinal logistic regression model

There are various scenarios where an ordinal regression could be useful

  • When your outcome variable has a finite number of categories with a natural ordering.
  • The proportional odds assumption holds.

To understand how to model ordinal outcomes and interpret the model coefficients, in this article, we will model the student’s college apply status using R statistical programming language.

Article Outline

This article contains the following:

  • Introduction
  • Loading libraries
  • Data description
  • Data types
  • Model fitting
  • Adding p-values
  • Calculating Odds ratio
  • Estimating 95% Confidence Intervals (CI)
  • Model interpretation
  • Probability plots

Loading libraries

The first step is to install and import the following libraries in R.

# Loading libraries
library(MASS)
library(foreign)
library(tidyverse)
library(margins)

Data Description

Here we are going to use the application to graduate school dataset obtained from the UCLA website: https://stats.idre.ucla.edu

It contains four variables where apply is the ordinal outcome/dependent variable, and gpapared, and publicare the independent variables.

apply (1/2/3): It has 3 levels namely unlikely, somewhat likely, and very likely, coded as 1, 2, and 3 respectively. The 3 being highest and 1 being lowest.
gpa: grade point average (gpa).
pared (0/1): a binary variable that indicates if at least one parent went to graduate school. The value of 1 indicates at least one parent went to graduate school.
public (0/1): refers to the type of undergraduate institute, where 1 indicates that the undergraduate institution is public, and 0 private.

Data Preparation

Let’s load the dataset using the¬†read_csv( )¬†function and print the first six rows using the¬†head( )¬†function.

data <- read_csv("admission.csv")
head(data)
Top six rows

Let’s check the data types using the¬†str( )¬†function. The¬†apply¬†column is of¬†string¬†type, and the rest of the columns are of type¬†numeric.

str(data)
Structure of the dataset

Fixing Data Types

Next, we will fix the data type to suit the model requirements. First, we need to convert the apply column to an ordinal column. We can do this using the ordered( ) function where we need to supply the apply column and its labels in a natural order. Next, convert the pared and public columns to a categorical variable.

Print the labels to see the order.

data$apply <- ordered(data$apply, levels = c("unlikely", "somewhat likely", "very likely"))
data$pared <- as.factor(data$pared)
data$public <- as.factor(data$public)

levels(data$apply)
levels(data$pared)
levels(data$public)
Data Types

Model Fitting

Now we have a polished dataset; we can fit the model using the porl( ) function from MASS package, where we need to supply the apply ~ pared + public + gpa and Hess = True .

  • Next, print the model summary using¬†summary(model)¬†.
  • The model outputs¬†coefficients¬†and¬†tau-cuts
  • These coefficients are used to estimate the¬†latent score formula (y*)¬†and then we can compare it with¬†tau-cuts¬†.
model <- polr(apply ~ pared + public + gpa, data = data, Hess = TRUE)
summary(model)
Model estimates

Adding P-values

Next, we will add the p-values to report the significant variables at a 95% confidence interval. We can estimate it using the pnorm( ) function using  t-value from the summary table. Next, we can add these p-values to the summary table using cbind( ) function which binds it column-wise.

(ctable <- coef(summary(model)))

p <- pnorm(abs(ctable[,"t value"]), lower.tail = FALSE) * 2

(ctable <- cbind(ctable,
                 "p-value" = round(p, digits = 2)))
Model summary
Model estimates with P-value

Calculating 95% CI

You can also estimate the 95% confidence intervals of the estimates using the confint( ) function.

(ci <- confint(model))
95% CI Range

Odds Ratio

We already discussed that model coefficients are not directly interpretable as they are in log-odds terms. One of the best ways to interpret them is to estimate the odds ratio. We can obtain an odds ratio by taking an exponent of the coefficient.

# Odd ratios
round(exp(coef(model)), digits = 2)
Odds Ratio

The interpretation of the odds ratio

GPA: When a student’s GPA increases by one unit, the likelihood of them being more likely to apply (very or somewhat likely versus unlikely) is multiplied by 1.85, equivalent to an 85% increase while keeping all other variables constant.

Parents: When comparing students whose parents attended college to those whose parents did not, the odds of the former being more likely to apply (i.e., very or somewhat likely versus unlikely) is 2.85 times that of the latter while keeping all other variables constant.

Public: When comparing public schools to those in private schools, the odds of public school being more likely to apply (i.e., very or somewhat likely versus unlikely) is 5.71% lower, which is equivalent to (1‚Äď0.943) x 100%, while keeping all other variables constant.

Probability calculations

While calculating probabilities, we must remember that
a) Ordinal logistic regression uses log-odds of cumulative probabilities,
b) Cumulative logit(.) requires subtracting the model estimates.

Equations

Here we get two equations as the probability of the third one can be estimated by subtracting it from 1 (total probabilities sum up to 1)

  • logit(P(Y<=1)) = logit(F_unlikely) = 2.20 ‚ÄĒ (1.05 * Pared ‚ÄĒ 0.06 * Public + 0.616 * GPA)
  • logit(P(Y<=2)) = logit(F_somewhat_likely) = 4.30 ‚ÄĒ (1.05 * Pared ‚ÄĒ 0.06 * Public + 0.616 * GPA)

Interpretation of an observation

Let’s compute the logit(F_unlikely) and logit(F_somewhat_likely) for an observation.

Here we are going to take the second observation from our dataset, where, pared = 1 , public = 0 , and gpa = 3.21 .

The estimation shows that logit_F_unlikely = -0.82736 and logit_F_somewhat = 1.27264 . The next step is to estimate the odds.

# Let's take the second observation from original dataset
pared = 1  # at least one parent went to graduate school
public = 0 # attended a private institution
gpa = 3.21 # gpa secured

logit_F_unlikely = 2.20 - (1.05 * pared - 0.06 * public + 0.616 * gpa)
logit_F_unlikely

logit_F_somewhat = 4.30 - (1.05 * pared - 0.06 * public + 0.616 * gpa)
logit_F_somewhat

# Note: logit_F_unlikely and logit_F_somewhat are in logit form

[1] -0.82736
[1] 1.27264

Converting it to odds

We can compute the odds by taking the exponent. The estimation shows that odds_F_unlikely = 0.437202 and odds_F_somewhat = 3.570266 .

# Convert to odds
Odds_F_unlikely = exp(logit_F_unlikely)
Odds_F_unlikely

Odds_F_somewhat = exp(logit_F_somewhat)
Odds_F_somewhat

[1] 0.437202
[1] 3.570266

Convert to probabilities

Next, we will estimate the probabilities using the formula, probability = odds/(1 + odds) . The cumulative probabilities for F-unlikely = 0.30 and F_somewhat = 0.78 . As this represents cumulative probabilities, we can estimate the probabilities as unlikely, somewhat likely, and very likely.

F_unlikely = Odds_F_unlikely / (1 + Odds_F_unlikely)
F_unlikely

F_somewhat = Odds_F_somewhat / (1 + Odds_F_somewhat)
F_somewhat

[1] 0.3042036
[1] 0.7811943

Getting probabilities for individual categories

In this stage, we estimate the probabilities of being unlikely, somewhat likely, and very likely.

Here are the probability estimates:

  • P_unlikely = 0.30
  • P_somewhat = 0.78 ‚ÄĒ 0.30 = 0.48
  • P_very_likely = 1 ‚ÄĒ 0.78 = 0.22
  • So total probabilities¬†(P_unlikely + P_somewhat + P_very_likely = 0.30 + 0.48 + 0.22 = 1)¬†sum up to 1.
# Get probabilities for individual categories based on the concept of cumulative probabilities
# for unlikely i.e., P(unlikely)
P_unlikely = F_unlikely
P_unlikely

# for somewhat likely i.e., P(somewhat) = P(Y<=2) - P(Y<=1)
P_somewhat_likely = F_somewhat - F_unlikely
P_somewhat_likely

# for somewhat likely i.e., P(very) = 1 - P(Y<=2)
P_very_likely = 1- F_somewhat
P_very_likely

[1] 0.3042036
[1] 0.4769908
[1] 0.2188057

Plotting probabilities for the original data

Now we have trained the model, we can use the model for making predictions. We can utilize the¬†predict( )¬†function to predict the probabilities¬†(type = ‚Äúprobs‚ÄĚ)for each observation and combine them with existing data using¬†cbind( ). This way, we can visualize the probability changes over a variable range.

data1 <- cbind(data, predict(model, data, type = "probs"))
head(data1)
Predicted probabilities

As the probabilities are in their separate columns¬†(wide format)¬†, we need to bring these column headings into one column and their probability values into another. So, one approach is to convert the¬†wide format¬†into a¬†long format¬†using tidyverse’s¬†pivot_longer( )¬†function, where we need to pass the column names that we want to put under¬†Level¬†column and their corresponding values into¬†Probability¬†column.

plotdata <- data1 %>% 
    pivot_longer(
        cols = c("unlikely", "somewhat likely", "very likely"),
        names_to = "Level",
        values_to = "Probability"
    )
head(plotdata)
Long format

Once we have a long data format, we can use ggplot library to visualize it.

  • Here we plot the¬†gpa¬†in x-axis and¬†Probability¬†on the y-axis and color the lines based on¬†Label¬†.
  • Next, we create a faceted plot. Row-wise faced based on¬†pared¬†and column-wise facet based on¬†public¬†.
  • Adding a black-and-white theme¬†[theme_bw( )]¬†to it.
  • Then we can save the plot using¬†ggsave( )¬†function with supplying¬†width = 20, height = 15, units = ‚Äėcm‚Äô, and dpi = 600¬†.
# Plotting probabilities
ggplot(plotdata, aes(x = gpa, y = Probability, color = Level)) +
    geom_line(size = 0.8) +
    facet_grid(pared ~ public, labeller = "label_both") +
    theme_bw()

# Saving the plot
ggsave("original_data_probs.png",
       width = 20,
       height = 15,
       units = "cm",
       dpi = 600)
Probability plot for original train data

Plotting the probabilities for a new dataset

We can also use the trained model to estimate the probabilities of another dataset (a test dataset).

Here to illustrate, we are going to generate a dummy dataset

  • Generating 400 observations and saving them to¬†newdata¬†.
  • Converting¬†pared¬†and¬†public¬†into factor variable.
  • Predicting probabilities and combining them with¬†newdata¬†.
# Generating a new dataset
newdata <- data.frame(
    pared = rep(0:1, 200),
    public = rep(0:1, each = 200),
    gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4)
)

# Converting it to categorical variable
newdata$pared <- as.factor(newdata$pared)
newdata$public <- as.factor(newdata$public)

# Estimating and combining probabilities to newdata
newdata <- cbind(newdata, predict(model, newdata, type = "probs"))

head(newdata)
Probabilities of ‘newdata'

The next step is to convert wide format to long format

# Wide to long
plotdata1 <- newdata %>% 
    pivot_longer(
        cols = c("unlikely", "somewhat likely", "very likely"),
        names_to = "Level",
        values_to = "Probability"
        )

head(plotdata1)
Long data format

The next step is plotting the probabilities and saving the plot in .png format with 600 dpi resolution.

# Plotting probabilities
ggplot(plotdata1, aes(x = gpa, y = Probability, color = Level)) +
    geom_line() +
    facet_grid(pared ~ public, labeller = "label_both") +
    theme_bw()

# Saving the plot
ggsave("new_data_probs.png",
       width = 20,
       height = 15,
       units = "cm",
       dpi = 600)
Probability plot for ‚Äėnewdata‚Äô

Ordered logistic regression is instrumental when you want to predict an ordered outcome. It has several applications in social science, transportation, econometrics, and other relevant domains.

I hope you now understand how to fit an ordered logistic regression model and how to interpret it. Try this approach on your data and see how it goes.

Note: The same can be done using Python as well, using the pandas and statsmodels library.

Thank you note:

I am grateful to the R community for providing exceptional packages that have been instrumental in my work. I also extend my heartfelt appreciation to Dr. Gregg Harbaugh for his insightful explanation of ordinal regression, which can be found at: https://www.youtube.com/watch?v=jWIJ7P1G9P4. Additionally, I express my gratitude to UCLA for their invaluable regression tutorials that utilize R and have been tremendously helpful in my learning journey.

Click here for the data and code

I hope you’ve found this article useful!