## Introduction

Unlike binary logistic regression (two categories in the dependent variable), ordered logistic regression can have three or more categories assuming they can have a natural ordering (not nominal).

The basic philosophy behind this regression model is that as one variable increases, it would result in a shift towards either end of the spectrum of the ordinal responses.

## When to use an ordinal logistic regression model

There are various scenarios where an ordinal regression could be useful

• When your outcome variable has a finite number of categories with a natural ordering.
• The proportional odds assumption holds.

To understand how to model ordinal outcomes and interpret the model coefficients, in this article, we will model the student’s college `apply` status using R statistical programming language.

## Article Outline

• Introduction
• Data description
• Data types
• Model fitting
• Calculating Odds ratio
• Estimating 95% Confidence Intervals (CI)
• Model interpretation
• Probability plots

The first step is to install and import the following libraries in R.

``````# Loading libraries
library(MASS)
library(foreign)
library(tidyverse)
library(margins)``````

## Data Description

Here we are going to use the application to graduate school dataset obtained from the UCLA website: https://stats.idre.ucla.edu

It contains four variables where `apply` is the ordinal outcome/dependent variable, and `gpa``pared`, and `public`are the independent variables.

apply (1/2/3): It has 3 levels namely `unlikely``somewhat likely`, and `very likely`, coded as `1``2`, and `3` respectively. The `3` being highest and `1` being lowest.
pared (0/1): a binary variable that indicates if at least one parent went to graduate school. The value of 1 indicates at least one parent went to graduate school.
public (0/1): refers to the type of undergraduate institute, where `1` indicates that the undergraduate institution is public, and `0` private.

## Data Preparation

Let’s load the dataset using the `read_csv( )` function and print the first six rows using the `head( )` function.

``````data <- read_csv("admission.csv")

Let’s check the data types using the `str( )` function. The `apply` column is of `string` type, and the rest of the columns are of type `numeric`.

``str(data)``

## Fixing Data Types

Next, we will fix the data type to suit the model requirements. First, we need to convert the `apply` column to an ordinal column. We can do this using the `ordered( )` function where we need to supply the `apply` column and its labels in a natural order. Next, convert the `pared` and `public` columns to a categorical variable.

Print the labels to see the order.

``````data\$apply <- ordered(data\$apply, levels = c("unlikely", "somewhat likely", "very likely"))
data\$pared <- as.factor(data\$pared)
data\$public <- as.factor(data\$public)

levels(data\$apply)
levels(data\$pared)
levels(data\$public)``````

## Model Fitting

Now we have a polished dataset; we can fit the model using the `porl( )` function from MASS package, where we need to supply the `apply ~ pared + public + gpa` and `Hess = True` .

• Next, print the model summary using `summary(model)` .
• The model outputs `coefficients` and `tau-cuts`
• These coefficients are used to estimate the `latent score formula (y*)` and then we can compare it with `tau-cuts` .
``````model <- polr(apply ~ pared + public + gpa, data = data, Hess = TRUE)
summary(model)``````

Next, we will add the p-values to report the significant variables at a 95% confidence interval. We can estimate it using the `pnorm( )` function using  `t-value` from the summary table. Next, we can add these p-values to the summary table using `cbind( )` function which binds it column-wise.

``````(ctable <- coef(summary(model)))

p <- pnorm(abs(ctable[,"t value"]), lower.tail = FALSE) * 2

(ctable <- cbind(ctable,
"p-value" = round(p, digits = 2)))``````

## Calculating 95% CI

You can also estimate the 95% confidence intervals of the estimates using the `confint( )` function.

``(ci <- confint(model))``

## Odds Ratio

We already discussed that model coefficients are not directly interpretable as they are in log-odds terms. One of the best ways to interpret them is to estimate the odds ratio. We can obtain an odds ratio by taking an exponent of the coefficient.

``````# Odd ratios
round(exp(coef(model)), digits = 2)``````

### The interpretation of the odds ratio

GPA: When a student’s GPA increases by one unit, the likelihood of them being more likely to apply (very or somewhat likely versus unlikely) is multiplied by 1.85, equivalent to an 85% increase while keeping all other variables constant.

Parents: When comparing students whose parents attended college to those whose parents did not, the odds of the former being more likely to apply (i.e., very or somewhat likely versus unlikely) is 2.85 times that of the latter while keeping all other variables constant.

Public: When comparing public schools to those in private schools, the odds of public school being more likely to apply (i.e., very or somewhat likely versus unlikely) is 5.71% lower, which is equivalent to (1–0.943) x 100%, while keeping all other variables constant.

# Probability calculations

While calculating probabilities, we must remember that
a) Ordinal logistic regression uses log-odds of cumulative probabilities,
b) Cumulative logit(.) requires subtracting the model estimates.

## Equations

Here we get two equations as the probability of the third one can be estimated by subtracting it from 1 (total probabilities sum up to 1)

• logit(P(Y<=1)) = logit(F_unlikely) = 2.20 — (1.05 * Pared — 0.06 * Public + 0.616 * GPA)
• logit(P(Y<=2)) = logit(F_somewhat_likely) = 4.30 — (1.05 * Pared — 0.06 * Public + 0.616 * GPA)

## Interpretation of an observation

Let’s compute the logit(F_unlikely) and logit(F_somewhat_likely) for an observation.

Here we are going to take the second observation from our dataset, where, `pared = 1` , `public = 0` , and `gpa = 3.21` .

The estimation shows that `logit_F_unlikely = -0.82736` and `logit_F_somewhat = 1.27264` . The next step is to estimate the odds.

``````# Let's take the second observation from original dataset
pared = 1  # at least one parent went to graduate school
public = 0 # attended a private institution
gpa = 3.21 # gpa secured

logit_F_unlikely = 2.20 - (1.05 * pared - 0.06 * public + 0.616 * gpa)
logit_F_unlikely

logit_F_somewhat = 4.30 - (1.05 * pared - 0.06 * public + 0.616 * gpa)
logit_F_somewhat

# Note: logit_F_unlikely and logit_F_somewhat are in logit form``````

[1] -0.82736
[1] 1.27264

## Converting it to odds

We can compute the odds by taking the exponent. The estimation shows that `odds_F_unlikely = 0.437202` and `odds_F_somewhat = 3.570266` .

``````# Convert to odds
Odds_F_unlikely = exp(logit_F_unlikely)
Odds_F_unlikely

Odds_F_somewhat = exp(logit_F_somewhat)
Odds_F_somewhat``````

[1] 0.437202
[1] 3.570266

## Convert to probabilities

Next, we will estimate the probabilities using the formula, `probability =`` odds/(1 + odds)` . The cumulative probabilities for `F-unlikely = 0.30` and `F_somewhat = 0.78` . As this represents cumulative probabilities, we can estimate the probabilities as unlikely, somewhat likely, and very likely.

``````F_unlikely = Odds_F_unlikely / (1 + Odds_F_unlikely)
F_unlikely

F_somewhat = Odds_F_somewhat / (1 + Odds_F_somewhat)
F_somewhat``````

[1] 0.3042036
[1] 0.7811943

## Getting probabilities for individual categories

In this stage, we estimate the probabilities of being unlikely, somewhat likely, and very likely.

Here are the probability estimates:

• `P_unlikely = 0.30`
• `P_somewhat = 0.78 — 0.30 = 0.48`
• `P_very_likely = 1 — 0.78 = 0.22`
• So total probabilities `(P_unlikely + P_somewhat + P_very_likely = 0.30 + 0.48 + 0.22 = 1)` sum up to 1.
``````# Get probabilities for individual categories based on the concept of cumulative probabilities
# for unlikely i.e., P(unlikely)
P_unlikely = F_unlikely
P_unlikely

# for somewhat likely i.e., P(somewhat) = P(Y<=2) - P(Y<=1)
P_somewhat_likely = F_somewhat - F_unlikely
P_somewhat_likely

# for somewhat likely i.e., P(very) = 1 - P(Y<=2)
P_very_likely = 1- F_somewhat
P_very_likely``````

[1] 0.3042036
[1] 0.4769908
[1] 0.2188057

## Plotting probabilities for the original data

Now we have trained the model, we can use the model for making predictions. We can utilize the `predict( )` function to predict the probabilities `(type = “probs”)`for each observation and combine them with existing data using `cbind( )`. This way, we can visualize the probability changes over a variable range.

``````data1 <- cbind(data, predict(model, data, type = "probs"))

As the probabilities are in their separate columns `(wide format)` , we need to bring these column headings into one column and their probability values into another. So, one approach is to convert the `wide format` into a `long format` using tidyverse’s `pivot_longer( )` function, where we need to pass the column names that we want to put under `Level` column and their corresponding values into `Probability` column.

``````plotdata <- data1 %>%
pivot_longer(
cols = c("unlikely", "somewhat likely", "very likely"),
names_to = "Level",
values_to = "Probability"
)

Once we have a long data format, we can use `ggplot` library to visualize it.

• Here we plot the `gpa` in x-axis and `Probability` on the y-axis and color the lines based on `Label` .
• Next, we create a faceted plot. Row-wise faced based on `pared` and column-wise facet based on `public` .
• Adding a black-and-white theme [`theme_bw( )`] to it.
• Then we can save the plot using `ggsave( )` function with supplying `width = 20, height = 15, units = ‘cm’, and dpi = 600` .
``````# Plotting probabilities
ggplot(plotdata, aes(x = gpa, y = Probability, color = Level)) +
geom_line(size = 0.8) +
facet_grid(pared ~ public, labeller = "label_both") +
theme_bw()

# Saving the plot
ggsave("original_data_probs.png",
width = 20,
height = 15,
units = "cm",
dpi = 600)``````

## Plotting the probabilities for a new dataset

We can also use the trained model to estimate the probabilities of another dataset (a test dataset).

Here to illustrate, we are going to generate a dummy dataset

• Generating 400 observations and saving them to `newdata` .
• Converting `pared` and `public` into factor variable.
• Predicting probabilities and combining them with `newdata` .
``````# Generating a new dataset
newdata <- data.frame(
pared = rep(0:1, 200),
public = rep(0:1, each = 200),
gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4)
)

# Converting it to categorical variable
newdata\$pared <- as.factor(newdata\$pared)
newdata\$public <- as.factor(newdata\$public)

# Estimating and combining probabilities to newdata
newdata <- cbind(newdata, predict(model, newdata, type = "probs"))

The next step is to convert `wide format` to `long format`

``````# Wide to long
plotdata1 <- newdata %>%
pivot_longer(
cols = c("unlikely", "somewhat likely", "very likely"),
names_to = "Level",
values_to = "Probability"
)

The next step is plotting the probabilities and saving the plot in `.png` format with `600 dpi` resolution.

``````# Plotting probabilities
ggplot(plotdata1, aes(x = gpa, y = Probability, color = Level)) +
geom_line() +
facet_grid(pared ~ public, labeller = "label_both") +
theme_bw()

# Saving the plot
ggsave("new_data_probs.png",
width = 20,
height = 15,
units = "cm",
dpi = 600)``````

Ordered logistic regression is instrumental when you want to predict an ordered outcome. It has several applications in social science, transportation, econometrics, and other relevant domains.

I hope you now understand how to fit an ordered logistic regression model and how to interpret it. Try this approach on your data and see how it goes.

Note: The same can be done using Python as well, using the `pandas `and `statsmodels` library.

Thank you note:

I am grateful to the R community for providing exceptional packages that have been instrumental in my work. I also extend my heartfelt appreciation to Dr. Gregg Harbaugh for his insightful explanation of ordinal regression, which can be found at: https://www.youtube.com/watch?v=jWIJ7P1G9P4. Additionally, I express my gratitude to UCLA for their invaluable regression tutorials that utilize R and have been tremendously helpful in my learning journey.