Discrete-time survival analysis using logit models are different from
the Cox proportional hazards models in the last lecture:
1. How Time is Treated:
- Discrete-Time Logit Models are specifically
designed for situations where time is measured in discrete intervals
(e.g., years, months, grades in school). The event (e.g., promotion,
dropout) is considered to occur within one of these distinct time
periods. The data is structured in “person-period” format, where each
individual contributes multiple rows, one for each time period they are
at risk. The outcome variable is binary, indicating whether the event
occurred in that specific interval (1) or not (0), given they survived
until the beginning of the interval.
- Cox Proportional Hazards (Cox PH) Models are
designed for continuous time. They analyze the hazard rate, which is the
instantaneous risk of the event occurring at a specific point in time,
given that the individual has survived up to that time.3 While Cox
models can handle tied event times (multiple events occurring at the
same recorded time), they fundamentally assume an underlying continuous
time process.
2. The Outcome Variable and Modeling Approach:
- Discrete-Time Logit Models model the probability of
the event occurring within a specific discrete time interval,
conditional on survival up to the start of that interval. This
probability is typically modeled using a logistic regression framework,
hence the “logit” in the name. The log-odds of the event occurring in
each interval are predicted by the covariates.
- Cox PH Models directly models the hazard rate. It
estimates the hazard ratio, which is the ratio of the hazard rates
between two groups or for a one-unit change in a covariate. The model is
semi-parametric because it estimates the effect of covariates on the
hazard without making assumptions about the shape of the baseline hazard
function (the hazard when all covariates are zero).
3. The Baseline Hazard:
- In Discrete-Time Logit Models, the baseline hazard
(the probability of the event in each time interval when all covariates
are zero) is often treated as time-dependent. In the example code below,
this is implicitly handled by including
time
and
I(time^2)
as predictors. More explicitly, one could include
dummy variables for each time period to allow the baseline hazard to
vary freely across time.
- The Cox PH models has a non-parametric baseline
hazard that is not explicitly estimated. The focus is on how the
covariates proportionally scale this baseline hazard. The “proportional
hazards” assumption implies that the ratio of the hazard rates for any
two individuals remains constant over time.
4. The Proportional Hazards Assumption:
- While the term “discrete-time proportional hazards
model” exists (often using a complementary log-log link), the
logit model itself does not inherently assume proportional hazards. The
effects of covariates can vary across different time intervals,
especially when time-by-covariate interactions are included in the
model.
- A key assumption of the Cox model is that the
hazards are proportional over time. This means that the effect of a
covariate on the hazard rate is constant across all time points.
Violations of this assumption can lead to biased results. Statistical
tests and graphical methods are used to assess this assumption.
5. Handling of Ties:
- Discrete-Time Logit Models: Tied events (multiple
events occurring at the same recorded time) are naturally handled within
the discrete time intervals. If multiple promotions happen in the same
year, they all contribute to the event count for that year.
- Cox PH Models: Tied event times require specific
methods to handle them in the partial likelihood estimation (e.g.,
Breslow, Efron methods).
6. Data Structure:
- Discrete-Time Logit Models requires reshaping the
data into a long (person-period) format.
- Cox PH Models__ typically use a data structure where each row
represents an individual with their survival time and event status.
Time |
Discrete intervals |
Continuous |
Outcome |
Probability of event in an interval |
Hazard rate |
Modeling Approach |
Logistic Regression |
Semi-parametric hazard regression |
Baseline Hazard |
Can be time-dependent |
Non-parametric, not directly estimated |
Proportional Hazards |
Not inherently assumed |
Key assumption |
Handling Ties |
Naturally handled within intervals |
Requires specific tie-handling methods |
Data Structure |
Long (person-period) |
Typically one row per subject |
When to Choose Which:
- Choose Discrete-Time Logit Models when:
- Time is naturally measured in discrete intervals.
- The proportional hazards assumption of the Cox model is
violated.
- You want to directly model the probability of the event in each time
interval.
- You want more flexibility in how the effects of covariates change
over time.
- Choose Cox Proportional Hazards Models when:
- Time is continuous or can be reasonably approximated as
continuous.
- The proportional hazards assumption holds.
- You are primarily interested in estimating hazard ratios.
- You have a large dataset with many unique event times.
Load Libraries
library(tidyverse)
: This line loads the tidyverse package. Think
of this as bringing in your essential toolkit for data manipulation and
visualization in R. It includes powerful functions like
mutate
, pivot_longer
, left_join
,
and filter
, which we’ll see in action.
- library(haven): The haven
package is specifically designed to read data files from various
statistical software packages, including SAS, SPSS, and Stata. In this
case, we’ll use it to read a SAS dataset.
library(tidyverse)
library(haven)
Read in the data
The read_sas()
function from the haven package reads the
SAS dataset located at the provided URL and stores it in a data frame
called rank. This data contains information about associate professors,
their characteristics, and their career progression.
rank <- read_sas("https://statisticalhorizons.com/wp-content/uploads/2022/01/rank.sas7bdat")
#assign id numbers to the 301 case:
rank <- rank %>% mutate(id =1:301)
Here, we’re using the pipe operator %>%
from
dplyr
(part of tidyverse
). It allows us to
chain operations together in a readable way.
- mutate()
is a function that adds new variables or
modifies existing ones in a data frame.
- id = 1:301
creates a new variable named id and assigns a
unique integer from 1 to 301 to each row. This is important for tracking
individual cases, especially when we reshape the data later. We’re
essentially giving each person in the dataset a unique identifier.
Reshape the data from wide to long for articles
Why reshape to long format? In discrete-time survival analysis, we
are interested in the probability of an event (promotion in this case)
occurring at each discrete time point. The long format allows us to have
each person-year as a separate row, which is the structure needed for
fitting our logit model.
rankyrs <- rank %>%
pivot_longer(
cols = starts_with("art"),
names_to = "atime",
values_to = "art"
) %>%
mutate(time = as.numeric(str_remove(atime, "art")))
#reshape the data from wide to long for citations:
rankyrs2 <- rank %>%
pivot_longer(
cols = starts_with("cit"),
names_to = "ctime",
values_to = "cit"
) %>%
mutate(time = as.numeric(str_remove(ctime, "cit")))
pivot_longer()
is a powerful function for reshaping
data from a “wide” format (where each time point has its own column) to
a “long” format (where each row represents a specific time point for an
individual). This is crucial for discrete-time survival analysis.
cols = starts_with("art")
specifies that we want to
gather the columns whose names start with “art”. These represent the
number of articles published at different time points (e.g., art1, art2,
art3…).
names_to = "atime"
creates a new column named “atime”
that will contain the original column names (“art1”, “art2”,
etc.).
values_to = "art"
creates a new column named “art” that
will contain the values from the original “art” columns (the actual
number of articles).
The same logic applies to the block for citations
(starts_with("cit")
), creating ctime
and
cit
columns.
mutate(time = as.numeric(str_remove(atime, "art"))
):
After pivoting, the atime
column (e.g., “art1”, “art2”) is
still in character format. We want to extract the time information (the
number).
str_remove(atime, "art")
from the stringr
package (part of tidyverse
) removes the “art” prefix from
the atime
column, leaving just the number.
as.numeric()
converts this remaining string into a
numeric value, which we store in a new column called time
.
We do the same for the citations.
Merge articles and citations
rankyrs <- rankyrs %>%
left_join(rankyrs2 %>% select(id, time, cit), by = c("id", "time"))
left_join():
This function combines two data frames
based on common columns. We are merging rankyrs
(which now
has the article information in long format) with a subset of
rankyrs2
(which has the citation information in long
format).
rankyrs2 %>% select(id, time, cit)
: We first select
only the id
, time
, and cit
columns from rankyrs2
since these are the variables we want
to join.
by = c("id", "time")
: This specifies that the join
should be based on matching values in both the id
and
time
columns. This ensures that the number of articles and
citations for the same individual at the same time point are combined
into a single row in rankyrs
.
Sort by id and time
rankyrs<-rankyrs %>%
arrange(id, time)
arrange(id, time)
sorts the rankyrs
data
frame first by the id
column (so all observations for the
same individual are together) and then by the time
column
(so the observations for each individual are in chronological order).
This is helpful for understanding the sequence of events.
Recode event variable to be 0 before event occurrence
In discrete-time survival analysis, we model the probability of the
event occurring in a specific time interval, given that it has not
occurred before. Our promo variable captures this: for each person-year,
it’s 1 if the promotion happened at that exact time, and 0
otherwise.
ranksub <- ranksub %>%
mutate(promo = if_else(time == dur, event, 0))
mutate(promo = if_else(time == dur, event, 0))
creates
a new binary outcome variable called promo
.
if_else(condition, value_if_true, value_if_false)
is a
conditional function.
time == dur
is the condition checked to determine
whether the current time point (time
) is equal to the time
of the event (dur
).
event
: If the condition is true (i.e., we are at the
time of promotion), the value of promo
is set to the value
of the event
variable (which is 1 for promotion and 0
otherwise).
- If the condition is false (i.e., we are at a time point before the
promotion), the value of promo is set to
0
, indicating that
the event did not occur at that specific person-year.
Change the missings in jobtime to 11
The choice of 11 might be based on the context of the data (e.g., the
maximum observed job time plus one.
ranksub <- ranksub %>%
mutate(jobtime = if_else(is.na(jobtime), 11, jobtime))
mutate(jobtime = if_else(is.na(jobtime), 11, jobtime))
:
is.na(jobtime)
checks for missing values (represented
as NA
) in the jobtime
column.
- If a value is missing (TRUE), it’s replaced with 11.
- If a value is not missing (FALSE), it remains its original
value.
Define jobpres, a time-varying coviariate
This demonstrates how to handle time-varying covariates in
discrete-time survival analysis. The value of jobpres
can
change for an individual as time progresses, and our data structure in
long format allows us to capture these changes.
ranksub <- ranksub %>%
mutate(jobpres = if_else(time<jobtime,prest1, prest2))
mutate(jobpres = if_else(time < jobtime, prest1, prest2))
creates a new variable jobpres
.
time < jobtime
is the condition checked where the
current time point (time) is less than the jobtime.
- If the condition is true (i.e., before
jobtime
), the
value of jobpres
is taken from the prest1
variable (likely prestige at an earlier job).
- If the condition is false (i.e., at or after
jobtime
),
the value of jobpres
is taken from the prest2
variable (likely prestige at a later or current job).
Fit discrete time logit
This glm()
call fits a discrete-time logit model where
the outcome is the log-odds of promotion at a specific time point. The
coefficients from this model will tell us how each predictor variable is
associated with this log-odds, holding other variables constant. We will
need to exponentiate these coefficients to interpret them as odds
ratios.
myfit<-glm(promo~undgrad + phdmed + phdprest + art + cit + jobpres + time + I(time^2), family=binomial,data=ranksub)
library(gtsummary)
# Create a summary table of the model results
model_table <- tbl_regression(
myfit,
exponentiate = TRUE, # Display odds ratios instead of log-odds
pvalue_fun = ~style_pvalue(.x, digits = 3), # Format p-values
label = list(
"undgrad" ~ "Undergraduate Institution Prestige",
"phdmed" ~ "Medical PhD",
"phdprest" ~ "PhD Program Prestige",
"art" ~ "Number of Articles",
"cit" ~ "Number of Citations",
"jobpres" ~ "Job Prestige",
"time" ~ "Time (Years)",
"I(time^2)" ~ "Time Squared" # Use the exact string "I(time^2)"
) # Custom labels for variables
)
# Print the table to the console
model_table
Characteristic |
OR |
95% CI |
p-value |
Undergraduate Institution Prestige |
1.21 |
1.07, 1.38 |
0.002 |
Medical PhD |
0.79 |
0.56, 1.11 |
0.170 |
PhD Program Prestige |
1.03 |
0.86, 1.23 |
0.772 |
Number of Articles |
1.08 |
1.04, 1.12 |
<0.001 |
Number of Citations |
1.00 |
1.00, 1.00 |
0.924 |
Job Prestige |
0.78 |
0.62, 0.97 |
0.026 |
Time (Years) |
8.02 |
5.18, 13.0 |
<0.001 |
Time Squared |
0.85 |
0.82, 0.89 |
<0.001 |
Abbreviations: CI = Confidence Interval, OR = Odds Ratio |
# You can further customize the table, for example, adding a title and caption
model_table <- model_table %>%
as_gt() %>% # Convert to a gt table for more advanced formatting
gt::tab_header(
title = "Discrete-Time Logit Model Results",
subtitle = "Predicting Probability of Promotion"
) %>%
gt::tab_caption("Odds ratios and p-values are reported.")
# Print the customized table
model_table
---
title: "Discrete Time Logit Models"
output: 
  html_notebook:
    toc: true
    toc_float: true
---
Discrete-time survival analysis using logit models are different from the Cox proportional hazards models in the last lecture:

### 1. How Time is Treated:
 - __Discrete-Time Logit Models__ are specifically designed for situations where time is measured in discrete intervals (e.g., years, months, grades in school). The event (e.g., promotion, dropout) is considered to occur within one of these distinct time periods. The data is structured in "person-period" format, where each individual contributes multiple rows, one for each time period they are at risk. The outcome variable is binary, indicating whether the event occurred in that specific interval (1) or not (0), given they survived until the beginning of the interval.  
 - __Cox Proportional Hazards (Cox PH) Models__ are designed for continuous time. They analyze the hazard rate, which is the instantaneous risk of the event occurring at a specific point in time, given that the individual has survived up to that time.3 While Cox models can handle tied event times (multiple events occurring at the same recorded time), they fundamentally assume an underlying continuous time process.
 
### 2. The Outcome Variable and Modeling Approach:
 - __Discrete-Time Logit Models__ model the probability of the event occurring within a specific discrete time interval, conditional on survival up to the start of that interval. This probability is typically modeled using a logistic regression framework, hence the "logit" in the name. The log-odds of the event occurring in each interval are predicted by the covariates.  
 - __Cox PH Models__ directly models the hazard rate. It estimates the hazard ratio, which is the ratio of the hazard rates between two groups or for a one-unit change in a covariate. The model is semi-parametric because it estimates the effect of covariates on the hazard without making assumptions about the shape of the baseline hazard function (the hazard when all covariates are zero).  
 
### 3. The Baseline Hazard:
 - In __Discrete-Time Logit Models__, the baseline hazard (the probability of the event in each time interval when all covariates are zero) is often treated as time-dependent. In the example code below, this is implicitly handled by including `time` and `I(time^2)` as predictors. More explicitly, one could include dummy variables for each time period to allow the baseline hazard to vary freely across time.  
 - The __Cox PH models__  has a non-parametric baseline hazard that is not explicitly estimated. The focus is on how the covariates proportionally scale this baseline hazard. The "proportional hazards" assumption implies that the ratio of the hazard rates for any two individuals remains constant over time.  
 
### 4. The Proportional Hazards Assumption:
 - While the term __"discrete-time proportional hazards model"__ exists (often using a complementary log-log link), the logit model itself does not inherently assume proportional hazards. The effects of covariates can vary across different time intervals, especially when time-by-covariate interactions are included in the model.    
 - A key assumption of __the Cox model__ is that the hazards are proportional over time. This means that the effect of a covariate on the hazard rate is constant across all time points. Violations of this assumption can lead to biased results. Statistical tests and graphical methods are used to assess this assumption.  
 
### 5. Handling of Ties:  
 - __Discrete-Time Logit Models:__ Tied events (multiple events occurring at the same recorded time) are naturally handled within the discrete time intervals. If multiple promotions happen in the same year, they all contribute to the event count for that year.  
 - __Cox PH Models:__ Tied event times require specific methods to handle them in the partial likelihood estimation (e.g., Breslow, Efron methods).

### 6. Data Structure:
 - __Discrete-Time Logit Models__ requires reshaping the data into a long (person-period) format.  
 - Cox PH Models__ typically use a data structure where each row represents an individual with their survival time and event status.  
 

| Feature              | Discrete-Time Logit Models          | Cox Proportional Hazards Models        |
|----------------------|-------------------------------------|----------------------------------------|
| Time                 | Discrete intervals                  | Continuous                             |
| Outcome              | Probability of event in an interval | Hazard rate                            |
| Modeling Approach    | Logistic Regression                 | Semi-parametric hazard regression      |
| Baseline Hazard      | Can be time-dependent               | Non-parametric, not directly estimated |
| Proportional Hazards | Not inherently assumed              | Key assumption                         |
| Handling Ties        | Naturally handled within intervals  | Requires specific tie-handling methods |
| Data Structure       | Long (person-period)                | Typically one row per subject          |  

## When to Choose Which:
 - Choose Discrete-Time Logit Models when:  
   - Time is naturally measured in discrete intervals.
   - The proportional hazards assumption of the Cox model is violated.  
   - You want to directly model the probability of the event in each time interval.  
   - You want more flexibility in how the effects of covariates change over time.  
 - Choose Cox Proportional Hazards Models when:  
   - Time is continuous or can be reasonably approximated as continuous.  
   - The proportional hazards assumption holds.  
   - You are primarily interested in estimating hazard ratios.  
   - You have a large dataset with many unique event times.  

## Load Libraries
 - `library(tidyverse)`: This line loads the [tidyverse](https://www.tidyverse.org/packages/) package. Think of this as bringing in your essential toolkit for data manipulation and visualization in R. It includes powerful functions like `mutate`, `pivot_longer`, `left_join`, and `filter`, which we'll see in action.    
 - library(haven): The [haven](https://haven.tidyverse.org/) package is specifically designed to read data files from various statistical software packages, including SAS, SPSS, and Stata. In this case, we'll use it to read a SAS dataset.  
```{r}
library(tidyverse)
library(haven)
```

## Read in the data
The `read_sas()` function from the haven package reads the SAS dataset located at the provided URL and stores it in a data frame called rank. This data contains information about associate professors, their characteristics, and their career progression.

```{r}
rank <- read_sas("https://statisticalhorizons.com/wp-content/uploads/2022/01/rank.sas7bdat")

#assign id numbers to the 301 case:
rank <- rank %>% mutate(id =1:301)
```
Here, we're using the pipe operator `%>%` from `dplyr` (part of `tidyverse`). It allows us to chain operations together in a readable way.  
 - `mutate()` is a function that adds new variables or modifies existing ones in a data frame.  
 - `id = 1:301` creates a new variable named id and assigns a unique integer from 1 to 301 to each row. This is important for tracking individual cases, especially when we reshape the data later. We're essentially giving each person in the dataset a unique identifier.  

## Reshape the data from wide to long for articles
Why reshape to long format? In discrete-time survival analysis, we are interested in the probability of an event (promotion in this case) occurring at each discrete time point. The long format allows us to have each person-year as a separate row, which is the structure needed for fitting our logit model.  

```{r}
rankyrs <- rank %>%
  pivot_longer(
    cols = starts_with("art"),
    names_to  = "atime",
    values_to = "art"
  ) %>%
  mutate(time = as.numeric(str_remove(atime, "art")))

#reshape the data from wide to long for citations:
rankyrs2 <- rank %>%
  pivot_longer(
    cols = starts_with("cit"), 
    names_to = "ctime",
    values_to = "cit"
  ) %>%
  mutate(time = as.numeric(str_remove(ctime, "cit")))
```

 - `pivot_longer()` is a powerful function for reshaping data from a "wide" format (where each time point has its own column) to a "long" format (where each row represents a specific time point for an individual). This is crucial for discrete-time survival analysis.  
 
   - `cols = starts_with("art")` specifies that we want to gather the columns whose names start with "art". These represent the number of articles published at different time points (e.g., art1, art2, art3...).  
   - `names_to = "atime"` creates a new column named "atime" that will contain the original column names ("art1", "art2", etc.).  
   - `values_to = "art"` creates a new column named "art" that will contain the values from the original "art" columns (the actual number of articles).  
 - The same logic applies to the block for citations (`starts_with("cit")`), creating `ctime` and `cit` columns.  
 - mutate(`time = as.numeric(str_remove(atime, "art"))`): After pivoting, the `atime` column (e.g., "art1", "art2") is still in character format. We want to extract the time information (the number).  
   - `str_remove(atime, "art")` from the `stringr` package (part of `tidyverse`) removes the "art" prefix from the `atime` column, leaving just the number.  
   - `as.numeric()` converts this remaining string into a numeric value, which we store in a new column called `time`. We do the same for the citations.  

## Merge articles and citations
```{r}
rankyrs <- rankyrs %>%
  left_join(rankyrs2 %>% select(id, time, cit), by = c("id", "time"))
```

 - `left_join():` This function combines two data frames based on common columns. We are merging `rankyrs` (which now has the article information in long format) with a subset of `rankyrs2` (which has the citation information in long format).  
   - `rankyrs2 %>% select(id, time, cit)`: We first select only the `id`, `time`, and `cit` columns from `rankyrs2` since these are the variables we want to join.  
   - `by = c("id", "time")`: This specifies that the join should be based on matching values in both the `id` and `time` columns. This ensures that the number of articles and citations for the same individual at the same time point are combined into a single row in `rankyrs`.  

## Sort by id and time
```{r}
rankyrs<-rankyrs %>%
  arrange(id, time)
```

 - `arrange(id, time)` sorts the `rankyrs` data frame first by the `id` column (so all observations for the same individual are together) and then by the `time` column (so the observations for each individual are in chronological order). This is helpful for understanding the sequence of events.  

## Get rid of person years after a promotion  
This filtering creates the "risk set" at each time point. For each individual at each time point before or at the time of their event (if it occurred), they are considered to be at risk of the event happening at that time.  
```{r}
ranksub<- rankyrs %>%
  filter(time<=dur)
```

 - `filter(time <= dur)` filters the `rankyrs` data frame to keep only the rows where the time is less than or equal to the `dur` variable. The `dur` variable represents the time until the event (promotion) occurred for those who were promoted, or the total observation time for those who were not. By filtering this way, we are only considering the person-years before or at the time of promotion.  
 
## Recode event variable to be 0 before event occurrence  
In discrete-time survival analysis, we model the probability of the event occurring in a specific time interval, given that it has not occurred before. Our promo variable captures this: for each person-year, it's 1 if the promotion happened at that exact time, and 0 otherwise.  

```{r}
ranksub <- ranksub %>%
  mutate(promo = if_else(time == dur, event, 0))
```

 - `mutate(promo = if_else(time == dur, event, 0))` creates a new binary outcome variable called `promo`.  
   - `if_else(condition, value_if_true, value_if_false)` is a conditional function.  
   - `time == dur` is the condition checked to determine whether the current time point (`time`) is equal to the time of the event (`dur`).  
   - `event`: If the condition is true (i.e., we are at the time of promotion), the value of `promo` is set to the value of the `event` variable (which is 1 for promotion and 0 otherwise).  
   - If the condition is false (i.e., we are at a time point before the promotion), the value of promo is set to `0`, indicating that the event did not occur at that specific person-year.

## Change the missings in jobtime to 11  
The choice of 11 might be based on the context of the data (e.g., the maximum observed job time plus one. 
```{r}
ranksub <- ranksub %>%
  mutate(jobtime = if_else(is.na(jobtime), 11,  jobtime))
```

 - `mutate(jobtime = if_else(is.na(jobtime), 11, jobtime))`:
   - `is.na(jobtime)` checks for missing values (represented as `NA`) in the `jobtime` column.  
   - If a value is missing (TRUE), it's replaced with 11.  
   - If a value is not missing (FALSE), it remains its original value.

## Define jobpres, a time-varying coviariate
This demonstrates how to handle time-varying covariates in discrete-time survival analysis. The value of `jobpres` can change for an individual as time progresses, and our data structure in long format allows us to capture these changes.

```{r}
ranksub <- ranksub %>%
  mutate(jobpres = if_else(time<jobtime,prest1, prest2))
```
 - `mutate(jobpres = if_else(time < jobtime, prest1, prest2))` creates a new variable `jobpres`.  
   - `time < jobtime` is the condition checked where the current time point (time) is less than the jobtime.  
   - If the condition is true (i.e., before `jobtime`), the value of `jobpres` is taken from the `prest1` variable (likely prestige at an earlier job).  
   - If the condition is false (i.e., at or after `jobtime`), the value of `jobpres` is taken from the `prest2` variable (likely prestige at a later or current job).  

## Fit discrete time logit  
This `glm()` call fits a discrete-time logit model where the outcome is the log-odds of promotion at a specific time point. The coefficients from this model will tell us how each predictor variable is associated with this log-odds, holding other variables constant. We will need to exponentiate these coefficients to interpret them as odds ratios.

```{r}
myfit<-glm(promo~undgrad + phdmed + phdprest + art + cit + jobpres + time + I(time^2), family=binomial,data=ranksub)
```


```{r}
library(gtsummary)

# Create a summary table of the model results
model_table <- tbl_regression(
  myfit,
  exponentiate = TRUE, # Display odds ratios instead of log-odds
  pvalue_fun = ~style_pvalue(.x, digits = 3), # Format p-values
  label = list(
    "undgrad" ~ "Undergraduate Institution Prestige",
    "phdmed" ~ "Medical PhD",
    "phdprest" ~ "PhD Program Prestige",
    "art" ~ "Number of Articles",
    "cit" ~ "Number of Citations",
    "jobpres" ~ "Job Prestige",
    "time" ~ "Time (Years)",
    "I(time^2)" ~ "Time Squared" # Use the exact string "I(time^2)"
  ) # Custom labels for variables
)

# Print the table to the console
model_table
```

```{r}
# You can further customize the table, for example, adding a title and caption
model_table <- model_table %>%
  as_gt() %>% # Convert to a gt table for more advanced formatting
  gt::tab_header(
    title = "Discrete-Time Logit Model Results",
    subtitle = "Predicting Probability of Promotion"
  ) %>%
  gt::tab_caption("Odds ratios and p-values are reported.")

# Print the customized table
model_table