MOOC Econometrics

Test Exercise 5

Notes:

• See website for how to submit your answers and how feedback is organized

Goals and skills being used:

• Get experience with the mathematical structure of the logit model • Get experience with the interpretation of parameters of the logit model

Questions

Consider again the application in lecture 5.5, where we have analyzed response to a direct mailing using the following logit specification Pr[respi = 1] = exp(β0 + β1malei + β2activei + β3agei + β4(agei/10)2)1 + exp(β0 + β1malei + β2activei + β3agei + β4(agei/10)2) for i = 1, . . . , 925.

The maximum likelihood estimates of the parameters are given by

Variable Coefficient Std. Error t-value p-value Intercept -2.488 0.890 -2.796 0.005 Male 0.954 0.158 6.029 0.000 Active 0.914 0.185 4.945 0.000 Age 0.070 0.036 1.964 0.050 (Age/10)2 -0.069 0.034 -2.015 0.044

(a) Show that ∂ Pr[respi = 1]/∂agei + ∂ Pr[respi = 0]/∂agei= 0.

So loading the data and fiiting the model next:

library(readxl)
library(ggplot2)
library(dplyr)

df <- read_excel("Data Lecture 5-5.xls", col_names = TRUE)
df <- df%>% mutate(age2 = (age/10)^2)
head(df)
## # A tibble: 6 x 6
##   `repond id` response  male activity   age  age2
##         <dbl>    <dbl> <dbl>    <dbl> <dbl> <dbl>
## 1           1        1     0        0    58 33.6 
## 2           2        1     1        0    50 25   
## 3           3        1     1        0    40 16   
## 4           4        1     1        0    36 13.0 
## 5           5        1     1        0    28  7.84
## 6           6        1     1        0    70 49

So fitting the model:

logit <- glm(df$response ~ df$male + df$activity + df$age +df$age2,
             family = binomial(link = "logit"))
summary(logit)
## 
## Call:
## glm(formula = df$response ~ df$male + df$activity + df$age + 
##     df$age2, family = binomial(link = "logit"))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6926  -1.2156   0.7389   1.0982   1.8473  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.48836    0.88999  -2.796  0.00517 ** 
## df$male      0.95369    0.15818   6.029 1.65e-09 ***
## df$activity  0.91375    0.18478   4.945 7.61e-07 ***
## df$age       0.06995    0.03561   1.964  0.04948 *  
## df$age2     -0.06869    0.03410  -2.015  0.04394 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1282.1  on 924  degrees of freedom
## Residual deviance: 1203.7  on 920  degrees of freedom
## AIC: 1213.7
## 
## Number of Fisher Scoring iterations: 4
library(pscl)
## Warning: package 'pscl' was built under R version 4.0.2
LogLikelihood   <- pR2(logit)['llh']
## fitting null model for pseudo-r2
Likelihood      <- exp(pR2(logit)['llh'])
## fitting null model for pseudo-r2
McFaddenR2      <- pR2(logit)['McFadden']
## fitting null model for pseudo-r2
NagelkerkeR2    <- pR2(logit)['r2CU']
## fitting null model for pseudo-r2
library(fmsb)
NagelkerkeR2 <- NagelkerkeR2(logit)$R2
NagelkerkeR2
## [1] 0.1083013

So showing that :

# logit odd ratios
exp(logit$coefficients)
## (Intercept)     df$male df$activity      df$age     df$age2 
##  0.08304624  2.59527990  2.49365102  1.07244947  0.93361419
#regression marginal effects:

ols <- lm(df$response ~ df$male + df$activity + df$age +df$age2)
ols$coefficients
## (Intercept)     df$male df$activity      df$age     df$age2 
## -0.06088779  0.22400196  0.20826836  0.01549425 -0.01520940
#logit average marginal effects:

log_scal <- mean(dlogis(predict(logit, type = "link")))
log_scal*coef(logit)
## (Intercept)     df$male df$activity      df$age     df$age2 
## -0.57116043  0.21890442  0.20973538  0.01605475 -0.01576709
# Predicting probabilities logit 

polsreg <- predict(logit, type = "response")
summary(polsreg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1404  0.4265  0.5403  0.5081  0.5595  0.7613
#matrix accuracy

logit_matrix <- table(True = df$response, predi = round(fitted(logit)))
round(prop.table(logit_matrix),2)
##     predi
## True    0    1
##    0 0.21 0.28
##    1 0.10 0.40
# R2 McFadden

logi0 <- update(logit, formula. = df$response ~ 1)
R2mc <-1- as.vector(logLik(logit)/logLik(logi0))
R2mc
## [1] 0.06111504

With that information we can say:

# RPTA. 

#  ∂Pr[respi=1]/∂agei+∂Pr[respi=0]/∂agei=0

#∂Pr[respi=1]∂agei+∂Pr[respi=0]∂agei  = ∂Pr[respi=1]∂agei + ∂(1−Pr[respi=1])∂agei

#                                     = ∂Pr[respi=1]∂agei + −∂(Pr[respi=1])∂agei

#                                     = ∂Pr[respi=1]∂agei − ∂(Pr[respi=1])∂agei

#                                     = 0

(b) Assume that you recode the dependent variable as follows: respnew i =−respi+1. Hence, positive response is now defined to be equal to zero and negative response to be equal to 1. Use the odds ratio to show that this transformation implies that the sign of all parameters change.

The odd ratios was:

# logit odd ratios
exp(logit$coefficients)
## (Intercept)     df$male df$activity      df$age     df$age2 
##  0.08304624  2.59527990  2.49365102  1.07244947  0.93361419

So the condiction:

#The condition respinew=−respi+1 means that the odds ratio of each new response is inversed, or:


# Pr[respinew=1]/Pr[respinew=0] =  1 /( Pr[respi=1]/Pr[respi=0] ) = Pr[respi=0]/Pr[respi=1]

# ⇒ Pr[respinew=0]/Pr[respinew=1] =  Pr[respi=1]Pr[respi=0]

And the odds ratio formula is given as:

#Pr[respi=1]/Pr[respi=0] = exp(β0+∑j=2kβj∗xji) 
#                         = exp(β0+β1∗malei+β2∗activei+β3∗agei+β4∗(agei/10))2


#⇒Pr[respi=0]Pr[respi=1] = 1/  exp(β0+∑j=2kβj∗xji)
#                         =1 / exp(β0+β1∗malei+β2∗activei+β3∗agei+β4∗(agei/10)2)

So finally:

# Pr[respinew=1]/Pr[respinew=0] = Pr[respi=0]/Pr[respi=1] 

#                               = 1 / exp(β0+∑j=2kβj∗xji)

#                               =exp(−β0−∑j=2kβj∗xji))

#                               =exp(−β0−β1∗malei−β2∗activei−β3∗agei−β4∗(agei/10)2)

Where the transformation indeed implies that the sign of all parameters change.

(c) Consider again the odds ratio positive response versus negative response:Pr[respi = 1] Pr[respi = 0] = exp(β0 + β1malei + β2activei + β3agei + β4(agei/10)2).

During lecture 5.5 you have seen that this odds ratio obtains its maximum value for age equal to 50 years for males as well as females. Suppose now that you want to extend the logit model and allow that this age valueis possibly different for males than for females. Discuss how you can extend the logit specification.

Recording that the marginal effects are:

#regression marginal effects:

ols <- lm(df$response ~ df$male + df$activity + df$age +df$age2)
ols$coefficients
## (Intercept)     df$male df$activity      df$age     df$age2 
## -0.06088779  0.22400196  0.20826836  0.01549425 -0.01520940
#logit average marginal effects:

log_scal <- mean(dlogis(predict(logit, type = "link")))
log_scal*coef(logit)
## (Intercept)     df$male df$activity      df$age     df$age2 
## -0.57116043  0.21890442  0.20973538  0.01605475 -0.01576709
# RPTA


#One way is to split the sample in two (male−only and female−only) groups, and estimate the logit model seperately for each group.

#In this case, the remaining activity variable parameter β2 could also differ for each one of the groups and should be estimated seperately.
# Another way is to modify the logit model in order to use combinations of male and age variables (and related parameters) and leave the sample united as is.

# For example, we could add the variables malei∗agei and malei∗(agei/10)2 to the logit specification, thus extending it by 2 more variables with parameters β5 and β6.

# Then, the model odds ratio would be:

# Pr[respi=1]/Pr[respi=0] =            exp(β0+β1∗malei+β2∗activei+β3∗agei+β4∗(agei/10)2+β5∗malei∗agei+β6∗malei∗(agei/10)2)
#Or, we could replace the age related variables and parameters
#   β3∗agei+β4∗(agei/10)2

#with their male/female related combinations:

#   β3∗malei∗agei+β4∗malei∗(agei/10)2+β5∗(1−malei)∗agei+β6∗(1−malei)∗(agei/10)2

Then, the model odds ratio would be:

# Pr[respi=1]/ Pr[respi=0]
#                         =exp(β0+β1∗malei+β2∗activei+β3∗malei∗agei+β4∗malei∗(agei/10)2
#                           +β5∗(1−malei)∗agei+β6∗(1−malei)∗(agei/10)2).

These second way variations would allow for similar flexibility to study on the age effect throught the different variables and parameters they incorporate.

# Predicting the errors:
#matrix accuracy

logit_matrix <- table(True = df$response, predi = round(fitted(logit)))
round(prop.table(logit_matrix),2)
##     predi
## True    0    1
##    0 0.21 0.28
##    1 0.10 0.40
plot(residuals.glm(logit))