ps7

Problem 1

library(tidyverse)

## ── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.3
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## Warning: package 'ggplot2' was built under R version 3.6.2

## Warning: package 'tibble' was built under R version 3.6.2

## Warning: package 'tidyr' was built under R version 3.6.2

## Warning: package 'dplyr' was built under R version 3.6.2

## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(texreg)

## Version:  1.36.23
## Date:     2017-03-03
## Author:   Philip Leifeld (University of Glasgow)
## 
## Please cite the JSS article in your publications -- see citation("texreg").

## 
## Attaching package: 'texreg'

## The following object is masked from 'package:tidyr':
## 
##     extract

library(AER)

## Loading required package: car

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## Loading required package: survival

## Warning: package 'survival' was built under R version 3.6.2

library(mfx)

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: betareg

load("charity.rdata")

The variable \(respond\) is a dummy variable equal to one if a person responded with a contribution on the most recent mailing sent by a charitable organization.

The variable \(resplast\) is a dummy variable equal to one if the person responded to the previous mailing,

\(avggift\) is the average of past gifts (in Dutch guilders),

and \(propresp\) is the proportion of times the person has responded to past mailings.

# The fraction of people who have responded at least once in the past

# This identical to PEA (for discrete characteristics)

charity %>% summarise(mean(respond))

##   mean(respond)
## 1     0.3999531

There are 4,268 individuals in the database with 1,707 individuals responding with a donation to the most recent request. That means 39.995% of the individuals responded most recently with a donation.

# Estimate the linear probabilty model

formula = respond ~ resplast + weekslast + propresp + mailsyear + avggift

respond_lm = lm(formula, data = charity)

coeftest(respond_lm, vcov. = vcovHC(respond_lm, type = "HC0"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)  1.6518e-02  3.4693e-02  0.4761  0.634008    
## resplast     6.6461e-02  2.0795e-02  3.1960  0.001403 ** 
## weekslast   -1.0783e-03  1.8963e-04 -5.6863 1.385e-08 ***
## propresp     6.5034e-01  3.8537e-02 16.8755 < 2.2e-16 ***
## mailsyear    5.1975e-02  1.0559e-02  4.9222 8.880e-07 ***
## avggift      1.8272e-04  2.0546e-05  8.8934 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Estimate the probit model

respond_probit = glm(formula, family = binomial(link = "probit"), data = charity)

coeftest(respond_probit, vcov. = vcovHC(respond_probit, type = "HC0"))

## 
## z test of coefficients:
## 
##                Estimate  Std. Error  z value  Pr(>|z|)    
## (Intercept) -1.29490453  0.11232683 -11.5280 < 2.2e-16 ***
## resplast     0.12610481  0.05800261   2.1741    0.0297 *  
## weekslast   -0.00456630  0.00068823  -6.6348 3.249e-11 ***
## propresp     1.84930622  0.11442517  16.1617 < 2.2e-16 ***
## mailsyear    0.14615331  0.03272541   4.4660 7.968e-06 ***
## avggift      0.00111549  0.00096581   1.1550    0.2481    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Estimate the logit model

respond_logit = glm(formula, family = binomial(link = "logit"), data = charity)

coeftest(respond_logit, vcov. = vcovHC(respond_logit, type = "HC0"))

## 
## z test of coefficients:
## 
##               Estimate Std. Error  z value  Pr(>|z|)    
## (Intercept) -2.1138918  0.1902751 -11.1097 < 2.2e-16 ***
## resplast     0.1912431  0.0951495   2.0099   0.04444 *  
## weekslast   -0.0078934  0.0011891  -6.6382 3.174e-11 ***
## propresp     3.0356951  0.1912578  15.8723 < 2.2e-16 ***
## mailsyear    0.2434161  0.0549584   4.4291 9.463e-06 ***
## avggift      0.0019482  0.0015670   1.2433   0.21377    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Are the results similar across the models?

list(lpm = respond_lm, probit =  respond_probit, logit = respond_logit) %>% screenreg(digits = 3)

## 
## ==========================================================
##                 lpm           probit         logit        
## ----------------------------------------------------------
## (Intercept)        0.017         -1.295 ***     -2.114 ***
##                   (0.036)        (0.114)        (0.193)   
## resplast           0.066 ***      0.126 *        0.191 *  
##                   (0.019)        (0.057)        (0.094)   
## weekslast         -0.001 ***     -0.005 ***     -0.008 ***
##                   (0.000)        (0.001)        (0.001)   
## propresp           0.650 ***      1.849 ***      3.036 ***
##                   (0.037)        (0.114)        (0.191)   
## mailsyear          0.052 ***      0.146 ***      0.243 ***
##                   (0.010)        (0.032)        (0.054)   
## avggift            0.000 *        0.001          0.002    
##                   (0.000)        (0.001)        (0.002)   
## ----------------------------------------------------------
## R^2                0.214                                  
## Adj. R^2           0.214                                  
## Num. obs.       4268           4268           4268        
## RMSE               0.435                                  
## AIC                            4764.437       4769.028    
## BIC                            4802.590       4807.181    
## Log Likelihood                -2376.218      -2378.514    
## Deviance                       4752.437       4757.028    
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

resplast, weekslast, propresp, and mailsyear are all significant at 5% level in each model.

avggift is significant at 5% in the lpm.

The estimates are different. But as expected, we find the smallest difference between the probit and the logit model

What are the interpretations of the coefficient \(b_j\) in each of the models?

In general, for the linear probability model, \(b_j\) is the effect on the probability that the dependent variable is equal to one:

\[\mathbb{P}(y=1) \Rightarrow \Delta x_j =1\]

With the Non-linear probability models (nlpm), we lose the clean interpretation of the partial effects from the coefficients of the explanatory variables:

\[\approx f(x'_i b) b_j) \Delta x_{ij}\]

The only meaningful we can get out of the estimates for the nlpm is that the sign of the estimates are equal to sign of the partial effect:

\[\partial \mathbb{P}(y=1) / \partial x_{ji} = f(x'_i \beta) \beta_j\]

The complication is that the average partial effects is not just a function of its estimates (includes the hole set of x’s).

To obtain a meaningful interpretation of the \(x'_i\), is to compute either

The Average Partial Effect (APE), or
The Partial Effect at the Average (PEA)

3. APE (marginal effect, mean effect):

An estimation of the average partial effect across the population.

Such that the partial effect is calculated for each observation, so that we can average over the population.

PEA:

More straightforward computation. Only a single partial effect to compute, the average of the regressors.

The interpretation is less clear, since it is not always well-defined:

e.g.,

for discrete characteristics: The PEA uses the proportion of units with value unity. Not meaningful value.
for non-linear terms (interaction, higher order terms)

Only valid for continuous regressors.

# APE: atmean = F
formula = respond ~ mailsyear + resplast

ape = probitmfx(formula, data = charity, atmean = F)

# PEA: atmean = T
pea = probitmfx(formula, data = charity, atmean = T)

pe = lm(formula, data = charity)

list(APE = ape, PEA = pea, PE = pe) %>% screenreg(digits = 6)

## 
## ===================================================================
##                 APE               PEA               PE             
## -------------------------------------------------------------------
## mailsyear           0.080609 ***      0.089012 ***     0.081177 ***
##                    (0.010391)        (0.011696)       (0.010535)   
## resplast            0.340682 ***      0.344383 ***     0.340874 ***
##                    (0.015269)        (0.015412)       (0.014900)   
## (Intercept)                                            0.119445 ***
##                                                       (0.023144)   
## -------------------------------------------------------------------
## Num. obs.        4268              4268             4268           
## Log Likelihood  -2609.925003      -2609.925003                     
## Deviance         5219.850006       5219.850006                     
## AIC              5225.850006       5225.850006                     
## BIC              5244.926707       5244.926707                     
## R^2                                                    0.121696    
## Adj. R^2                                               0.121284    
## RMSE                                                   0.459274    
## ===================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

ps7

Jonathan Riis

5/27/2021

Problem 1