library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.3
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(texreg)
## Version: 1.36.23
## Date: 2017-03-03
## Author: Philip Leifeld (University of Glasgow)
##
## Please cite the JSS article in your publications -- see citation("texreg").
##
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
##
## extract
library(AER)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.6.2
library(mfx)
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: betareg
load("charity.rdata")
The variable \(respond\) is a dummy variable equal to one if a person responded with a contribution on the most recent mailing sent by a charitable organization.
The variable \(resplast\) is a dummy variable equal to one if the person responded to the previous mailing,
\(avggift\) is the average of past gifts (in Dutch guilders),
and \(propresp\) is the proportion of times the person has responded to past mailings.
1.
# The fraction of people who have responded at least once in the past
# This identical to PEA (for discrete characteristics)
charity %>% summarise(mean(respond))
## mean(respond)
## 1 0.3999531
There are 4,268 individuals in the database with 1,707 individuals responding with a donation to the most recent request. That means 39.995% of the individuals responded most recently with a donation.
2.
# Estimate the linear probabilty model
formula = respond ~ resplast + weekslast + propresp + mailsyear + avggift
respond_lm = lm(formula, data = charity)
coeftest(respond_lm, vcov. = vcovHC(respond_lm, type = "HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6518e-02 3.4693e-02 0.4761 0.634008
## resplast 6.6461e-02 2.0795e-02 3.1960 0.001403 **
## weekslast -1.0783e-03 1.8963e-04 -5.6863 1.385e-08 ***
## propresp 6.5034e-01 3.8537e-02 16.8755 < 2.2e-16 ***
## mailsyear 5.1975e-02 1.0559e-02 4.9222 8.880e-07 ***
## avggift 1.8272e-04 2.0546e-05 8.8934 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Estimate the probit model
respond_probit = glm(formula, family = binomial(link = "probit"), data = charity)
coeftest(respond_probit, vcov. = vcovHC(respond_probit, type = "HC0"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.29490453 0.11232683 -11.5280 < 2.2e-16 ***
## resplast 0.12610481 0.05800261 2.1741 0.0297 *
## weekslast -0.00456630 0.00068823 -6.6348 3.249e-11 ***
## propresp 1.84930622 0.11442517 16.1617 < 2.2e-16 ***
## mailsyear 0.14615331 0.03272541 4.4660 7.968e-06 ***
## avggift 0.00111549 0.00096581 1.1550 0.2481
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Estimate the logit model
respond_logit = glm(formula, family = binomial(link = "logit"), data = charity)
coeftest(respond_logit, vcov. = vcovHC(respond_logit, type = "HC0"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.1138918 0.1902751 -11.1097 < 2.2e-16 ***
## resplast 0.1912431 0.0951495 2.0099 0.04444 *
## weekslast -0.0078934 0.0011891 -6.6382 3.174e-11 ***
## propresp 3.0356951 0.1912578 15.8723 < 2.2e-16 ***
## mailsyear 0.2434161 0.0549584 4.4291 9.463e-06 ***
## avggift 0.0019482 0.0015670 1.2433 0.21377
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Are the results similar across the models?
list(lpm = respond_lm, probit = respond_probit, logit = respond_logit) %>% screenreg(digits = 3)
##
## ==========================================================
## lpm probit logit
## ----------------------------------------------------------
## (Intercept) 0.017 -1.295 *** -2.114 ***
## (0.036) (0.114) (0.193)
## resplast 0.066 *** 0.126 * 0.191 *
## (0.019) (0.057) (0.094)
## weekslast -0.001 *** -0.005 *** -0.008 ***
## (0.000) (0.001) (0.001)
## propresp 0.650 *** 1.849 *** 3.036 ***
## (0.037) (0.114) (0.191)
## mailsyear 0.052 *** 0.146 *** 0.243 ***
## (0.010) (0.032) (0.054)
## avggift 0.000 * 0.001 0.002
## (0.000) (0.001) (0.002)
## ----------------------------------------------------------
## R^2 0.214
## Adj. R^2 0.214
## Num. obs. 4268 4268 4268
## RMSE 0.435
## AIC 4764.437 4769.028
## BIC 4802.590 4807.181
## Log Likelihood -2376.218 -2378.514
## Deviance 4752.437 4757.028
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
resplast, weekslast, propresp, and mailsyear are all significant at 5% level in each model.
avggift is significant at 5% in the lpm.
The estimates are different. But as expected, we find the smallest difference between the probit and the logit model
What are the interpretations of the coefficient \(b_j\) in each of the models?
In general, for the linear probability model, \(b_j\) is the effect on the probability that the dependent variable is equal to one:
\[\mathbb{P}(y=1) \Rightarrow \Delta x_j =1\]
With the Non-linear probability models (nlpm), we lose the clean interpretation of the partial effects from the coefficients of the explanatory variables:
\[\approx f(x'_i b) b_j) \Delta x_{ij}\]
The only meaningful we can get out of the estimates for the nlpm is that the sign of the estimates are equal to sign of the partial effect:
\[\partial \mathbb{P}(y=1) / \partial x_{ji} = f(x'_i \beta) \beta_j\]
The complication is that the average partial effects is not just a function of its estimates (includes the hole set of x’s).
To obtain a meaningful interpretation of the \(x'_i\), is to compute either
The Average Partial Effect (APE), or
The Partial Effect at the Average (PEA)
3. APE (marginal effect, mean effect):
An estimation of the average partial effect across the population.
Such that the partial effect is calculated for each observation, so that we can average over the population.
PEA:
More straightforward computation. Only a single partial effect to compute, the average of the regressors.
The interpretation is less clear, since it is not always well-defined:
e.g.,
Only valid for continuous regressors.
# APE: atmean = F
formula = respond ~ mailsyear + resplast
ape = probitmfx(formula, data = charity, atmean = F)
# PEA: atmean = T
pea = probitmfx(formula, data = charity, atmean = T)
pe = lm(formula, data = charity)
list(APE = ape, PEA = pea, PE = pe) %>% screenreg(digits = 6)
##
## ===================================================================
## APE PEA PE
## -------------------------------------------------------------------
## mailsyear 0.080609 *** 0.089012 *** 0.081177 ***
## (0.010391) (0.011696) (0.010535)
## resplast 0.340682 *** 0.344383 *** 0.340874 ***
## (0.015269) (0.015412) (0.014900)
## (Intercept) 0.119445 ***
## (0.023144)
## -------------------------------------------------------------------
## Num. obs. 4268 4268 4268
## Log Likelihood -2609.925003 -2609.925003
## Deviance 5219.850006 5219.850006
## AIC 5225.850006 5225.850006
## BIC 5244.926707 5244.926707
## R^2 0.121696
## Adj. R^2 0.121284
## RMSE 0.459274
## ===================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05