[1] 0.1064305
[1] 0.006385827
[1] 0.0002554331
[1] 0.1064305
[1] 0.006385827
[1] 0.0002554331
2024-11-08
This lecture follows Long (1997, Chapters 7-8)
Many variables in the social sciences consist of integer counts
E.g., number of times a candidate runs for office, frequency of conflict, number of terror attacks, number of war casualties, number of positive statements about a candidate, number of homicides in a city, etc.
Least squares is inappropriate for the reasons we’ve already discussed: \(\textbf{Nonlinearity, nonadditivity and heteroskedasticity}\).
\[p(y|\mu)={{exp(-\mu)\mu^y}\over{y!}}\]
\[\mu_i=E(y_i|x_i)=exp(\alpha+\beta x_i)\]
\(\alpha+\beta x_i\) must be positive; the rate parameter must be positive.
Non-linear regression model with heteroskedasticity (Long 1997, p. 223), \(\mu_i=exp(\alpha+\beta x_i)\)
Let’s just assume \(\alpha=-0.25\), and \(\beta=0.13\) as Long does (p. 221). If we calculate the expected values of \(y\) for a number of \(x\) values, then
When \(x=1\), then the expected number of counts – the rate parameter – is 0.89 (exp(-0.25 + 0.13* 1))
When \(x=10\) then the expected number of counts is 2.85
\(E(y|x)=exp(\alpha+\beta x_i)=var(y|x)\)
\[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\]
\[p(y|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]
The likelihood of the PRM with \(k\) predictors is.
\[\prod_{i=1}^{N}p(y_i|\mu_i)=\prod_{i=1}^{N}{{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]
The log of the likelihood is then,
\[\tiny log(\prod_{i=1}^{N}p(y_i|\mu_i))=\sum_{i=1}^{N}log{{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]
\(\tiny E(y|x)=exp(\alpha+\sum_K \beta_k x_{k,i}\).
The partial derivative of \(E(y|x)\) with respect to \(x_k\)
Using the chain-rule.
\(\tiny u=\alpha+\sum_K \beta_k x_{k,i}\).
\(\tiny {{\partial y}\over{\partial u}}{{\partial u}\over{\partial x}}\)
\[\tiny {{\partial E(Y|X)}\over{\partial x_k}}={{\partial exp(u)}\over{\partial u}}{{\partial u \beta}\over{\partial x_k}}\]
\[\tiny exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k\] - The effect of \(x_k\) on \(y\) is now a function of the rate parameter and the expected effect of \(x_k\) on that rate parameter
\[\tiny {{E(y|X, x_k+d_k)}\over{E(y|X, x_k)}}\]
The numerator: \(\tiny E(y|X, x_k+d_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k)exp(\beta_k d_k)\)
The denominator: \(\tiny E(y|X, x_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k)\)
We’re left with: \(\tiny exp(\beta_k d_k)\)
\[\tiny pr(y=m|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{m}}\over{m!}}\]
\[\tiny exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k\]
We might encounter either under or overdispersion brought about by unobserved heterogeneity.
It is useful to rely on an alternative model that doesn’t treat \(\mu\) as fixed, but rather it is drawn from a distribution, i.e., \[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k+\epsilon_i)\]
The NBRM stems from the negative binomial distribution
Recall the binomial density is the PDF stemming from \(k\) independent bernoulli trials. The \(\theta\) parameter will govern its shape.
We can modify the logic (and code) slightly to generate a probability of observing \(r\) successes, given \(n\) trials.
For instance, how many times would we need to flip a coin in order for three heads to appear, or four heads, and so forth? \(({{s+f-1}\over{s}})\theta^{f}(1-\theta)^{s}\)
\(f=\)number of failures, \(s=\)number of successes. The total number of trials is just \(s+f\).
If \(\theta\) represents the probability of striking out, how many at bats are expected before a batter strikes out once, or twice, or 10 times?
Or, how many games must the Minnesota Vikings play in the 2022 NFL season before they lose three games?
\[-1^s({{-f}\over{s}})\]
\[{{n}\over{k}}={{\Gamma(n+1)}\over {\Gamma(k+1)\Gamma(n+1)}}\]
If we have overdispersion (\(E(y)<var(y)\)) then standard errors will be too small and we will be too over confident in our results
Instead, let’s assume that the rate parameter is subject to error
That is, it follows some distribution (we’ll use gamma)
\[\ \mu^*_i=exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k+\epsilon_i)\]
\[\mu^*_i=exp(\alpha)exp(\beta_1 x_{1,i})exp(\beta_2 x_{2,i})...exp(\beta_k x_{k,i}))exp(\epsilon_i)\] \[ E(Y_i|X)=\mu_i d_i\]
\[ E(Y_i|X)=\mu_i\]
\[p(y|\mu_i d_i)={{exp(-\mu_i d_i)\mu_i d_i ^y}\over{y!}}\]
\[p(y|\mu_i)=\int_0^{\infty} pr(y|x, d_i)pr(d_i) d\mathrm{d}\]
where, \(d_i\) is distributed gamma,
\[{{\Gamma(v_i^{v_i})}\over {\Gamma(v_i)}}exp(-d_i v_i)\]
If we combine these two, the equation becomes
\[\tiny p(y|x_i)={{{\Gamma(y_i+v_i)}\over {y_i!\Gamma(v_i)}} ({{v_i}\over{v_i+\mu_i}})^{v_i} ({{\mu_i}\over{v_i+\mu_i}})^{y_i}}\]
\[E(y|x)=\mu_i\]
\[var(y_i|x)=\mu_i(1+({{\mu_i}\over{v_i}}))\]
The \(v_i\) parameter governs the shape of the gamma density. Another way to write this is,
The negative binomial model is an example of a mixture model that has a closed form maximum likelihood solution, where we maximize,
\[\prod_{i=1}^{N}p(y_i| x_i \beta)=\prod {{\Gamma(y_i+\alpha^{-1})}\over{y_i \Gamma(\alpha^{-1})}}({{\alpha^{-1}}\over{\alpha^{-1}+\mu_i}})({{\mu_i}\over{\alpha^{-1}+\mu_i}})\]
Notice that the shape of the distrbution shifts with different values of \(\mu\).
[1] "art" "fem" "mar" "kid5" "phd" "ment"
Fit a PRM to the data. Notice how the standard errors are quite small.
Call:
glm(formula = art ~ kid5, family = poisson(link = "log"), data = bioChemists)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.55960 0.02988 18.728 <2e-16 ***
kid5 -0.06978 0.03450 -2.023 0.0431 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1817.4 on 914 degrees of freedom
Residual deviance: 1813.2 on 913 degrees of freedom
AIC: 3485
Number of Fisher Scoring iterations: 5
Generate some prediction.
Call:
glm(formula = art ~ female + kid5 + married, family = poisson(link = "log"),
data = bioChemists)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.63656 0.05457 11.665 < 2e-16 ***
female -0.28549 0.05433 -5.255 1.48e-07 ***
kid5 -0.16118 0.03934 -4.097 4.19e-05 ***
married 0.13271 0.06092 2.179 0.0294 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1817.4 on 914 degrees of freedom
Residual deviance: 1776.7 on 911 degrees of freedom
AIC: 3452.5
Number of Fisher Scoring iterations: 5
Expected Articles, 4 Kid (All Variables=Min): 0.7455263
Call:
glm(formula = art ~ female + kid5 + married, family = poisson(link = "log"),
data = bioChemists)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.63656 0.05457 11.665 < 2e-16 ***
female -0.28549 0.05433 -5.255 1.48e-07 ***
kid5 -0.16118 0.03934 -4.097 4.19e-05 ***
married 0.13271 0.06092 2.179 0.0294 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1817.4 on 914 degrees of freedom
Residual deviance: 1776.7 on 911 degrees of freedom
AIC: 3452.5
Number of Fisher Scoring iterations: 5
Call:
glm(formula = art ~ female + kid5 + married, family = quasipoisson(link = "log"),
data = bioChemists)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.63656 0.07864 8.094 1.83e-15 ***
female -0.28549 0.07830 -3.646 0.000281 ***
kid5 -0.16118 0.05670 -2.843 0.004572 **
married 0.13271 0.08779 1.512 0.130965
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.076853)
Null deviance: 1817.4 on 914 degrees of freedom
Residual deviance: 1776.7 on 911 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5