Advanced quantitative data analysis

class: center, middle, inverse, title-slide

.title[
# Advanced quantitative data analysis
]
.subtitle[
## Random effect and model selection
]
.author[
### Mengni Chen
]
.institute[
### Department of Sociology, University of Copenhagen
]

---

#Let's get ready

```r
library(tidyverse) # Add the tidyverse package to my current library.
library(haven) # Handle labelled data.
library(broom) #transform the regression result into a dataframe
library(splitstackshape) #transform wide data (with stacked variables) to long data
library(plm) #linear models for panel data
library(lmtest) # to generate SE-robust coefficients in fixed effect
```

---
#Does partnership make you happier? Fixed effect
<img src="https://github.com/fancycmn/slide9/blob/main/S9_pic15.PNG?raw=true" width="70%" style="display: block; margin-left:150px;">

`$$\text{Fixed effect}:Sat_{i,t}= \beta_{1}*partner_{i,t} + u_{i} + \epsilon_{i,t}$$`
$$u_{i}:\text{person-specific unobserved component, time-constant} $$

`$$\epsilon_{i,t}:\text{person-and-time-specific component, time-varying}$$`

`$$\text{Exogeneity assumption}:E(\epsilon_{i,t}|x_{i,t})= 0$$`
`$$\text{unobserved time-constant component is correlated with the IV }:Cov(u_{i},partner_{i,t})\neq 0$$`
---
#Does partnership make you happier? Random effect

---
#Another example: random effect 
- Very often used in psychology and public health
- Another example: some individual genetics are not correlated with smoking but correlated with the blood pressure
<img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic2.PNG?raw=true" width="70%" style="display: block; margin-left:150px;">

---
#Random effect: mathematic demonstration
`$$\text{Random effect}:Sat_{i,t}= \beta_{1}*partner_{i,t} + u_{i} + \epsilon_{i,t}$$`
$$u_{i}:\text{person-specific unobserved component, time-constant} $$

`$$\epsilon_{i,t}:\text{person-and-time-specific component, time-varying}$$`

`$$\text{Assumption 1}:E(\epsilon_{i,t}|x_{i,t})= 0$$`
`$$\text{Assumption 2: unobserved time-constant component are uncorrelated with the IV }: Cov(u_{i},partner_{i,t})= 0$$`
Different from fixed effect, here the `$u_{i}$`randomly varying. This means that `$u_{i}$` has a zero mean and constant variance, and independent of Xs and `$\epsilon_{i,t}$`

**Then can we just run an OLS regression, because the  `$u_{i}$` is random?**
---
#Random effect estimator
**Even though the  `$u_{i}$` is random, an OLS regression cannot work because it causes serial correlation.**
`$$\text{Random effect}:Sat_{i,t}= \beta_{1}*partner_{i,t} + 7 + \epsilon_{i,t}$$`
Note: suppose `$u_{i}$` is 7. Although `$u_{i}$` is random, but without control it, `$\epsilon_{i,t}$` with 7 embedded will be correlated over time (that is serial correlation). That is, `$\epsilon_{i,t}$`  will be correlated with `$\epsilon_{i,t-1}$`. Thus, OLS estimation will be problematic.

**We will use feasible generalized least squares (FGLS) to get a random effect estimation**.
If you want to know what is FLGS, [click here](https://www.youtube.com/watch?v=--H9uI_BFIc)

---
#Does partnership make you happier?
  - [Prepare the data](https://rpubs.com/fancycmn/974109)

```r
panel_data <- pdata.frame(long_data, index=c("id", "wave")) #define the dataset as a panel data
```

---
#Does partnership make you happier?
  - Random effect: modelling

```r
random <- plm(sat ~ ptner + hlt, data=panel_data, model="random") # include one covariate "hlt" health status
summary(random)
coeftest(random, vcov. = vcovHC, type = "HC1") #results of removing reverse and repeated transition
random_robust <- coeftest(random, vcov. = vcovHC, type = "HC1")
```
<img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic6.3.PNG?raw=true" width="50%" style="display: block; margin-top:10px;">

]
---
#Does partnership make you happier?
  - Random effect: interpretation
    - When a person has a partner, life satisfaction is 0.363 points higher than when not.
    - When a person's self-rate health increase by 1 point, life satisfaction increases by 0.354.
<img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic6.3.PNG?raw=true" width="50%" style="display: block; margin-top:10px;">

---
#Does partnership make you happier? Compare pooled OLS, fixed effect, random effect
  - Pooled OLS
  - Random effect 
  - Fixed effect

```r
pols <- plm(sat ~ ptner + hlt, data=panel_data, model="pooling") 
summary(pols)
pols_robust<- coeftest(pols, vcov. = vcovHC, type = "HC1") #results of removing reverse and repeated transition

fixed <- plm(sat ~ ptner + hlt, data=panel_data, model="within") 
summary(fixed)
fixed_robust <- coeftest(fixed, vcov. = vcovHC, type = "HC1") #results of removing reverse and repeated transition

texreg::htmlreg(list(pols_robust, fixed_robust, random_robust), 
                custom.model.names=c("Pooled OLS", "Fixed effect", "Random effect"),
        include.ci = FALSE, omit.coef = "factor", center=TRUE,file = "compare1.html") 
```

---
#Does partnership make you happier? Compare pooled OLS, fixed effect, random effect
<img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic8.PNG?raw=true" width="70%" style="display: block; margin-top:10px;">

---
#Which model should I use
  - Criteria 1: doing the test 
    - Breusch and Pagan Lagrange Multiplier Test: random effect vs pooled OLS
    - Hausman Test: fixed effect vs random effect

---
#BP-LM test
  - The null hypothesis is **the variance of the random effect is zero**. That is, variance of `$u_{i}$` is zero.
  - If the null hypothesis is rejected, we should use random effect.
  - If the null hypothesis is not reject, we should use pooled OLS.

```r
plmtest(pols, type=c("bp"))
```

```
## 
##  Lagrange Multiplier Test - (Breusch-Pagan)
## 
## data:  sat ~ ptner + hlt
## chisq = 2440.9, df = 1, p-value < 2.2e-16
## alternative hypothesis: significant effects
```
p-value is very significant here. This means that we should reject the null hypothesis. We should not use pooled OLS. We **should use** random effect.

---
#Hausman Test
  - The null hypothesis is that `$Cov(u_{i},partner_{i,t})=0$` is true.
  - If the null hypothesis is rejected, we should use fixed effect.
  - If the null hypothesis is not rejected, we should use random effect.

```r
phtest(fixed, random)
```

```
## 
##  Hausman Test
## 
## data:  sat ~ ptner + hlt
## chisq = 238.8, df = 2, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
```
p-value is less than 0.05 here. This means that we reject the null hypothesis. We should not use random effect. We **should use** fixed effect.
---
#Which model should I use
  - Criteria 1: doing the test 
    - Breusch and Pagan Lagrange Multiplier Test: random effect vs pooled OLS
    - Hausman Test: fixed effect vs random effect
  - Criteria 2: theoretical consideration
    - Consider theoretically whether `$Cov(u_{i},X_{i,t})=0,or\space\neq 0$`
    - In political science and economics, fixed effect is the standard model
    - In psychology, random effect is more preferred.
  - Criteria 3: how many ID you have in your dataset
    - Choose fixed effect , when you have very small number of ID, which is not randomly sampled.
      - e.g. you have follow several individuals for a long time
      - e.g. when you have several countries for a long time
    - Choose fixed effect, When you have sampled all the units
      - e.g. you sampled all the states in a country

---
#Take home
  - Understand what is random effect
  - Understand what is the difference between fixed effect and random effect
  - Know how to run random effect
  - Know how to do tests to select models
  - Important codes:
    - `plm(Y ~ X, data=your own data, model="random")`
    - `plmtest(your pooled regression, type=c("bp"))` to select pooled ols or random effect
    - `phtest(fixed, random)` to select fixed or random effect

---
class: center, middle
#[Exercise](https://rpubs.com/fancycmn/970816)