class: center, middle, inverse, title-slide .title[ # Advanced quantitative data analysis ] .subtitle[ ## Random effect and model selection ] .author[ ### Mengni Chen ] .institute[ ### Department of Sociology, University of Copenhagen ] --- <style type="text/css"> .remark-slide-content { font-size: 20px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 14px; } </style> #Let's get ready ```r library(tidyverse) # Add the tidyverse package to my current library. library(haven) # Handle labelled data. library(broom) #transform the regression result into a dataframe library(splitstackshape) #transform wide data (with stacked variables) to long data library(plm) #linear models for panel data library(lmtest) # to generate SE-robust coefficients in fixed effect ``` --- #Does partnership make you happier? Fixed effect <img src="https://github.com/fancycmn/slide9/blob/main/S9_pic15.PNG?raw=true" width="70%" style="display: block; margin-left:150px;"> `$$\text{Fixed effect}:Sat_{i,t}= \beta_{1}*partner_{i,t} + u_{i} + \epsilon_{i,t}$$` $$u_{i}:\text{person-specific unobserved component, time-constant} $$ `$$\epsilon_{i,t}:\text{person-and-time-specific component, time-varying}$$` `$$\text{Exogeneity assumption}:E(\epsilon_{i,t}|x_{i,t})= 0$$` `$$\text{unobserved time-constant component is correlated with the IV }:Cov(u_{i},partner_{i,t})\neq 0$$` --- #Does partnership make you happier? Random effect <img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic3.PNG?raw=true" width="75%" style="display: block;margin-left:150px"> --- #Another example: random effect - Very often used in psychology and public health - Another example: some individual genetics are not correlated with smoking but correlated with the blood pressure <img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic2.PNG?raw=true" width="70%" style="display: block; margin-left:150px;"> --- #Random effect: mathematic demonstration `$$\text{Random effect}:Sat_{i,t}= \beta_{1}*partner_{i,t} + u_{i} + \epsilon_{i,t}$$` $$u_{i}:\text{person-specific unobserved component, time-constant} $$ `$$\epsilon_{i,t}:\text{person-and-time-specific component, time-varying}$$` `$$\text{Assumption 1}:E(\epsilon_{i,t}|x_{i,t})= 0$$` `$$\text{Assumption 2: unobserved time-constant component are uncorrelated with the IV }: Cov(u_{i},partner_{i,t})= 0$$` Different from fixed effect, here the `\(u_{i}\)`randomly varying. This means that `\(u_{i}\)` has a zero mean and constant variance, and independent of Xs and `\(\epsilon_{i,t}\)` **Then can we just run an OLS regression, because the `\(u_{i}\)` is random?** --- #Random effect estimator **Even though the `\(u_{i}\)` is random, an OLS regression cannot work because it causes serial correlation.** `$$\text{Random effect}:Sat_{i,t}= \beta_{1}*partner_{i,t} + 7 + \epsilon_{i,t}$$` Note: suppose `\(u_{i}\)` is 7. Although `\(u_{i}\)` is random, but without control it, `\(\epsilon_{i,t}\)` with 7 embedded will be correlated over time (that is serial correlation). That is, `\(\epsilon_{i,t}\)` will be correlated with `\(\epsilon_{i,t-1}\)`. Thus, OLS estimation will be problematic. **We will use feasible generalized least squares (FGLS) to get a random effect estimation**. If you want to know what is FLGS, [click here](https://www.youtube.com/watch?v=--H9uI_BFIc) --- #Does partnership make you happier? - [Prepare the data](https://rpubs.com/fancycmn/974109) ```r panel_data <- pdata.frame(long_data, index=c("id", "wave")) #define the dataset as a panel data ``` --- #Does partnership make you happier? - Random effect: modelling ```r random <- plm(sat ~ ptner + hlt, data=panel_data, model="random") # include one covariate "hlt" health status summary(random) coeftest(random, vcov. = vcovHC, type = "HC1") #results of removing reverse and repeated transition random_robust <- coeftest(random, vcov. = vcovHC, type = "HC1") ``` <img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic6.3.PNG?raw=true" width="50%" style="display: block; margin-top:10px;"> ] --- #Does partnership make you happier? - Random effect: interpretation - When a person has a partner, life satisfaction is 0.363 points higher than when not. - When a person's self-rate health increase by 1 point, life satisfaction increases by 0.354. <img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic6.3.PNG?raw=true" width="50%" style="display: block; margin-top:10px;"> --- #Does partnership make you happier? Compare pooled OLS, fixed effect, random effect - Pooled OLS - Random effect - Fixed effect ```r pols <- plm(sat ~ ptner + hlt, data=panel_data, model="pooling") summary(pols) pols_robust<- coeftest(pols, vcov. = vcovHC, type = "HC1") #results of removing reverse and repeated transition fixed <- plm(sat ~ ptner + hlt, data=panel_data, model="within") summary(fixed) fixed_robust <- coeftest(fixed, vcov. = vcovHC, type = "HC1") #results of removing reverse and repeated transition texreg::htmlreg(list(pols_robust, fixed_robust, random_robust), custom.model.names=c("Pooled OLS", "Fixed effect", "Random effect"), include.ci = FALSE, omit.coef = "factor", center=TRUE,file = "compare1.html") ``` --- #Does partnership make you happier? Compare pooled OLS, fixed effect, random effect <img src="https://github.com/fancycmn/slide11/blob/main/S11_Pic8.PNG?raw=true" width="70%" style="display: block; margin-top:10px;"> --- #Which model should I use - Criteria 1: doing the test - Breusch and Pagan Lagrange Multiplier Test: random effect vs pooled OLS - Hausman Test: fixed effect vs random effect --- #BP-LM test - The null hypothesis is **the variance of the random effect is zero**. That is, variance of `\(u_{i}\)` is zero. - If the null hypothesis is rejected, we should use random effect. - If the null hypothesis is not reject, we should use pooled OLS. ```r plmtest(pols, type=c("bp")) ``` ``` ## ## Lagrange Multiplier Test - (Breusch-Pagan) ## ## data: sat ~ ptner + hlt ## chisq = 2440.9, df = 1, p-value < 2.2e-16 ## alternative hypothesis: significant effects ``` p-value is very significant here. This means that we should reject the null hypothesis. We should not use pooled OLS. We **should use** random effect. --- #Hausman Test - The null hypothesis is that `\(Cov(u_{i},partner_{i,t})=0\)` is true. - If the null hypothesis is rejected, we should use fixed effect. - If the null hypothesis is not rejected, we should use random effect. ```r phtest(fixed, random) ``` ``` ## ## Hausman Test ## ## data: sat ~ ptner + hlt ## chisq = 238.8, df = 2, p-value < 2.2e-16 ## alternative hypothesis: one model is inconsistent ``` p-value is less than 0.05 here. This means that we reject the null hypothesis. We should not use random effect. We **should use** fixed effect. --- #Which model should I use - Criteria 1: doing the test - Breusch and Pagan Lagrange Multiplier Test: random effect vs pooled OLS - Hausman Test: fixed effect vs random effect - Criteria 2: theoretical consideration - Consider theoretically whether `\(Cov(u_{i},X_{i,t})=0,or\space\neq 0\)` - In political science and economics, fixed effect is the standard model - In psychology, random effect is more preferred. - Criteria 3: how many ID you have in your dataset - Choose fixed effect , when you have very small number of ID, which is not randomly sampled. - e.g. you have follow several individuals for a long time - e.g. when you have several countries for a long time - Choose fixed effect, When you have sampled all the units - e.g. you sampled all the states in a country --- #Take home - Understand what is random effect - Understand what is the difference between fixed effect and random effect - Know how to run random effect - Know how to do tests to select models - Important codes: - `plm(Y ~ X, data=your own data, model="random")` - `plmtest(your pooled regression, type=c("bp"))` to select pooled ols or random effect - `phtest(fixed, random)` to select fixed or random effect --- class: center, middle #[Exercise](https://rpubs.com/fancycmn/970816)