class: center, middle, inverse, title-slide .title[ # Advanced quantitative data analysis ] .subtitle[ ## Fixed effect II ] .author[ ### Mengni Chen ] .institute[ ### Department of Sociology, University of Copenhagen ] --- <style type="text/css"> .remark-slide-content { font-size: 20px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 14px; } </style> #Let's get ready ```r #install.packages("lmtest") library(tidyverse) # Add the tidyverse package to my current library. library(haven) # Handle labelled data. library(broom) library(splitstackshape) #transform wide data (with stacked variables) to long data library(plm) #linear models for panel data library(lmtest) # to generate SE-robust coefficients in fixed effect ``` --- #Does partnership make you happier? - [Prepare the data](https://rpubs.com/fancycmn/1118077) - Fixed effect ```r panel_data <- pdata.frame(long_data, index=c("id", "wave")) fixed <- plm(sat ~ ptner, data=panel_data, model="within") summary(fixed) ``` ``` ## Oneway (individual) effect Within Model ## ## Call: ## plm(formula = sat ~ ptner, data = panel_data, model = "within") ## ## Unbalanced Panel: n = 2591, T = 1-6, N = 10370 ## ## Residuals: ## Min. 1st Qu. Median 3rd Qu. Max. ## -7.83333 -0.50000 0.00000 0.56371 5.66667 ## ## Coefficients: ## Estimate Std. Error t-value Pr(>|t|) ## ptnerYes 0.308867 0.037296 8.2815 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Total Sum of Squares: 12535 ## Residual Sum of Squares: 12426 ## R-Squared: 0.0087404 ## Adj. R-Squared: -0.32147 ## F-statistic: 68.5825 on 1 and 7778 DF, p-value: < 2.22e-16 ``` --- #Get robust Standard error in fixed effect In panel data, errors are potentially heteroscedastic and auto-correlated (i.e., serially correlated over time). Thus, the standard errors and inferences could be problematic. .pull-left[ - Homoscedasticity <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic1.JPG?raw=true" width="100%" style="display: block; margin: 5px ;"> ] .pull-right[ - Serial Correlation When error terms from different (usually adjacent) periods (or cross-section observations) are correlated, the error term is serially correlated, i.e. `\(\epsilon_{i,t}\)` is related with `\(\epsilon_{i,t-1}.\)` Serial correlation occurs in time-series studies when the errors associated with a given period carry over into future periods. ] --- #Get robust standard error in fixed effect ```r coeftest(fixed, vcov. = vcovHC, type = "HC1") ``` ``` ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## ptnerYes 0.308867 0.040715 7.5861 3.679e-14 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r #coeftest is a function under the "lmtest" package, vovc. is to specify the covariance matrix; vcovHC, type="HC1" are to generate robust standard errors. ``` --- #Modelling time and order Such estimation ignores temporal order! <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic2.JPG?raw=true" width="70%" style="display: block; margin: 5px ;"> --- #Modelling time and order `$$Sat_{i,t}-\bar{Sat_{i,t}}= \beta_{1}*(partner_{i,t}-\bar{partner_{i,t}}) + (\epsilon_{i,t} -\bar{\epsilon_{i,t}})$$` ```r panel_data$ptner1=case_when(panel_data$ptner=="Yes" ~ 1, panel_data$ptner=="No" ~ 0) panel_data <- panel_data %>% group_by(id) %>% mutate(gr_mean=mean(ptner1), #generate the mean for ptner1 by id de_mean=ptner1-gr_mean) # within-person de-mean ``` <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic7.JPG?raw=true" width="90%" style="display: block; margin: 5px ;"> --- #Modelling time and order <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic4.JPG?raw=true" width="90%" style="display: block; margin: 5px ;"> --- #What is the problem with the estimation <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic5.JPG?raw=true" width="70%" style="display: block; margin: 5px ;"> - Current setup: - Initial status is defined ("no partner") - Partner effect modelled by a simple dummy (yes vs no) - Problem: - Subsequent order is not defined - Consequences: - Sequence of cause and effect unclear - The estimation mixes the effects of union formation and union dissolution - The estimation comprises repeated events (multiple formations and dissolution) --- #Modelling time and temporal order - How to define the sample? - Included only those who can experience the event. In this case, we only include who are initially single. - How to define the event? - Decide whether to remove or keep reverse and repeated transition - Anchor the event in time --- #How to define the event? Whether to remove or keep reverse and repeated transition - Option 1 - Keep reverse/repeated transitions, if the interest is in finding a partner - Option 2 - Remove reverse/repeated transitions, if the interest is in finding and keeping a partner - Options vary across cases. - In the partnership case, Option 2 might be better - In the divorce case, Option 1 might be better as you can consider remarriage as a path to economic recovery which could lead to improvement of life satisfaction over time. --- #How to define the event? Whether to remove or keep reverse and repeated transition ```r panel_data <- panel_data %>% group_by(id) %>% mutate( wave=as.numeric(wave), partnerwave=case_when(havepartner==1 ~ wave), breakwave=case_when(breakpartner==1 ~ wave), partnerwave1=min(partnerwave, na.rm = TRUE), breakwave1=min(breakwave, na.rm = TRUE) ) #identify the timing of first union formation and first union dissolution #There will be many warnings, whenever you attempt to find the minimum or maximum value of a vector that has a length of zero. It won’t actually prevent your code from running. panel_data$partnerwave1[is.infinite(panel_data$partnerwave1)] <- NA panel_data$breakwave1[is.infinite(panel_data$breakwave1)] <- NA ``` <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic5.PNG?raw=true" width="80%" style="display: block; margin: 5px ;"> --- #How to define the event? Whether to remove or keep reverse and repeated transition ```r #remove observations after upon and after first separation panel_data <- panel_data %>% mutate(dropcase=case_when(is.na(breakwave1)==FALSE & wave>= breakwave1 ~ 1, TRUE ~ 0) ) panel_data2 <- filter(panel_data, dropcase==0) ``` <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic6.PNG?raw=true" width="80%" style="display: block; margin: 5px ;"> --- #Compare results of keeping and removing ```r panel_data2 <- pdata.frame(panel_data2, index=c("id", "wave")) fixed_2 <- plm(sat ~ ptner, data=panel_data2, model="within") coeftest(fixed_2, vcov. = vcovHC, type = "HC1") #results of removing reverse and repeated transition ``` ``` ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## ptnerYes 0.292990 0.044926 6.5216 7.455e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r coeftest(fixed, vcov. = vcovHC, type = "HC1") #original results of keep all transitions ``` ``` ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## ptnerYes 0.308867 0.040715 7.5861 3.679e-14 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- After removing reverse and repeated transition <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic9.JPG?raw=true" width="80%" style="display: block; margin: 5px ;"> Original <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic4.JPG?raw=true" width="80%" style="display: block; margin: 5px ;"> --- #How to define the event? Anchoring the event in time With panel data we can investigate the time path of a causal effect - Termed "impact function" (IF) - Different impact functions, see the major types of impact function in the following <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic10.JPG?raw=true" width="65%" style="display: block; margin: 5px ;"> --- #How to define the event? Anchoring the event in time - Step impact ```r panel_impact <- select(panel_data2,id, wave, sat, ptner,havepartner,ptner1, partnerwave1 ) ``` <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic12.JPG?raw=true" width="65%" style="display: block; margin: 5px ;"> --- #How to define the event? Anchoring the event in time - Linear impact ```r panel_impact <- panel_impact %>% mutate( wave=as.numeric(wave), index=wave - partnerwave1, duration=case_when(index<0 ~ 0, index>=0 ~ index, TRUE ~ 0), #linear setup duration2=duration^2 #quadratic setup ) ``` <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic14.JPG?raw=true" width="65%" style="display: block; margin: 5px ;"> --- #How to define the event? Anchoring the event in time - Dummy impact ```r panel_impact <- panel_impact %>% mutate( index=as.numeric(index), dummy=case_when(index==0 ~ "year of formation", index==-1 ~ "1 year before", index %in%c(-2:-5) ~ "2+ year before", index==1 ~ "1 year after", index>1 ~ "2+ year after", is.na(index)==TRUE ~ "2+ year before" ) %>% as_factor() ) #setup for dummy impact ``` <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic15.JPG?raw=true" width="65%" style="display: block; margin: 5px ;"> --- #How to define the event? Anchoring the event in time ```r #panel_impact <- pdata.frame(panel_impact, index=c("id", "wave")) step <- plm(sat ~ ptner1, data=panel_impact, model="within") #estimate the type of step impact linear <- plm(sat ~ ptner1 + duration, data=panel_impact, model="within") #estimate the type of linear impact quadratic <- plm(sat ~ ptner1 + duration + duration2, data=panel_impact, model="within") #estimate the type of quadratic impact dummyimpact <- plm(sat ~ dummy, data=panel_impact, model="within") #estimate the type of dummy impact texreg::htmlreg(list(step, linear, quadratic, dummyimpact), custom.model.names=c("step", "linear", "quadratic", "dummyimpact"), include.ci = FALSE, center=TRUE, file = "impact.html") ``` ``` ## The table was written to the file 'impact.html'. ``` --- #How to define the event? Anchoring the event in time <img src="https://github.com/fancycmn/slide10/blob/main/S10_Pic16.JPG?raw=true" width="50%" style="display: block; margin: 5px ;"> --- #Take home - Generate standard error robust results, using `coeftest()` - Define your sample: keep samples who are at risk of experiencing the event - Define the event: keep or remove the reverse and repeated transition - Estimate the temporal impact of the event - Step impact - Linear or quadratic impact - Dummy impact --- class: center, middle #[Exercise](https://rpubs.com/fancycmn/1118080)