class: center, middle, inverse, title-slide .title[ # Advanced quantitative data analysis ] .subtitle[ ## Cross-sectional data analysis II ] .author[ ### Mengni Chen ] .institute[ ### Department of Sociology, University of Copenhagen ] --- <style type="text/css"> .remark-slide-content { font-size: 20px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 14px; } </style> #Let's get ready ```r #install one new package install.packages("oaxaca") #for Blinder-Oaxaca Decomposition ``` ```r library(tidyverse) #for recoding library(haven) #for importing data library(janitor) #for tabulation library(estimatr) #robust standard error OLS library(texreg) #export regression result library(ggplot2) #for plotting library(oaxaca) #for Blinder-Oaxaca Decomposition ``` Prepare a dataset of age, sex, relationship status, life satisfaction - DV: life satisfaction (sat6) - IV: - age(age), - sex(sex_gen), - relationship status(relstat), - number of children (nkids), - health (hlt1) - religion (sd30) --- #Prepare data ```r wave1 <- read_dta("anchor1_50percent_Eng.dta") # sample size =6201 ``` ```r tabyl(wave1,age) #no missing tabyl(wave1,sex_gen) #no missing tabyl(wave1,relstat) # 34 cases reporting -7, needs cleaning tabyl(wave1, nkids) #no missing tabyl(wave1, hlt1) #10 cases reporting -1 or -2, needs cleaning tabyl(wave1,sd30) #28 cases reporting -1 or -2, needs cleaning ``` --- #Prepare data ```r wave1b <- wave1 %>% transmute( age, nkids, sat6=case_when(sat6<0 ~ as.numeric(NA), #specify when sat should be considered missing TRUE ~ as.numeric(sat6)), gender=as_factor(sex_gen) %>% fct_drop(), #treat sex_gen as categorical, and drop unused level relstat=as_factor(relstat), #treat relationship status as categorical relstat_new1=case_when( relstat=="-7 Incomplete data" ~ as.character(NA),#specify when it should be missing TRUE ~ as.character(relstat) ) %>% as_factor() %>% fct_drop(), #make relstat as a factor, and then drop unused levels in relstat_new1 health=case_when( hlt1<0 ~ as.numeric(NA),#specify when it should be missing TRUE ~ hlt1), religion=case_when( sd30<0 ~ as.character(NA), #specify when it should be missing sd30==7 ~ "No",#specify when it should be "no religion" sd30 %in% c(1:6) ~ "Yes" #specify when it should be "have religion" ) %>% as_factor()%>%fct_relevel("No", "Yes") #use "No" as reference level )%>% drop_na() #drop all observations with missing values in the sample # sample size change from 6201 to 6132 ``` --- #The relationship between having a partner and life satisfaction - Generate a new variable for relationship status ```r wave1c <- wave1b %>% mutate( relstat_new2=case_when( relstat_new1 %in% c("1 Never married single") ~ "single", #treat 'never married single' as 'single' relstat_new1 %in% c("2 Never married LAT", "3 Never married COHAB", "4 Married COHAB", "5 Married noncohabiting") ~ 'partnered', #treat the 4 situations as "partnered" relstat_new1 %in% c("6 Divorced/separated single", "7 Divorced/separated LAT", "8 Divorced/separated COHAB") ~ 'separated', #treat the 3 situations as "separated" relstat_new1 %in% c("9 Widowed single", "10 Widowed LAT") ~ 'widowed' #treat the 2 situations as "widowed" ) %>% as_factor()# make relstat_new2 as factor ) %>% filter(relstat_new2!= "widowed" & relstat_new2!= "separated") #only 4 widowed and 284 separated, dropping. # sample size change to 5845 after dropping widowed and separated ``` --- #Exploring the relationship between having a partner and life satisfaction First, we can estimate the average sat6 across relationship status to have a first look .pull-left[ ```r tabyl(wave1c,relstat_new2) ``` ``` ## relstat_new2 n percent ## single 2435 0.4165954 ## partnered 3410 0.5834046 ## separated 0 0.0000000 ## widowed 0 0.0000000 ``` ```r #check the frequency of each category ``` ] .pull-right[ ```r wave1c$relstat_new2 <- fct_drop(wave1c$relstat_new2) #drop unused levels compare<- wave1c %>% group_by(relstat_new2) %>% dplyr::summarise( mean_sat6=mean(sat6), #calculate the mean of sat6 by marital1 ) compare ``` ``` ## # A tibble: 2 × 2 ## relstat_new2 mean_sat6 ## <fct> <dbl> ## 1 single 7.51 ## 2 partnered 7.78 ``` ] --- #Why those partnered are happier than those singles ```r wave1c$relstat_new2 <- fct_drop(wave1c$relstat_new2) #drop unused levels compare<- wave1c %>% group_by(relstat_new2) %>% dplyr::summarise( mean_sat6=mean(sat6), #calculate the mean of sat6 by marital1 mean_age=mean(age), mean_nkids=mean(nkids), mean_health=mean(health), prop_religion=sum(religion=="Yes")/n() #the no. of Yes/the total size of the group ) compare ``` ``` ## # A tibble: 2 × 6 ## relstat_new2 mean_sat6 mean_age mean_nkids mean_health prop_religion ## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 single 7.51 20.3 0.0583 3.90 0.744 ## 2 partnered 7.78 29.1 0.969 3.79 0.725 ``` ```r #compare to the single group, the partnered group are older, had more children, slightly lower in health and religion. ``` --- #Why those partnered are happier than those singles ```r OLS_single <- lm_robust(formula = sat6 ~ age + nkids + health + religion, subset=(relstat_new2=="single"), data = wave1c) OLS_partner <- lm_robust(formula = sat6 ~ age + nkids + health + religion, subset=(relstat_new2=="partnered"), data = wave1c) #subset=() here is to specify which sample for the OLS screenreg(list(OLS_single,OLS_partner),include.ci = FALSE, digits = 3, custom.model.names = c("OLS for single","OLS for partnered"),#rename the models single.row =TRUE #to make the coef. and standard error in the same row ) ``` ``` ## ## ======================================================= ## OLS for single OLS for partnered ## ------------------------------------------------------- ## (Intercept) 7.038 (0.225) *** 5.947 (0.184) *** ## age -0.070 (0.006) *** -0.013 (0.005) ** ## nkids 0.076 (0.144) 0.060 (0.033) ## health 0.464 (0.042) *** 0.502 (0.031) *** ## religionYes 0.107 (0.082) 0.337 (0.061) *** ## ------------------------------------------------------- ## R^2 0.147 0.100 ## Adj. R^2 0.146 0.099 ## Num. obs. 2435 3410 ## RMSE 1.661 1.557 ## ======================================================= ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` ```r #screen reg is to show the result in your screen in a formatted way ``` --- #Why those partnered are happier than those singles ```r OLS_single <- lm_robust(formula = sat6 ~ age + nkids + health + religion, subset=(relstat_new2=="single"), data = wave1c) OLS_partner <- lm_robust(formula = sat6 ~ age + nkids + health + religion, subset=(relstat_new2=="partnered"), data = wave1c) #subset=() here is to specify which sample for the OLS screenreg(list(OLS_single,OLS_partner),include.ci = FALSE, digits = 3, custom.model.names = c("OLS for single","OLS for partnered"),#rename the models single.row =TRUE #to make the coef. and standard error in the same row ) ``` ``` ## ## ======================================================= ## OLS for single OLS for partnered ## ------------------------------------------------------- ## (Intercept) 7.038 (0.225) *** 5.947 (0.184) *** ## age -0.070 (0.006) *** -0.013 (0.005) ** ## nkids 0.076 (0.144) 0.060 (0.033) ## health 0.464 (0.042) *** 0.502 (0.031) *** ## religionYes 0.107 (0.082) 0.337 (0.061) *** ## ------------------------------------------------------- ## R^2 0.147 0.100 ## Adj. R^2 0.146 0.099 ## Num. obs. 2435 3410 ## RMSE 1.661 1.557 ## ======================================================= ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` ```r #screen reg is to show the result in your screen in a formatted way ``` <img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f1.png?raw=true" width="50%" style="display: block; margin: auto;" > --- #Why those partnered are happier than those singles - Blinder-Oaxaca Decomposition - an econometric method - often used to explain the difference in the means of the dependent variable between two groups - A good example is the gender difference in wage: why men's average wage is higher than women's wage - because men are more educated, more experienced, working long hours than women? (composition difference) - or the return of education, experience, labor is higher for men than women? (discrimination) - today, we learn the so-call two-fold Blinder-Oaxaca decomposition --- # OLS regression `$$y_i= \alpha+\beta*x_i + e_i$$` -- - Group A (treated)=Those partnered, the life satisfaction can be modelled as the following `$$y_i^a= \alpha^a +\beta_1^a*age_i^a +... +\beta_4^a*religion_i^a+ e_i^a$$` -- - Then, Group A (treated) =Those partnered, the average life satisfaction is `$$\overline{y^a}= \alpha^a +\beta_1^a*\overline{age_i^a} + ...+\beta_4^a*\overline{religion_i^a}$$` -- - Group B (control)=Those singled, the life satisfaction can be modeled as the following `$$y_i^b= \alpha^b +\beta_1^b*age_i^b + ... + \beta_4^b*religion_i^b + e_i^b$$` -- - Then,Group B (control)=Those singled, the average life satisfaction is `$$\overline{y^b}= \alpha^b +\beta_1^b*\overline{age_i^b} + ...+\beta_4^b*\overline{religion_i^b}$$` --- #Blinder-Oaxaca Decomposition The difference between the average life satisfaction between those singles and partnered can be described: `$$\overline{y^a}-\overline{y^b}= \alpha^a - \alpha^b + \beta_1^a*\overline{age_i^a}- \beta_1^b*\overline{age_i^b} + ...+ \beta_4^a*\overline{religion_i^a}-\beta_4^b*\overline{religion_i^b}$$` some transformation `$$\overline{y^a}-\overline{y^b}=$$` `$$\alpha^a - \alpha^b +$$` `$$\color{red}\beta_\color{red}1^\color{red}b*(\overline{age_i^a}-\overline{age_i^b})+ \overline{age_i^a}*(\beta_1^a-\beta_1^b)+$$` `$$...+$$` `$$\color{red}\beta_\color{red}4^\color{red}b*(\overline{religion_i^a}-\overline{religion_i^b})+ \overline{religion_i^a}*(\beta_4^a-\beta_4^b)$$` *note: `\(a\)` represents "partnered group"; `\(b\)` represents "single group" --- #Blinder-Oaxaca Decomposition <img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f5.png?raw=true" width="100%" style="display: block; margin: auto;" > --- #Blinder-Oaxaca Decomposition - explained part: attribute to cross-group differences in the explanatory variables. often called the composition effect, or endowment effect; - unexplained part: attribute to difference in the intercept + different in the regression coefficients.often called the coefficient effect; <img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f6.png?raw=true" width="80%" style="display: block; margin: auto;" > --- #Blinder-Oaxaca Decomposition: coding - first step: identify group, here is "relstat_new3" ```r wave1c <- wave1c %>% mutate( relstat_new3=case_when( relstat_new2 %in% c("single") ~ TRUE, #oaxaca treat those TRUE as the control group,i.e. group B relstat_new2 %in% c("partnered") ~ FALSE) #oaxaca treat those FALSE as the treated group, i.e. group A ) ``` - second step: decompose. Remember, to use regression coefficients of Group B (control,i.e. single) as reference, i.e. `\(\color{red}\beta^\color{red}b\)`, we need to specific group.weights=0 ```r #oaxaca(#formula = y ~ x1 + x2.. | your group variable, data=your data) result <- oaxaca(formula = sat6 ~ age + nkids + health + religion | relstat_new3 , group.weights = 0, data = wave1c) ``` ``` ## oaxaca: oaxaca() performing analysis. Please wait. ``` ``` ## ## Bootstrapping standard errors: ``` ``` ## 1 / 100 (1%) ``` ``` ## 10 / 100 (10%) ``` ``` ## 20 / 100 (20%) ``` ``` ## 30 / 100 (30%) ``` ``` ## 40 / 100 (40%) ``` ``` ## 50 / 100 (50%) ``` ``` ## 60 / 100 (60%) ``` ``` ## 70 / 100 (70%) ``` ``` ## 80 / 100 (80%) ``` ``` ## 90 / 100 (90%) ``` ``` ## 100 / 100 (100%) ``` --- #Blinder-Oaxaca Decomposition: understand results ```r result #look at what the result of decomposition looks ``` ``` ## $beta ## $beta$beta.A ## (Intercept) age nkids health religionYes ## 5.94657309 -0.01288393 0.05987523 0.50224933 0.33670445 ## ## $beta$beta.B ## (Intercept) age nkids health religionYes ## 7.03838495 -0.07037895 0.07564373 0.46434948 0.10704506 ## ## $beta$beta.diff ## (Intercept) age nkids health religionYes ## -1.09181186 0.05749502 -0.01576851 0.03789985 0.22965939 ## ## $beta$beta.R ## (Intercept) age nkids health religionYes ## [1,] 0.0000000 7.038385 -0.07037895 0.07564373 0.4643495 0.1070451 ## [2,] 1.0000000 5.946573 -0.01288393 0.05987523 0.5022493 0.3367045 ## [3,] 0.5000000 6.492479 -0.04163144 0.06775948 0.4832994 0.2218748 ## [4,] 0.5834046 6.401417 -0.03683609 0.06644431 0.4864604 0.2410294 ## [5,] -1.0000000 6.109504 -0.02582568 0.19722216 0.4957332 0.2645122 ## [6,] -2.0000000 6.705597 -0.03890896 0.14254406 0.4953289 0.2609984 ## [7,] 0.0000000 7.038385 -0.07037895 0.07564373 0.4643495 0.1070451 ## ## ## $call ## oaxaca(formula = sat6 ~ age + nkids + health + religion | relstat_new3, ## data = wave1c, group.weights = 0) ## ## $n ## $n$n.A ## [1] 3410 ## ## $n$n.B ## [1] 2435 ## ## $n$n.pooled ## [1] 5845 ## ## ## $R ## [1] 100 ## ## $reg ## $reg$reg.A ## ## Call: ## NULL ## ## Coefficients: ## (Intercept) age nkids health religionYes ## 5.94657 -0.01288 0.05988 0.50225 0.33670 ## ## ## $reg$reg.B ## ## Call: ## NULL ## ## Coefficients: ## (Intercept) age nkids health religionYes ## 7.03838 -0.07038 0.07564 0.46435 0.10705 ## ## ## $reg$reg.pooled.1 ## ## Call: ## NULL ## ## Coefficients: ## (Intercept) age nkids health religionYes ## 6.10950 -0.02583 0.19722 0.49573 0.26451 ## ## ## $reg$reg.pooled.2 ## ## Call: ## NULL ## ## Coefficients: ## (Intercept) age nkids health ## 6.70560 -0.03891 0.14254 0.49533 ## religionYes relstat_new3TRUE ## 0.26100 -0.54465 ## ## ## ## $threefold ## $threefold$overall ## coef(endowments) se(endowments) coef(coefficients) se(coefficients) ## -0.60740111 0.13557500 0.39187202 0.05264004 ## coef(interaction) se(interaction) ## 0.48570318 0.13776063 ## ## $threefold$variables ## coef(endowments) se(endowments) coef(coefficients) se(coefficients) ## (Intercept) 0.000000000 0.000000000 -1.0918118635 0.253885841 ## age -0.622738578 0.054574435 1.1658856925 0.144769249 ## nkids 0.068858905 0.142748998 -0.0009195598 0.009373424 ## health -0.051432520 0.012033382 0.1478172132 0.193272198 ## religionYes -0.002088922 0.002529319 0.1709005412 0.083773119 ## coef(interaction) se(interaction) ## (Intercept) 0.000000000 0.000000000 ## age 0.508736888 0.065393110 ## nkids -0.014354159 0.145256582 ## health -0.004197883 0.005454114 ## religionYes -0.004481668 0.003987870 ## ## ## $twofold ## $twofold$overall ## group.weight coef(explained) se(explained) coef(unexplained) ## [1,] 0.0000000 -0.6074011 0.13557500 0.8775752 ## [2,] 1.0000000 -0.1216979 0.03378946 0.3918720 ## [3,] 0.5000000 -0.3645495 0.07082835 0.6347236 ## [4,] 0.5834046 -0.3240396 0.08115709 0.5942137 ## [5,] -1.0000000 -0.1090531 0.02572224 0.3792272 ## [6,] -2.0000000 -0.2744792 0.03222091 0.5446532 ## [7,] 0.0000000 -0.6074011 0.13557500 0.8775752 ## se(unexplained) coef(unexplained A) se(unexplained A) coef(unexplained B) ## [1,] 0.14207581 8.775752e-01 1.420758e-01 0.0000000 ## [2,] 0.05264004 0.000000e+00 0.000000e+00 0.3918720 ## [3,] 0.08205947 4.387876e-01 7.103790e-02 0.1959360 ## [4,] 0.09120432 3.655938e-01 8.288768e-02 0.2286199 ## [5,] 0.03733074 1.579843e-01 1.587636e-02 0.2212429 ## [6,] 0.05173187 -1.841235e-13 1.438947e-13 0.5446532 ## [7,] 0.14207581 8.775752e-01 1.420758e-01 0.0000000 ## se(unexplained B) ## [1,] 0.00000000 ## [2,] 0.05264004 ## [3,] 0.02632002 ## [4,] 0.02192960 ## [5,] 0.02193859 ## [6,] 0.05173187 ## [7,] 0.00000000 ## ## $twofold$variables ## $twofold$variables[[1]] ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) 0 0.000000000 0.000000000 -1.09181186 ## age 0 -0.622738578 0.054574435 1.67462258 ## nkids 0 0.068858905 0.142748998 -0.01527372 ## health 0 -0.051432520 0.012033382 0.14361933 ## religionYes 0 -0.002088922 0.002529319 0.16641887 ## se(unexplained) coef(unexplained A) se(unexplained A) ## (Intercept) 0.25388584 -1.09181186 0.25388584 ## age 0.20919778 1.67462258 0.20919778 ## nkids 0.15458849 -0.01527372 0.15458849 ## health 0.18799208 0.14361933 0.18799208 ## religionYes 0.08166684 0.16641887 0.08166684 ## coef(unexplained B) se(unexplained B) ## (Intercept) 0 0 ## age 0 0 ## nkids 0 0 ## health 0 0 ## religionYes 0 0 ## ## $twofold$variables[[2]] ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) 1 0.00000000 0.000000000 -1.0918118635 ## age 1 -0.11400169 0.034992847 1.1658856925 ## nkids 1 0.05450475 0.028104487 -0.0009195598 ## health 1 -0.05563040 0.012540701 0.1478172132 ## religionYes 1 -0.00657059 0.004763773 0.1709005412 ## se(unexplained) coef(unexplained A) se(unexplained A) ## (Intercept) 0.253885841 0 0 ## age 0.144769249 0 0 ## nkids 0.009373424 0 0 ## health 0.193272198 0 0 ## religionYes 0.083773119 0 0 ## coef(unexplained B) se(unexplained B) ## (Intercept) -1.0918118635 0.253885841 ## age 1.1658856925 0.144769249 ## nkids -0.0009195598 0.009373424 ## health 0.1478172132 0.193272198 ## religionYes 0.1709005412 0.083773119 ## ## $twofold$variables[[3]] ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) 0.5 0.000000000 0.00000000 -1.091811864 ## age 0.5 -0.368370134 0.03213051 1.420254137 ## nkids 0.5 0.061681825 0.07286083 -0.008096639 ## health 0.5 -0.053531462 0.01198328 0.145718271 ## religionYes 0.5 -0.004329756 0.00325111 0.168659707 ## se(unexplained) coef(unexplained A) se(unexplained A) ## (Intercept) 0.25388584 -0.545905932 0.12694292 ## age 0.17689505 0.837311290 0.10459889 ## nkids 0.08196257 -0.007636859 0.07729424 ## health 0.19063092 0.071809665 0.09399604 ## religionYes 0.08270265 0.083209437 0.04083342 ## coef(unexplained B) se(unexplained B) ## (Intercept) -0.5459059318 0.126942920 ## age 0.5829428463 0.072384625 ## nkids -0.0004597799 0.004686712 ## health 0.0739086066 0.096636099 ## religionYes 0.0854502706 0.041886559 ## ## $twofold$variables[[4]] ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) 0.5834046 0.000000000 0.00000000 -1.091811864 ## age 0.5834046 -0.325939128 0.03476201 1.377823130 ## nkids 0.5834046 0.060484622 0.08419713 -0.006899436 ## health 0.5834046 -0.053881584 0.01194847 0.146068394 ## religionYes 0.5834046 -0.004703547 0.00305306 0.169033499 ## se(unexplained) coef(unexplained A) se(unexplained A) ## (Intercept) 0.25388584 -0.454843779 0.14811817 ## age 0.18227368 0.697640031 0.12204695 ## nkids 0.09407699 -0.006362961 0.09018764 ## health 0.19019056 0.059831149 0.10967545 ## religionYes 0.08252742 0.069329334 0.04764481 ## coef(unexplained B) se(unexplained B) ## (Intercept) -0.6369680846 0.105767668 ## age 0.6801830986 0.060310200 ## nkids -0.0005364754 0.003904925 ## health 0.0862372450 0.080516305 ## religionYes 0.0997041652 0.034899494 ## ## $twofold$variables[[5]] ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) -1 0.00000000 0.000000000 -1.0918119 ## age -1 -0.22851506 0.028544991 1.2803991 ## nkids -1 0.17953241 0.028338986 -0.1259472 ## health -1 -0.05490866 0.012016393 0.1470955 ## religionYes -1 -0.00516180 0.003687676 0.1694918 ## se(unexplained) coef(unexplained A) se(unexplained A) ## (Intercept) 0.25388584 -0.16293056 0.12247469 ## age 0.17061320 0.37694667 0.09199642 ## nkids 0.02158975 -0.13303722 0.01556796 ## health 0.19109404 0.02469249 0.07927063 ## religionYes 0.08289376 0.05231292 0.03395066 ## coef(unexplained B) se(unexplained B) ## (Intercept) -0.928881301 0.142790443 ## age 0.903452391 0.091166905 ## nkids 0.007089994 0.008933804 ## health 0.122402984 0.114545031 ## religionYes 0.117178832 0.050120768 ## ## $twofold$variables[[6]] ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) -2 0.000000000 0.000000000 -1.09181186 ## age -2 -0.344280679 0.030421263 1.39616468 ## nkids -2 0.129758640 0.028520507 -0.07617345 ## health -2 -0.054863884 0.012282192 0.14705069 ## religionYes -2 -0.005093231 0.003751362 0.16942318 ## se(unexplained) coef(unexplained A) se(unexplained A) ## (Intercept) 0.25388584 -0.75902411 0.10827295 ## age 0.17246244 0.75801540 0.09251316 ## nkids 0.02057542 -0.08007483 0.01372315 ## health 0.19106955 0.02622441 0.07907386 ## religionYes 0.08290507 0.05485913 0.03371728 ## coef(unexplained B) se(unexplained B) ## (Intercept) -0.332787750 0.154066808 ## age 0.638149284 0.084030240 ## nkids 0.003901374 0.008849679 ## health 0.120826281 0.113495881 ## religionYes 0.114564050 0.049932253 ## ## $twofold$variables[[7]] ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) 0 0.000000000 0.000000000 -1.09181186 ## age 0 -0.622738578 0.054574435 1.67462258 ## nkids 0 0.068858905 0.142748998 -0.01527372 ## health 0 -0.051432520 0.012033382 0.14361933 ## religionYes 0 -0.002088922 0.002529319 0.16641887 ## se(unexplained) coef(unexplained A) se(unexplained A) ## (Intercept) 0.25388584 -1.09181186 0.25388584 ## age 0.20919778 1.67462258 0.20919778 ## nkids 0.15458849 -0.01527372 0.15458849 ## health 0.18799208 0.14361933 0.18799208 ## religionYes 0.08166684 0.16641887 0.08166684 ## coef(unexplained B) se(unexplained B) ## (Intercept) 0 0 ## age 0 0 ## nkids 0 0 ## health 0 0 ## religionYes 0 0 ## ## ## ## $x ## $x$x.mean.A ## (Intercept) age nkids health religionYes ## 1.0000000 29.1263930 0.9686217 3.7894428 0.7246334 ## ## $x$x.mean.B ## (Intercept) age nkids health religionYes ## 1.00000000 20.27802875 0.05831622 3.90020534 0.74414784 ## ## $x$x.mean.diff ## (Intercept) age nkids health religionYes ## 0.00000000 8.84836421 0.91030548 -0.11076252 -0.01951441 ## ## ## $y ## $y$y.A ## [1] 7.77654 ## ## $y$y.B ## [1] 7.506366 ## ## $y$y.diff ## [1] 0.2701741 ## ## ## attr(,"class") ## [1] "oaxaca" ``` --- #Blinder-Oaxaca Decomposition: understand results how to read the result? you need to know what is [list in R](https://www.youtube.com/watch?v=X8lNTDeiKiE&t=171s) ```r #I want to see the resutl that uses the equally weighted average coefficent to decompose variable <- result$twofold$variables[[1]] overall <- result$twofold$overall[1,] result2<- rbind(variable, overall)[,c(1:5)] #rbind is to bind rows of "variable" and "overall" dataset. [,c(1:5)] after rbind is to select column 1 to 5. result2 ``` ``` ## group.weight coef(explained) se(explained) coef(unexplained) ## (Intercept) 0 0.000000000 0.000000000 -1.09181186 ## age 0 -0.622738578 0.054574435 1.67462258 ## nkids 0 0.068858905 0.142748998 -0.01527372 ## health 0 -0.051432520 0.012033382 0.14361933 ## religionYes 0 -0.002088922 0.002529319 0.16641887 ## overall 0 -0.607401115 0.135574997 0.87757520 ## se(unexplained) ## (Intercept) 0.25388584 ## age 0.20919778 ## nkids 0.15458849 ## health 0.18799208 ## religionYes 0.08166684 ## overall 0.14207581 ``` --- #Blinder-Oaxaca Decomposition: visualize results .pull-left[ ```r plot(result, decomposition = "twofold", group.weight = 0, type="overall", title = "Overall") ``` <img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f2.png?raw=true" width="80%" style="display: block; margin: auto;" > ] .pull-right[ ```r plot(result, decomposition = "twofold", group.weight =0, type="variables", title="by variable") ``` <img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f3.png?raw=true" width="80%" style="display: block; margin: auto;" > ] --- #Take home 1. Exploratory analysis: two-fold oaxaca decomposition 2. Two-fold Oaxaca decomposition - explained part - unexplained part - reference set of coefficient in OB decomposition 4. Important code - oaxaca(formula = y ~ x1 + x2 + x3 | your group var , data = your dataset) - plot(result, decomposition = "twofold", type="", group.weight =, title = "") --- class: center, middle #[Exercise](https://rpubs.com/fancycmn/1348401)