You have 7 small tasks. You can work in groups to complete them in-class. The answer provided here may not be the only right answer. You can have your own ways of coding.
Import anchor1_50percent_Eng data
#Question: Import anchor1_50percent_Eng data
library(tidyverse)
library(haven)
wave1 <- read_dta("anchor1_50percent_Eng.dta")
Keep variables of id, age, sex_gen, cohort, homosex_new, yedu, relstat, as well as one variable that reflects the attitude towards family, and one variable that reflects subjective wellbeing. The function that allow you to do this job is called “______”. Make these as a new dataset.
We show the data in this tab.
#Question: Keep variables of id, age, sex_gen, cohort, homosex_new, yedu, relstat, as well as one variable that reflects the attitude towards family, and one variable that reflects subjective wellbeing. The function that allow you to do this job is called "______". Make these as a new dataset.
(wave1a <- select(wave1, id, age, sex_gen, cohort, homosex_new, yeduc,
relstat, val1i7,sat6))
## # A tibble: 6,201 × 9
## id age sex_gen cohort homose…¹ yeduc relstat val1i7 sat6
## <dbl> <dbl+lbl> <dbl+lb> <dbl+l> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl>
## 1 267206000 16 2 [2 Fe… 1 [1 1… -1 [-1 … 0 [0 c… 1 [1 N… 3 7
## 2 112963000 35 1 [1 Ma… 3 [3 1… -1 [-1 … 10.5 1 [1 N… 2 6
## 3 327937000 16 2 [2 Fe… 1 [1 1… 0 [0 H… 0 [0 c… -7 [-7 … 4 8
## 4 318656000 27 2 [2 Fe… 2 [2 1… 0 [0 H… 11.5 4 [4 M… 5 [5 A… 9
## 5 717889000 37 1 [1 Ma… 3 [3 1… 0 [0 H… 11.5 4 [4 M… 4 7
## 6 222517000 15 1 [1 Ma… 1 [1 1… -1 [-1 … 0 [0 c… 1 [1 N… 5 [5 A… 9
## 7 144712000 16 2 [2 Fe… 1 [1 1… -1 [-1 … 0 [0 c… 1 [1 N… 4 8
## 8 659357000 17 2 [2 Fe… 1 [1 1… 0 [0 H… 0 [0 c… 2 [2 N… 5 [5 A… 7
## 9 506367000 37 1 [1 Ma… 3 [3 1… 0 [0 H… 10.5 4 [4 M… 1 [1 D… 9
## 10 64044000 15 2 [2 Fe… 1 [1 1… -1 [-1 … 0 [0 c… 1 [1 N… 1 [1 D… 7
## # … with 6,191 more rows, and abbreviated variable name ¹homosex_new
Change the variables into numeric and factors appropriately. For that, you need the two functions, “______” for numeric and “______” for categorical variables.
#Question: Change the variables into numeric and factors appropriately. For that, you need the two functions, "______" for numeric and "______" for categorical variables.
wave1a <- mutate(wave1a,
id=zap_labels(id),
age=zap_labels(age),
yeduc=zap_labels(yeduc),
sat6=zap_labels(sat6),
sex_gen=as_factor(sex_gen),
cohort=as_factor(cohort),
homosex_new=as_factor(homosex_new),
relstat=as_factor(relstat),
val1i7=as_factor(val1i7)
)
wave1a
## # A tibble: 6,201 × 9
## id age sex_gen cohort homosex_new yeduc relstat val1i7 sat6
## <dbl> <dbl> <fct> <fct> <fct> <dbl> <fct> <fct> <dbl>
## 1 267206000 16 2 Female 1 1991-1993 -1 No partner 0 1 Neve… 3 7
## 2 112963000 35 1 Male 3 1971-1973 -1 No partner 10.5 1 Neve… 2 6
## 3 327937000 16 2 Female 1 1991-1993 0 Hetero 0 -7 Inc… 4 8
## 4 318656000 27 2 Female 2 1981-1983 0 Hetero 11.5 4 Marr… 5 Agr… 9
## 5 717889000 37 1 Male 3 1971-1973 0 Hetero 11.5 4 Marr… 4 7
## 6 222517000 15 1 Male 1 1991-1993 -1 No partner 0 1 Neve… 5 Agr… 9
## 7 144712000 16 2 Female 1 1991-1993 -1 No partner 0 1 Neve… 4 8
## 8 659357000 17 2 Female 1 1991-1993 0 Hetero 0 2 Neve… 5 Agr… 7
## 9 506367000 37 1 Male 3 1971-1973 0 Hetero 10.5 4 Marr… 1 Dis… 9
## 10 64044000 15 2 Female 1 1991-1993 -1 No partner 0 1 Neve… 1 Dis… 7
## # … with 6,191 more rows
#Question: Z-standardize yeduc and age.
# First. check whether values on those variables make sense.
summary(wave1a$yeduc)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -7.000 0.000 11.000 8.933 13.000 20.000
summary(wave1a$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.00 17.00 26.00 25.84 35.00 38.00
wave1a <- mutate(wave1a,
yeduc=case_when(
yeduc<0 ~ as.numeric(NA),
TRUE ~ as.numeric(yeduc)
),
z_yeduc=(yeduc- mean(yeduc,na.rm=TRUE))/sd(yeduc,na.rm = TRUE),
z_age=(age- mean(age))/sd(age)
)
summary(wave1a$z_yeduc)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -1.4513 -1.4513 0.3226 0.0000 0.6451 1.7739 26
summary(wave1a$z_age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.41586 -1.05703 0.01946 0.00000 1.09595 1.45478
Show the frequency of relstat. Recode relstat to “no partner” when the person is never married regardless of the cohabiting status; to “with partner” when the person is married regardless of the cohabiting status. Show the frequence of the relstat again. What is the frequency for “no partner” and the frequency for “with partner”?
#Question: Show the frequency of relstat. Recode relstat to "no partner" when the person is never married regardless of the cohabiting status; to "with partner" when the person is married regardless of the cohabiting status. Show the frequence of the relstat again. What is the frequency for "no partner" and the frequency for "with partner"?
table(wave1a$relstat)
##
## -7 Incomplete data 1 Never married single
## 34 2448
## 2 Never married LAT 3 Never married COHAB
## 1012 660
## 4 Married COHAB 5 Married noncohabiting
## 1735 23
## 6 Divorced/separated single 7 Divorced/separated LAT
## 146 63
## 8 Divorced/separated COHAB 9 Widowed single
## 76 3
## 10 Widowed LAT 11 Widowed COHAB
## 1 0
wave1a <- mutate(wave1a,
relstat = case_when(
relstat %in% c("1 Never married single" ,"2 Never married LAT", "3 Never married COHAB") ~ "no partner",
relstat %in% c("4 Married COHAB" ,"5 Married noncohabiting") ~ "with partner",
TRUE ~ as.character(relstat)
),
relstat = factor(relstat)
)
table(wave1a$relstat)
##
## -7 Incomplete data 10 Widowed LAT
## 34 1
## 6 Divorced/separated single 7 Divorced/separated LAT
## 146 63
## 8 Divorced/separated COHAB 9 Widowed single
## 76 3
## no partner with partner
## 4120 1758
#Remember, %in% is a operator means "belonging to"
Now check out prop.table() and find out how to use. Check which cohort has the highest proportion of people reporting homosexual orientation
#Question: Now check out prop.table() and find out how to use. Check which cohort has the highest proportion of people reporting homosexual orientation
prop.table(table(wave1a$homosex_new, wave1a$cohort), margin = 1)
##
## -7 Incomplete data 0 former capikid first interview
## -7 Incomplete data
## -1 No partner 0.0000000 0.0000000
## 0 Hetero 0.0000000 0.0000000
## 1 Gay 0.0000000 0.0000000
## 2 Lesbian 0.0000000 0.0000000
##
## 1 1991-1993 2 1981-1983 3 1971-1973 4 2001-2003
## -7 Incomplete data
## -1 No partner 0.6233218 0.2408899 0.1357883 0.0000000
## 0 Hetero 0.1520633 0.3835500 0.4643867 0.0000000
## 1 Gay 0.1200000 0.5200000 0.3600000 0.0000000
## 2 Lesbian 0.2258065 0.4838710 0.2903226 0.0000000
##
## 9 former capikid re-interview
## -7 Incomplete data
## -1 No partner 0.0000000
## 0 Hetero 0.0000000
## 1 Gay 0.0000000
## 2 Lesbian 0.0000000
Show a frequency table for variable “val1i7” for the whole sample, and then show a frequency table for people who are aged >30
#Question: Show a frequency table for variable "val1i7" for the whole sample, and then show a frequency table for people who are aged >30
prop.table(table(wave1a$val1i7))
##
## -5 Inconsistent value -4 Filter error / Incorrect entry
## 0.000000000 0.000000000
## -3 Does not apply -2 No answer
## 0.000000000 0.001612643
## -1 Don't know 1 Disagree completely
## 0.008547009 0.147234317
## 2 3
## 0.129011450 0.214159007
## 4 5 Agree completely
## 0.185776488 0.313659087
prop.table(table(wave1a$val1i7[wave1a$age > 30]))
##
## -5 Inconsistent value -4 Filter error / Incorrect entry
## 0.000000000 0.000000000
## -3 Does not apply -2 No answer
## 0.000000000 0.002481390
## -1 Don't know 1 Disagree completely
## 0.006947891 0.211910670
## 2 3
## 0.133995037 0.216377171
## 4 5 Agree completely
## 0.151364764 0.276923077