This R Markdown is to accompany our (Gurleen Bassy, Mateja Dokic, Katherine Liu) submission for the Final Report. All the tables and visualizations used in our submission along with the code behind them are found in this Markdown.
After we set up our data set and all necessary packages for our visualizations and regressions, we will take a look at our variables. We are working with variables ‘polviews’, ‘reliten’, and ‘sexeduc’. These variables represent ideological position, religiosity, and attitudes on public school sex education respectively.
# Check the initial state of the variables
table(gss$polviews)
##
## extremely liberal liberal
## 2081 7623
## slightly liberal moderate, middle of the road
## 7900 23992
## slightly conservative conservative
## 9596 9361
## extremely conservative don't know
## 2165 0
## iap I don't have a job
## 0 0
## dk, na, iap no answer
## 0 0
## not imputable_(2147483637) not imputable_(2147483638)
## 0 0
## refused skipped on web
## 0 0
## uncodeable not available in this release
## 0 0
## not available in this year see codebook
## 0 0
unique(gss$polviews)
## [1] <NA> moderate, middle of the road
## [3] slightly conservative conservative
## [5] liberal extremely conservative
## [7] slightly liberal extremely liberal
## 20 Levels: extremely liberal liberal ... see codebook
table(gss$reliten)
##
## strong not very strong
## 22652 23738
## somewhat strong (vol.) no religion
## 5736 7629
## don't know iap
## 0 0
## I don't have a job dk, na, iap
## 0 0
## no answer not imputable_(2147483637)
## 0 0
## not imputable_(2147483638) refused
## 0 0
## skipped on web uncodeable
## 0 0
## not available in this release not available in this year
## 0 0
## see codebook
## 0
unique(gss$reliten)
## [1] <NA> strong not very strong
## [4] somewhat strong (vol.) no religion
## 17 Levels: strong not very strong somewhat strong (vol.) ... see codebook
table(gss$sexeduc)
##
## favor oppose
## 35639 5127
## depends on age/grade (vol.) don't know
## 9 0
## iap I don't have a job
## 0 0
## dk, na, iap no answer
## 0 0
## not imputable_(2147483637) not imputable_(2147483638)
## 0 0
## refused skipped on web
## 0 0
## uncodeable not available in this release
## 0 0
## not available in this year see codebook
## 0 0
unique(gss$sexeduc)
## [1] <NA> favor
## [3] oppose depends on age/grade (vol.)
## 16 Levels: favor oppose depends on age/grade (vol.) don't know ... see codebook
table(gss$educ)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 177 49 158 268 326 410 866 896 2786 2172 3010 3942 21401
## 13 14 15 16 17 18 19 20
## 5905 8208 3307 9994 2392 2945 1112 1803
unique(gss$educ)
## [1] 16 10 12 17 14 13 6 9 8 11 7 15 20 18 3 2 4 5 19 1 NA 0
table(gss$childs)
##
## 0 1 2 3 4 5 6 7 8
## 19927 11495 18138 11196 5753 2573 1328 718 1001
unique(gss$childs)
## [1] 0 5 4 2 1 3 6 8 7 NA
Lets recode some of them and clean them up, minding syntax.
# Recoding polviews into three categories: Liberal, Moderate, Conservative
gss <- gss %>%
mutate(polviews_recoded = case_when(
polviews %in% c("extremely liberal", "liberal", "slightly liberal") ~ "Liberal",
polviews %in% c("moderate, middle of the road") ~ "Moderate",
polviews %in% c("extremely conservative", "conservative", "slightly conservative") ~ "Conservative",
TRUE ~ NA_character_
)) %>%
mutate(polviews_recoded = factor(polviews_recoded, levels = c("Liberal", "Moderate", "Conservative"))) %>%
filter(!is.na(polviews_recoded))
# Check the recoding
gss %>%
count(polviews_recoded)
## polviews_recoded n
## 1 Liberal 17604
## 2 Moderate 23992
## 3 Conservative 21122
# Recoding sexeduc into a dichotomous variable
gss <- gss %>%
mutate(sexeduc_recoded = case_when(
sexeduc %in% c("favor") ~ "In Favour",
sexeduc %in% c("oppose") ~ "Oppose",
TRUE ~ NA_character_
)) %>%
mutate(sexeduc_recoded = factor(sexeduc_recoded, levels = c("Oppose", "In Favour"))) %>%
filter(!is.na(sexeduc_recoded))
# Check the recoding
gss %>%
count(sexeduc_recoded)
## sexeduc_recoded n
## 1 Oppose 4625
## 2 In Favour 33833
# Recoding reliten into different levels of religiosity
gss <- gss %>%
mutate(reliten_recoded = case_when(
reliten %in% c("no religion") ~ "No Religion",
reliten %in% c("not very strong") ~ "Low Religiosity",
reliten %in% c("somewhat strong (vol1.)", "strong") ~ "High Religiosity",
TRUE ~ NA_character_
)) %>%
mutate(reliten_recoded = factor(reliten_recoded, levels = c("No Religion", "Low Religiosity", "High Religiosity"))) %>%
filter(!is.na(reliten_recoded))
# Check the recoding
gss %>%
count(reliten_recoded)
## reliten_recoded n
## 1 No Religion 4251
## 2 Low Religiosity 13172
## 3 High Religiosity 12117
# Recoding educ into meaningful categories
gss <- gss %>%
mutate(educ_recoded = case_when(
educ >= 0 & educ <= 11 ~ "Did Not Complete Highschool",
educ == 12 ~ "High School Graduate",
educ >= 13 & educ <= 16 ~ "College/University Graduate or Less",
educ >= 17 & educ <= 19 ~ "Graduate Level and Above",
TRUE ~ NA_character_
)) %>%
mutate(educ_recoded = factor(educ_recoded, levels = c(
"Did Not Complete Highschool",
"High School Graduate",
"College/University Graduate or Less",
"Graduate Level and Above"))) %>%
filter(!is.na(educ_recoded))
# Check the recoding
gss %>%
count(educ_recoded)
## educ_recoded n
## 1 Did Not Complete Highschool 5840
## 2 High School Graduate 8938
## 3 College/University Graduate or Less 11430
## 4 Graduate Level and Above 2576
#Recoding childs
gss <- gss %>%
mutate(childs_recoded = case_when(
childs == 0 ~ "Childless",
childs >= 1 & childs <= 8 ~ "Has Children",
TRUE ~ NA_character_
)) %>%
mutate(childs_recoded = factor(childs_recoded, levels = c(
"Childless",
"Has Children"
))) %>%
filter(!is.na(childs_recoded))
#Check recoding
gss %>%
count(childs_recoded)
## childs_recoded n
## 1 Childless 7932
## 2 Has Children 20798
We will need to convert our variables to numeric ones in order to make our descriptive statistics table and our regression table later on.
# Converting categorical variables to numeric for analysis purposes
gss <- gss %>%
mutate(
polviews_recoded_numeric = case_when(
polviews_recoded == "Liberal" ~ 0,
polviews_recoded == "Moderate" ~ 1,
polviews_recoded == "Conservative" ~ 2,
TRUE ~ NA_real_
),
reliten_recoded_numeric = case_when(
reliten_recoded == "No Religion" ~ 0,
reliten_recoded == "Low Religiosity" ~ 1,
reliten_recoded == "High Religiosity" ~ 2,
TRUE ~ NA_real_
),
sexeduc_recoded_numeric = case_when(
sexeduc_recoded == "Oppose" ~ 0,
sexeduc_recoded == "In Favour" ~ 1,
TRUE ~ NA_real_
),
educ_recoded_numeric = case_when(
educ_recoded == "Did Not Complete Highschool" ~ 0,
educ_recoded == "High School Graduate" ~ 1,
educ_recoded == "College/University Graduate or Less" ~ 2,
educ_recoded == "Graduate Level and Above" ~ 3,
TRUE ~ NA_real_
),
childs_recoded_numeric = case_when(
childs_recoded == "Childless" ~ 0,
childs_recoded == "Has Children" ~ 1,
TRUE ~ NA_real_
)
)
# Renaming variables for clarity
gss_renamed <- gss %>%
rename(
"Ideological Position" = polviews_recoded_numeric,
"Religiosity" = reliten_recoded_numeric,
"Attitude on Public School Sex Education" = sexeduc_recoded_numeric,
"Education" = educ_recoded_numeric,
"Parenthood" = childs_recoded_numeric
)
# Creating our descriptive statistic summary table with datasummary_skim
datasummary_skim(
gss_renamed %>% select(
"Ideological Position",
"Religiosity",
"Attitude on Public School Sex Education",
"Education",
"Parenthood"
),
histogram = TRUE
)
| Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
|---|---|---|---|---|---|---|---|---|
| Ideological Position | 3 | 0 | 1.1 | 0.8 | 0.0 | 1.0 | 2.0 | |
| Religiosity | 3 | 0 | 1.3 | 0.7 | 0.0 | 1.0 | 2.0 | |
| Attitude on Public School Sex Education | 2 | 0 | 0.9 | 0.3 | 0.0 | 1.0 | 1.0 | |
| Education | 4 | 0 | 1.4 | 0.9 | 0.0 | 1.0 | 3.0 | |
| Parenthood | 2 | 0 | 0.7 | 0.4 | 0.0 | 1.0 | 1.0 |
We will be making 2 visualizations of our data, a diverging stacked bar chart and a grouped bar plot. For this part we need to make certain we use our original categorical variables or else our chosen visualizations will not work.
# Diverging stacked bar chart using the original categorical variables
ggplot(gss, aes(x = reliten_recoded, fill = sexeduc_recoded)) +
geom_bar(position = "fill", width = 0.7) +
facet_wrap(~ polviews_recoded, ncol = 3) +
scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
labs(
title = "Attitudes on Sex Education by Religiosity\nand Ideological Position",
x = "Religiosity",
y = "Proportion",
fill = "Attitude on Sex Education"
) +
coord_flip() +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.title = element_blank(),
strip.background = element_rect(fill = "#f0f0f0", color = NA),
strip.text = element_text(face = "bold"),
plot.margin = margin(20, 20, 20, 20)
)
ggplot(gss, aes(x = educ_recoded, fill = sexeduc_recoded)) +
geom_bar(position = "fill", width = 0.7) +
facet_wrap(~ polviews_recoded, ncol = 3) +
scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
labs(
title = "Attitudes on Sex Education by\nEducation and Ideological Position",
x = "Education",
y = "Proportion",
fill = "Attitude on Sex Education"
) +
coord_flip() +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.title = element_blank(),
strip.background = element_rect(fill = "#f0f0f0", color = NA),
strip.text = element_text(face = "bold"),
plot.margin = margin(20, 20, 20, 20)
)
ggplot(gss, aes(x = childs_recoded, fill = sexeduc_recoded)) +
geom_bar(position = "fill", width = 0.7) +
facet_wrap(~ polviews_recoded, ncol = 3) +
scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
labs(
title = "Attitudes on Sex Education by Parenthood\nand Ideological Position",
x = "Parenthood",
y = "Proportion",
fill = "Attitude on Sex Education"
) +
coord_flip() +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.title = element_blank(),
strip.background = element_rect(fill = "#f0f0f0", color = NA),
strip.text = element_text(face = "bold"),
plot.margin = margin(20, 20, 20, 20)
)
# Our last visualization will be an interaction graph to explore the interaction between ideological position and religiosity
m.interact <- lm(sexeduc_recoded_numeric ~ reliten_recoded:polviews_recoded, data = gss)
summary(m.interact)
##
## Call:
## lm(formula = sexeduc_recoded_numeric ~ reliten_recoded:polviews_recoded,
## data = gss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96811 0.06399 0.07612 0.14212 0.26213
##
## Coefficients: (1 not defined because of singularities)
## Estimate
## (Intercept) 0.737874
## reliten_recodedNo Religion:polviews_recodedLiberal 0.230234
## reliten_recodedLow Religiosity:polviews_recodedLiberal 0.198135
## reliten_recodedHigh Religiosity:polviews_recodedLiberal 0.136502
## reliten_recodedNo Religion:polviews_recodedModerate 0.217056
## reliten_recodedLow Religiosity:polviews_recodedModerate 0.186007
## reliten_recodedHigh Religiosity:polviews_recodedModerate 0.120010
## reliten_recodedNo Religion:polviews_recodedConservative 0.155559
## reliten_recodedLow Religiosity:polviews_recodedConservative 0.138142
## reliten_recodedHigh Religiosity:polviews_recodedConservative NA
## Std. Error t value
## (Intercept) 0.004574 161.32
## reliten_recodedNo Religion:polviews_recodedLiberal 0.008768 26.26
## reliten_recodedLow Religiosity:polviews_recodedLiberal 0.007069 28.03
## reliten_recodedHigh Religiosity:polviews_recodedLiberal 0.007790 17.52
## reliten_recodedNo Religion:polviews_recodedModerate 0.009686 22.41
## reliten_recodedLow Religiosity:polviews_recodedModerate 0.006343 29.32
## reliten_recodedHigh Religiosity:polviews_recodedModerate 0.006732 17.83
## reliten_recodedNo Religion:polviews_recodedConservative 0.012215 12.73
## reliten_recodedLow Religiosity:polviews_recodedConservative 0.006872 20.10
## reliten_recodedHigh Religiosity:polviews_recodedConservative NA NA
## Pr(>|t|)
## (Intercept) <2e-16 ***
## reliten_recodedNo Religion:polviews_recodedLiberal <2e-16 ***
## reliten_recodedLow Religiosity:polviews_recodedLiberal <2e-16 ***
## reliten_recodedHigh Religiosity:polviews_recodedLiberal <2e-16 ***
## reliten_recodedNo Religion:polviews_recodedModerate <2e-16 ***
## reliten_recodedLow Religiosity:polviews_recodedModerate <2e-16 ***
## reliten_recodedHigh Religiosity:polviews_recodedModerate <2e-16 ***
## reliten_recodedNo Religion:polviews_recodedConservative <2e-16 ***
## reliten_recodedLow Religiosity:polviews_recodedConservative <2e-16 ***
## reliten_recodedHigh Religiosity:polviews_recodedConservative NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3217 on 28721 degrees of freedom
## Multiple R-squared: 0.04672, Adjusted R-squared: 0.04645
## F-statistic: 175.9 on 8 and 28721 DF, p-value: < 2.2e-16
plot(allEffects(m.interact),
multiline = TRUE,
main = "Interaction between Religiosity and Ideological Position\non Attitudes toward Sex Education",
xlab = "Religiosity",
ylab = "Attitudes toward Sex Education (Numeric)",
lwd = 2,
col = c("blue", "red", "green")
)
We will be conducting a linear regression with 3 models (2 single predictor 1 multiple predictor). It is important to use the numeric version of our variables here in order for the linear regression to make sense. We are using sjPlot as it is the most reader friendly way to analyze coefficients in our opinion
# Fitting our regression models using the categorical variables from gss
model1 <- glm(sexeduc_recoded_numeric ~ polviews_recoded, data = gss, family = binomial(link = "logit"))
model2 <- glm(sexeduc_recoded_numeric ~ reliten_recoded, data = gss, family = binomial(link = "logit"))
model3 <- glm(sexeduc_recoded_numeric ~ educ_recoded, data = gss, family = binomial(link = "logit"))
model4 <- glm(sexeduc_recoded_numeric ~ childs_recoded, data = gss, family = binomial(link = "logit"))
# Creating a regression table with sjPlot
tab_model(model1, model2, model3, model4,
show.aic = TRUE)
## Profiled confidence intervals may take longer time to compute.
## Use `ci_method="wald"` for faster computation of CIs.
## Profiled confidence intervals may take longer time to compute.
## Use `ci_method="wald"` for faster computation of CIs.
## Profiled confidence intervals may take longer time to compute.
## Use `ci_method="wald"` for faster computation of CIs.
## Profiled confidence intervals may take longer time to compute.
## Use `ci_method="wald"` for faster computation of CIs.
| sexeduc_recoded_numeric | sexeduc_recoded_numeric | sexeduc_recoded_numeric | sexeduc_recoded_numeric | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictors | Odds Ratios | CI | p | Odds Ratios | CI | p | Odds Ratios | CI | p | Odds Ratios | CI | p |
| (Intercept) | 12.06 | 11.11 – 13.10 | <0.001 | 18.51 | 16.15 – 21.33 | <0.001 | 3.32 | 3.13 – 3.53 | <0.001 | 10.02 | 9.29 – 10.82 | <0.001 |
|
polviews recoded [Moderate] |
0.77 | 0.69 – 0.85 | <0.001 | |||||||||
|
polviews recoded [Conservative] |
0.35 | 0.31 – 0.38 | <0.001 | |||||||||
|
reliten recoded [Low Religiosity] |
0.56 | 0.48 – 0.66 | <0.001 | |||||||||
|
reliten recoded [High Religiosity] |
0.23 | 0.20 – 0.27 | <0.001 | |||||||||
|
educ recoded [High School Graduate] |
2.17 | 1.99 – 2.37 | <0.001 | |||||||||
|
educ recoded [College/University Graduate or Less] |
3.36 | 3.07 – 3.67 | <0.001 | |||||||||
|
educ recoded [Graduate Level and Above] |
3.84 | 3.28 – 4.52 | <0.001 | |||||||||
|
childs recoded [Has Children] |
0.63 | 0.58 – 0.69 | <0.001 | |||||||||
| Observations | 28730 | 28730 | 28730 | 28730 | ||||||||
| R2 Tjur | 0.023 | 0.028 | 0.030 | 0.004 | ||||||||
| AIC | 20896.296 | 20709.083 | 20748.392 | 21414.129 | ||||||||
~ Gurleen Bassy, Mateja Dokic, Katherine Liu