Introduction

This R Markdown is to accompany our (Gurleen Bassy, Mateja Dokic, Katherine Liu) submission for the Final Report. All the tables and visualizations used in our submission along with the code behind them are found in this Markdown.

Setup

After we set up our data set and all necessary packages for our visualizations and regressions, we will take a look at our variables. We are working with variables ‘polviews’, ‘reliten’, and ‘sexeduc’. These variables represent ideological position, religiosity, and attitudes on public school sex education respectively.

# Check the initial state of the variables
table(gss$polviews)
## 
##             extremely liberal                       liberal 
##                          2081                          7623 
##              slightly liberal  moderate, middle of the road 
##                          7900                         23992 
##         slightly conservative                  conservative 
##                          9596                          9361 
##        extremely conservative                    don't know 
##                          2165                             0 
##                           iap            I don't have a job 
##                             0                             0 
##                   dk, na, iap                     no answer 
##                             0                             0 
##    not imputable_(2147483637)    not imputable_(2147483638) 
##                             0                             0 
##                       refused                skipped on web 
##                             0                             0 
##                    uncodeable not available in this release 
##                             0                             0 
##    not available in this year                  see codebook 
##                             0                             0
unique(gss$polviews)
## [1] <NA>                         moderate, middle of the road
## [3] slightly conservative        conservative                
## [5] liberal                      extremely conservative      
## [7] slightly liberal             extremely liberal           
## 20 Levels: extremely liberal liberal ... see codebook
table(gss$reliten)
## 
##                        strong               not very strong 
##                         22652                         23738 
##        somewhat strong (vol.)                   no religion 
##                          5736                          7629 
##                    don't know                           iap 
##                             0                             0 
##            I don't have a job                   dk, na, iap 
##                             0                             0 
##                     no answer    not imputable_(2147483637) 
##                             0                             0 
##    not imputable_(2147483638)                       refused 
##                             0                             0 
##                skipped on web                    uncodeable 
##                             0                             0 
## not available in this release    not available in this year 
##                             0                             0 
##                  see codebook 
##                             0
unique(gss$reliten)
## [1] <NA>                   strong                 not very strong       
## [4] somewhat strong (vol.) no religion           
## 17 Levels: strong not very strong somewhat strong (vol.) ... see codebook
table(gss$sexeduc)
## 
##                         favor                        oppose 
##                         35639                          5127 
##   depends on age/grade (vol.)                    don't know 
##                             9                             0 
##                           iap            I don't have a job 
##                             0                             0 
##                   dk, na, iap                     no answer 
##                             0                             0 
##    not imputable_(2147483637)    not imputable_(2147483638) 
##                             0                             0 
##                       refused                skipped on web 
##                             0                             0 
##                    uncodeable not available in this release 
##                             0                             0 
##    not available in this year                  see codebook 
##                             0                             0
unique(gss$sexeduc)
## [1] <NA>                        favor                      
## [3] oppose                      depends on age/grade (vol.)
## 16 Levels: favor oppose depends on age/grade (vol.) don't know ... see codebook
table(gss$educ)
## 
##     0     1     2     3     4     5     6     7     8     9    10    11    12 
##   177    49   158   268   326   410   866   896  2786  2172  3010  3942 21401 
##    13    14    15    16    17    18    19    20 
##  5905  8208  3307  9994  2392  2945  1112  1803
unique(gss$educ)
##  [1] 16 10 12 17 14 13  6  9  8 11  7 15 20 18  3  2  4  5 19  1 NA  0
table(gss$childs)
## 
##     0     1     2     3     4     5     6     7     8 
## 19927 11495 18138 11196  5753  2573  1328   718  1001
unique(gss$childs)
##  [1]  0  5  4  2  1  3  6  8  7 NA

Cleaning Variables

Lets recode some of them and clean them up, minding syntax.

# Recoding polviews into three categories: Liberal, Moderate, Conservative
gss <- gss %>%
  mutate(polviews_recoded = case_when(
    polviews %in% c("extremely liberal", "liberal", "slightly liberal") ~ "Liberal",
    polviews %in% c("moderate, middle of the road") ~ "Moderate",
    polviews %in% c("extremely conservative", "conservative", "slightly conservative") ~ "Conservative",
    TRUE ~ NA_character_
  )) %>%
  mutate(polviews_recoded = factor(polviews_recoded, levels = c("Liberal", "Moderate", "Conservative"))) %>%
  filter(!is.na(polviews_recoded))

# Check the recoding
gss %>%
  count(polviews_recoded)
##   polviews_recoded     n
## 1          Liberal 17604
## 2         Moderate 23992
## 3     Conservative 21122
# Recoding sexeduc into a dichotomous variable
gss <- gss %>%
  mutate(sexeduc_recoded = case_when(
    sexeduc %in% c("favor") ~ "In Favour",
    sexeduc %in% c("oppose") ~ "Oppose",
    TRUE ~ NA_character_
  )) %>%
  mutate(sexeduc_recoded = factor(sexeduc_recoded, levels = c("Oppose", "In Favour"))) %>%
  filter(!is.na(sexeduc_recoded))

# Check the recoding
gss %>%
  count(sexeduc_recoded)
##   sexeduc_recoded     n
## 1          Oppose  4625
## 2       In Favour 33833
# Recoding reliten into different levels of religiosity
gss <- gss %>%
  mutate(reliten_recoded = case_when(
    reliten %in% c("no religion") ~ "No Religion",
    reliten %in% c("not very strong") ~ "Low Religiosity",
    reliten %in% c("somewhat strong (vol1.)", "strong") ~ "High Religiosity",
    TRUE ~ NA_character_
  )) %>%
  mutate(reliten_recoded = factor(reliten_recoded, levels = c("No Religion", "Low Religiosity", "High Religiosity"))) %>%
  filter(!is.na(reliten_recoded))

# Check the recoding
gss %>%
  count(reliten_recoded)
##    reliten_recoded     n
## 1      No Religion  4251
## 2  Low Religiosity 13172
## 3 High Religiosity 12117
# Recoding educ into meaningful categories
gss <- gss %>%
  mutate(educ_recoded = case_when(
    educ >= 0 & educ <= 11 ~ "Did Not Complete Highschool",
    educ == 12 ~ "High School Graduate",
    educ >= 13 & educ <= 16 ~ "College/University Graduate or Less",
    educ >= 17 & educ <= 19 ~ "Graduate Level and Above",
    TRUE ~ NA_character_
  )) %>%
  mutate(educ_recoded = factor(educ_recoded, levels = c(
    "Did Not Complete Highschool", 
    "High School Graduate", 
    "College/University Graduate or Less", 
    "Graduate Level and Above"))) %>%
  filter(!is.na(educ_recoded))

# Check the recoding
gss %>%
  count(educ_recoded)
##                          educ_recoded     n
## 1         Did Not Complete Highschool  5840
## 2                High School Graduate  8938
## 3 College/University Graduate or Less 11430
## 4            Graduate Level and Above  2576
#Recoding childs
gss <- gss %>%
  mutate(childs_recoded = case_when(
    childs == 0 ~ "Childless",
    childs >= 1 & childs <= 8 ~ "Has Children",
    TRUE ~ NA_character_
  )) %>% 
  mutate(childs_recoded = factor(childs_recoded, levels = c(
    "Childless",
    "Has Children"
  ))) %>% 
  filter(!is.na(childs_recoded))

#Check recoding
gss %>% 
  count(childs_recoded)
##   childs_recoded     n
## 1      Childless  7932
## 2   Has Children 20798

Descriptive Statistics

We will need to convert our variables to numeric ones in order to make our descriptive statistics table and our regression table later on.

# Converting categorical variables to numeric for analysis purposes
gss <- gss %>%
  mutate(
    polviews_recoded_numeric = case_when(
      polviews_recoded == "Liberal" ~ 0,
      polviews_recoded == "Moderate" ~ 1,
      polviews_recoded == "Conservative" ~ 2,
      TRUE ~ NA_real_
    ),
    reliten_recoded_numeric = case_when(
      reliten_recoded == "No Religion" ~ 0,
      reliten_recoded == "Low Religiosity" ~ 1,
      reliten_recoded == "High Religiosity" ~ 2,
      TRUE ~ NA_real_
    ),
    sexeduc_recoded_numeric = case_when(
      sexeduc_recoded == "Oppose" ~ 0,
      sexeduc_recoded == "In Favour" ~ 1,
      TRUE ~ NA_real_
    ),
    educ_recoded_numeric = case_when(
      educ_recoded == "Did Not Complete Highschool" ~ 0,
      educ_recoded == "High School Graduate" ~ 1,
      educ_recoded == "College/University Graduate or Less" ~ 2,
      educ_recoded == "Graduate Level and Above" ~ 3,
      TRUE ~ NA_real_
    ),
    childs_recoded_numeric = case_when(
      childs_recoded == "Childless" ~ 0,
      childs_recoded == "Has Children" ~ 1,
      TRUE ~ NA_real_
    )
  )

# Renaming variables for clarity
gss_renamed <- gss %>%
  rename(
    "Ideological Position" = polviews_recoded_numeric,
    "Religiosity" = reliten_recoded_numeric,
    "Attitude on Public School Sex Education" = sexeduc_recoded_numeric,
    "Education" = educ_recoded_numeric,
    "Parenthood" = childs_recoded_numeric
  )

# Creating our descriptive statistic summary table with datasummary_skim
datasummary_skim(
  gss_renamed %>% select(
    "Ideological Position",
    "Religiosity",
    "Attitude on Public School Sex Education",
    "Education",
    "Parenthood"
  ),
  histogram = TRUE
)
tinytable_dot277ozxgbemqsc2coa
Unique Missing Pct. Mean SD Min Median Max Histogram
Ideological Position 3 0 1.1 0.8 0.0 1.0 2.0
Religiosity 3 0 1.3 0.7 0.0 1.0 2.0
Attitude on Public School Sex Education 2 0 0.9 0.3 0.0 1.0 1.0
Education 4 0 1.4 0.9 0.0 1.0 3.0
Parenthood 2 0 0.7 0.4 0.0 1.0 1.0

Exploratory Data Visualization

We will be making 2 visualizations of our data, a diverging stacked bar chart and a grouped bar plot. For this part we need to make certain we use our original categorical variables or else our chosen visualizations will not work.

# Diverging stacked bar chart using the original categorical variables
ggplot(gss, aes(x = reliten_recoded, fill = sexeduc_recoded)) +
  geom_bar(position = "fill", width = 0.7) +
  facet_wrap(~ polviews_recoded, ncol = 3) +
  scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
  labs(
    title = "Attitudes on Sex Education by Religiosity\nand Ideological Position",
    x = "Religiosity",
    y = "Proportion",
    fill = "Attitude on Sex Education"
  ) +
  coord_flip() +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top",
    legend.title = element_blank(),
    strip.background = element_rect(fill = "#f0f0f0", color = NA),
    strip.text = element_text(face = "bold"),
    plot.margin = margin(20, 20, 20, 20)
  )

ggplot(gss, aes(x = educ_recoded, fill = sexeduc_recoded)) +
  geom_bar(position = "fill", width = 0.7) +
  facet_wrap(~ polviews_recoded, ncol = 3) +
  scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
  labs(
    title = "Attitudes on Sex Education by\nEducation and Ideological Position",
    x = "Education",
    y = "Proportion",
    fill = "Attitude on Sex Education"
  ) +
  coord_flip() +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top",
    legend.title = element_blank(),
    strip.background = element_rect(fill = "#f0f0f0", color = NA),
    strip.text = element_text(face = "bold"),
    plot.margin = margin(20, 20, 20, 20)
  )

ggplot(gss, aes(x = childs_recoded, fill = sexeduc_recoded)) +
  geom_bar(position = "fill", width = 0.7) +
  facet_wrap(~ polviews_recoded, ncol = 3) +
  scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
  labs(
    title = "Attitudes on Sex Education by Parenthood\nand Ideological Position",
    x = "Parenthood",
    y = "Proportion",
    fill = "Attitude on Sex Education"
  ) +
  coord_flip() +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top",
    legend.title = element_blank(),
    strip.background = element_rect(fill = "#f0f0f0", color = NA),
    strip.text = element_text(face = "bold"),
    plot.margin = margin(20, 20, 20, 20)
  )

# Our last visualization will be an interaction graph to explore the interaction between ideological position and religiosity 
m.interact <- lm(sexeduc_recoded_numeric ~ reliten_recoded:polviews_recoded, data = gss)
summary(m.interact)
## 
## Call:
## lm(formula = sexeduc_recoded_numeric ~ reliten_recoded:polviews_recoded, 
##     data = gss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96811  0.06399  0.07612  0.14212  0.26213 
## 
## Coefficients: (1 not defined because of singularities)
##                                                              Estimate
## (Intercept)                                                  0.737874
## reliten_recodedNo Religion:polviews_recodedLiberal           0.230234
## reliten_recodedLow Religiosity:polviews_recodedLiberal       0.198135
## reliten_recodedHigh Religiosity:polviews_recodedLiberal      0.136502
## reliten_recodedNo Religion:polviews_recodedModerate          0.217056
## reliten_recodedLow Religiosity:polviews_recodedModerate      0.186007
## reliten_recodedHigh Religiosity:polviews_recodedModerate     0.120010
## reliten_recodedNo Religion:polviews_recodedConservative      0.155559
## reliten_recodedLow Religiosity:polviews_recodedConservative  0.138142
## reliten_recodedHigh Religiosity:polviews_recodedConservative       NA
##                                                              Std. Error t value
## (Intercept)                                                    0.004574  161.32
## reliten_recodedNo Religion:polviews_recodedLiberal             0.008768   26.26
## reliten_recodedLow Religiosity:polviews_recodedLiberal         0.007069   28.03
## reliten_recodedHigh Religiosity:polviews_recodedLiberal        0.007790   17.52
## reliten_recodedNo Religion:polviews_recodedModerate            0.009686   22.41
## reliten_recodedLow Religiosity:polviews_recodedModerate        0.006343   29.32
## reliten_recodedHigh Religiosity:polviews_recodedModerate       0.006732   17.83
## reliten_recodedNo Religion:polviews_recodedConservative        0.012215   12.73
## reliten_recodedLow Religiosity:polviews_recodedConservative    0.006872   20.10
## reliten_recodedHigh Religiosity:polviews_recodedConservative         NA      NA
##                                                              Pr(>|t|)    
## (Intercept)                                                    <2e-16 ***
## reliten_recodedNo Religion:polviews_recodedLiberal             <2e-16 ***
## reliten_recodedLow Religiosity:polviews_recodedLiberal         <2e-16 ***
## reliten_recodedHigh Religiosity:polviews_recodedLiberal        <2e-16 ***
## reliten_recodedNo Religion:polviews_recodedModerate            <2e-16 ***
## reliten_recodedLow Religiosity:polviews_recodedModerate        <2e-16 ***
## reliten_recodedHigh Religiosity:polviews_recodedModerate       <2e-16 ***
## reliten_recodedNo Religion:polviews_recodedConservative        <2e-16 ***
## reliten_recodedLow Religiosity:polviews_recodedConservative    <2e-16 ***
## reliten_recodedHigh Religiosity:polviews_recodedConservative       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3217 on 28721 degrees of freedom
## Multiple R-squared:  0.04672,    Adjusted R-squared:  0.04645 
## F-statistic: 175.9 on 8 and 28721 DF,  p-value: < 2.2e-16
plot(allEffects(m.interact), 
     multiline = TRUE,
     main = "Interaction between Religiosity and Ideological Position\non Attitudes toward Sex Education",
     xlab = "Religiosity",
     ylab = "Attitudes toward Sex Education (Numeric)",
     lwd = 2, 
     col = c("blue", "red", "green") 
)

Initial Regression Analysis

We will be conducting a linear regression with 3 models (2 single predictor 1 multiple predictor). It is important to use the numeric version of our variables here in order for the linear regression to make sense. We are using sjPlot as it is the most reader friendly way to analyze coefficients in our opinion

# Fitting our regression models using the categorical variables from gss
model1 <- glm(sexeduc_recoded_numeric ~ polviews_recoded, data = gss, family = binomial(link = "logit"))
model2 <- glm(sexeduc_recoded_numeric ~ reliten_recoded, data = gss, family = binomial(link = "logit"))
model3 <- glm(sexeduc_recoded_numeric ~ educ_recoded, data = gss, family = binomial(link = "logit"))
model4 <- glm(sexeduc_recoded_numeric ~ childs_recoded, data = gss, family = binomial(link = "logit"))

# Creating a regression table with sjPlot 
tab_model(model1, model2, model3, model4,
          show.aic = TRUE)
## Profiled confidence intervals may take longer time to compute.
##   Use `ci_method="wald"` for faster computation of CIs.
## Profiled confidence intervals may take longer time to compute.
##   Use `ci_method="wald"` for faster computation of CIs.
## Profiled confidence intervals may take longer time to compute.
##   Use `ci_method="wald"` for faster computation of CIs.
## Profiled confidence intervals may take longer time to compute.
##   Use `ci_method="wald"` for faster computation of CIs.
  sexeduc_recoded_numeric sexeduc_recoded_numeric sexeduc_recoded_numeric sexeduc_recoded_numeric
Predictors Odds Ratios CI p Odds Ratios CI p Odds Ratios CI p Odds Ratios CI p
(Intercept) 12.06 11.11 – 13.10 <0.001 18.51 16.15 – 21.33 <0.001 3.32 3.13 – 3.53 <0.001 10.02 9.29 – 10.82 <0.001
polviews recoded
[Moderate]
0.77 0.69 – 0.85 <0.001
polviews recoded
[Conservative]
0.35 0.31 – 0.38 <0.001
reliten recoded [Low
Religiosity]
0.56 0.48 – 0.66 <0.001
reliten recoded [High
Religiosity]
0.23 0.20 – 0.27 <0.001
educ recoded [High School
Graduate]
2.17 1.99 – 2.37 <0.001
educ recoded
[College/University
Graduate or Less]
3.36 3.07 – 3.67 <0.001
educ recoded [Graduate
Level and Above]
3.84 3.28 – 4.52 <0.001
childs recoded [Has
Children]
0.63 0.58 – 0.69 <0.001
Observations 28730 28730 28730 28730
R2 Tjur 0.023 0.028 0.030 0.004
AIC 20896.296 20709.083 20748.392 21414.129

Thank you for reading our R Markdown

~ Gurleen Bassy, Mateja Dokic, Katherine Liu