Introduction

This R Markdown is to accompany our (Gurleen Bassy, Mateja Dokic, Katherine Liu) submission for the ‘Stepping Stone 2’ assignment. All the tables and visualizations used in our submission along with the code behind them are found in this Markdown.

Setup

After we set up our data set and all necessary packages for our visualizations and regressions, we will take a look at our variables. We are working with variables ‘polviews’, ‘reliten’, and ‘sexeduc’. These variables represent political views, strength of religious affiliation, and attitudes on public school sex education respectively.

# Check the initial state of the variables
table(gss$polviews)
## 
##             extremely liberal                       liberal 
##                          2081                          7623 
##              slightly liberal  moderate, middle of the road 
##                          7900                         23992 
##         slightly conservative                  conservative 
##                          9596                          9361 
##        extremely conservative                    don't know 
##                          2165                             0 
##                           iap            I don't have a job 
##                             0                             0 
##                   dk, na, iap                     no answer 
##                             0                             0 
##    not imputable_(2147483637)    not imputable_(2147483638) 
##                             0                             0 
##                       refused                skipped on web 
##                             0                             0 
##                    uncodeable not available in this release 
##                             0                             0 
##    not available in this year                  see codebook 
##                             0                             0
unique(gss$polviews)
## [1] <NA>                         moderate, middle of the road
## [3] slightly conservative        conservative                
## [5] liberal                      extremely conservative      
## [7] slightly liberal             extremely liberal           
## 20 Levels: extremely liberal liberal ... see codebook
table(gss$reliten)
## 
##                        strong               not very strong 
##                         22652                         23738 
##        somewhat strong (vol.)                   no religion 
##                          5736                          7629 
##                    don't know                           iap 
##                             0                             0 
##            I don't have a job                   dk, na, iap 
##                             0                             0 
##                     no answer    not imputable_(2147483637) 
##                             0                             0 
##    not imputable_(2147483638)                       refused 
##                             0                             0 
##                skipped on web                    uncodeable 
##                             0                             0 
## not available in this release    not available in this year 
##                             0                             0 
##                  see codebook 
##                             0
unique(gss$reliten)
## [1] <NA>                   strong                 not very strong       
## [4] somewhat strong (vol.) no religion           
## 17 Levels: strong not very strong somewhat strong (vol.) ... see codebook
table(gss$sexeduc)
## 
##                         favor                        oppose 
##                         35639                          5127 
##   depends on age/grade (vol.)                    don't know 
##                             9                             0 
##                           iap            I don't have a job 
##                             0                             0 
##                   dk, na, iap                     no answer 
##                             0                             0 
##    not imputable_(2147483637)    not imputable_(2147483638) 
##                             0                             0 
##                       refused                skipped on web 
##                             0                             0 
##                    uncodeable not available in this release 
##                             0                             0 
##    not available in this year                  see codebook 
##                             0                             0
unique(gss$sexeduc)
## [1] <NA>                        favor                      
## [3] oppose                      depends on age/grade (vol.)
## 16 Levels: favor oppose depends on age/grade (vol.) don't know ... see codebook

Cleaning Variables

Lets recode some of them and clean them up, minding syntax.

# Recoding polviews
gss <- gss %>%
  mutate(polviews_recoded = case_when(
    polviews %in% c("extremely liberal") ~ "Extremely Liberal",
    polviews %in% c("slightly liberal") ~ "Slightly Liberal",
    polviews %in% c("liberal") ~ "Liberal",
    polviews %in% c("moderate, middle of the road") ~ "Moderate",
    polviews %in% c("extremely conservative") ~ "Extremely Conservative",
    polviews %in% c("slightly conservative") ~ "Slightly Conservative",
    polviews %in% c("conservative") ~ "Conservative",
    TRUE ~ NA_character_
  ),
  polviews_recoded = factor(polviews_recoded, levels = c("Extremely Liberal", "Liberal", "Slightly Liberal", "Moderate", "Slightly Conservative", "Conservative", "Extremely Conservative"))
  ) %>%
  filter(!is.na(polviews_recoded))

# Check the recoding
gss %>%
  count(polviews_recoded)
##         polviews_recoded     n
## 1      Extremely Liberal  2081
## 2                Liberal  7623
## 3       Slightly Liberal  7900
## 4               Moderate 23992
## 5  Slightly Conservative  9596
## 6           Conservative  9361
## 7 Extremely Conservative  2165
# Recoding sexeduc, we will make this variable dichotomous as the response "depends on age/grade" does not help our research and only has 9 responses anyways
gss <- gss %>%
  mutate(sexeduc_recoded = case_when(
    sexeduc %in% c("favor") ~ "In Favour",
    sexeduc %in% c("oppose") ~ "Oppose",
    TRUE ~ NA_character_
  ),
  sexeduc_recoded = factor(sexeduc_recoded, levels = c("Oppose", "In Favour"))
  ) %>%
  filter(!is.na(sexeduc_recoded))

# Check the recoding
gss %>%
  count(sexeduc_recoded)
##   sexeduc_recoded     n
## 1          Oppose  4625
## 2       In Favour 33833
# Recoding reliten
gss <- gss %>%
  mutate(reliten_recoded = case_when(
    reliten %in% c("strong") ~ "Strong",
    reliten %in% c("somewhat strong (vol.)") ~ "Somewhat Strong",
    reliten %in% c("not very strong") ~ "Not Very Strong",
    reliten %in% c("no religion") ~ "No Religious Affiliation",
    TRUE ~ NA_character_
  ),
  reliten_recoded = factor(reliten_recoded, levels = c("No Religious Affiliation", "Not Very Strong", "Somewhat Strong", "Strong"))
  ) %>%
  filter(!is.na(reliten_recoded))

# Check the recoding
gss %>%
  count(reliten_recoded)
##            reliten_recoded     n
## 1 No Religious Affiliation  4251
## 2          Not Very Strong 13172
## 3          Somewhat Strong  2897
## 4                   Strong 12117

Descriptive Statistics

We will need to convert our variables to numeric ones in order to make our descriptive statistics table and our regression table later on.

# Converting categorical variables to numeric for analysis purposes
gss <- gss %>%
  mutate(
    polviews_recoded_numeric = case_when(
      polviews_recoded == "Extremely Liberal" ~ 0,
      polviews_recoded == "Liberal" ~ 1,
      polviews_recoded == "Slightly Liberal" ~ 2,
      polviews_recoded == "Moderate" ~ 3,
      polviews_recoded == "Slightly Conservative" ~ 4,
      polviews_recoded == "Conservative" ~ 5,
      polviews_recoded == "Extremely Conservative" ~ 6,
      TRUE ~ NA_real_
    ),
    reliten_recoded_numeric = case_when(
      reliten_recoded == "No Religious Affiliation" ~ 0,
      reliten_recoded == "Not Very Strong" ~ 1,
      reliten_recoded == "Somewhat Strong" ~ 2,
      reliten_recoded == "Strong" ~ 3,
      TRUE ~ NA_real_
    ),
    sexeduc_recoded_numeric = case_when(
      sexeduc_recoded == "Oppose" ~ 0,
      sexeduc_recoded == "In Favour" ~ 1,
      TRUE ~ NA_real_
    )
  )

# Renaming variables for clarity
gss_renamed <- gss %>%
  rename(
    "Political Views" = polviews_recoded_numeric,
    "Strength of Religious Affiliation" = reliten_recoded_numeric,
    "Attitude on Public School Sex Education" = sexeduc_recoded_numeric
  )

# Creating our descriptive statistic summary table with datasummary_skim
datasummary_skim(
  gss_renamed %>% select(
    "Political Views",
    "Strength of Religious Affiliation",
    "Attitude on Public School Sex Education"
  ),
  histogram = TRUE
)
tinytable_n9ft05fosjitty1346g3
Unique Missing Pct. Mean SD Min Median Max Histogram
Political Views 7 0 3.1 1.4 0.0 3.0 6.0
Strength of Religious Affiliation 4 0 1.7 1.1 0.0 1.0 3.0
Attitude on Public School Sex Education 2 0 0.9 0.3 0.0 1.0 1.0

Exploratory Data Visualization

We will be making 2 visualizations of our data, a diverging stacked bar chart and a grouped bar plot. For this part we need to make certain we use our original categorical variables or else our chosen visualizations will not work.

# Diverging stacked bar chart using the original categorical variables
ggplot(gss, aes(x = polviews_recoded, fill = sexeduc_recoded)) +
  geom_bar(position = "fill", width = 0.7) +
  facet_wrap(~ reliten_recoded, ncol = 2) +
  scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
  labs(
    title = "Attitudes on Sex Education by Political Views\nand Religious Affiliation",
    x = "Political Views",
    y = "Proportion",
    fill = "Attitude on Sex Education"
  ) +
  coord_flip() +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top",
    legend.title = element_blank(),
    strip.background = element_rect(fill = "#f0f0f0", color = NA),
    strip.text = element_text(face = "bold"),
    plot.margin = margin(20, 20, 20, 20)
  )

# Grouped bar plot using the original categorical variables
ggplot(gss, aes(x = polviews_recoded, fill = sexeduc_recoded)) +
  geom_bar(position = "dodge", color = "black", width = 0.7) +
  facet_wrap(~ reliten_recoded, ncol = 2) +
  scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
  labs(
    title = "Attitudes on Sex Education by Political Views\nand Religious Affiliation",
    x = "Political Views",
    y = "Count",
    fill = "Attitude on Sex Education"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"),
    legend.position = "top",
    legend.title = element_blank(),
    strip.background = element_rect(fill = "#f0f0f0", color = NA),
    strip.text = element_text(face = "bold"),
    panel.grid.major = element_line(color = "grey80"),
    panel.grid.minor = element_blank()
  )

Initial Regression Analysis

We will be conducting a linear regression with 3 models (2 single predictor 1 multiple predictor). It is important to use the numeric version of our variables here in order for the linear regression to make sense. We are using sjPlot as it is the most reader friendly way to analyze coefficients in our opinion

# Fitting our regression models while ensuring to use the numeric variables
model1 <- lm(`Attitude on Public School Sex Education` ~ `Political Views`, data = gss_renamed)
model2 <- lm(`Attitude on Public School Sex Education` ~ `Strength of Religious Affiliation`, data = gss_renamed)
model3 <- lm(`Attitude on Public School Sex Education` ~ `Political Views` + `Strength of Religious Affiliation`, data = gss_renamed)

# Creating a regression table with sjPlot 
tab_model(
  model1, model2, model3,
  dv.labels = c(
    "Model 1<br>Political Views<br>VS<br>Attitude on Public School Sex Education",
    "Model 2<br>Strength of Religious Affiliation<br>VS<br>Attitude on Public School Sex Education",
    "Model 3<br>Political Views + Strength of Religious Affiliation<br>VS<br>Attitude on Public School Sex Education"
  ),
  string.pred = "Predictors",  # Customize column names if necessary
  string.est = "Coefficient", 
  string.ci = "CI",
  string.p = "P-Value"
)
  Model 1
Political Views
VS
Attitude on Public School Sex Education
Model 2
Strength of Religious Affiliation
VS
Attitude on Public School Sex Education
Model 3
Political Views + Strength of Religious Affiliation
VS
Attitude on Public School Sex Education
Predictors Coefficient CI P-Value Coefficient CI P-Value Coefficient CI P-Value
(Intercept) 1.00 0.99 – 1.01 <0.001 0.96 0.95 – 0.97 <0.001 1.05 1.04 – 1.06 <0.001
Political Views -0.04 -0.04 – -0.04 <0.001 -0.03 -0.04 – -0.03 <0.001
Strength of Religious
Affiliation
-0.05 -0.05 – -0.04 <0.001 -0.04 -0.04 – -0.04 <0.001
Observations 32437 32437 32437
R2 / R2 adjusted 0.028 / 0.028 0.025 / 0.025 0.045 / 0.045

Thank you for reading our R Markdown

~ Gurleen Bassy, Mateja Dokic, Katherine Liu