This R Markdown is to accompany our (Gurleen Bassy, Mateja Dokic, Katherine Liu) submission for the ‘Stepping Stone 2’ assignment. All the tables and visualizations used in our submission along with the code behind them are found in this Markdown.
After we set up our data set and all necessary packages for our visualizations and regressions, we will take a look at our variables. We are working with variables ‘polviews’, ‘reliten’, and ‘sexeduc’. These variables represent political views, strength of religious affiliation, and attitudes on public school sex education respectively.
# Check the initial state of the variables
table(gss$polviews)
##
## extremely liberal liberal
## 2081 7623
## slightly liberal moderate, middle of the road
## 7900 23992
## slightly conservative conservative
## 9596 9361
## extremely conservative don't know
## 2165 0
## iap I don't have a job
## 0 0
## dk, na, iap no answer
## 0 0
## not imputable_(2147483637) not imputable_(2147483638)
## 0 0
## refused skipped on web
## 0 0
## uncodeable not available in this release
## 0 0
## not available in this year see codebook
## 0 0
unique(gss$polviews)
## [1] <NA> moderate, middle of the road
## [3] slightly conservative conservative
## [5] liberal extremely conservative
## [7] slightly liberal extremely liberal
## 20 Levels: extremely liberal liberal ... see codebook
table(gss$reliten)
##
## strong not very strong
## 22652 23738
## somewhat strong (vol.) no religion
## 5736 7629
## don't know iap
## 0 0
## I don't have a job dk, na, iap
## 0 0
## no answer not imputable_(2147483637)
## 0 0
## not imputable_(2147483638) refused
## 0 0
## skipped on web uncodeable
## 0 0
## not available in this release not available in this year
## 0 0
## see codebook
## 0
unique(gss$reliten)
## [1] <NA> strong not very strong
## [4] somewhat strong (vol.) no religion
## 17 Levels: strong not very strong somewhat strong (vol.) ... see codebook
table(gss$sexeduc)
##
## favor oppose
## 35639 5127
## depends on age/grade (vol.) don't know
## 9 0
## iap I don't have a job
## 0 0
## dk, na, iap no answer
## 0 0
## not imputable_(2147483637) not imputable_(2147483638)
## 0 0
## refused skipped on web
## 0 0
## uncodeable not available in this release
## 0 0
## not available in this year see codebook
## 0 0
unique(gss$sexeduc)
## [1] <NA> favor
## [3] oppose depends on age/grade (vol.)
## 16 Levels: favor oppose depends on age/grade (vol.) don't know ... see codebook
Lets recode some of them and clean them up, minding syntax.
# Recoding polviews
gss <- gss %>%
mutate(polviews_recoded = case_when(
polviews %in% c("extremely liberal") ~ "Extremely Liberal",
polviews %in% c("slightly liberal") ~ "Slightly Liberal",
polviews %in% c("liberal") ~ "Liberal",
polviews %in% c("moderate, middle of the road") ~ "Moderate",
polviews %in% c("extremely conservative") ~ "Extremely Conservative",
polviews %in% c("slightly conservative") ~ "Slightly Conservative",
polviews %in% c("conservative") ~ "Conservative",
TRUE ~ NA_character_
),
polviews_recoded = factor(polviews_recoded, levels = c("Extremely Liberal", "Liberal", "Slightly Liberal", "Moderate", "Slightly Conservative", "Conservative", "Extremely Conservative"))
) %>%
filter(!is.na(polviews_recoded))
# Check the recoding
gss %>%
count(polviews_recoded)
## polviews_recoded n
## 1 Extremely Liberal 2081
## 2 Liberal 7623
## 3 Slightly Liberal 7900
## 4 Moderate 23992
## 5 Slightly Conservative 9596
## 6 Conservative 9361
## 7 Extremely Conservative 2165
# Recoding sexeduc, we will make this variable dichotomous as the response "depends on age/grade" does not help our research and only has 9 responses anyways
gss <- gss %>%
mutate(sexeduc_recoded = case_when(
sexeduc %in% c("favor") ~ "In Favour",
sexeduc %in% c("oppose") ~ "Oppose",
TRUE ~ NA_character_
),
sexeduc_recoded = factor(sexeduc_recoded, levels = c("Oppose", "In Favour"))
) %>%
filter(!is.na(sexeduc_recoded))
# Check the recoding
gss %>%
count(sexeduc_recoded)
## sexeduc_recoded n
## 1 Oppose 4625
## 2 In Favour 33833
# Recoding reliten
gss <- gss %>%
mutate(reliten_recoded = case_when(
reliten %in% c("strong") ~ "Strong",
reliten %in% c("somewhat strong (vol.)") ~ "Somewhat Strong",
reliten %in% c("not very strong") ~ "Not Very Strong",
reliten %in% c("no religion") ~ "No Religious Affiliation",
TRUE ~ NA_character_
),
reliten_recoded = factor(reliten_recoded, levels = c("No Religious Affiliation", "Not Very Strong", "Somewhat Strong", "Strong"))
) %>%
filter(!is.na(reliten_recoded))
# Check the recoding
gss %>%
count(reliten_recoded)
## reliten_recoded n
## 1 No Religious Affiliation 4251
## 2 Not Very Strong 13172
## 3 Somewhat Strong 2897
## 4 Strong 12117
We will need to convert our variables to numeric ones in order to make our descriptive statistics table and our regression table later on.
# Converting categorical variables to numeric for analysis purposes
gss <- gss %>%
mutate(
polviews_recoded_numeric = case_when(
polviews_recoded == "Extremely Liberal" ~ 0,
polviews_recoded == "Liberal" ~ 1,
polviews_recoded == "Slightly Liberal" ~ 2,
polviews_recoded == "Moderate" ~ 3,
polviews_recoded == "Slightly Conservative" ~ 4,
polviews_recoded == "Conservative" ~ 5,
polviews_recoded == "Extremely Conservative" ~ 6,
TRUE ~ NA_real_
),
reliten_recoded_numeric = case_when(
reliten_recoded == "No Religious Affiliation" ~ 0,
reliten_recoded == "Not Very Strong" ~ 1,
reliten_recoded == "Somewhat Strong" ~ 2,
reliten_recoded == "Strong" ~ 3,
TRUE ~ NA_real_
),
sexeduc_recoded_numeric = case_when(
sexeduc_recoded == "Oppose" ~ 0,
sexeduc_recoded == "In Favour" ~ 1,
TRUE ~ NA_real_
)
)
# Renaming variables for clarity
gss_renamed <- gss %>%
rename(
"Political Views" = polviews_recoded_numeric,
"Strength of Religious Affiliation" = reliten_recoded_numeric,
"Attitude on Public School Sex Education" = sexeduc_recoded_numeric
)
# Creating our descriptive statistic summary table with datasummary_skim
datasummary_skim(
gss_renamed %>% select(
"Political Views",
"Strength of Religious Affiliation",
"Attitude on Public School Sex Education"
),
histogram = TRUE
)
| Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
|---|---|---|---|---|---|---|---|---|
| Political Views | 7 | 0 | 3.1 | 1.4 | 0.0 | 3.0 | 6.0 | |
| Strength of Religious Affiliation | 4 | 0 | 1.7 | 1.1 | 0.0 | 1.0 | 3.0 | |
| Attitude on Public School Sex Education | 2 | 0 | 0.9 | 0.3 | 0.0 | 1.0 | 1.0 |
We will be making 2 visualizations of our data, a diverging stacked bar chart and a grouped bar plot. For this part we need to make certain we use our original categorical variables or else our chosen visualizations will not work.
# Diverging stacked bar chart using the original categorical variables
ggplot(gss, aes(x = polviews_recoded, fill = sexeduc_recoded)) +
geom_bar(position = "fill", width = 0.7) +
facet_wrap(~ reliten_recoded, ncol = 2) +
scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
labs(
title = "Attitudes on Sex Education by Political Views\nand Religious Affiliation",
x = "Political Views",
y = "Proportion",
fill = "Attitude on Sex Education"
) +
coord_flip() +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.title = element_blank(),
strip.background = element_rect(fill = "#f0f0f0", color = NA),
strip.text = element_text(face = "bold"),
plot.margin = margin(20, 20, 20, 20)
)
# Grouped bar plot using the original categorical variables
ggplot(gss, aes(x = polviews_recoded, fill = sexeduc_recoded)) +
geom_bar(position = "dodge", color = "black", width = 0.7) +
facet_wrap(~ reliten_recoded, ncol = 2) +
scale_fill_manual(values = c("In Favour" = "#1f78b4", "Oppose" = "#e31a1c")) +
labs(
title = "Attitudes on Sex Education by Political Views\nand Religious Affiliation",
x = "Political Views",
y = "Count",
fill = "Attitude on Sex Education"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
legend.position = "top",
legend.title = element_blank(),
strip.background = element_rect(fill = "#f0f0f0", color = NA),
strip.text = element_text(face = "bold"),
panel.grid.major = element_line(color = "grey80"),
panel.grid.minor = element_blank()
)
We will be conducting a linear regression with 3 models (2 single predictor 1 multiple predictor). It is important to use the numeric version of our variables here in order for the linear regression to make sense. We are using sjPlot as it is the most reader friendly way to analyze coefficients in our opinion
# Fitting our regression models while ensuring to use the numeric variables
model1 <- lm(`Attitude on Public School Sex Education` ~ `Political Views`, data = gss_renamed)
model2 <- lm(`Attitude on Public School Sex Education` ~ `Strength of Religious Affiliation`, data = gss_renamed)
model3 <- lm(`Attitude on Public School Sex Education` ~ `Political Views` + `Strength of Religious Affiliation`, data = gss_renamed)
# Creating a regression table with sjPlot
tab_model(
model1, model2, model3,
dv.labels = c(
"Model 1<br>Political Views<br>VS<br>Attitude on Public School Sex Education",
"Model 2<br>Strength of Religious Affiliation<br>VS<br>Attitude on Public School Sex Education",
"Model 3<br>Political Views + Strength of Religious Affiliation<br>VS<br>Attitude on Public School Sex Education"
),
string.pred = "Predictors", # Customize column names if necessary
string.est = "Coefficient",
string.ci = "CI",
string.p = "P-Value"
)
|
Model 1 Political Views VS Attitude on Public School Sex Education |
Model 2 Strength of Religious Affiliation VS Attitude on Public School Sex Education |
Model 3 Political Views + Strength of Religious Affiliation VS Attitude on Public School Sex Education |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| Predictors | Coefficient | CI | P-Value | Coefficient | CI | P-Value | Coefficient | CI | P-Value |
| (Intercept) | 1.00 | 0.99 – 1.01 | <0.001 | 0.96 | 0.95 – 0.97 | <0.001 | 1.05 | 1.04 – 1.06 | <0.001 |
| Political Views | -0.04 | -0.04 – -0.04 | <0.001 | -0.03 | -0.04 – -0.03 | <0.001 | |||
|
Strength of Religious Affiliation |
-0.05 | -0.05 – -0.04 | <0.001 | -0.04 | -0.04 – -0.04 | <0.001 | |||
| Observations | 32437 | 32437 | 32437 | ||||||
| R2 / R2 adjusted | 0.028 / 0.028 | 0.025 / 0.025 | 0.045 / 0.045 | ||||||
~ Gurleen Bassy, Mateja Dokic, Katherine Liu