First, I load a package called tidyVerse.
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.0 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
We are going to be looking at some Pew Survey Data, which I have uploaded to R. Now, I just have to load it into this R Notebook.
pew <- read_csv("January 3-10, 2018 - Core Trends Survey/January 3-10, 2018 - Core Trends Survey - CSV.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## usr = col_character(),
## `pial11ao@` = col_character()
## )
## See spec(...) for full column specifications.
The Pew Survery data set contains, among many other things, a list of the top social media sites. Participants were asked whether or not they use each of these sites, with “Yes”, “No”, and “Don’t know” and refusal to answer being possible.
In this case, I am most interested in learning about how many survey respondents use Twitter. The relevant variable from the dataset is called web1a.
I am particularly interested in seeing the number of Twitter users sorted by educational attainment. Do those with a college degree seem to be more likely to use Twitter than those with only some college, for example?
pew <- pew %>%
mutate(web1a = as.factor(web1a))
pew <- pew %>%
mutate(web1a = fct_recode(web1a, "Yes" = "1", "No" = "2", NULL = "8", NULL = "9"))
pew <- pew %>%
mutate(educ2 = as.factor(educ2))
pew <- pew %>%
mutate(educ2 = fct_recode(educ2, "Less than HS" = "1", "Some HS" = "2", "HS graduate" = "3", "Some college" = "4", "Associate degree" = "5", "College degree" = "6", "Some grad school" = "7", "Grad degree" = "8", NULL = "98", NULL = "99"))
pew %>%
count(web1a)
I also want to see how many participants fell into each category of educational attainment, so I am creating a table to display that as well.
pew %>%
count(educ2)
Finally, it would be interesting to see these two variables (web1a and educ2, that is, Twitter usage and educational attainment) listed all in one table. I am now going to create that table to take a look.
pew %>%
count(web1a, educ2)
pew %>%
drop_na(web1a) %>%
ggplot(aes(x = web1a, fill = educ2)) +
scale_fill_viridis_d() +
geom_bar(position = "dodge") +
coord_flip() +
theme_minimal() +
labs(y = "Number of people",
x = "Do you use Twitter?",
title = "Twitter Usage by Educational Attainment")
pew <- pew %>%
mutate(educ2_simple = fct_collapse(educ2,
Some_college_or_less = c("Less than HS",
"Some HS",
"HS graduate",
"Some college"),
Associate_degree_or_more = c("Associate degree",
"College degree",
"Some grad school",
"Grad degree")))
pew %>%
count(educ2_simple)
pew %>%
drop_na(web1a) %>%
ggplot(aes(x = web1a, fill = educ2_simple)) +
scale_fill_viridis_d() +
geom_bar(position = "dodge") +
coord_flip() +
theme_minimal() +
labs(y = "Number of people",
x = "Do you use Twitter?",
title = "Twitter Usage by Educational Attainment")