Hi! My name is Marcus Mann and I’m an assistant professor in sociology at Purdue University. I got my Ph.D. in sociology at Duke University. I mainly study politics and knowledge (e.g. political identity and attitudes toward science and scientists) and use a variety of computational methods in my research. I also teach these methods at the graduate level here at Purdue.
At the end of this course, you will know:
Three families of automated text analysis
You always want to make sure you’re working from the same directory on your computer
To check what your current directory is, you can type getwd()
And to set a new directory, you can type setwd("~/your/working/directory/here")
The Pacman package allows you to install AND load packages at the same time through its “p_load” function using only only one line of code.
This means you only need to install the pacman package once and then can use p_load for every package after.
First install the package as normal with install.packages("pacman")
Then you can install and load all packages you’ll need for the rest of class
pacman::p_load( devtools, harrypotter, textdata, tidyverse, stringr, tidytext, dplyr)
First we’re going to load Bradley Boehmke’s Harry Potter dataset which he has made available publicly on Githhub and which includes all text from the Harry Potter series organized into its separate books.
To download this corpus, we use the “devtools” package to download user-generated R packages that are not available through CRAN.
install.packages("devtools") library(devtools) install_github("bradleyboehmke/harrypotter")titles <- c("Philosopher's Stone", "Chamber of Secrets", "Prisoner of Azkaban", "Goblet of Fire", "Order of the Phoenix", "Half-Blood Prince", "Deathly Hallows")
books <- list(philosophers_stone, chamber_of_secrets, prisoner_of_azkaban, goblet_of_fire, order_of_the_phoenix, half_blood_prince, deathly_hallows)
series <- tibble()
for(i in seq_along(titles)) { temp <- tibble(chapter = seq_along(books[[i]]), text = books[[i]]) %>% unnest_tokens(word, text) %>%
mutate(book = titles[i]) %>% select(book, everything()) series <- rbind(series, temp) }
series$book <- factor(series$book, levels = rev(titles)) series