This analysis is based on a dataset behind the story Why Many Americans Don’t Vote from FiveThirtyEight that explores why some Americans do not vote in presidential elections.
The original dataset contains information on a variety of demographic, socioeconomic, and attitudinal factors that contribute to non-voting.
In this analysis, we will be selecting a subset of variables to explore the relationships between non-voting and key demographic and socioeconomic factors.
First, let’s load the libraries we will be using:
library(dplyr)
library(readr)
Next, we will load the original dataset into R:
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/non-voters/nonvoters_data.csv"
nonvoters <- read_csv(url)
This dataset has 119 columns and 5836 rows.
Here are the column names:
colnames(nonvoters)
## [1] "RespId" "weight" "Q1" "Q2_1"
## [5] "Q2_2" "Q2_3" "Q2_4" "Q2_5"
## [9] "Q2_6" "Q2_7" "Q2_8" "Q2_9"
## [13] "Q2_10" "Q3_1" "Q3_2" "Q3_3"
## [17] "Q3_4" "Q3_5" "Q3_6" "Q4_1"
## [21] "Q4_2" "Q4_3" "Q4_4" "Q4_5"
## [25] "Q4_6" "Q5" "Q6" "Q7"
## [29] "Q8_1" "Q8_2" "Q8_3" "Q8_4"
## [33] "Q8_5" "Q8_6" "Q8_7" "Q8_8"
## [37] "Q8_9" "Q9_1" "Q9_2" "Q9_3"
## [41] "Q9_4" "Q10_1" "Q10_2" "Q10_3"
## [45] "Q10_4" "Q11_1" "Q11_2" "Q11_3"
## [49] "Q11_4" "Q11_5" "Q11_6" "Q14"
## [53] "Q15" "Q16" "Q17_1" "Q17_2"
## [57] "Q17_3" "Q17_4" "Q18_1" "Q18_2"
## [61] "Q18_3" "Q18_4" "Q18_5" "Q18_6"
## [65] "Q18_7" "Q18_8" "Q18_9" "Q18_10"
## [69] "Q19_1" "Q19_2" "Q19_3" "Q19_4"
## [73] "Q19_5" "Q19_6" "Q19_7" "Q19_8"
## [77] "Q19_9" "Q19_10" "Q20" "Q21"
## [81] "Q22" "Q23" "Q24" "Q25"
## [85] "Q26" "Q27_1" "Q27_2" "Q27_3"
## [89] "Q27_4" "Q27_5" "Q27_6" "Q28_1"
## [93] "Q28_2" "Q28_3" "Q28_4" "Q28_5"
## [97] "Q28_6" "Q28_7" "Q28_8" "Q29_1"
## [101] "Q29_2" "Q29_3" "Q29_4" "Q29_5"
## [105] "Q29_6" "Q29_7" "Q29_8" "Q29_9"
## [109] "Q29_10" "Q30" "Q31" "Q32"
## [113] "Q33" "ppage" "educ" "race"
## [117] "gender" "income_cat" "voter_category"
We will select a subset of variables from the dataset that are most relevant to our analysis. Specifically, we will focus on the following variables:
nonvoters <- nonvoters %>%
select(ppage, educ, race, gender, income_cat, voter_category)
Let’s view the first 10 rows:
head(nonvoters, 10)
## # A tibble: 10 × 6
## ppage educ race gender income_cat voter_category
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 73 College White Female $75-125k always
## 2 90 College White Female $125k or more always
## 3 53 College White Male $125k or more sporadic
## 4 58 Some college Black Female $40-75k sporadic
## 5 81 High school or less White Male $40-75k always
## 6 61 High school or less White Female $40-75k rarely/never
## 7 80 High school or less White Female $125k or more always
## 8 68 Some college Other/Mixed Female $75-125k always
## 9 70 College White Male $125k or more always
## 10 83 Some college White Male $125k or more always
Let’s use better and clear column names, and get rid of unnecessary abbreviations:
nonvoters <- nonvoters %>%
rename(
age = ppage,
highest_education_level = educ,
income_category = income_cat
)
The renamed columns now look like this:
head(nonvoters, 10)
## # A tibble: 10 × 6
## age highest_education_level race gender income_category voter_category
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 73 College White Female $75-125k always
## 2 90 College White Female $125k or more always
## 3 53 College White Male $125k or more sporadic
## 4 58 Some college Black Female $40-75k sporadic
## 5 81 High school or less White Male $40-75k always
## 6 61 High school or less White Female $40-75k rarely/never
## 7 80 High school or less White Female $125k or more always
## 8 68 Some college Other/Mi… Female $75-125k always
## 9 70 College White Male $125k or more always
## 10 83 Some college White Male $125k or more always
In conclusion, the non-voters dataset provides valuable insights into the demographics and attitudes of non-voters in the United States. By tidying the dataset, we were able to create a more useful and manageable subset of the data that includes only the relevant columns. We also renamed some of the columns to make them more interpretable.
To extend and verify the work from the selected article, one could conduct more detailed analyses to investigate the reasons behind non-voting among different demographic groups. This could include conducting statistical tests to determine if there are significant differences in the reasons for non-voting between different age, race, or income groups. Additionally, it would be interesting to explore the effectiveness of various voter outreach and mobilization strategies, such as door-to-door canvassing, phone banking, or social media campaigns. Such analyses could help to identify effective strategies for increasing voter turnout in future elections.