Overview

https://projects.fivethirtyeight.com/non-voters-poll-2020-election/ This is a dataset from 2020 where the FiveThirtyEight team polled more than 8,000 people to gather their voting history. The team at FiveThirtyEight used the data to showcase voting history based on race, income, age, education, and political party association. The FiveThirtyEight team also explored the different barriers people experienced while trying to go vote.

https://raw.githubusercontent.com/fivethirtyeight/data/master/non-voters/nonvoters_data.csv

Import data from URL - explore all the variables of the FiveThirtyEight dataset

nonvoters <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/non-voters/nonvoters_data.csv')
head(nonvoters)
##   RespId weight Q1 Q2_1 Q2_2 Q2_3 Q2_4 Q2_5 Q2_6 Q2_7 Q2_8 Q2_9 Q2_10 Q3_1 Q3_2
## 1 470001 0.7516  1    1    1    2    4    1    4    2    2    4     2    1    1
## 2 470002 1.0267  1    1    2    2    3    1    1    2    1    1     3    3    3
## 3 470003 1.0844  1    1    1    2    2    1    1    2    1    4     3    2    2
## 4 470007 0.6817  1    1    1    1    3    1    1    1    1    1     2    1    1
## 5 480008 0.9910  1    1    1   -1    1    1    1    1    1    1     1    4   -1
## 6 480009 1.0591  1    3    2    3    4    1    3    3    1    1     4    1    2
##   Q3_3 Q3_4 Q3_5 Q3_6 Q4_1 Q4_2 Q4_3 Q4_4 Q4_5 Q4_6 Q5 Q6 Q7 Q8_1 Q8_2 Q8_3
## 1    4    4    3    2    2    1    2    2    2    2  1  2  1    3    4    2
## 2    4    3    3    2    2    2    2    3    3    1  1  2  2    2    3    2
## 3    3    3    2    2    2    2    3    3    2    3  1  1  1    3    2    1
## 4    4    4    2    1    1    2    2    2    2    2  1  3  1    3    2    2
## 5    1    1    2    4    1    1    1    1    1    1  1  2  2    1    3    2
## 6   -1    2    2    2    4    3    3    3    4    2  2  4  1    3    3    3
##   Q8_4 Q8_5 Q8_6 Q8_7 Q8_8 Q8_9 Q9_1 Q9_2 Q9_3 Q9_4 Q10_1 Q10_2 Q10_3 Q10_4
## 1    1    1    1    1    2    4    2    2    4    4     2     2     2     2
## 2    2    2    2    3    2    2    1    1    3    4     2     2     2     2
## 3    1    2    2    2    2    1    1    2    4    4     2     2     1     2
## 4    2    2    2    2    2    2    1    2    4    4     2     2     2     2
## 5    3    3    3    4    2    2    1    4    3    4     2     2     2     2
## 6    2    3    3    2    2    2   -1   -1   -1    4     2     2     2     2
##   Q11_1 Q11_2 Q11_3 Q11_4 Q11_5 Q11_6 Q14 Q15 Q16 Q17_1 Q17_2 Q17_3 Q17_4 Q18_1
## 1     2     2     2     2     2     2   5   1   1     1     1     1     3     2
## 2     2     2     1     2     2     2   1   1   2     2     2     2     3     2
## 3     2     2     1     2     1     2   5   2   1     1     3     1     1     2
## 4     1     2     2     2     1     2   5   1   4     1     1     1     1     2
## 5     2     2     1     2     2     2   1   5   1     2     2     4     4     2
## 6     2     2     2     1     2     2  -1  -1  -1    -1    -1    -1    -1     2
##   Q18_2 Q18_3 Q18_4 Q18_5 Q18_6 Q18_7 Q18_8 Q18_9 Q18_10 Q19_1 Q19_2 Q19_3
## 1     2     2     2     2     2     2     2     2      2    -1    -1     1
## 2     2     2     2     2     2     2     2     2      2    -1     1    -1
## 3     2     2     2     2     2     1     2     2      2    -1     1    -1
## 4     2     2     2     2     2     2     2     2      2    -1    -1     1
## 5     2     2     2     2     2     2     2     2      2    -1    -1    -1
## 6     2     2     2     2     2     2     2     2      2    -1    -1    -1
##   Q19_4 Q19_5 Q19_6 Q19_7 Q19_8 Q19_9 Q19_10 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27_1
## 1     1     1     1     1    -1    -1     -1   1   1  NA   2   1   1   1     1
## 2    -1    -1    -1    -1    -1    -1     -1   1   1  NA   1   3   3   1     1
## 3     1    -1    -1    -1     1     1     -1   1   1  NA   2   1   2   1     1
## 4    -1    -1    -1    -1     1    -1      1   1   1  NA   2   1   2   1     1
## 5    -1    -1    -1    -1    -1    -1     -1   1   1  NA   1   3   1   1     1
## 6    -1    -1    -1    -1    -1    -1     -1   2   2   7  -1   4   3   4     2
##   Q27_2 Q27_3 Q27_4 Q27_5 Q27_6 Q28_1 Q28_2 Q28_3 Q28_4 Q28_5 Q28_6 Q28_7 Q28_8
## 1     1     1     1     1     1     1     1     1     1    -1    -1     1    -1
## 2     1     1     1     1     1     1    -1    -1    -1    -1     1    -1    -1
## 3     1     1     1     1     1     1    -1    -1    -1    -1    -1     1    -1
## 4     1     1     1     1     1     1     1    -1     1    -1    -1    -1    -1
## 5     1     1     1     1     1     1     1     1    -1     1    -1     1    -1
## 6     2     2     2     2     2    NA    NA    NA    NA    NA    NA    NA    NA
##   Q29_1 Q29_2 Q29_3 Q29_4 Q29_5 Q29_6 Q29_7 Q29_8 Q29_9 Q29_10 Q30 Q31 Q32 Q33
## 1    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA   2  NA   1  NA
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA   3  NA  NA   1
## 3    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA   2  NA   2  NA
## 4    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA   2  NA   1  NA
## 5    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA   1  -1  NA  NA
## 6    -1    -1    -1    -1    -1    -1    -1    -1     1     -1   5  NA  NA  -1
##   ppage                educ  race gender    income_cat voter_category
## 1    73             College White Female      $75-125k         always
## 2    90             College White Female $125k or more         always
## 3    53             College White   Male $125k or more       sporadic
## 4    58        Some college Black Female       $40-75k       sporadic
## 5    81 High school or less White   Male       $40-75k         always
## 6    61 High school or less White Female       $40-75k   rarely/never

I’m assuming the columns that starts with Q_ are for data visualization purposes, but I won’t be using them because I don’t know how to create charts using that type of data yet. I’m going to generate a new data frame and get the last six columns for this assignment. I’m also not sure how to work with the weighted column since I don’t know the math behind it and I will ignore it for the purpose of this assignment.

new_nonvoters <- nonvoters %>% transmute(
  RespId,
  participant_age = ppage,
  education = educ,
  race,
  gender,
  income_cat,
  voter_cat = voter_category
)

head(new_nonvoters)
##   RespId participant_age           education  race gender    income_cat
## 1 470001              73             College White Female      $75-125k
## 2 470002              90             College White Female $125k or more
## 3 470003              53             College White   Male $125k or more
## 4 470007              58        Some college Black Female       $40-75k
## 5 480008              81 High school or less White   Male       $40-75k
## 6 480009              61 High school or less White Female       $40-75k
##      voter_cat
## 1       always
## 2       always
## 3     sporadic
## 4     sporadic
## 5       always
## 6 rarely/never

Checking data type for each column’s values and see if any columns need conversion. I don’t see any columns that would need to be converted into a Boolean column, so we should be good here.

str(new_nonvoters)
## 'data.frame':    5836 obs. of  7 variables:
##  $ RespId         : int  470001 470002 470003 470007 480008 480009 480010 470008 470010 470011 ...
##  $ participant_age: int  73 90 53 58 81 61 80 68 70 83 ...
##  $ education      : chr  "College" "College" "College" "Some college" ...
##  $ race           : chr  "White" "White" "White" "Black" ...
##  $ gender         : chr  "Female" "Female" "Male" "Female" ...
##  $ income_cat     : chr  "$75-125k" "$125k or more" "$125k or more" "$40-75k" ...
##  $ voter_cat      : chr  "always" "always" "sporadic" "sporadic" ...

Confirming to see if any rows has any NA values.

summary(new_nonvoters)
##      RespId       participant_age  education             race          
##  Min.   :470001   Min.   :22.00   Length:5836        Length:5836       
##  1st Qu.:472070   1st Qu.:36.00   Class :character   Class :character  
##  Median :474152   Median :54.00   Mode  :character   Mode  :character  
##  Mean   :474654   Mean   :51.69                                        
##  3rd Qu.:476218   3rd Qu.:65.00                                        
##  Max.   :488325   Max.   :94.00                                        
##     gender           income_cat         voter_cat        
##  Length:5836        Length:5836        Length:5836       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 

Conclusion - I believe I would have to reproduce the findings with the same dataset to see if I also arrive at the same findings. I would have to learn to visualize the data that are in the Q columns, which I hope to learn in the near future.