Exploratory data analysis on variables that may affect why people don’t vote

Loading data into dataframe

non_voter = read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/non-voters/nonvoters_data.csv", header= TRUE, sep=",")

Creating a subset,renaming columns and droping na values

subset_voter = subset(non_voter,select = c(ppage,educ,race,gender,income_cat))

colnames(subset_voter)=c("age","education","race","gender","income")
subset_voter = na.omit(subset_voter)

head(subset_voter)
##   age           education  race gender        income
## 1  73             College White Female      $75-125k
## 2  90             College White Female $125k or more
## 3  53             College White   Male $125k or more
## 4  58        Some college Black Female       $40-75k
## 5  81 High school or less White   Male       $40-75k
## 6  61 High school or less White Female       $40-75k
hist(subset_voter$age)

Distribution for age of non-voters

income_count = table(subset_voter$income)
barplot(income_count)

Bar graph for the incomes of non-voters

education_count = table(subset_voter$education)
barplot(education_count)

Bar graph for education of non-voters

gender_count = table(subset_voter$gender)
barplot(gender_count)

Gender count of non-voters

Conclusion: 1:The distribution for age of non voters is not normally distributed and there seem to be two peaks: 1) Late 20;s and 2) In the 60’s 2: People in any income group are equally likely to not vote except for the $75-125k group. They are slightly more likely to not vote. 3: People who are more educated seem more likely to not vote as there are significantly more college educated people in this sample 4: Men and women are both equally likely to not vote.