Exploratory data analysis on variables that may affect why people don’t vote
Loading data into dataframe
non_voter = read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/non-voters/nonvoters_data.csv", header= TRUE, sep=",")
Creating a subset,renaming columns and droping na values
subset_voter = subset(non_voter,select = c(ppage,educ,race,gender,income_cat))
colnames(subset_voter)=c("age","education","race","gender","income")
subset_voter = na.omit(subset_voter)
head(subset_voter)
## age education race gender income
## 1 73 College White Female $75-125k
## 2 90 College White Female $125k or more
## 3 53 College White Male $125k or more
## 4 58 Some college Black Female $40-75k
## 5 81 High school or less White Male $40-75k
## 6 61 High school or less White Female $40-75k
hist(subset_voter$age)
Distribution for age of non-voters
income_count = table(subset_voter$income)
barplot(income_count)
Bar graph for the incomes of non-voters
education_count = table(subset_voter$education)
barplot(education_count)
Bar graph for education of non-voters
gender_count = table(subset_voter$gender)
barplot(gender_count)
Gender count of non-voters
Conclusion: 1:The distribution for age of non voters is not normally distributed and there seem to be two peaks: 1) Late 20;s and 2) In the 60’s 2: People in any income group are equally likely to not vote except for the $75-125k group. They are slightly more likely to not vote. 3: People who are more educated seem more likely to not vote as there are significantly more college educated people in this sample 4: Men and women are both equally likely to not vote.