For this dataset, I’ll be using the first one posted by Farhana and it holds vote counts for two states. This data has one row per political candidate and includes the following columns:
The last 2 columns are vote counts for those states.
vote_data <- read.csv("https://raw.githubusercontent.com/addsding/data607/main/project2/vote-counts.csv")
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on
## 'https://raw.githubusercontent.com/addsding/data607/main/project2/vote-counts.csv'
head(vote_data)
To get this data tidy, we’ll be pivoting the last two columns to have one column per candidate and state combination.
Pivot time!
vote_data_pivot <- pivot_longer(vote_data, cols=2:3, names_to="state", values_to="votes")
head(vote_data_pivot)
Data looks good, no other cleaning necessary!
Overall, who has the most votes? And what percentage of votes did they get?
votes <- vote_data_pivot |>
group_by(Candidate) |>
summarise(sum_votes = sum(votes),
.groups = 'drop') |>
mutate(freq = formattable::percent(sum_votes / sum(sum_votes)))
votes
From this summary, it looks like Hilary Clinton had the most votes at 54.97% for CA and FL with Donald Trump in 2nd place at 41.11%. Does this change if we group by state as well?
cali_votes <- vote_data_pivot[vote_data_pivot$state=="CA",] |>
group_by(Candidate) |>
summarise(sum_votes = sum(votes),
.groups = 'drop') |>
mutate(freq = formattable::percent(sum_votes / sum(sum_votes)))
cali_votes
fl_votes <- vote_data_pivot[vote_data_pivot$state=="FL",] |>
group_by(Candidate) |>
summarise(sum_votes = sum(votes),
.groups = 'drop') |>
mutate(freq = formattable::percent(sum_votes / sum(sum_votes)))
fl_votes
By state, Clinton is very much the majority in CA. in FL however, it’s much closer with Trump actually taking the lead. This makes sense as FL tends to lean more Republican while CA is very much a Democratic state.
With the help of pivoting, this data was easy to tidy up and then
analyze with dplyr. It would be more interesting to see
other states’ voting data as well as perhaps what percentage of the
state is in which political party to see if the votes lined up with
affiliation. It’d also be interesting to see the percentage of folks who
didn’t vote if possible and see how that swung the outcome of this
election in another analysis.