Introduction
The generic_ballot_averages.csv dataset provides a comprehensive collection of generic ballot polling averages. This dataset offers insights into the preferences of voters in hypothetical elections, without specifying individual candidates, making it a valuable resource for analyzing and understanding voter sentiments towards different political parties in various election cycles or time periods
Article information: https://projects.fivethirtyeight.com/polls/generic-ballot/
# Loading the necessary libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Import data through URL
data_url <- "https://projects.fivethirtyeight.com/polls/data/generic_ballot_averages.csv"
# Load the dataset from the URL
df <- read.csv(data_url)
# Display the first few rows of the dataset to get an overview
head(df)
## candidate pct_estimate lo hi date election cycle
## 1 Democrats 43.94449 39.33347 48.55550 2017-04-15 2018-11-06 2018
## 2 Republicans 39.54969 34.93867 44.16071 2017-04-15 2018-11-06 2018
## 3 Democrats 43.74965 39.14054 48.35876 2017-04-16 2018-11-06 2018
## 4 Republicans 39.59254 34.98343 44.20165 2017-04-16 2018-11-06 2018
## 5 Democrats 43.74553 39.13599 48.35508 2017-04-17 2018-11-06 2018
## 6 Republicans 39.58794 34.97840 44.19749 2017-04-17 2018-11-06 2018
Creating Transformations
In this section, we clean and reshape the dataset. We select relevant columns, rename them for clarity, and replace non-intuitive abbreviations. Finally, we showcase the initial rows of the transformed dataset
# Load necessary libraries
library(dplyr)
# Load the dataset from the provided URL
data_url <- "https://projects.fivethirtyeight.com/polls/data/generic_ballot_averages.csv"
df <- read.csv(data_url)
# Create a subset of the columns
df_subset <- df %>%
select(
candidate,
pct_estimate,
lo,
hi,
date,
election,
cycle
)
# Rename columns
df_subset <- df_subset %>%
rename(
candidate_name = candidate,
percentage_estimate = pct_estimate,
lower_bound = lo,
upper_bound = hi,
polling_date = date,
election_type = election,
election_cycle = cycle
)
# Replace any non-intuitive abbreviations if necessary
# For example, if "D" represents Democrats and "R" represents Republicans:
df_subset$candidate_name <- ifelse(df_subset$candidate_name == "D", "Democrats",
ifelse(df_subset$candidate_name == "R", "Republicans", df_subset$candidate_name))
# Display the first few rows of the transformed dataset
head(df_subset)
## candidate_name percentage_estimate lower_bound upper_bound polling_date
## 1 Democrats 43.94449 39.33347 48.55550 2017-04-15
## 2 Republicans 39.54969 34.93867 44.16071 2017-04-15
## 3 Democrats 43.74965 39.14054 48.35876 2017-04-16
## 4 Republicans 39.59254 34.98343 44.20165 2017-04-16
## 5 Democrats 43.74553 39.13599 48.35508 2017-04-17
## 6 Republicans 39.58794 34.97840 44.19749 2017-04-17
## election_type election_cycle
## 1 2018-11-06 2018
## 2 2018-11-06 2018
## 3 2018-11-06 2018
## 4 2018-11-06 2018
## 5 2018-11-06 2018
## 6 2018-11-06 2018
Conclusion:
Analyzing the dataset of generic ballot polling averages during the 2018 election cycle reveals intriguing insights into voter preferences. On average, the Democrats held a slight lead with an estimated percentage of approximately 43.9%, while the Republicans obtained around 39.5% support. To gain a deeper understanding of the dynamics at play, it is recommended to further investigate the factors influencing shifts in voter sentiment over time.