This analysis aims to explore the Generic Ballot Poll Data from the FiveThirtyEight website link to the article. The dataset contains information about public opinion on which party’s candidate voters would choose in a generic election for the House of Representatives.
# Load required packages
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Read the data from the URL
data_url <- "https://raw.githubusercontent.com/yvargas1590/congress-generic-ballot/main/generic_ballot_averages.csv"
data <- read_csv(data_url)
## Rows: 3986 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): candidate
## dbl (4): pct_estimate, lo, hi, cycle
## date (2): date, election
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Fix column name by removing the tab character
colnames(data)[2] <- "pct_estimate"
# Fix column name by removing the tab character
colnames(data)[2] <- "pct_estimate"
# Subset the columns in the dataset
columns <- c("candidate", "pct_estimate", "cycle")
subset_data <- data %>% select(all_of(columns))
# Rename columns with meaningful names
colnames(subset_data) <- c("candidate", "pct_estimate", "cycle")
# Preview the resulting subset of the dataset
subset_data
## # A tibble: 3,986 × 3
## candidate pct_estimate cycle
## <chr> <dbl> <dbl>
## 1 Democrats 43.9 2018
## 2 Republicans 39.5 2018
## 3 Democrats 43.7 2018
## 4 Republicans 39.6 2018
## 5 Democrats 43.7 2018
## 6 Republicans 39.6 2018
## 7 Democrats 43.7 2018
## 8 Republicans 39.6 2018
## 9 Democrats 43.4 2018
## 10 Republicans 38.8 2018
## # ℹ 3,976 more rows
subset_data <- subset_data %>% filter(candidate %in% c("Democrats", "Republicans"))
library(ggplot2)
# Plot the data
ggplot(subset_data, aes(x = cycle, y = pct_estimate, color = candidate, linetype = candidate)) +
geom_line() +
labs(x = "Cycle", y = "Percentage Estimate", title = "Public Opinion on Republicans vs Democrats") +
theme_minimal()
In this analysis, we have successfully subsetted the Generic Ballot Poll Data from the FiveThirtyEight website and transformed the column names to be more meaningful. The resulting dataset contains the columns “candidate”, “pct_estimate”, and “cycle”.
To extend this analysis, one could perform further exploratory data analysis on the dataset to uncover insights and trends in public opinion on the House of Representatives election. Additionally, one could compare this dataset with historical election results to validate the accuracy of the generic ballot polls.