Introduction

The generic_ballot_averages.csv dataset provides a comprehensive collection of generic ballot polling averages. This dataset offers insights into the preferences of voters in hypothetical elections, without specifying individual candidates, making it a valuable resource for analyzing and understanding voter sentiments towards different political parties in various election cycles or time periods

Article information: https://projects.fivethirtyeight.com/polls/generic-ballot/

# Loading the  necessary libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Import data through URL
data_url <- "https://projects.fivethirtyeight.com/polls/data/generic_ballot_averages.csv"

# Load the dataset from the URL
df <- read.csv(data_url)

# Display the first few rows of the dataset to get an overview
head(df)
##     candidate pct_estimate       lo       hi       date   election cycle
## 1   Democrats     43.94449 39.33347 48.55550 2017-04-15 2018-11-06  2018
## 2 Republicans     39.54969 34.93867 44.16071 2017-04-15 2018-11-06  2018
## 3   Democrats     43.74965 39.14054 48.35876 2017-04-16 2018-11-06  2018
## 4 Republicans     39.59254 34.98343 44.20165 2017-04-16 2018-11-06  2018
## 5   Democrats     43.74553 39.13599 48.35508 2017-04-17 2018-11-06  2018
## 6 Republicans     39.58794 34.97840 44.19749 2017-04-17 2018-11-06  2018

Creating Transformations

In this section, we clean and reshape the dataset. We select relevant columns, rename them for clarity, and replace non-intuitive abbreviations. Finally, we showcase the initial rows of the transformed dataset

# Load necessary libraries
library(dplyr)

# Load the dataset from the provided URL
data_url <- "https://projects.fivethirtyeight.com/polls/data/generic_ballot_averages.csv"
df <- read.csv(data_url)

# Create a subset of the columns
df_subset <- df %>%
  select(
    candidate,
    pct_estimate,
    lo,
    hi,
    date,
    election,
    cycle
  )

# Rename columns 
df_subset <- df_subset %>%
  rename(
    candidate_name = candidate,
    percentage_estimate = pct_estimate,
    lower_bound = lo,
    upper_bound = hi,
    polling_date = date,
    election_type = election,
    election_cycle = cycle
  )

# Replace any non-intuitive abbreviations if necessary
# For example, if "D" represents Democrats and "R" represents Republicans:
df_subset$candidate_name <- ifelse(df_subset$candidate_name == "D", "Democrats", 
                                    ifelse(df_subset$candidate_name == "R", "Republicans", df_subset$candidate_name))

# Display the first few rows of the transformed dataset
head(df_subset)
##   candidate_name percentage_estimate lower_bound upper_bound polling_date
## 1      Democrats            43.94449    39.33347    48.55550   2017-04-15
## 2    Republicans            39.54969    34.93867    44.16071   2017-04-15
## 3      Democrats            43.74965    39.14054    48.35876   2017-04-16
## 4    Republicans            39.59254    34.98343    44.20165   2017-04-16
## 5      Democrats            43.74553    39.13599    48.35508   2017-04-17
## 6    Republicans            39.58794    34.97840    44.19749   2017-04-17
##   election_type election_cycle
## 1    2018-11-06           2018
## 2    2018-11-06           2018
## 3    2018-11-06           2018
## 4    2018-11-06           2018
## 5    2018-11-06           2018
## 6    2018-11-06           2018

Conclusion:

Analyzing the dataset of generic ballot polling averages during the 2018 election cycle reveals intriguing insights into voter preferences. On average, the Democrats held a slight lead with an estimated percentage of approximately 43.9%, while the Republicans obtained around 39.5% support. To gain a deeper understanding of the dynamics at play, it is recommended to further investigate the factors influencing shifts in voter sentiment over time.