In this analysis, I am using the polls_2008 dataset from the dslabs package. This dataset contains polling data from the 2008 U.S. presidential election, including the margin of Obama’s lead over McCain as a percentage, recorded at different days leading up to the election.
For this visualization, I created a scatter plot to show the relationship between the number of days before the election (day) and the polling margin (margin). Each point represents a specific day’s polling data, with the margin showing how much Obama was leading by that particular day. A linear regression line has been added to the plot to visualize the trend over time, showing how Obama’s polling margin changed as the election day approached.
The x-axis represents the number of days before the election, and the y-axis represents Obama’s polling margin as a percentage. The plot uses a minimal theme to maintain clarity and focus on the data, with the title “Polling Margin in 2008 U.S. Presidential Election” placed at the top for context.
By examining this plot, we can see how Obama’s polling margin fluctuated and potentially identify any significant trends leading up to the election. I will provide two graphs.
##load necessary packages
library(dslabs)
Warning: package 'dslabs' was built under R version 4.4.3
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
Warning: package 'ggthemes' was built under R version 4.4.3
day margin
Min. :-155.00 Min. :-0.05000
1st Qu.:-111.50 1st Qu.: 0.02417
Median : -72.00 Median : 0.04500
Mean : -74.31 Mean : 0.04223
3rd Qu.: -35.00 3rd Qu.: 0.06083
Max. : -1.00 Max. : 0.12000
ggplot(polls_2008, aes(x = day, y = margin)) +geom_point(alpha =0.6, color ="blue") +# Plot individual points with some transparencygeom_smooth(method ="lm", se =FALSE, color ="green") +# Add a linear regression linescale_x_continuous(name ="Days Before Election") +# Label for the x-axisscale_y_continuous(name ="Polling Margin (%)") +# Label for the y-axisggtitle("Polling Margin in 2008 U.S. Presidential Election") +# Title of the plottheme_minimal() +# Use a clean, minimal themetheme(plot.title =element_text(hjust =0.5)) # Center the title
`geom_smooth()` using formula = 'y ~ x'
##First visualization interpretation
This graph shows the polling margin in the 2008 U.S. presidential election over the days leading up to the election. The x-axis represents the number of days before the election, and the y-axis shows the polling margin, which indicates the percentage difference in support between the two main candidates. The blue dots represent individual polling data points, and the green line is a linear regression showing the trend over time. The plot suggests that as the election day approached, the polling margin shifted in a certain direction. The smooth line helps us see the general trend of the data without focusing on every small fluctuation. The graph uses a minimal theme for a clean look and centers the title for better readability.
Interpretation of the graph:
X-axis (Days Before Election): This axis shows the number of days remaining until the election. As you move from left to right, the days get closer to the election date.
Y-axis (Polling Margin %): The y-axis represents the polling margin, which shows the difference in support between the two main candidates in percentage terms. Positive values would indicate one candidate leading, and negative values would suggest the other candidate was ahead.
Red Dots (Individual Data Points): Each red dot represents a polling result on a specific day leading up to the election. The transparency (alpha = 0.6) makes it easier to see overlapping points.
Black Line (Linear Regression): The black line shows the general trend of polling results over time. This line is calculated using linear regression, which helps us understand if the polling margin was consistently increasing or decreasing as the election day got closer.
Title: The title at the top of the graph (“Polling Margin in 2008 U.S. Presidential Election”) gives context to the graph and helps us know exactly what we are looking at.
Theme and Aesthetics: The minimal theme makes the graph look clean and simple. The title is centered to make it more readable and visually appealing.
This graph shows the polling margin in the 2008 U.S. presidential election as the election day got closer. The x-axis shows how many days were left before the election, and the y-axis shows the polling margin, which tells us the percentage difference in support between the two main candidates. The red dots are individual polling results, and the black line represents the overall trend of the data. The line is a straight line (linear regression), which helps us see if the margin was generally increasing or decreasing over time.
Creating a scatterplot with a trend line
ggplot(polls_2008, aes(x = date, y = margin)) +geom_line(color ="blue", size =1) +# Trend linegeom_point(color ="purple", alpha =0.6) +# Individual data pointstheme_minimal() +# Clean themelabs(title ="Polling Margin in 2008 Election", x ="Date", y ="Obama Lead (%)") +theme(plot.title =element_text(hjust =0.5))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Second Visualisation interpretation
In this visualization, I created a plot to show the polling margin for Obama in the 2008 election as the election date approached. Here’s how I interpret it:
X-axis (Date): This shows the number of days before Election Day (Nov 4, 2008). As the days get closer to the election, we can see how polling data changes.
Y-axis (Obama Lead %): This shows the margin or percentage by which Obama was leading in the polls. Positive values mean Obama was ahead, while negative values mean McCain was ahead.
Blue Line (Trend line): The line shows the overall trend of Obama’s polling lead as the election date gets closer. It helps us see if Obama’s lead was increasing or decreasing over time.
Purple Points (Data Points): Each red point represents a specific poll from a certain day, showing Obama’s lead on that exact day.
Title: The title “Polling Margin in 2008 Election” tells us what the plot is about.
Labels: The X-axis and Y-axis are clearly labeled to tell us what the data represents.
In a nutshell, this plot helps us understand how Obama’s lead in the polls changed over time leading up to the election. The trend line shows the general direction, and the points represent daily polling results.