Data Description

Project 538 provided over thirty-thousand rows of data from the 2016, 2020, 2024 presidential elections. Twenty-thousand rows related to specific states were excluded.

Key variables included:

Methods

Load assessment data

Load spending data

New names:
• `` -> `...73`
• `` -> `...74`
• `` -> `...75`
• `` -> `...76`
• `` -> `...77`
• `` -> `...78`
• `` -> `...79`
• `` -> `...80`
• `` -> `...81`
• `` -> `...82`
• `` -> `...83`
• `` -> `...84`
• `` -> `...85`
• `` -> `...86`
• `` -> `...87`
• `` -> `...88`
• `` -> `...89`
• `` -> `...90`
• `` -> `...91`
• `` -> `...92`
• `` -> `...93`
• `` -> `...94`
• `` -> `...95`
• `` -> `...96`
• `` -> `...97`
• `` -> `...98`
• `` -> `...99`
• `` -> `...100`
• `` -> `...101`
• `` -> `...102`
• `` -> `...103`
• `` -> `...104`
• `` -> `...105`
• `` -> `...106`
• `` -> `...107`
• `` -> `...108`
• `` -> `...109`
• `` -> `...110`
• `` -> `...111`
• `` -> `...112`
• `` -> `...113`
• `` -> `...114`
• `` -> `...115`
• `` -> `...116`
• `` -> `...117`
• `` -> `...118`
• `` -> `...119`
• `` -> `...120`
• `` -> `...121`
• `` -> `...122`
• `` -> `...123`
• `` -> `...124`
• `` -> `...125`
• `` -> `...126`
• `` -> `...127`
• `` -> `...128`
• `` -> `...129`
• `` -> `...130`
• `` -> `...131`
• `` -> `...132`
• `` -> `...133`
• `` -> `...134`
• `` -> `...135`
• `` -> `...136`
• `` -> `...137`
• `` -> `...138`
• `` -> `...139`
• `` -> `...140`
• `` -> `...141`
• `` -> `...142`
• `` -> `...143`
• `` -> `...144`
• `` -> `...145`
• `` -> `...146`
• `` -> `...147`
• `` -> `...148`
• `` -> `...149`
• `` -> `...150`
• `` -> `...151`
• `` -> `...152`
• `` -> `...153`
• `` -> `...154`
• `` -> `...155`
• `` -> `...156`
• `` -> `...157`
• `` -> `...158`
• `` -> `...159`
• `` -> `...160`
• `` -> `...161`
• `` -> `...162`
• `` -> `...163`
• `` -> `...164`
• `` -> `...165`
• `` -> `...166`
• `` -> `...167`
• `` -> `...168`
• `` -> `...169`
• `` -> `...170`
• `` -> `...171`
• `` -> `...172`
• `` -> `...173`
• `` -> `...174`
• `` -> `...175`
• `` -> `...176`
• `` -> `...177`
• `` -> `...178`
• `` -> `...179`
• `` -> `...180`
• `` -> `...181`
• `` -> `...182`
• `` -> `...183`
• `` -> `...184`

Load demographic data

Warning: One or more parsing issues, call `problems()` on your data frame
for details, e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 62 Columns: 5
── Column specification ────────────────────────────────────────────
Delimiter: ","
chr (3): County, FIPS, Rank within US (of 3143 counties)
dbl (2): Value (Percent), People (Unemployed)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Adding in new data

Warning: One or more parsing issues, call `problems()` on your data frame
for details, e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 64 Columns: 5
── Column specification ────────────────────────────────────────────
Delimiter: ","
chr (2): County, Rank within US (of 3143 counties)
dbl (3): FIPS, Value (Percent), People (Education: Less Than 9th...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame
for details, e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 64 Columns: 5
── Column specification ────────────────────────────────────────────
Delimiter: ","
chr (2): County, Rank within US (of 3143 counties)
dbl (3): FIPS, Value (Percent), People(Education: Less Than High...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame
for details, e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 64 Columns: 5
── Column specification ────────────────────────────────────────────
Delimiter: ","
chr (2): County, Rank within US (of 3143 counties)
dbl (3): FIPS, Value (Percent), People (Education: At Least Bach...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame
for details, e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 65 Columns: 4
── Column specification ────────────────────────────────────────────
Delimiter: ","
chr (2): County, Rank within US (of 3142 counties)
dbl (1): FIPS
num (1): Value (Dollars)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame
for details, e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 65 Columns: 4
── Column specification ────────────────────────────────────────────
Delimiter: ","
chr (2): County, Rank within US (of 3142 counties)
dbl (1): FIPS
num (1): Value (Dollars)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame
for details, e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 64 Columns: 4
── Column specification ────────────────────────────────────────────
Delimiter: ","
chr (2): County, Rank within US (of 3135 counties)
dbl (1): FIPS
num (1): Value (Dollars)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Joined data

Correlations

Create test/training data

Linear Regression Model


Call:
lm(formula = proficiency ~ enroll + tlocrev + ppcstot + unemployed + 
    at_least_bachelor_education + household_income, data = t_train)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.7360 -2.9337 -0.2146  2.3756  8.5571 

Coefficients:
                               Estimate  Std. Error t value Pr(>|t|)   
(Intercept)                  8.59405726 10.62009737   0.809  0.42418   
enroll                       0.00045645  0.00052416   0.871  0.39014   
tlocrev                     -0.00001838  0.00008596  -0.214  0.83201   
ppcstot                      0.00070458  0.00044490   1.584  0.12281   
unemployed                  -0.19554527  0.27793175  -0.704  0.48663   
at_least_bachelor_education  0.39826973  0.13013574   3.060  0.00437 **
household_income             0.00007607  0.00012306   0.618  0.54074   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.912 on 33 degrees of freedom
Multiple R-squared:  0.5629,    Adjusted R-squared:  0.4834 
F-statistic: 7.082 on 6 and 33 DF,  p-value: 0.00006584

Root Mean Squared Error (RMSE): 4.939345 
R-squared: 0.3152698 
Error in usmap::map_with_data(data, values = values, include = include,  : 
  `data` must be a data.frame containing either a `state` or `fips` column.

pca

k-means

Decision Tree

Cross-fold validation

Bootstrapping

We don’t have equal number of spam and non-spam messages. We can use bootstrapping to create a more balanced dataset.

Note that we re-used the function from earlier to return a t_train and a t_test using our bootstrapped sample.

