An Introduction to Florida State Senate District 37

This Florida State Senate District has become a hotbed of controversy following the 2020 elections. Why is that you ask?

Because of stories like these

This leads us to the issues of ghost candidates…

Ghost candidates are candidates who have chosen to compete in an election – not with the intent of actually winning their respective election – but instead with the intent to disrupt the vote either in favor of, or against, another candidate. In this case, the ghost candidate, Alex Rodriguez, ran an NPA campaign funded by Republican operatives in an attempt to siphon votes from the popular incumbent Democrat, Jose Javier Rodriguez, who bears the same last name.

The margin of victory for the Republican candidate, Ileana Garcia, was just 32 votes in an election where over 200,000 votes were cast. What is even more peculiar about this case is the fact that the ghost candidates managed to garner over 6,000 votes. This begs one to ask the question: Where did support for Alex Rodriguez come from?

To get a baseline read on an independent candidate’s performance, I will also be introducing you to Senate District 39

Review of the extent literature

Candidates with the same name

In this analysis we wil be using the Florida Voter File alongside a file containing the results for the 2020 elections in these two senate districts, aggregated by precinct.

Loading Data

load("~/Downloads/precinct-results-spread.RData")

###### Helper functions ######

## Function to make age correct from birth year
age = function(from, to) {
  from_lt = as.POSIXlt(from)
  to_lt = as.POSIXlt(to)
  
  age = to_lt$year - from_lt$year
  
  ifelse(to_lt$mon < from_lt$mon |
           (to_lt$mon == from_lt$mon & to_lt$mday < from_lt$mday),
         age - 1, age)
}

## Function to read in txt files from a folder and bind together 
load_voter_file = function(files2read, colnames2use) {
  for (file2read in files2read) {
    dta <- read_delim(file = file2read,
                      col_types = paste(rep("c", times = length(colnames2use)), collapse = ""),
                      delim = "\t",
                      escape_double = FALSE,
                      col_names = colnames2use)
    if (match(file2read, files2read) == 1) {
      combined_data <- dta
    }
    else {
      combined_data <- bind_rows(combined_data, dta)
      rm(dta)
    }
  }
  return(combined_data)
}


##########################################
# Read Florida 2021 Recap file #
##########################################


# use it to obtain voters needing assistance and vote history 2016 GE

colnames2userecap <- c("CountyCode", "VoterID", "NameLast", "NameSuffix", "NameFirst", 
                       "NameMiddle","RequestedPublicRecordsExemption","ResidenceAddressLine1",
                       "ResidenceAddressLine2", "ResidenceCity", "ResidenceState", "ResidenceZipcode",
                       "MailingAddressLine1", "MailingAddressLine2", "MailingAddresLine3", 
                       "MailingCity", "MailingState", "MailingZipcode", "MailingCountry",
                       "Gender", "Race", "BirthDate","RegistrationDate", "PartyAffiliation",
                       "Precinct", "PrecinctGroup", "PrecinctSplit", "PrecinctSuffix",
                       "VoterStatus","CongressionalDistrict", "HouseDistrict", "SenateDistrict",
                       "CountyCommissionDistrict", "SchoolBoardDistrict","DaytimeAreaCode", 
                       "DaytimePhoneNumber", "DaytimePhoneExtension", "FormerName",
                       "VotingAssistance", "PollWorker", "Birthplace", "Military", "MilitaryDependent",
                       "Overseas", "VoteHistoryCode", "EmailAddress")

files2readrecap <- Sys.glob(paths = "/Users/Andy/Downloads/10866_DETAIL_DAD.txt")

DADERecap <- load_voter_file(files2readrecap, colnames2userecap)

Cutting down recap file removing certain unnecessary variables

keeps <- c("CountyCode", "VoterID", 
           "ResidenceAddressLine1", "ResidenceCity", "ResidenceZipcode",  "Gender", "Race", "BirthDate","RegistrationDate", "PartyAffiliation",
           "Precinct", "PrecinctGroup", "PrecinctSplit", "PrecinctSuffix",
           "VoterStatus","CongressionalDistrict", "HouseDistrict", "SenateDistrict",
           "VotingAssistance", "VoteHistoryCode", "Birthplace", "Military", "MilitaryDependent",
           "Overseas", "EmailAddress")

#cut down recap file
DADERecap <- DADERecap %>% 
  select(all_of(keeps))

Adjusting VoterID, Party Affiliation, Race, Registration and Birth Date Columns

# create new variables
DADERecap <- DADERecap %>% mutate(VoterID = as.numeric(VoterID))

DADERecap <- DADERecap %>%
  
  mutate(
    # Limit parties to dem, rep, npa, other
    PartyAffiliation = case_when(
      tolower(PartyAffiliation) == "dem" ~ "DEM",
      tolower(PartyAffiliation) == "rep" ~ "REP",
      tolower(PartyAffiliation) == "npa" ~ "NPA",
      TRUE ~ "Other"),
    
    # Edit race column
    Race = case_when(
      Race == 3 ~ 'Black',
      Race == 4 ~ 'Hispanic',
      Race == 5 ~ 'White',
      TRUE ~ 'Other'),
    
    # Adjust date fields
    BirthDate = as.Date(BirthDate, "%m/%d/%Y"),
    RegistrationDate = as.Date(RegistrationDate, "%m/%d/%Y"),
    
    # Reformat registration date, separated by "-"
    reg.date = ymd(RegistrationDate),
    
    # Make voter ID a numeric column
    VoterID = as.numeric(VoterID)) %>%
  
  separate(reg.date, sep="-", into = c("reg.year", "reg.month", "reg.day"),
           
  )

Adjusting Age Columns

## need to change birthdate to align with Nov 3 2020 election
DADERecap$Age <- age(DADERecap$BirthDate,"2020-11-03")


DADERecap <- DADERecap %>%
  mutate(
    Agecat = case_when(
      Age >= 18 & Age < 30 ~ "18-29",
      Age >= 30 & Age < 45 ~ "30-44",
      Age >= 45 & Age < 65 ~ "45-64",
      Age >= 65 & Age < 106 ~ "65-105",
      TRUE ~ "Other"
    ),
    
    Agecat2 = case_when(
      Age >= 18 & Age < 24 ~ "18-23",
      Age >= 24 & Age < 30 ~ "24-29", 
      Age >= 30 & Age < 45 ~ "30-44", 
      Age >= 45 & Age < 65 ~ "45-64",
      Age >= 65 & Age < 106 ~ "65-105", 
      TRUE ~ "Other"
    )
  )
as_tibble(DADERecap)

## # A tibble: 1,602,095 × 31
##    CountyCode   VoterID ResidenceAddressL… ResidenceCity ResidenceZipcode Gender
##    <chr>          <dbl> <chr>              <chr>         <chr>            <chr> 
##  1 DAD        100012467 "421NW 150Th St "  Miami         33168            F     
##  2 DAD        100021401 "35SW 21St AVE "   Miami         33135            F     
##  3 DAD        100024317 "35303SW 180Th AV… Florida City  33034            F     
##  4 DAD        100024558 "3929NE 171St St " N Miami Beach 33160            F     
##  5 DAD        100036452 "1840NE 124Th St " North Miami   33181            F     
##  6 DAD        100036893 "1 Andalusia Ave " Coral Gables  33134            F     
##  7 DAD        100056221 "906 Escobar AVE " Coral Gables  33134            F     
##  8 DAD        100056222 "906 Escobar AVE " Coral Gables  33134            M     
##  9 DAD        100056328 "303 East Ridge V… Cutler Bay    33157            F     
## 10 DAD        100064671 "13900SW 108Th AV… Miami         331766553        F     
## # … with 1,602,085 more rows, and 25 more variables: Race <chr>,
## #   BirthDate <date>, RegistrationDate <date>, PartyAffiliation <chr>,
## #   Precinct <chr>, PrecinctGroup <chr>, PrecinctSplit <chr>,
## #   PrecinctSuffix <chr>, VoterStatus <chr>, CongressionalDistrict <chr>,
## #   HouseDistrict <chr>, SenateDistrict <chr>, VotingAssistance <chr>,
## #   VoteHistoryCode <chr>, Birthplace <chr>, Military <chr>,
## #   MilitaryDependent <chr>, Overseas <chr>, EmailAddress <chr>, …

Separating voter recap file to keep only those who voted in the analysis

#Creating variable column for people whose votes were counted
DADERecap$Voted <- with(DADERecap, ifelse(VoteHistoryCode == 'Y' | VoteHistoryCode == 'A' | VoteHistoryCode == 'E', 1, 0))

#Cutting the recap file to just voters
DADEVotersRecap <- filter(DADERecap, DADERecap$Voted == 1)

Creating variable dummy columns for possible independent variables

#####Creating multiple variable columns#####
#Creating variable column for early voters
DADEVotersRecap$Early <- with(DADEVotersRecap, ifelse(VoteHistoryCode == 'E', 1, 0))

#Creating variable column for mail voters
DADEVotersRecap$VBM <- with(DADEVotersRecap, ifelse(VoteHistoryCode == 'A', 1, 0))

#Creating variable column for in person voters
DADEVotersRecap$InPerson <- with(DADEVotersRecap, ifelse(VoteHistoryCode == 'Y', 1, 0))

#Creating variable column for 18-29 y/o's
DADEVotersRecap$Age_18_29 <- with(DADEVotersRecap, ifelse(Agecat == '18-29', 1, 0))

#Creating variable column for 30-44 y/o's
DADEVotersRecap$Age_30_44 <- with(DADEVotersRecap, ifelse(Agecat == '30-44', 1, 0))

#Creating variable column for 45-64 y/o's
DADEVotersRecap$Age_45_64 <- with(DADEVotersRecap, ifelse(Agecat == '45-64', 1, 0))

#Creating variable column for 65-105 y/o's
DADEVotersRecap$Age_65_105 <- with(DADEVotersRecap, ifelse(Agecat == '65-105', 1, 0))

#Creating variable column for Hispanics in general
DADEVotersRecap$Hispanic <- with(DADEVotersRecap, ifelse(Race == 'Hispanic', 1, 0))

#Creating variable column for Black in general
DADEVotersRecap$Black <- with(DADEVotersRecap, ifelse(Race == 'Black', 1, 0))

#Creating variable column for Whites in general
DADEVotersRecap$White <- with(DADEVotersRecap, ifelse(Race == 'White', 1, 0))

#Creating variable column for non-Hispanics in general
DADEVotersRecap$NonHispanic <- with(DADEVotersRecap, ifelse(Race != 'Hispanic', 1, 0))

#Creating variable column for Independents in general
DADEVotersRecap$NPA <- with(DADEVotersRecap, ifelse(PartyAffiliation == 'NPA', 1, 0))

#Creating variable column for Dems in general
DADEVotersRecap$DEM <- with(DADEVotersRecap, ifelse(PartyAffiliation == 'DEM', 1, 0))

#Creating variable column for Republicans in general
DADEVotersRecap$REP <- with(DADEVotersRecap, ifelse(PartyAffiliation == 'REP', 1, 0))

#Creating variable column for young dems
DADEVotersRecap$YoungDems <- with(DADEVotersRecap, ifelse(Age_18_29 == '1' & DEM == '1', 1, 0))

#Creating variable column for young independents
DADEVotersRecap$YoungNPA <- with(DADEVotersRecap, ifelse(Age_18_29 == '1' & NPA == '1', 1, 0))

#Creating variable column for young hispanics
DADEVotersRecap$YoungHispanic <- with(DADEVotersRecap, ifelse(Age_18_29 == '1' & Hispanic == '1', 1, 0))

#Creating variable column for young VBM voters
DADEVotersRecap$YoungVBM <- with(DADEVotersRecap, ifelse(Age_18_29 == '1' & VBM == '1', 1, 0))

#Creating variable column for young dems who VBM
DADEVotersRecap$YoungDemsVBM <- with(DADEVotersRecap, ifelse(Age_18_29 == '1' & DEM == '1' & VBM == '1', 1, 0))

#Creating variable column for independent hispanics
DADEVotersRecap$NPAHispanic <- with(DADEVotersRecap, ifelse(Hispanic == '1' & NPA == '1', 1, 0))

#Creating variable column for independents who VBM
DADEVotersRecap$NPAVBM <- with(DADEVotersRecap, ifelse(VBM == '1' & NPA == '1', 1, 0))

#Creating variable column for hispanics who VBM
DADEVotersRecap$VBMHispanic <- with(DADEVotersRecap, ifelse(Hispanic == '1' & VBM == '1', 1, 0))

#Creating variable column for dems who VBM
DADEVotersRecap$VBMDems <- with(DADEVotersRecap, ifelse(VBM == '1' & DEM == '1', 1, 0))

#Creating variable column for REM voters
DADEVotersRecap$REM <- with(DADEVotersRecap, ifelse(VoterStatus == 'REM', 1, 0))

#Creating variable column for other race voters
DADEVotersRecap$OtherRace <- with(DADEVotersRecap, ifelse(Race == 'Other', 1, 0))

#Creating variable column for other party voters
DADEVotersRecap$OtherParty <- with(DADEVotersRecap, ifelse(PartyAffiliation == 'Other', 1, 0))

#Creating variable column for Hispanic Dems
DADEVotersRecap$HispanicDem <- with(DADEVotersRecap, ifelse(Hispanic == '1' & DEM == '1', 1, 0))

#Creating variable column for Hispanic Reps
DADEVotersRecap$HispanicRep <- with(DADEVotersRecap, ifelse(Hispanic == '1' & REP == '1', 1, 0))

#Creating variable column for Hispanic Others
DADEVotersRecap$HispanicOtherParty <- with(DADEVotersRecap, ifelse(Hispanic == '1' & OtherParty == '1', 1, 0))

#Creating variable column for Black Dems
DADEVotersRecap$BlackDem <- with(DADEVotersRecap, ifelse(Black == '1' & DEM == '1', 1, 0))

#Creating variable column for Black Reps
DADEVotersRecap$BlackRep <- with(DADEVotersRecap, ifelse(Black == '1' & REP == '1', 1, 0))

#Creating variable column for Black Others
DADEVotersRecap$BlackOtherParty <- with(DADEVotersRecap, ifelse(Black == '1' & OtherParty == '1', 1, 0))

#Creating variable column for Black NPA
DADEVotersRecap$BlackNPA <- with(DADEVotersRecap, ifelse(Black == '1' & NPA == '1', 1, 0))

#Creating variable column for White Dems
DADEVotersRecap$WhiteDem <- with(DADEVotersRecap, ifelse(White == '1' & DEM == '1', 1, 0))

#Creating variable column for White Reps
DADEVotersRecap$WhiteRep <- with(DADEVotersRecap, ifelse(White == '1' & REP == '1', 1, 0))

#Creating variable column for White Others
DADEVotersRecap$WhiteOtherParty <- with(DADEVotersRecap, ifelse(White == '1' & OtherParty == '1', 1, 0))

#Creating variable column for White NPA
DADEVotersRecap$WhiteNPA <- with(DADEVotersRecap, ifelse(White == '1' & NPA == '1', 1, 0))

#Creating variable column for Other Race Dems
DADEVotersRecap$OtherRaceDem <- with(DADEVotersRecap, ifelse(OtherRace == '1' & DEM == '1', 1, 0))

#Creating variable column for Other Race Reps
DADEVotersRecap$OtherRaceRep <- with(DADEVotersRecap, ifelse(OtherRace == '1' & REP == '1', 1, 0))

#Creating variable column for Other Race Others
DADEVotersRecap$OtherRaceOtherParty <- with(DADEVotersRecap, ifelse(OtherRace == '1' & OtherParty == '1', 1, 0))

#Creating variable column for Other Race NPA
DADEVotersRecap$OtherRaceNPA <- with(DADEVotersRecap, ifelse(OtherRace == '1' & NPA == '1', 1, 0))

Creating county_precinct variable to use as key variable to merge later

#Creating county precinct identifier
DADEVotersRecap$county_precinct <- paste(DADEVotersRecap$CountyCode, DADEVotersRecap$PrecinctSplit, sep = "_")

Adjusting precinct results file so we can be able to merge

#Adjusting Precinct Results Spread to merge
DadePrecinctResults <- filter(precinctresultsspread, precinctresultsspread$CountyCode == "DAD")
DadePrecinctResults$county_precinct <- paste0(
  substr(DadePrecinctResults$county_precinct, 1, 7),
  ".",
  substr(DadePrecinctResults$county_precinct, 8, nchar(DadePrecinctResults$county_precinct))
)
DadePrecinctResults$FLSS_DEM <- as.numeric(DadePrecinctResults$FLSS_DEM)
DadePrecinctResults$FLSS_REP <- as.numeric(DadePrecinctResults$FLSS_REP)
DadePrecinctResults$FLSS_NPA <- as.numeric(DadePrecinctResults$FLSS_NPA)
DadePrecinctTest <- DadePrecinctResults %>%
  group_by(county_precinct) %>% 
  summarise(FLSS_DEM = sum(FLSS_DEM,na.rm=T), FLSS_REP = sum(FLSS_REP,na.rm=T), FLSS_NPA = sum(FLSS_NPA,na.rm=T))
as_tibble(DadePrecinctTest)

## # A tibble: 866 × 4
##    county_precinct FLSS_DEM FLSS_REP FLSS_NPA
##    <chr>              <dbl>    <dbl>    <dbl>
##  1 DAD_001.0              0        0        0
##  2 DAD_002.0              0        0        0
##  3 DAD_003.0              0        0        0
##  4 DAD_004.0              0        0        0
##  5 DAD_005.0              0        0        0
##  6 DAD_006.0              0        0        0
##  7 DAD_007.0              0        0        0
##  8 DAD_008.0              0        0        0
##  9 DAD_009.0              0        0        0
## 10 DAD_010.0              0        0        0
## # … with 856 more rows

Aggregating Voter File data by precinct and senate district to step away from the individual level analysis it’s at now

#Adjusting Voter File Summary Recap to Aggregate by Precinct
#Adjusting Voter File Summary Recap to Aggregate by Precinct
DADEVotersRecap_mergefile <- DADEVotersRecap
DADEVotersRecap_mergefile <- DADEVotersRecap_mergefile %>%
  group_by(county_precinct, SenateDistrict) %>% 
  summarise(Hispanic = sum(Hispanic,na.rm=T), Black = sum(Black,na.rm=T), NonHispanic = sum(NonHispanic,na.rm=T), Voted = sum(Voted,na.rm=T), NPA = sum(NPA,na.rm=T), DEM = sum(DEM,na.rm=T), REP = sum(REP,na.rm=T), Cuban = sum(Cuban,na.rm=T), NonCubanHispanic = sum(NonCubanHispanic,na.rm=T), PuertoRican = sum(PuertoRican,na.rm=T), Venezuelan = sum(Venezuelan,na.rm=T), Colombian = sum(Dominican,na.rm=T), InPerson = sum(InPerson,na.rm=T), Early = sum(Early,na.rm=T), VBM = sum(VBM,na.rm=T), Voted = sum(Voted,na.rm=T), Age_18_29 = sum(Age_18_29,na.rm=T), Age_30_44 = sum(Age_30_44,na.rm=T), Age_45_64 = sum(Age_45_64,na.rm=T), Age_65_105 = sum(Age_65_105,na.rm=T), YoungDems = sum(YoungDems,na.rm=T), YoungNPA = sum(YoungNPA,na.rm=T), YoungHispanic = sum(YoungHispanic,na.rm=T), YoungVBM = sum(YoungVBM,na.rm=T), YoungDemsVBM = sum(YoungDemsVBM,na.rm=T), NPAHispanic = sum(NPAHispanic,na.rm=T), NPAVBM = sum(NPAVBM,na.rm=T), White = sum(White,na.rm=T), OtherRace = sum(OtherRace,na.rm=T), OtherParty = sum(OtherParty,na.rm=T), HispanicDem = sum(HispanicDem,na.rm=T), HispanicRep = sum(HispanicRep,na.rm=T), HispanicOtherParty = sum(HispanicOtherParty,na.rm=T), BlackNPA = sum(BlackNPA,na.rm=T), BlackOtherParty = sum(BlackOtherParty,na.rm=T), BlackRep = sum(BlackRep,na.rm=T), BlackDem = sum(BlackDem,na.rm=T), WhiteNPA = sum(WhiteNPA,na.rm=T), WhiteOtherParty = sum(WhiteOtherParty,na.rm=T), WhiteRep = sum(WhiteRep,na.rm=T), WhiteDem = sum(WhiteDem,na.rm=T), OtherRaceNPA = sum(OtherRaceNPA,na.rm=T), OtherRaceDem = sum(OtherRaceDem,na.rm=T), OtherRaceRep = sum(OtherRaceRep,na.rm=T), OtherRaceOtherParty = sum(OtherRaceOtherParty,na.rm=T), MedianAge = median(Age,na.rm=T))

## `summarise()` has grouped output by 'county_precinct'. You can override using the `.groups` argument.

as_tibble(DADEVotersRecap_mergefile)

## # A tibble: 825 × 47
##    county_precinct SenateDistrict Hispanic Black NonHispanic Voted   NPA   DEM
##    <chr>           <chr>             <dbl> <dbl>       <dbl> <dbl> <dbl> <dbl>
##  1 DAD_*           *                  1567   782        1902  3469   699  1353
##  2 DAD_001.0       38                  156     2         448   604   220   168
##  3 DAD_002.0       38                  479    10         699  1178   405   365
##  4 DAD_003.0       38                  488    21        1509  1997   684   592
##  5 DAD_004.0       38                  785    77        1016  1801   683   561
##  6 DAD_005.0       38                  815    62        1548  2363   920   677
##  7 DAD_006.0       38                  325    22        1406  1731   605   448
##  8 DAD_007.0       38                  626    31        1130  1756   558   752
##  9 DAD_008.0       38                  418    30         635  1053   368   447
## 10 DAD_009.0       38                 1133    49        1925  3058  1064  1076
## # … with 815 more rows, and 39 more variables: REP <dbl>, Cuban <dbl>,
## #   NonCubanHispanic <dbl>, PuertoRican <dbl>, Venezuelan <dbl>,
## #   Colombian <dbl>, InPerson <dbl>, Early <dbl>, VBM <dbl>, Age_18_29 <dbl>,
## #   Age_30_44 <dbl>, Age_45_64 <dbl>, Age_65_105 <dbl>, YoungDems <dbl>,
## #   YoungNPA <dbl>, YoungHispanic <dbl>, YoungVBM <dbl>, YoungDemsVBM <dbl>,
## #   NPAHispanic <dbl>, NPAVBM <dbl>, White <dbl>, OtherRace <dbl>,
## #   OtherParty <dbl>, HispanicDem <dbl>, HispanicRep <dbl>, …

Finally merging the two files together

#Merging the two files together
DADE_Merged_File <- DADEVotersRecap_mergefile %>%
  left_join(DadePrecinctTest, by = "county_precinct")

Checking our final merged file to see if there’s any duplicates that need to be “handled”

#Checking for duplicate precincts in the final merge file
length(unique(DADE_Merged_File$county_precinct)) == nrow(DADE_Merged_File)

## [1] FALSE

Duplicate_Test <- data.frame(table(DADE_Merged_File$county_precinct))
Duplicate_Test[Duplicate_Test$Freq > 1,]

##          Var1 Freq
## 414 DAD_505.0    2
## 502 DAD_593.0    2

#We find duplicates in Precinct 505.0 and 593.0

#Cutting out those two unluckly folks who appear as being in the wrong SD because they didn't vote in the SD elections anyway so their incluson/exclusion is irrelevant to our analysis
DADE_Merged_File <- DADE_Merged_File[-c(415, 503), ]

##Checking for duplicate precincts in the final merge file
length(unique(DADE_Merged_File$county_precinct)) == nrow(DADE_Merged_File)

## [1] TRUE

Duplicate_Test <- data.frame(table(DADE_Merged_File$county_precinct))
Duplicate_Test[Duplicate_Test$Freq > 1,]

## [1] Var1 Freq
## <0 rows> (or 0-length row.names)

#No more duplicates!!! Woohoo!

Separate Senate Districts 37 & 39 to analyze how the NPAs performed in each

#Separating between SD37 and SD39
Merged_SD37 <- filter(DADE_Merged_File, SenateDistrict == "37")
Merged_SD39 <- filter(DADE_Merged_File, SenateDistrict == "39")
as_tibble(Merged_SD37)

## # A tibble: 171 × 50
##    county_precinct SenateDistrict Hispanic Black NonHispanic Voted   NPA   DEM
##    <chr>           <chr>             <dbl> <dbl>       <dbl> <dbl> <dbl> <dbl>
##  1 DAD_047.0       37                   25     4         294   319   106    96
##  2 DAD_051.0       37                 3607    19        2963  6570  2500  1921
##  3 DAD_052.0       37                    8     0          10    18     3    13
##  4 DAD_281.1       37                   61     5           9    70    15    28
##  5 DAD_284.0       37                  822    23         120   942   266   375
##  6 DAD_285.0       37                  123     0           9   132    37    42
##  7 DAD_288.0       37                    4     0           1     5     2     1
##  8 DAD_289.0       37                   36     1          12    48    13    12
##  9 DAD_345.1       37                    2     0           0     2     1     0
## 10 DAD_426.0       37                 2668    54         593  3261   842   845
## # … with 161 more rows, and 42 more variables: REP <dbl>, Cuban <dbl>,
## #   NonCubanHispanic <dbl>, PuertoRican <dbl>, Venezuelan <dbl>,
## #   Colombian <dbl>, InPerson <dbl>, Early <dbl>, VBM <dbl>, Age_18_29 <dbl>,
## #   Age_30_44 <dbl>, Age_45_64 <dbl>, Age_65_105 <dbl>, YoungDems <dbl>,
## #   YoungNPA <dbl>, YoungHispanic <dbl>, YoungVBM <dbl>, YoungDemsVBM <dbl>,
## #   NPAHispanic <dbl>, NPAVBM <dbl>, White <dbl>, OtherRace <dbl>,
## #   OtherParty <dbl>, HispanicDem <dbl>, HispanicRep <dbl>, …

as_tibble(Merged_SD39)

## # A tibble: 152 × 50
##    county_precinct SenateDistrict Hispanic Black NonHispanic Voted   NPA   DEM
##    <chr>           <chr>             <dbl> <dbl>       <dbl> <dbl> <dbl> <dbl>
##  1 DAD_167.0       39                  349    89         191   540   166   229
##  2 DAD_168.0       39                 1775   517         861  2636   845  1176
##  3 DAD_330.0       39                 2146    12         230  2376   678   553
##  4 DAD_353.0       39                 1795    10         241  2036   697   456
##  5 DAD_367.0       39                 1910    16         204  2114   757   652
##  6 DAD_376.0       39                 2921    12         367  3288  1013   644
##  7 DAD_396.0       39                 1665    13         290  1955   622   555
##  8 DAD_398.0       39                  746    19         146   892   295   232
##  9 DAD_399.0       39                  744     5         124   868   310   194
## 10 DAD_402.0       39                 2538     7         291  2829   778   515
## # … with 142 more rows, and 42 more variables: REP <dbl>, Cuban <dbl>,
## #   NonCubanHispanic <dbl>, PuertoRican <dbl>, Venezuelan <dbl>,
## #   Colombian <dbl>, InPerson <dbl>, Early <dbl>, VBM <dbl>, Age_18_29 <dbl>,
## #   Age_30_44 <dbl>, Age_45_64 <dbl>, Age_65_105 <dbl>, YoungDems <dbl>,
## #   YoungNPA <dbl>, YoungHispanic <dbl>, YoungVBM <dbl>, YoungDemsVBM <dbl>,
## #   NPAHispanic <dbl>, NPAVBM <dbl>, White <dbl>, OtherRace <dbl>,
## #   OtherParty <dbl>, HispanicDem <dbl>, HispanicRep <dbl>, …

Changing our variable columns from totals of each variable to percentage of each variable relative to the voting population

#Adjusting SD37 merged file to get normalized results for precinct statistics
Merged_SD37$PercentHispanic <- (Merged_SD37[, 3] / Merged_SD37[[6]])*100
Merged_SD37$PercentBlack <- (Merged_SD37[, 4] / Merged_SD37[[6]])*100
Merged_SD37$PercentWhite <- (Merged_SD37[, 29] / Merged_SD37[[6]])*100
Merged_SD37$PercentOtherRace <- (Merged_SD37[, 30] / Merged_SD37[[6]])*100
Merged_SD37$PercentNonHispanic <- (Merged_SD37[, 5] / Merged_SD37[[6]])*100
Merged_SD37$PercentNPA <- (Merged_SD37[, 7] / Merged_SD37[[6]])*100
Merged_SD37$PercentDEM <- (Merged_SD37[, 8] / Merged_SD37[[6]])*100
Merged_SD37$PercentREP <- (Merged_SD37[, 9] / Merged_SD37[[6]])*100
Merged_SD37$PercentCuban <- (Merged_SD37[, 10] / Merged_SD37[[6]])*100
Merged_SD37$PercentNonCubanHispanic <- (Merged_SD37[, 11] / Merged_SD37[[6]])*100
Merged_SD37$PercentInPerson <- (Merged_SD37[, 15] / Merged_SD37[[6]])*100
Merged_SD37$PercentEarly <- (Merged_SD37[, 16] / Merged_SD37[[6]])*100
Merged_SD37$PercentVBM <- (Merged_SD37[, 17] / Merged_SD37[[6]])*100
Merged_SD37$Percent18_29 <- (Merged_SD37[, 18] / Merged_SD37[[6]])*100
Merged_SD37$Percent30_44 <- (Merged_SD37[, 19] / Merged_SD37[[6]])*100
Merged_SD37$Percent45_64 <- (Merged_SD37[, 20] / Merged_SD37[[6]])*100
Merged_SD37$Percent65_105 <- (Merged_SD37[, 21] / Merged_SD37[[6]])*100
Merged_SD37$PercentYoungDems <- (Merged_SD37[, 22] / Merged_SD37[[6]])*100
Merged_SD37$PercentYoungNPA <- (Merged_SD37[, 23] / Merged_SD37[[6]])*100
Merged_SD37$PercentYoungHispanic <- (Merged_SD37[, 24] / Merged_SD37[[6]])*100
Merged_SD37$PercentYoungVBM <- (Merged_SD37[, 25] / Merged_SD37[[6]])*100
Merged_SD37$PercentYoungDemsVBM <- (Merged_SD37[, 26] / Merged_SD37[[6]])*100
Merged_SD37$PercentNPAHispanic <- (Merged_SD37[, 27] / Merged_SD37[[6]])*100
Merged_SD37$PercentWhiteNPA <- (Merged_SD37[, 39] / Merged_SD37[[6]])*100
Merged_SD37$PercentBlackNPA <- (Merged_SD37[, 35] / Merged_SD37[[6]])*100
Merged_SD37$PercentOtherRaceNPA <- (Merged_SD37[, 43] / Merged_SD37[[6]])*100
Merged_SD37$PercentHispanicDem <- (Merged_SD37[, 32] / Merged_SD37[[6]])*100
Merged_SD37$PercentWhiteDem <- (Merged_SD37[, 42] / Merged_SD37[[6]])*100
Merged_SD37$PercentBlackDem <- (Merged_SD37[, 38] / Merged_SD37[[6]])*100
Merged_SD37$PercentOtherRaceDem <- (Merged_SD37[, 44] / Merged_SD37[[6]])*100
Merged_SD37$PercentHispanicRep <- (Merged_SD37[, 33] / Merged_SD37[[6]])*100
Merged_SD37$PercentWhiteRep <- (Merged_SD37[, 41] / Merged_SD37[[6]])*100
Merged_SD37$PercentBlackRep <- (Merged_SD37[, 37] / Merged_SD37[[6]])*100
Merged_SD37$PercentOtherRaceRep <- (Merged_SD37[, 45] / Merged_SD37[[6]])*100
Merged_SD37$PercentHispanicOtherParty <- (Merged_SD37[, 34] / Merged_SD37[[6]])*100
Merged_SD37$PercentWhiteOtherParty <- (Merged_SD37[, 40] / Merged_SD37[[6]])*100
Merged_SD37$PercentBlackOtherParty <- (Merged_SD37[, 36] / Merged_SD37[[6]])*100
Merged_SD37$PercentOtherRaceOtherParty <- (Merged_SD37[, 46] / Merged_SD37[[6]])*100
Merged_SD37$PercentNPAVBM <- (Merged_SD37[, 28] / Merged_SD37[[6]])*100
Merged_SD37$PercentNPAVote <- (Merged_SD37[, 50] / (Merged_SD37[[50]] + Merged_SD37[[48]] + Merged_SD37[[49]]))*100
Merged_SD37$PercentDEMVote <- (Merged_SD37[, 48] / (Merged_SD37[[50]] + Merged_SD37[[48]] + Merged_SD37[[49]]))*100
Merged_SD37$PercentREPVote <- (Merged_SD37[, 49] / (Merged_SD37[[50]] + Merged_SD37[[48]] + Merged_SD37[[49]]))*100
Merged_SD37$PercentVotedAtPolls <- ((Merged_SD37[, 15] + Merged_SD37[, 16]) / Merged_SD37[[6]])*100

#Adjusting SD39 merged file to get normalized results for precinct statistics
Merged_SD39$PercentHispanic <- (Merged_SD39[, 3] / Merged_SD39[[6]])*100
Merged_SD39$PercentBlack <- (Merged_SD39[, 4] / Merged_SD39[[6]])*100
Merged_SD39$PercentWhite <- (Merged_SD39[, 29] / Merged_SD39[[6]])*100
Merged_SD39$PercentOtherRace <- (Merged_SD39[, 30] / Merged_SD39[[6]])*100
Merged_SD39$PercentNonHispanic <- (Merged_SD39[, 5] / Merged_SD39[[6]])*100
Merged_SD39$PercentNPA <- (Merged_SD39[, 7] / Merged_SD39[[6]])*100
Merged_SD39$PercentDEM <- (Merged_SD39[, 8] / Merged_SD39[[6]])*100
Merged_SD39$PercentREP <- (Merged_SD39[, 9] / Merged_SD39[[6]])*100
Merged_SD39$PercentCuban <- (Merged_SD39[, 10] / Merged_SD39[[6]])*100
Merged_SD39$PercentNonCubanHispanic <- (Merged_SD39[, 11] / Merged_SD39[[6]])*100
Merged_SD39$PercentInPerson <- (Merged_SD39[, 15] / Merged_SD39[[6]])*100
Merged_SD39$PercentEarly <- (Merged_SD39[, 16] / Merged_SD39[[6]])*100
Merged_SD39$PercentVBM <- (Merged_SD39[, 17] / Merged_SD39[[6]])*100
Merged_SD39$Percent18_29 <- (Merged_SD39[, 18] / Merged_SD39[[6]])*100
Merged_SD39$Percent30_44 <- (Merged_SD39[, 19] / Merged_SD39[[6]])*100
Merged_SD39$Percent45_64 <- (Merged_SD39[, 20] / Merged_SD39[[6]])*100
Merged_SD39$Percent65_105 <- (Merged_SD39[, 21] / Merged_SD39[[6]])*100
Merged_SD39$PercentYoungDems <- (Merged_SD39[, 22] / Merged_SD39[[6]])*100
Merged_SD39$PercentYoungNPA <- (Merged_SD39[, 23] / Merged_SD39[[6]])*100
Merged_SD39$PercentYoungHispanic <- (Merged_SD39[, 24] / Merged_SD39[[6]])*100
Merged_SD39$PercentYoungVBM <- (Merged_SD39[, 25] / Merged_SD39[[6]])*100
Merged_SD39$PercentYoungDemsVBM <- (Merged_SD39[, 26] / Merged_SD39[[6]])*100
Merged_SD39$PercentNPAHispanic <- (Merged_SD39[, 27] / Merged_SD39[[6]])*100
Merged_SD39$PercentWhiteNPA <- (Merged_SD39[, 39] / Merged_SD39[[6]])*100
Merged_SD39$PercentBlackNPA <- (Merged_SD39[, 35] / Merged_SD39[[6]])*100
Merged_SD39$PercentOtherRaceNPA <- (Merged_SD39[, 43] / Merged_SD39[[6]])*100
Merged_SD39$PercentHispanicDem <- (Merged_SD39[, 32] / Merged_SD39[[6]])*100
Merged_SD39$PercentWhiteDem <- (Merged_SD39[, 42] / Merged_SD39[[6]])*100
Merged_SD39$PercentBlackDem <- (Merged_SD39[, 38] / Merged_SD39[[6]])*100
Merged_SD39$PercentOtherRaceDem <- (Merged_SD39[, 44] / Merged_SD39[[6]])*100
Merged_SD39$PercentHispanicRep <- (Merged_SD39[, 33] / Merged_SD39[[6]])*100
Merged_SD39$PercentWhiteRep <- (Merged_SD39[, 41] / Merged_SD39[[6]])*100
Merged_SD39$PercentBlackRep <- (Merged_SD39[, 37] / Merged_SD39[[6]])*100
Merged_SD39$PercentOtherRaceRep <- (Merged_SD39[, 45] / Merged_SD39[[6]])*100
Merged_SD39$PercentHispanicOtherParty <- (Merged_SD39[, 34] / Merged_SD39[[6]])*100
Merged_SD39$PercentWhiteOtherParty <- (Merged_SD39[, 40] / Merged_SD39[[6]])*100
Merged_SD39$PercentBlackOtherParty <- (Merged_SD39[, 36] / Merged_SD39[[6]])*100
Merged_SD39$PercentOtherRaceOtherParty <- (Merged_SD39[, 46] / Merged_SD39[[6]])*100
Merged_SD39$PercentNPAVBM <- (Merged_SD39[, 28] / Merged_SD39[[6]])*100
Merged_SD39$PercentNPAVote <- (Merged_SD39[, 50] / (Merged_SD39[[50]] + Merged_SD39[[48]] + Merged_SD39[[49]]))*100
Merged_SD39$PercentDEMVote <- (Merged_SD39[, 48] / (Merged_SD39[[50]] + Merged_SD39[[48]] + Merged_SD39[[49]]))*100
Merged_SD39$PercentREPVote <- (Merged_SD39[, 49] / (Merged_SD39[[50]] + Merged_SD39[[48]] + Merged_SD39[[49]]))*100
Merged_SD39$PercentVotedAtPolls <- ((Merged_SD39[, 15] + Merged_SD39[, 16]) / Merged_SD39[[6]])*100

Creating these proportion variable columns makes them a list. This does not work for my desired style of analysis, so I will unlist() each of these new created variables.

#Unlisting All These Variables (SD37)
Merged_SD37$PercentHispanic <- unlist(Merged_SD37$PercentHispanic)
Merged_SD37$PercentBlack <- unlist(Merged_SD37$PercentBlack)
Merged_SD37$PercentWhite <- unlist(Merged_SD37$PercentWhite)
Merged_SD37$PercentOtherRace <- unlist(Merged_SD37$PercentOtherRace)
Merged_SD37$PercentNonHispanic <- unlist(Merged_SD37$PercentNonHispanic)
Merged_SD37$PercentNPA <- unlist(Merged_SD37$PercentNPA)
Merged_SD37$PercentDEM <- unlist(Merged_SD37$PercentDEM)
Merged_SD37$PercentREP <- unlist(Merged_SD37$PercentREP)
Merged_SD37$Percent18_29 <- unlist(Merged_SD37$Percent18_29)
Merged_SD37$Percent30_44 <- unlist(Merged_SD37$Percent30_44)
Merged_SD37$Percent45_64 <- unlist(Merged_SD37$Percent45_64)
Merged_SD37$Percent65_105 <- unlist(Merged_SD37$Percent65_105)
Merged_SD37$PercentYoungDems <- unlist(Merged_SD37$PercentYoungDems)
Merged_SD37$PercentYoungNPA <- unlist(Merged_SD37$PercentYoungNPA)
Merged_SD37$PercentYoungHispanic <- unlist(Merged_SD37$PercentYoungHispanic)
Merged_SD37$PercentYoungVBM <- unlist(Merged_SD37$PercentYoungVBM)
Merged_SD37$PercentYoungDemsVBM <- unlist(Merged_SD37$PercentYoungDemsVBM)
Merged_SD37$PercentNPAVBM <- unlist(Merged_SD37$PercentNPAVBM)
Merged_SD37$PercentNPAHispanic <- unlist(Merged_SD37$PercentNPAHispanic)
Merged_SD37$PercentHispanicRep <- unlist(Merged_SD37$PercentHispanicRep)
Merged_SD37$PercentHispanicDem <- unlist(Merged_SD37$PercentHispanicDem)
Merged_SD37$PercentHispanicOtherParty <- unlist(Merged_SD37$PercentHispanicOtherParty)
Merged_SD37$PercentBlackDem <- unlist(Merged_SD37$PercentBlackDem)
Merged_SD37$PercentBlackRep <- unlist(Merged_SD37$PercentBlackRep)
Merged_SD37$PercentBlackNPA <- unlist(Merged_SD37$PercentBlackNPA)
Merged_SD37$PercentBlackOtherParty <- unlist(Merged_SD37$PercentBlackOtherParty)
Merged_SD37$PercentWhiteDem <- unlist(Merged_SD37$PercentWhiteDem)
Merged_SD37$PercentWhiteRep <- unlist(Merged_SD37$PercentWhiteRep)
Merged_SD37$PercentWhiteNPA <- unlist(Merged_SD37$PercentWhiteNPA)
Merged_SD37$PercentWhiteOtherParty <- unlist(Merged_SD37$PercentWhiteOtherParty)
Merged_SD37$PercentOtherRaceDem <- unlist(Merged_SD37$PercentOtherRaceDem)
Merged_SD37$PercentOtherRaceRep <- unlist(Merged_SD37$PercentOtherRaceRep)
Merged_SD37$PercentOtherRaceNPA <- unlist(Merged_SD37$PercentOtherRaceNPA)
Merged_SD37$PercentOtherRaceOtherParty <- unlist(Merged_SD37$PercentOtherRaceOtherParty)
Merged_SD37$PercentInPerson <- unlist(Merged_SD37$PercentInPerson)
Merged_SD37$PercentEarly <- unlist(Merged_SD37$PercentEarly)
Merged_SD37$PercentVBM <- unlist(Merged_SD37$PercentVBM)
Merged_SD37$PercentNPAVote <- unlist(Merged_SD37$PercentNPAVote)
Merged_SD37$PercentREPVote <- unlist(Merged_SD37$PercentREPVote)
Merged_SD37$PercentDEMVote <- unlist(Merged_SD37$PercentDEMVote)
Merged_SD37$PercentVotedAtPolls <- unlist(Merged_SD37$PercentVotedAtPolls)

#Unlisting All These Variables (SD39)
Merged_SD39$PercentHispanic <- unlist(Merged_SD39$PercentHispanic)
Merged_SD39$PercentBlack <- unlist(Merged_SD39$PercentBlack)
Merged_SD39$PercentWhite <- unlist(Merged_SD39$PercentWhite)
Merged_SD39$PercentOtherRace <- unlist(Merged_SD39$PercentOtherRace)
Merged_SD39$PercentNonHispanic <- unlist(Merged_SD39$PercentNonHispanic)
Merged_SD39$PercentNPA <- unlist(Merged_SD39$PercentNPA)
Merged_SD39$PercentDEM <- unlist(Merged_SD39$PercentDEM)
Merged_SD39$PercentREP <- unlist(Merged_SD39$PercentREP)
Merged_SD39$Percent18_29 <- unlist(Merged_SD39$Percent18_29)
Merged_SD39$Percent30_44 <- unlist(Merged_SD39$Percent30_44)
Merged_SD39$Percent45_64 <- unlist(Merged_SD39$Percent45_64)
Merged_SD39$Percent65_105 <- unlist(Merged_SD39$Percent65_105)
Merged_SD39$PercentYoungDems <- unlist(Merged_SD39$PercentYoungDems)
Merged_SD39$PercentYoungNPA <- unlist(Merged_SD39$PercentYoungNPA)
Merged_SD39$PercentYoungHispanic <- unlist(Merged_SD39$PercentYoungHispanic)
Merged_SD39$PercentYoungVBM <- unlist(Merged_SD39$PercentYoungVBM)
Merged_SD39$PercentYoungDemsVBM <- unlist(Merged_SD39$PercentYoungDemsVBM)
Merged_SD39$PercentNPAVBM <- unlist(Merged_SD39$PercentNPAVBM)
Merged_SD39$PercentNPAHispanic <- unlist(Merged_SD39$PercentNPAHispanic)
Merged_SD39$PercentHispanicRep <- unlist(Merged_SD39$PercentHispanicRep)
Merged_SD39$PercentHispanicDem <- unlist(Merged_SD39$PercentHispanicDem)
Merged_SD39$PercentHispanicOtherParty <- unlist(Merged_SD39$PercentHispanicOtherParty)
Merged_SD39$PercentBlackDem <- unlist(Merged_SD39$PercentBlackDem)
Merged_SD39$PercentBlackRep <- unlist(Merged_SD39$PercentBlackRep)
Merged_SD39$PercentBlackNPA <- unlist(Merged_SD39$PercentBlackNPA)
Merged_SD39$PercentBlackOtherParty <- unlist(Merged_SD39$PercentBlackOtherParty)
Merged_SD39$PercentWhiteDem <- unlist(Merged_SD39$PercentWhiteDem)
Merged_SD39$PercentWhiteRep <- unlist(Merged_SD39$PercentWhiteRep)
Merged_SD39$PercentWhiteNPA <- unlist(Merged_SD39$PercentWhiteNPA)
Merged_SD39$PercentWhiteOtherParty <- unlist(Merged_SD39$PercentWhiteOtherParty)
Merged_SD39$PercentOtherRaceDem <- unlist(Merged_SD39$PercentOtherRaceDem)
Merged_SD39$PercentOtherRaceRep <- unlist(Merged_SD39$PercentOtherRaceRep)
Merged_SD39$PercentOtherRaceNPA <- unlist(Merged_SD39$PercentOtherRaceNPA)
Merged_SD39$PercentOtherRaceOtherParty <- unlist(Merged_SD39$PercentOtherRaceOtherParty)
Merged_SD39$PercentInPerson <- unlist(Merged_SD39$PercentInPerson)
Merged_SD39$PercentEarly <- unlist(Merged_SD39$PercentEarly)
Merged_SD39$PercentVBM <- unlist(Merged_SD39$PercentVBM)
Merged_SD39$PercentNPAVote <- unlist(Merged_SD39$PercentNPAVote)
Merged_SD39$PercentREPVote <- unlist(Merged_SD39$PercentREPVote)
Merged_SD39$PercentDEMVote <- unlist(Merged_SD39$PercentDEMVote)
Merged_SD39$PercentVotedAtPolls <- unlist(Merged_SD39$PercentVotedAtPolls)

Descriptive Stats for SD37

DescriptiveStats_Merged_SD37 <- select(Merged_SD37, county_precinct, SenateDistrict, starts_with("Percent"))
summary(DescriptiveStats_Merged_SD37)

##  county_precinct    SenateDistrict     PercentHispanic   PercentBlack    
##  Length:171         Length:171         Min.   :  0.00   Min.   : 0.0000  
##  Class :character   Class :character   1st Qu.: 41.63   1st Qu.: 0.6674  
##  Mode  :character   Mode  :character   Median : 59.38   Median : 1.4808  
##                                        Mean   : 60.82   Mean   : 5.0612  
##                                        3rd Qu.: 83.20   3rd Qu.: 3.8483  
##                                        Max.   :100.00   Max.   :75.5282  
##   PercentWhite     PercentOtherRace  PercentNonHispanic   PercentNPA    
##  Min.   :  0.000   Min.   :  0.000   Min.   :  0.00     Min.   :  0.00  
##  1st Qu.:  6.448   1st Qu.:  5.659   1st Qu.: 16.80     1st Qu.: 26.23  
##  Median : 24.332   Median :  6.793   Median : 40.62     Median : 28.55  
##  Mean   : 26.315   Mean   :  7.800   Mean   : 39.18     Mean   : 29.53  
##  3rd Qu.: 42.678   3rd Qu.:  8.614   3rd Qu.: 58.37     3rd Qu.: 31.77  
##  Max.   :100.000   Max.   :100.000   Max.   :100.00     Max.   :100.00  
##    PercentDEM      PercentREP     PercentCuban.Cuban 
##  Min.   : 0.00   Min.   :  0.00   Min.   :  0.00000  
##  1st Qu.:30.02   1st Qu.: 26.34   1st Qu.:  7.58410  
##  Median :35.84   Median : 34.40   Median : 14.97882  
##  Mean   :35.74   Mean   : 33.35   Mean   : 19.94448  
##  3rd Qu.:40.58   3rd Qu.: 40.08   3rd Qu.: 31.03620  
##  Max.   :79.75   Max.   :100.00   Max.   :100.00000  
##  PercentNonCubanHispanic.NonCubanHispanic PercentInPerson   PercentEarly   
##  Min.   :  0.00000                        Min.   : 0.000   Min.   :  0.00  
##  1st Qu.: 24.67039                        1st Qu.: 7.455   1st Qu.: 40.30  
##  Median : 28.72678                        Median : 9.720   Median : 42.81  
##  Mean   : 29.58336                        Mean   :10.801   Mean   : 43.32  
##  3rd Qu.: 33.99328                        3rd Qu.:13.300   3rd Qu.: 45.85  
##  Max.   :100.00000                        Max.   :34.091   Max.   :100.00  
##    PercentVBM      Percent18_29    Percent30_44    Percent45_64    
##  Min.   :  0.00   Min.   : 0.00   Min.   : 0.00   Min.   :  6.338  
##  1st Qu.: 42.56   1st Qu.:12.40   1st Qu.:17.08   1st Qu.: 31.946  
##  Median : 45.67   Median :14.72   Median :21.20   Median : 34.807  
##  Mean   : 45.88   Mean   :15.59   Mean   :21.54   Mean   : 35.274  
##  3rd Qu.: 50.00   3rd Qu.:17.23   3rd Qu.:25.07   3rd Qu.: 38.589  
##  Max.   :100.00   Max.   :65.21   Max.   :50.00   Max.   :100.000  
##  Percent65_105   PercentYoungDems PercentYoungNPA  PercentYoungHispanic
##  Min.   : 0.00   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000      
##  1st Qu.:21.80   1st Qu.: 4.665   1st Qu.: 3.887   1st Qu.: 7.138      
##  Median :26.92   Median : 6.197   Median : 4.720   Median : 9.318      
##  Mean   :27.59   Mean   : 6.728   Mean   : 5.144   Mean   : 8.972      
##  3rd Qu.:33.09   3rd Qu.: 7.475   3rd Qu.: 5.811   3rd Qu.:10.825      
##  Max.   :92.09   Max.   :32.038   Max.   :50.000   Max.   :50.000      
##  PercentYoungVBM  PercentYoungDemsVBM PercentNPAHispanic PercentWhiteNPA 
##  Min.   : 0.000   Min.   :0.000       Min.   :  0.00     Min.   : 0.000  
##  1st Qu.: 4.262   1st Qu.:2.072       1st Qu.: 12.55     1st Qu.: 1.485  
##  Median : 5.618   Median :2.894       Median : 18.58     Median : 5.942  
##  Mean   : 6.049   Mean   :3.099       Mean   : 19.05     Mean   : 6.492  
##  3rd Qu.: 7.680   3rd Qu.:4.035       3rd Qu.: 23.85     3rd Qu.: 9.892  
##  Max.   :20.423   Max.   :9.859       Max.   :100.00     Max.   :33.333  
##  PercentBlackNPA   PercentOtherRaceNPA PercentHispanicDem PercentWhiteDem 
##  Min.   : 0.0000   Min.   : 0.000      Min.   : 0.00      Min.   : 0.000  
##  1st Qu.: 0.1210   1st Qu.: 2.360      1st Qu.:11.67      1st Qu.: 2.234  
##  Median : 0.3455   Median : 2.919      Median :17.66      Median : 9.604  
##  Mean   : 0.8032   Mean   : 3.182      Mean   :17.73      Mean   :11.351  
##  3rd Qu.: 0.7675   3rd Qu.: 3.939      3rd Qu.:23.54      3rd Qu.:17.713  
##  Max.   :11.1111   Max.   :11.290      Max.   :50.00      Max.   :55.556  
##  PercentBlackDem   PercentOtherRaceDem PercentHispanicRep PercentWhiteRep 
##  Min.   : 0.0000   Min.   : 0.000      Min.   : 0.00      Min.   : 0.000  
##  1st Qu.: 0.3280   1st Qu.: 1.659      1st Qu.:15.80      1st Qu.: 2.188  
##  Median : 0.9605   Median : 2.446      Median :22.09      Median : 6.514  
##  Mean   : 4.0318   Mean   : 2.627      Mean   :23.21      Mean   : 8.100  
##  3rd Qu.: 2.9275   3rd Qu.: 3.164      3rd Qu.:30.90      3rd Qu.:11.667  
##  Max.   :66.9014   Max.   :19.355      Max.   :58.99      Max.   :66.667  
##  PercentBlackRep   PercentOtherRaceRep PercentHispanicOtherParty
##  Min.   :0.00000   Min.   :  0.0000    Min.   : 0.0000          
##  1st Qu.:0.00000   1st Qu.:  0.8352    1st Qu.: 0.4474          
##  Median :0.09294   Median :  1.3314    Median : 0.7316          
##  Mean   :0.16620   Mean   :  1.8751    Mean   : 0.8301          
##  3rd Qu.:0.23023   3rd Qu.:  1.6532    3rd Qu.: 1.1086          
##  Max.   :1.45631   Max.   :100.0000    Max.   :10.0000          
##  PercentWhiteOtherParty PercentBlackOtherParty PercentOtherRaceOtherParty
##  Min.   :0.00000        Min.   :0.00000        Min.   :0.00000           
##  1st Qu.:0.06373        1st Qu.:0.00000        1st Qu.:0.00000           
##  Median :0.28490        Median :0.00000        Median :0.08163           
##  Mean   :0.37169        Mean   :0.05998        Mean   :0.11508           
##  3rd Qu.:0.51283        3rd Qu.:0.05695        3rd Qu.:0.17975           
##  Max.   :5.55556        Max.   :1.27389        Max.   :0.71942           
##  PercentNPAVBM    PercentNPAVote   PercentDEMVote  PercentREPVote   
##  Min.   :  0.00   Min.   : 0.000   Min.   : 0.00   Min.   :  8.974  
##  1st Qu.: 11.35   1st Qu.: 1.977   1st Qu.:39.73   1st Qu.: 43.134  
##  Median : 12.82   Median : 3.076   Median :47.85   Median : 49.307  
##  Mean   : 13.38   Mean   : 3.456   Mean   :47.86   Mean   : 48.679  
##  3rd Qu.: 14.11   3rd Qu.: 3.746   3rd Qu.:54.30   3rd Qu.: 56.241  
##  Max.   :100.00   Max.   :50.000   Max.   :87.73   Max.   :100.000  
##  PercentVotedAtPolls
##  Min.   :  0.00     
##  1st Qu.: 50.00     
##  Median : 54.33     
##  Mean   : 54.12     
##  3rd Qu.: 57.44     
##  Max.   :100.00

Descriptive Stats for SD39

DescriptiveStats_Merged_SD39 <- select(Merged_SD39, county_precinct, SenateDistrict, starts_with("Percent"))
summary(DescriptiveStats_Merged_SD39)

##  county_precinct    SenateDistrict     PercentHispanic   PercentBlack    
##  Length:152         Length:152         Min.   :  0.00   Min.   : 0.0000  
##  Class :character   Class :character   1st Qu.: 55.12   1st Qu.: 0.3736  
##  Mode  :character   Mode  :character   Median : 71.30   Median : 2.3686  
##                                        Mean   : 68.68   Mean   :10.6583  
##                                        3rd Qu.: 86.42   3rd Qu.:16.7697  
##                                        Max.   :100.00   Max.   :71.3450  
##   PercentWhite    PercentOtherRace  PercentNonHispanic   PercentNPA   
##  Min.   : 0.000   Min.   :  0.000   Min.   :  0.00     Min.   : 0.00  
##  1st Qu.: 5.863   1st Qu.:  5.347   1st Qu.: 13.58     1st Qu.:25.30  
##  Median : 8.141   Median :  6.284   Median : 28.70     Median :27.76  
##  Mean   :12.417   Mean   :  8.246   Mean   : 31.32     Mean   :28.32  
##  3rd Qu.:11.438   3rd Qu.:  7.613   3rd Qu.: 44.88     3rd Qu.:31.24  
##  Max.   :75.000   Max.   :100.000   Max.   :100.00     Max.   :52.17  
##    PercentDEM       PercentREP    PercentCuban.Cuban
##  Min.   : 7.368   Min.   : 0.00   Min.   : 0.00000  
##  1st Qu.:23.438   1st Qu.:24.66   1st Qu.:14.59262  
##  Median :31.387   Median :34.80   Median :22.05368  
##  Mean   :34.939   Mean   :35.51   Mean   :24.25851  
##  3rd Qu.:43.994   3rd Qu.:48.12   3rd Qu.:37.55718  
##  Max.   :80.000   Max.   :68.42   Max.   :55.17241  
##  PercentNonCubanHispanic.NonCubanHispanic PercentInPerson   PercentEarly  
##  Min.   : 0.00000                         Min.   : 0.000   Min.   : 0.00  
##  1st Qu.:26.16456                         1st Qu.: 8.806   1st Qu.:44.49  
##  Median :30.46891                         Median :11.461   Median :46.60  
##  Mean   :30.67344                         Mean   :12.101   Mean   :46.04  
##  3rd Qu.:36.80516                         3rd Qu.:14.800   3rd Qu.:49.10  
##  Max.   :66.66667                         Max.   :40.000   Max.   :75.86  
##    PercentVBM      Percent18_29     Percent30_44    Percent45_64  
##  Min.   : 20.69   Min.   :  0.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 37.65   1st Qu.: 13.54   1st Qu.:17.96   1st Qu.:33.33  
##  Median : 41.18   Median : 16.03   Median :21.24   Median :35.34  
##  Mean   : 41.86   Mean   : 17.16   Mean   :22.77   Mean   :35.61  
##  3rd Qu.: 44.81   3rd Qu.: 18.53   3rd Qu.:26.20   3rd Qu.:38.29  
##  Max.   :100.00   Max.   :100.00   Max.   :66.67   Max.   :75.00  
##  Percent65_105   PercentYoungDems PercentYoungNPA  PercentYoungHispanic
##  Min.   : 0.00   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000      
##  1st Qu.:17.64   1st Qu.: 4.073   1st Qu.: 4.393   1st Qu.: 9.739      
##  Median :23.62   Median : 6.039   Median : 5.106   Median :11.135      
##  Mean   :24.46   Mean   : 7.287   Mean   : 5.887   Mean   :11.217      
##  3rd Qu.:32.49   3rd Qu.: 8.478   3rd Qu.: 6.466   3rd Qu.:12.936      
##  Max.   :56.33   Max.   :50.000   Max.   :50.000   Max.   :35.135      
##  PercentYoungVBM   PercentYoungDemsVBM PercentNPAHispanic PercentWhiteNPA 
##  Min.   :  0.000   Min.   : 0.000      Min.   : 0.00      Min.   : 0.000  
##  1st Qu.:  4.396   1st Qu.: 1.832      1st Qu.:16.70      1st Qu.: 1.221  
##  Median :  5.271   Median : 2.471      Median :21.51      Median : 1.918  
##  Mean   :  6.201   Mean   : 3.151      Mean   :20.38      Mean   : 2.952  
##  3rd Qu.:  5.955   3rd Qu.: 3.234      3rd Qu.:24.51      3rd Qu.: 3.038  
##  Max.   :100.000   Max.   :50.000      Max.   :34.78      Max.   :25.000  
##  PercentBlackNPA    PercentOtherRaceNPA PercentHispanicDem PercentWhiteDem 
##  Min.   : 0.00000   Min.   : 0.000      Min.   : 0.00      Min.   : 0.000  
##  1st Qu.: 0.07564   1st Qu.: 2.220      1st Qu.:15.15      1st Qu.: 1.384  
##  Median : 0.36804   Median : 2.738      Median :18.95      Median : 2.689  
##  Mean   : 1.53813   Mean   : 3.454      Mean   :19.08      Mean   : 3.969  
##  3rd Qu.: 2.48541   3rd Qu.: 3.380      3rd Qu.:22.39      3rd Qu.: 4.348  
##  Max.   :20.00000   Max.   :50.000      Max.   :60.00      Max.   :50.000  
##  PercentBlackDem   PercentOtherRaceDem PercentHispanicRep PercentWhiteRep 
##  Min.   : 0.0000   Min.   : 0.000      Min.   : 0.00      Min.   : 0.000  
##  1st Qu.: 0.2064   1st Qu.: 1.301      1st Qu.:18.03      1st Qu.: 2.253  
##  Median : 1.5015   Median : 2.094      Median :25.54      Median : 3.217  
##  Mean   : 8.7201   Mean   : 3.169      Mean   :28.29      Mean   : 5.326  
##  3rd Qu.:13.2380   3rd Qu.: 2.942      3rd Qu.:41.66      3rd Qu.: 4.831  
##  Max.   :63.0744   Max.   :53.333      Max.   :63.16      Max.   :50.000  
##  PercentBlackRep  PercentOtherRaceRep PercentHispanicOtherParty
##  Min.   :0.0000   Min.   : 0.0000     Min.   :0.0000           
##  1st Qu.:0.0000   1st Qu.: 0.9411     1st Qu.:0.6273           
##  Median :0.1050   Median : 1.3577     Median :0.9291           
##  Mean   :0.3316   Mean   : 1.5579     Mean   :0.9266           
##  3rd Qu.:0.4925   3rd Qu.: 1.8160     3rd Qu.:1.2351           
##  Max.   :5.8824   Max.   :20.0000     Max.   :3.4483           
##  PercentWhiteOtherParty PercentBlackOtherParty PercentOtherRaceOtherParty
##  Min.   :0.00000        Min.   :0.00000        Min.   :0.00000           
##  1st Qu.:0.00000        1st Qu.:0.00000        1st Qu.:0.00000           
##  Median :0.07654        Median :0.00000        Median :0.03903           
##  Mean   :0.17034        Mean   :0.06846        Mean   :0.06587           
##  3rd Qu.:0.23423        3rd Qu.:0.07305        3rd Qu.:0.10888           
##  Max.   :1.63265        Max.   :0.86957        Max.   :0.57971           
##  PercentNPAVBM    PercentNPAVote    PercentDEMVote   PercentREPVote 
##  Min.   : 0.000   Min.   : 0.0000   Min.   : 18.68   Min.   : 0.00  
##  1st Qu.: 9.934   1st Qu.: 0.9863   1st Qu.: 32.50   1st Qu.:43.22  
##  Median :11.607   Median : 1.3855   Median : 42.11   Median :56.73  
##  Mean   :11.830   Mean   : 1.5701   Mean   : 44.59   Mean   :53.84  
##  3rd Qu.:13.202   3rd Qu.: 1.8557   3rd Qu.: 54.81   3rd Qu.:66.49  
##  Max.   :50.000   Max.   :13.6364   Max.   :100.00   Max.   :81.32  
##  PercentVotedAtPolls
##  Min.   : 0.00      
##  1st Qu.:55.19      
##  Median :58.82      
##  Mean   :58.14      
##  3rd Qu.:62.35      
##  Max.   :79.31

We are done fixing and cleaning up the data so we move on to…

Regression Analysis

It is important to note that we leave out the percent hispanic, percent 18_29, percent vbm, and percent NPA categories as our reference category for the regression analysis.

Senate District 37 Analysis

####Regression Analysis Results NPA SD37 & SD39
#SD37 NPA Model
SD37_GhostCand_Regression_NPA_McDonaldProject = lm(PercentNPAVote ~ PercentBlack + PercentWhite + PercentOtherRace + PercentDEM + PercentREP + PercentInPerson + PercentEarly + Percent30_44 + Percent45_64 + Percent65_105, data=Merged_SD37)
summary(SD37_GhostCand_Regression_NPA_McDonaldProject)

## 
## Call:
## lm(formula = PercentNPAVote ~ PercentBlack + PercentWhite + PercentOtherRace + 
##     PercentDEM + PercentREP + PercentInPerson + PercentEarly + 
##     Percent30_44 + Percent45_64 + Percent65_105, data = Merged_SD37)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7828  -1.4547  -0.3402   1.0095  26.2654 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.051418   5.338823   0.384   0.7013    
## PercentBlack      0.079242   0.036592   2.166   0.0318 *  
## PercentWhite      0.005052   0.021967   0.230   0.8184    
## PercentOtherRace -0.202572   0.041526  -4.878 2.57e-06 ***
## PercentDEM       -0.127290   0.049514  -2.571   0.0111 *  
## PercentREP       -0.022721   0.043234  -0.526   0.5999    
## PercentInPerson   0.162428   0.068432   2.374   0.0188 *  
## PercentEarly      0.008011   0.038593   0.208   0.8358    
## Percent30_44     -0.039603   0.055312  -0.716   0.4750    
## Percent45_64      0.220181   0.048654   4.525 1.17e-05 ***
## Percent65_105    -0.045555   0.050900  -0.895   0.3721    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.687 on 160 degrees of freedom
## Multiple R-squared:  0.3763, Adjusted R-squared:  0.3373 
## F-statistic: 9.653 on 10 and 160 DF,  p-value: 1.67e-12

Senate District 39 Analysis

#SD39 NPA Model
SD39_GhostCand_Regression_NPA_McDonaldProject = lm(PercentNPAVote ~ PercentBlack + PercentWhite + PercentOtherRace + PercentDEM + PercentREP + PercentInPerson + PercentEarly + Percent30_44 + Percent45_64 + Percent65_105, data=Merged_SD39)
summary(SD39_GhostCand_Regression_NPA_McDonaldProject)

## 
## Call:
## lm(formula = PercentNPAVote ~ PercentBlack + PercentWhite + PercentOtherRace + 
##     PercentDEM + PercentREP + PercentInPerson + PercentEarly + 
##     Percent30_44 + Percent45_64 + Percent65_105, data = Merged_SD39)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2396 -0.4716 -0.1189  0.2521  8.2616 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.4521554  2.0715474   2.149 0.033326 *  
## PercentBlack      0.0452584  0.0147542   3.067 0.002589 ** 
## PercentWhite      0.0239682  0.0087435   2.741 0.006913 ** 
## PercentOtherRace -0.0023690  0.0121881  -0.194 0.846165    
## PercentDEM       -0.0976615  0.0258416  -3.779 0.000231 ***
## PercentREP       -0.0707505  0.0198323  -3.567 0.000493 ***
## PercentInPerson   0.0793811  0.0190526   4.166 5.37e-05 ***
## PercentEarly      0.0485239  0.0156474   3.101 0.002329 ** 
## Percent30_44      0.0002108  0.0160003   0.013 0.989507    
## Percent45_64     -0.0335061  0.0175334  -1.911 0.058036 .  
## Percent65_105     0.0112605  0.0184071   0.612 0.541689    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.167 on 141 degrees of freedom
## Multiple R-squared:  0.3373, Adjusted R-squared:  0.2903 
## F-statistic: 7.178 on 10 and 141 DF,  p-value: 4.32e-09

Visualizing the estimates of the regression model through graphs

#Combined Plot Showing Regression Results in SD37 & SD39
combine_plots(
  plotlist = list(
    ggcoefstats(SD37_GhostCand_Regression_NPA_McDonaldProject, stats.labels = FALSE, exclude.intercept = TRUE) +
      ggplot2::labs(x = parse(text = "'standardized regression coefficient' ~italic(beta)"),
                    y = "fixed effects",
                    title = "Senate District 37"
                    ),
    ggcoefstats(SD39_GhostCand_Regression_NPA_McDonaldProject, stats.labels = FALSE, exclude.intercept = TRUE) +
      ggplot2::labs(
        x = parse(text = "'standardized regression coefficient' ~italic(beta)"),
        y = "fixed effects",
        title = "Senate District 39"
      )
  ),
  plotgrid.args = list(ncol = 2),
  annotation.args = list(title = "NPA Vote Share Regression")
)

From these visualizations and regressions, I can draw about 4 notes to tie our ideas back up. These are:

1. The NPA candidate is doing relatively the same amongst Hispanics in SD37 than in SD39.

2. The NPA candidate is doing about the same amongst VBM voters in SD37 and in SD39.

3. The NPA candidate is doing the same amongst NPA voters in SD37 compared to SD39

4. The NPA candidate is doing demonstrably better amongst 45-64 y/o voters in SD37 compared to SD39

Based on these results I actually end up rejecting all of my initial hypotheses. It is interesting to note the high levels of support for the NPA candidate in SD37 from those aged 45-64. It may be that this group of people was the group most targeted by a mailing campaign sent by the dark money groups funding the NPA candidate in SD37.

A Tale of Two Rodriguez

Andy Jarrin

12/17/2021

An Introduction to Florida State Senate District 37

This Florida State Senate District has become a hotbed of controversy following the 2020 elections. Why is that you ask?

Because of stories like these

This leads us to the issues of ghost candidates…

To get a baseline read on an independent candidate’s performance, I will also be introducing you to Senate District 39

Review of the extent literature

Candidates with the same name

In this analysis we wil be using the Florida Voter File alongside a file containing the results for the 2020 elections in these two senate districts, aggregated by precinct.

Loading Data

Cutting down recap file removing certain unnecessary variables

Adjusting VoterID, Party Affiliation, Race, Registration and Birth Date Columns

Adjusting Age Columns

Separating voter recap file to keep only those who voted in the analysis

Creating variable dummy columns for possible independent variables

Creating county_precinct variable to use as key variable to merge later

Adjusting precinct results file so we can be able to merge

Aggregating Voter File data by precinct and senate district to step away from the individual level analysis it’s at now

Finally merging the two files together

Checking our final merged file to see if there’s any duplicates that need to be “handled”

Separate Senate Districts 37 & 39 to analyze how the NPAs performed in each

Changing our variable columns from totals of each variable to percentage of each variable relative to the voting population

Creating these proportion variable columns makes them a list. This does not work for my desired style of analysis, so I will unlist() each of these new created variables.

Descriptive Stats for SD37

Descriptive Stats for SD39

We are done fixing and cleaning up the data so we move on to…

Regression Analysis

Senate District 37 Analysis

Senate District 39 Analysis

Visualizing the estimates of the regression model through graphs

From these visualizations and regressions, I can draw about 4 notes to tie our ideas back up. These are:

1. The NPA candidate is doing relatively the same amongst Hispanics in SD37 than in SD39.

2. The NPA candidate is doing about the same amongst VBM voters in SD37 and in SD39.

3. The NPA candidate is doing the same amongst NPA voters in SD37 compared to SD39

4. The NPA candidate is doing demonstrably better amongst 45-64 y/o voters in SD37 compared to SD39

A Tale of Two Rodriguez

Andy Jarrin

12/17/2021

An Introduction to Florida State Senate District 37

This Florida State Senate District has become a hotbed of controversy following the 2020 elections. Why is that you ask?

Because of stories like these

This leads us to the issues of ghost candidates…

To get a baseline read on an independent candidate’s performance, I will also be introducing you to Senate District 39

An NPA candidate also ran in SD39 but this NPA candidate did not share the same name as any of the other candidates in that senate district.

Review of the extent literature

Candidates with the same name

H1: In precincts in SD37 where people aged 18-29 accounted for higher proportions of the voting population, I expect the vote share for the NPA candidate to be relatively higher than in SD39 even when taking into account other factors.

H2: In precincts in SD37 where NPA voters accounted for higher proportions of the voting population, I expect the vote share for the NPA candidate to be relatively higher than in SD39 even when taking into account other factors.

H3: In precincts in SD37 where Hispanic voters accounted for higher proportions of the voting population, I expect the vote share for the NPA candidate to be relatively higher than in SD39 even when taking into account other factors.

In this analysis we wil be using the Florida Voter File alongside a file containing the results for the 2020 elections in these two senate districts, aggregated by precinct.

Loading Data

Cutting down recap file removing certain unnecessary variables

Adjusting VoterID, Party Affiliation, Race, Registration and Birth Date Columns

Adjusting Age Columns

Separating voter recap file to keep only those who voted in the analysis

Creating variable dummy columns for possible independent variables

Creating county_precinct variable to use as key variable to merge later

Adjusting precinct results file so we can be able to merge

Aggregating Voter File data by precinct and senate district to step away from the individual level analysis it’s at now

Finally merging the two files together

Checking our final merged file to see if there’s any duplicates that need to be “handled”

Separate Senate Districts 37 & 39 to analyze how the NPAs performed in each

Changing our variable columns from totals of each variable to percentage of each variable relative to the voting population

Creating these proportion variable columns makes them a list. This does not work for my desired style of analysis, so I will unlist() each of these new created variables.

Descriptive Stats for SD37

Descriptive Stats for SD39

We are done fixing and cleaning up the data so we move on to…

Regression Analysis

Senate District 37 Analysis

Senate District 39 Analysis

Visualizing the estimates of the regression model through graphs

From these visualizations and regressions, I can draw about 4 notes to tie our ideas back up. These are:

1. The NPA candidate is doing relatively the same amongst Hispanics in SD37 than in SD39.

2. The NPA candidate is doing about the same amongst VBM voters in SD37 and in SD39.

3. The NPA candidate is doing the same amongst NPA voters in SD37 compared to SD39

4. The NPA candidate is doing demonstrably better amongst 45-64 y/o voters in SD37 compared to SD39