My final project will be on the Maryland Primary 2022, my general hypothesis is what are the figures on Trump backed Gubernatorial Nominee Dan Cox. Is there a outlier of Republican voters who only voted for govenor and not other primary elections? Comparing the number of voters in the Maryland Republican gubernatorial primary and compare it to the number of Republican votes cast in the state Attorney General election, the U.S. Senate primary, and the sum of the Republican primary votes cast in U.S. House races. I will be using data from The Maryland State Board Of Elections website on the Unofficial 2022 and updating the information as it comes in with each part of the project.
/data folder,
and add dimensions and codebook to the README in that folder.library(rvest)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag() masks stats::lag()
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyr)
library(dplyr)
#Specifying the url for desired website to be scraped
Gubernatorial_2022 <- 'https://elections.maryland.gov/elections/2022/primary_results/gen_results_2022_1.html'
webpage <- read_html(Gubernatorial_2022)
name_data_html <- html_nodes(webpage,'.NameCol')
name_data <- html_text(name_data_html)
head(name_data)
## [1] "Name"
## [2] "\t\t\r\nDan Cox and Gordana Schifanelli\t\t\r\n"
## [3] "\t\t\r\nRobin Ficker and LeRoy F. Yegge, Jr.\t\t\r\n"
## [4] "\t\t\r\nKelly Schulz and Jeff Woolford\t\t\r\n"
## [5] "\t\t\r\nJoe Werner and Minh Thanh Luong\t\t\r\n"
## [6] "Name"
name_data <- gsub('\t\t\r\n', "", name_data)
head(name_data)
## [1] "Name"
## [2] "Dan Cox and Gordana Schifanelli"
## [3] "Robin Ficker and LeRoy F. Yegge, Jr."
## [4] "Kelly Schulz and Jeff Woolford"
## [5] "Joe Werner and Minh Thanh Luong"
## [6] "Name"
party_data_html <- html_nodes(webpage, '.PartyCol')
party_data <- html_text(party_data_html)
head(party_data)
## [1] "Party" "Republican" "Republican" "Republican" "Republican"
## [6] "Party"
votes_data_html <- html_nodes(webpage, '.VotesCol')
votes_data <- html_text(votes_data_html)
head(votes_data)
## [1] "Early Voting" "Election Day"
## [3] "Mail-In Ballot / Provisional" "Total"
## [5] " 26,455" " 106,724"
votes_data<-as.numeric(votes_data)
## Warning: NAs introduced by coercion
head(votes_data)
## [1] NA NA NA NA NA NA
length(votes_data)
## [1] 64
percent_data_html <- html_nodes(webpage, '.PercentCol')
percent_data <- html_text(percent_data_html)
head(percent_data)
## [1] "Percentage" " 52.05%" " 2.79%" " 43.46%" " 1.71%"
## [6] "Percentage"
percent_data <-as.numeric(percent_data)
## Warning: NAs introduced by coercion
head(percent_data)
## [1] NA NA NA NA NA NA
length(percent_data)
## [1] 16
Then print out the output of and codebook to the README in that folder. Then print out the output of glimpse() or skim() of your data frame.
The outcome (response, Y) and predictor (explanatory, X)
variables you will use to answer your question. Is there a decrease
response sum of votes for other electoral positions compared to the
republican gubernatorial primary? How do the total number of votes in
other republican primaries or state wide office compare with the number
of votes per governor? Is this mutually consistent with democratic
primaries?
The comparison groups you will use, if applicable. I will use (gubernatorial), (attorney general),(senator), & (comptroller)
Very preliminary exploratory data analysis, including some summary statistics and visualizations, along with some explanation on how they help you learn more about your data. (You can add to these later as you work on your project.)
The method(s) that you believe will be useful in answering your question(s). (You can update these later as you work on your project.)
What results from these specific statistical methods are needed to support your hypothesized answer?
Each section should be no more than 1 page (excluding figures). You can check a print preview to confirm length. The grading scheme for the project proposal is as follows. Note that after you receive feedback for your proposal you can improve it based on the feedback and re-submit it. If you re-submit, your final score for the proposal will be the average of two scores you receive (first and second submission).
library(tidyverse)
setwd("/Users/shoshigilg/Desktop/Data 101")
primary_csv <- read_csv("Primary_2022.csv")
## Rows: 33 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Candidate, Election, Party
## dbl (1): Vote
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(primary_csv)
## Candidate Election Party Vote
## Length:33 Length:33 Length:33 Min. : 2922
## Class :character Class :character Class :character 1st Qu.: 14085
## Mode :character Mode :character Mode :character Median : 33173
## Mean :109315
## 3rd Qu.:152900
## Max. :526643
primary_csv$Candidate <- as.factor(primary_csv$Candidate)
primary_csv$Vote <- as.numeric(primary_csv$Vote)
setwd("/Users/shoshigilg/Desktop/Data 101")
Counties <- read_csv("MD_Primary_2022_Counties.csv")
## New names:
## Rows: 792 Columns: 9
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): Candidate, Election, Party, County dbl (1): Vote lgl (4): ...6, ...7,
## ...8, ...9
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
primary_csv$Election <- as.factor(primary_csv$Election)
primary_csv$Party <- as.factor(primary_csv$Party)
str(primary_csv)
## spec_tbl_df [33 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Candidate: Factor w/ 33 levels "Adams","Baker",..: 21 29 10 2 12 18 17 3 26 16 ...
## $ Election : Factor w/ 4 levels "Attorney General",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Party : Factor w/ 2 levels "Democrat","Republican": 1 1 1 1 1 1 1 1 1 1 ...
## $ Vote : num [1:33] 215464 197914 140478 26250 25022 ...
## - attr(*, "spec")=
## .. cols(
## .. Candidate = col_character(),
## .. Election = col_character(),
## .. Party = col_character(),
## .. Vote = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
str(Counties)
## spec_tbl_df [792 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Candidate: chr [1:792] "Brown" "Brown" "Brown" "Brown" ...
## $ Election : chr [1:792] "Attorney General" "Attorney General" "Attorney General" "Attorney General" ...
## $ Party : chr [1:792] "Democrat" "Democrat" "Democrat" "Democrat" ...
## $ Vote : num [1:792] 1237 25647 47465 46346 3939 ...
## $ County : chr [1:792] "Allegany" "Anne Arundel" "Baltimore City" "Baltimore County" ...
## $ ...6 : logi [1:792] NA NA NA NA NA NA ...
## $ ...7 : logi [1:792] NA NA NA NA NA NA ...
## $ ...8 : logi [1:792] NA NA NA NA NA NA ...
## $ ...9 : logi [1:792] NA NA NA NA NA NA ...
## - attr(*, "spec")=
## .. cols(
## .. Candidate = col_character(),
## .. Election = col_character(),
## .. Party = col_character(),
## .. Vote = col_double(),
## .. County = col_character(),
## .. ...6 = col_logical(),
## .. ...7 = col_logical(),
## .. ...8 = col_logical(),
## .. ...9 = col_logical()
## .. )
## - attr(*, "problems")=<externalptr>
names(primary_csv) <- tolower(names(primary_csv))
names(primary_csv) <- gsub("/", "_", names(primary_csv))
number_of_votes <- primary_csv %>%
select('candidate', 'election','party','vote')
head(number_of_votes)
## # A tibble: 6 × 4
## candidate election party vote
## <fct> <fct> <fct> <dbl>
## 1 Moore Governor Democrat 215464
## 2 T. Perez Governor Democrat 197914
## 3 Franchot Governor Democrat 140478
## 4 Baker Governor Democrat 26250
## 5 Gansler Governor Democrat 25022
## 6 King Governor Democrat 24286
names(Counties) <- tolower(names(Counties))
MD_Counties <- Counties %>%
select('candidate','election', 'party', 'vote', 'county')
head(MD_Counties)
## # A tibble: 6 × 5
## candidate election party vote county
## <chr> <chr> <chr> <dbl> <chr>
## 1 Brown Attorney General Democrat 1237 Allegany
## 2 Brown Attorney General Democrat 25647 Anne Arundel
## 3 Brown Attorney General Democrat 47465 Baltimore City
## 4 Brown Attorney General Democrat 46346 Baltimore County
## 5 Brown Attorney General Democrat 3939 Calvert
## 6 Brown Attorney General Democrat 708 Caroline
summary(MD_Counties)
## candidate election party vote
## Length:792 Length:792 Length:792 Min. : 6.0
## Class :character Class :character Class :character 1st Qu.: 297.0
## Mode :character Mode :character Mode :character Median : 925.5
## Mean : 4554.8
## 3rd Qu.: 3070.0
## Max. :119114.0
## county
## Length:792
## Class :character
## Mode :character
##
##
##
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
number_of_votes2 <- number_of_votes %>%
mutate()