Deliverables

  1. Proposal-due August2,2022
  2. Presentation-due August16,2022
  3. Executivesummary-due August18,2022

Proposal Rubric This is a draft of the introduction section of your project as well as a data analysis plan and your dataset.

Section 1 - Introduction: The introduction should introduce your general research question and your data (where it came from, how it was collected, what are the cases, what are the variables, etc.).

My final project will be on the Maryland Primary 2022, my general hypothesis is what are the figures on Trump backed Gubernatorial Nominee Dan Cox. Is there a outlier of Republican voters who only voted for govenor and not other primary elections? Comparing the number of voters in the Maryland Republican gubernatorial primary and compare it to the number of Republican votes cast in the state Attorney General election, the U.S. Senate primary, and the sum of the Republican primary votes cast in U.S. House races. I will be using data from The Maryland State Board Of Elections website on the Unofficial 2022 and updating the information as it comes in with each part of the project.

Section 2 - Data: Place your data in the /data folder, and add dimensions and codebook to the README in that folder.

library(rvest)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()         masks stats::filter()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag()            masks stats::lag()
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)
library(dplyr)
#Specifying the url for desired website to be scraped
 Gubernatorial_2022 <- 'https://elections.maryland.gov/elections/2022/primary_results/gen_results_2022_1.html'

 webpage <- read_html(Gubernatorial_2022)
name_data_html <- html_nodes(webpage,'.NameCol') 

name_data <- html_text(name_data_html)

head(name_data)
## [1] "Name"                                                
## [2] "\t\t\r\nDan Cox and Gordana Schifanelli\t\t\r\n"     
## [3] "\t\t\r\nRobin Ficker and LeRoy F. Yegge, Jr.\t\t\r\n"
## [4] "\t\t\r\nKelly Schulz and Jeff Woolford\t\t\r\n"      
## [5] "\t\t\r\nJoe Werner and Minh Thanh Luong\t\t\r\n"     
## [6] "Name"
name_data <- gsub('\t\t\r\n', "", name_data)

head(name_data)
## [1] "Name"                                
## [2] "Dan Cox and Gordana Schifanelli"     
## [3] "Robin Ficker and LeRoy F. Yegge, Jr."
## [4] "Kelly Schulz and Jeff Woolford"      
## [5] "Joe Werner and Minh Thanh Luong"     
## [6] "Name"
party_data_html <- html_nodes(webpage, '.PartyCol')

party_data <- html_text(party_data_html)

head(party_data)
## [1] "Party"      "Republican" "Republican" "Republican" "Republican"
## [6] "Party"
votes_data_html <- html_nodes(webpage, '.VotesCol')

votes_data <- html_text(votes_data_html)

head(votes_data)
## [1] "Early Voting"                 "Election Day"                
## [3] "Mail-In Ballot / Provisional" "Total"                       
## [5] " 26,455"                      " 106,724"
votes_data<-as.numeric(votes_data)
## Warning: NAs introduced by coercion
head(votes_data)
## [1] NA NA NA NA NA NA
length(votes_data)
## [1] 64
percent_data_html <- html_nodes(webpage, '.PercentCol')

percent_data <- html_text(percent_data_html)

head(percent_data)
## [1] "Percentage" " 52.05%"    " 2.79%"     " 43.46%"    " 1.71%"    
## [6] "Percentage"
percent_data <-as.numeric(percent_data)
## Warning: NAs introduced by coercion
head(percent_data)
## [1] NA NA NA NA NA NA
length(percent_data)
## [1] 16

Then print out the output of and codebook to the README in that folder. Then print out the output of glimpse() or skim() of your data frame.

Section 3 - Data analysis plan:

  • The outcome (response, Y) and predictor (explanatory, X) variables you will use to answer your question. Is there a decrease response sum of votes for other electoral positions compared to the republican gubernatorial primary? How do the total number of votes in other republican primaries or state wide office compare with the number of votes per governor? Is this mutually consistent with democratic primaries?

  • The comparison groups you will use, if applicable. I will use (gubernatorial), (attorney general),(senator), & (comptroller)

  • Very preliminary exploratory data analysis, including some summary statistics and visualizations, along with some explanation on how they help you learn more about your data. (You can add to these later as you work on your project.)

  • The method(s) that you believe will be useful in answering your question(s). (You can update these later as you work on your project.)

  • What results from these specific statistical methods are needed to support your hypothesized answer?

Each section should be no more than 1 page (excluding figures). You can check a print preview to confirm length. The grading scheme for the project proposal is as follows. Note that after you receive feedback for your proposal you can improve it based on the feedback and re-submit it. If you re-submit, your final score for the proposal will be the average of two scores you receive (first and second submission).

library(tidyverse)
 setwd("/Users/shoshigilg/Desktop/Data 101")
primary_csv <- read_csv("Primary_2022.csv")
## Rows: 33 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Candidate, Election, Party
## dbl (1): Vote
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(primary_csv)
##   Candidate           Election            Party                Vote       
##  Length:33          Length:33          Length:33          Min.   :  2922  
##  Class :character   Class :character   Class :character   1st Qu.: 14085  
##  Mode  :character   Mode  :character   Mode  :character   Median : 33173  
##                                                           Mean   :109315  
##                                                           3rd Qu.:152900  
##                                                           Max.   :526643
primary_csv$Candidate <- as.factor(primary_csv$Candidate)
primary_csv$Vote <- as.numeric(primary_csv$Vote)
setwd("/Users/shoshigilg/Desktop/Data 101")
Counties <- read_csv("MD_Primary_2022_Counties.csv")
## New names:
## Rows: 792 Columns: 9
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): Candidate, Election, Party, County dbl (1): Vote lgl (4): ...6, ...7,
## ...8, ...9
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
primary_csv$Election <- as.factor(primary_csv$Election)
primary_csv$Party <- as.factor(primary_csv$Party)
str(primary_csv)
## spec_tbl_df [33 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Candidate: Factor w/ 33 levels "Adams","Baker",..: 21 29 10 2 12 18 17 3 26 16 ...
##  $ Election : Factor w/ 4 levels "Attorney General",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Party    : Factor w/ 2 levels "Democrat","Republican": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Vote     : num [1:33] 215464 197914 140478 26250 25022 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Candidate = col_character(),
##   ..   Election = col_character(),
##   ..   Party = col_character(),
##   ..   Vote = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(Counties)
## spec_tbl_df [792 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Candidate: chr [1:792] "Brown" "Brown" "Brown" "Brown" ...
##  $ Election : chr [1:792] "Attorney General" "Attorney General" "Attorney General" "Attorney General" ...
##  $ Party    : chr [1:792] "Democrat" "Democrat" "Democrat" "Democrat" ...
##  $ Vote     : num [1:792] 1237 25647 47465 46346 3939 ...
##  $ County   : chr [1:792] "Allegany" "Anne Arundel" "Baltimore City" "Baltimore County" ...
##  $ ...6     : logi [1:792] NA NA NA NA NA NA ...
##  $ ...7     : logi [1:792] NA NA NA NA NA NA ...
##  $ ...8     : logi [1:792] NA NA NA NA NA NA ...
##  $ ...9     : logi [1:792] NA NA NA NA NA NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Candidate = col_character(),
##   ..   Election = col_character(),
##   ..   Party = col_character(),
##   ..   Vote = col_double(),
##   ..   County = col_character(),
##   ..   ...6 = col_logical(),
##   ..   ...7 = col_logical(),
##   ..   ...8 = col_logical(),
##   ..   ...9 = col_logical()
##   .. )
##  - attr(*, "problems")=<externalptr>
names(primary_csv) <- tolower(names(primary_csv))
names(primary_csv) <- gsub("/", "_", names(primary_csv))
number_of_votes <- primary_csv %>%
  select('candidate', 'election','party','vote')
head(number_of_votes)
## # A tibble: 6 × 4
##   candidate election party      vote
##   <fct>     <fct>    <fct>     <dbl>
## 1 Moore     Governor Democrat 215464
## 2 T. Perez  Governor Democrat 197914
## 3 Franchot  Governor Democrat 140478
## 4 Baker     Governor Democrat  26250
## 5 Gansler   Governor Democrat  25022
## 6 King      Governor Democrat  24286
names(Counties) <- tolower(names(Counties))

MD_Counties <- Counties %>%
  select('candidate','election', 'party', 'vote', 'county')
head(MD_Counties)
## # A tibble: 6 × 5
##   candidate election         party     vote county          
##   <chr>     <chr>            <chr>    <dbl> <chr>           
## 1 Brown     Attorney General Democrat  1237 Allegany        
## 2 Brown     Attorney General Democrat 25647 Anne Arundel    
## 3 Brown     Attorney General Democrat 47465 Baltimore City  
## 4 Brown     Attorney General Democrat 46346 Baltimore County
## 5 Brown     Attorney General Democrat  3939 Calvert         
## 6 Brown     Attorney General Democrat   708 Caroline
summary(MD_Counties)
##   candidate           election            party                vote         
##  Length:792         Length:792         Length:792         Min.   :     6.0  
##  Class :character   Class :character   Class :character   1st Qu.:   297.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :   925.5  
##                                                           Mean   :  4554.8  
##                                                           3rd Qu.:  3070.0  
##                                                           Max.   :119114.0  
##     county         
##  Length:792        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
number_of_votes2 <- number_of_votes %>%
  mutate()