R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

We will start by loading the needed packages.

library(rvest)
## Loading required package: xml2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Now we will use the rvest package and read_html to load data from webpage then select the correct table

zips  <- read_html("http://www.zipcodestogo.com/Michigan/") #read html from webpage
vec <- zips %>% html_nodes("tr:nth-child(1) a")  %>% html_text()  %>% as.vector() #select table

Now we will use logical vectors to remove un-needed columns and headers and remove the last 9 values (web page junk)

truth <- vec != "View Map" # create vector where "View Map" generates a FALSE
vec<-vec[truth] #remove these
true <- vec != "Michigan" # create vector where "Michigan" generates a FALSE
vec  <- vec[true] #remove these
vec  <-  head(vec, -9) #remove last 9 values in KB

Now we have one list that includes both zip codes and counties, all as character strings We will separate numeric from non-numeric into separate vectors then make into a dataframe with 2 columns then convert zip codes into numeric and save all of this in a data frame named df, then print the head of df

names <- is.na(as.numeric(vec))
## Warning: NAs introduced by coercion
zip  <- as.numeric(vec[!names])
county  <- vec[names]
df <- as.data.frame(cbind(zip, county), stringsAsFactors = FALSE)
df$zip<- as.numeric(df$zip)
head(df)
##     zip      county
## 1 48001 Saint Clair
## 2 48002 Saint Clair
## 3 48003      Lapeer
## 4 48004 Saint Clair
## 5 48005      Macomb
## 6 48006 Saint Clair

Now we will supply a dataframe of new zip codes with unknown counties only 3 of them to keep it simple and match merge with the left_join function from dplyr and print out the new zips, then the new matched list

zip  <- (c(48001, 48103, 48433))
newzips <- as.data.frame(zip)
complete <- left_join(newzips, df)
## Joining by: "zip"
newzips
##     zip
## 1 48001
## 2 48103
## 3 48433
complete
##     zip      county
## 1 48001 Saint Clair
## 2 48103   Washtenaw
## 3 48433     Genesee