This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
We will start by loading the needed packages.
library(rvest)
## Loading required package: xml2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Now we will use the rvest package and read_html to load data from webpage then select the correct table
zips <- read_html("http://www.zipcodestogo.com/Michigan/") #read html from webpage
vec <- zips %>% html_nodes("tr:nth-child(1) a") %>% html_text() %>% as.vector() #select table
Now we will use logical vectors to remove un-needed columns and headers and remove the last 9 values (web page junk)
truth <- vec != "View Map" # create vector where "View Map" generates a FALSE
vec<-vec[truth] #remove these
true <- vec != "Michigan" # create vector where "Michigan" generates a FALSE
vec <- vec[true] #remove these
vec <- head(vec, -9) #remove last 9 values in KB
Now we have one list that includes both zip codes and counties, all as character strings We will separate numeric from non-numeric into separate vectors then make into a dataframe with 2 columns then convert zip codes into numeric and save all of this in a data frame named df, then print the head of df
names <- is.na(as.numeric(vec))
## Warning: NAs introduced by coercion
zip <- as.numeric(vec[!names])
county <- vec[names]
df <- as.data.frame(cbind(zip, county), stringsAsFactors = FALSE)
df$zip<- as.numeric(df$zip)
head(df)
## zip county
## 1 48001 Saint Clair
## 2 48002 Saint Clair
## 3 48003 Lapeer
## 4 48004 Saint Clair
## 5 48005 Macomb
## 6 48006 Saint Clair
Now we will supply a dataframe of new zip codes with unknown counties only 3 of them to keep it simple and match merge with the left_join function from dplyr and print out the new zips, then the new matched list
zip <- (c(48001, 48103, 48433))
newzips <- as.data.frame(zip)
complete <- left_join(newzips, df)
## Joining by: "zip"
newzips
## zip
## 1 48001
## 2 48103
## 3 48433
complete
## zip county
## 1 48001 Saint Clair
## 2 48103 Washtenaw
## 3 48433 Genesee