We are required to load 3 files: HTML, XLS, and JSON.
File #1 - HTML
We will read an HTML file from github
URL <- "https://raw.githubusercontent.com/mgroysman/Data-607-Upload-Assignment-Week-7/master/Books.html"
destfile <- "C://Data/temp.html"
download.file(URL, destfile)
#install.packages("XML")
library(XML)
html_data<-readHTMLTable(destfile)
html_data
## $`NULL`
## Title
## 1 The Language of SQl
## 2 SQL Queries for Mere Mortals
## 3 Sam Teach Yourself. Microsoft SQL Server T-SQL in 10 Minutes
## Author1 Author2 Publisher
## 1 Larry Rockoff Course Technology PTR
## 2 Michael J. Hernandez John L. Viescas Addison-Wesley
## 3 Ben Forta Pearson Education
## CountryPrinted
## 1 United States of America
## 2 United States of America
## 3 United States of America
typeof(html_data)
## [1] "list"
File #2 - XML
We will read a XML file from github
URL <- "https://raw.githubusercontent.com/mgroysman/Data-607-Upload-Assignment-Week-7/master/Books.xml"
destfile <- "C://Data/temp.xml"
download.file(URL, destfile)
#install.packages("plyr")
library(plyr)
xml_data<- ldply(xmlToList(destfile), data.frame)
xml_data
## .id Title
## 1 Books The Language of SQl
## 2 Books SQL Queries for Mere Mortals
## 3 Books Sam Teach Yourself. Microsoft SQL Server T-SQL in 10 Minutes
## Author1 Author2 Publisher
## 1 Larry Rockoff Course Technology PTR
## 2 Michael J. Hernandez John L. Viescas Addison-Wesley
## 3 Ben Forta Pearson Education, Inc.
## CountryPrinted .attrs
## 1 United States of America 1
## 2 United States of America 2
## 3 United States of America 3
File #3 - JSON
We will read a Json file from github
URL <- "https://raw.githubusercontent.com/mgroysman/Data-607-Upload-Assignment-Week-7/master/Books.json"
destfile <- "C://Data/temp.json"
download.file(URL, destfile)
#install.packages("RJSONIO")
library("RJSONIO")
isValidJSON(destfile)
## [1] TRUE
json_data<- fromJSON(destfile,nullValue=NA,simplify=FALSE)
json_data1<-do.call("rbind",lapply(json_data,data.frame,stringsAsFactors=FALSE))
json_data1
## Title
## 1 The Language of SQl
## 2 SQL Queries for Mere Mortals
## 3 Sam Teach Yourself. Microsoft SQL Server T-SQL in 10 Minutes
## Author1 Author2 Publisher
## 1 Larry Rockoff Course Technology PTR
## 2 Michael J. Hernandez John L. Viescas Addison-Wesley
## 3 Ben Forta Pearson Education
## PrintCountry
## 1 United States of America
## 2 United States of America
## 3 United States of America
All 3 dataframes captured the data. I personally prefer JSON dataframe. I do not have to deal with factors and it does not have additional attributes as XML one.