First, we load our required packages and data.
library(jsonlite)
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:utils':
##
## View
library(XML)
library(RCurl)
## Loading required package: bitops
xbooks <- "https://raw.githubusercontent.com/bkreis84/Kreis-Week-7/master/books.xml"
jbooks <- "https://raw.githubusercontent.com/bkreis84/Kreis-Week-7/master/books.json"
hbooks<- "https://raw.githubusercontent.com/bkreis84/Kreis-Week-7/master/books.html"
I used the getUrl function in order to make the HTTPS request. The xmlToDataFrame function converts the xml file into an R data frame. The result is sloppy, in that it puts all of the authors into a single column and doesn’t recognize the “subchildren”. I could have input each author as a child, in which case all 3 tables would have been identical. Just for the sake of seeing what it would look like, I went this way.
xbooks <- getURL(xbooks)
xframe <- xmlToDataFrame(xbooks)
xframe
## title year_published pages genre
## 1 A Storm of Swords 2000 992 fantasy
## 2 Automated Data Collection with R 2015 480 educational
## 3 A Short History of Nearly Everything 2003 544 non-fiction
## author
## 1 George R.R. MartinNANANA
## 2 Simon MunzertChristian Rubba Peter MeissnerDominic Nyhuis
## 3 Bill BrysonNANANA
JSON was very straightforward in that a single function was able to create the R table.
jframe <- fromJSON(jbooks)
jframe
## Title Year Published Pages Genre
## 1 A Storm of Swords 2000 992 fantasy
## 2 Automated Data Collection with R 2015 480 educational
## 3 A Short History of Nearly Everything 2003 544 non-fiction
## Author Author2 Author3 Author4
## 1 George R.R. Martin NA NA NA
## 2 Simon Munzert Christian Rubba Peter Meissner Dominic Nyhuis
## 3 Bill Bryson NA NA NA
Again we used the getURL function, followed by a relatively simple fucntion to read the HTML information and convert it into a data frame.
hbooks <- getURL(hbooks)
hframe <- data.frame(readHTMLTable(hbooks))
hframe
## NULL.Title NULL.Year.Published NULL.Pages
## 1 A Storm of Swords 2000 992
## 2 Automated Data Collection with R 2015 480
## 3 A Short History of Nearly Everything 2003 544
## NULL.Genre NULL.Author NULL.Author.2 NULL.Author.3
## 1 fantasy George R.R. Martin <NA> <NA>
## 2 educational Simon Munzert Christian Rubba Peter Meissner
## 3 non-fiction Bill Bryson <NA> <NA>
## NULL.Author.4
## 1 <NA>
## 2 Dominic Nyhuis
## 3 <NA>
The html and JSON data fraas I mentioned, I easily could have left each author as a child and had all of the tables match.