Week 7

First, we load our required packages and data.

library(jsonlite)

## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View

library(XML)
library(RCurl)

## Loading required package: bitops

xbooks <- "https://raw.githubusercontent.com/bkreis84/Kreis-Week-7/master/books.xml"
jbooks <- "https://raw.githubusercontent.com/bkreis84/Kreis-Week-7/master/books.json"
hbooks<- "https://raw.githubusercontent.com/bkreis84/Kreis-Week-7/master/books.html"

I used the getUrl function in order to make the HTTPS request. The xmlToDataFrame function converts the xml file into an R data frame. The result is sloppy, in that it puts all of the authors into a single column and doesn’t recognize the “subchildren”. I could have input each author as a child, in which case all 3 tables would have been identical. Just for the sake of seeing what it would look like, I went this way.

xbooks <- getURL(xbooks)
xframe <- xmlToDataFrame(xbooks)
xframe

##                                  title year_published pages       genre
## 1                    A Storm of Swords           2000   992     fantasy
## 2     Automated Data Collection with R           2015   480 educational
## 3 A Short History of Nearly Everything           2003   544 non-fiction
##                                                      author
## 1                                  George R.R. MartinNANANA
## 2 Simon MunzertChristian Rubba Peter MeissnerDominic Nyhuis
## 3                                         Bill BrysonNANANA

JSON was very straightforward in that a single function was able to create the R table.

jframe <- fromJSON(jbooks)
jframe

##                                  Title Year Published Pages       Genre
## 1                    A Storm of Swords           2000   992     fantasy
## 2     Automated Data Collection with R           2015   480 educational
## 3 A Short History of Nearly Everything           2003   544 non-fiction
##               Author         Author2        Author3        Author4
## 1 George R.R. Martin              NA             NA             NA
## 2      Simon Munzert Christian Rubba Peter Meissner Dominic Nyhuis
## 3        Bill Bryson              NA             NA             NA

Again we used the getURL function, followed by a relatively simple fucntion to read the HTML information and convert it into a data frame.

hbooks <- getURL(hbooks)
hframe <- data.frame(readHTMLTable(hbooks))
hframe

##                             NULL.Title NULL.Year.Published NULL.Pages
## 1                    A Storm of Swords                2000        992
## 2     Automated Data Collection with R                2015        480
## 3 A Short History of Nearly Everything                2003        544
##    NULL.Genre        NULL.Author   NULL.Author.2  NULL.Author.3
## 1     fantasy George R.R. Martin            <NA>           <NA>
## 2 educational      Simon Munzert Christian Rubba Peter Meissner
## 3 non-fiction        Bill Bryson            <NA>           <NA>
##    NULL.Author.4
## 1           <NA>
## 2 Dominic Nyhuis
## 3           <NA>

The html and JSON data fraas I mentioned, I easily could have left each author as a child and had all of the tables match.

Week 7

Brian Kreis

October 16, 2015