For our week 7 assignment, we are challenged to choose 3 books on a topic that interests us. The first thing that came to mind were the Dr. Seuss books I read as a child, so we’re going to use those. My favorites were:

Title: “Green Eggs and Ham”
Authors: Dr. Seuss and James Stevenson (illustrations)
Pages: 72

Title: “One fish two fish red fish blue fish”
Authors: Dr. Seuss and James Stevenson (illustrations)
Pages: 72

“I Am NOT Going to Get up Today!”
Authors: Dr. Seuss and James Stevenson (illustrations)
Pages: 48

They are all by the same author, but part of the assignment requested that we choose something with multiple authors, so I’ll be including the illustrator James Stevenson as well.

#install.packages('XML', dependencies = TRUE)
#install.packages(jasonlite)
library(XML)
library(RCurl)
## Loading required package: bitops
library(bitops)
library(jsonlite)

XML

https://raw.githubusercontent.com/excelsiordata/DATA607/master/Week%207%20Table.xml

http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.xml

seussXML = "http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.xml"
seussXMLParsed <- xmlParse(seussXML)
seussXMLRoot <- xmlRoot(seussXMLParsed)
seuss.XML.df <- xmlToDataFrame(seussXMLRoot)
seuss.XML.df
##                                  Title    Author     Illustrator Pages
## 1                   Green Eggs and Ham Dr. Seuss James Stevenson    72
## 2 One fish two fish red fish blue fish Dr. Seuss James Stevenson    72
## 3       I Am NOT Going to Get up Today Dr. Seuss James Stevenson    48

HTML

https://raw.githubusercontent.com/excelsiordata/DATA607/master/Week%207%20Table.html

http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.html

seussHTML = xmlParse("http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.html", isHTML = TRUE)

#Create the data frame from the HTML data
seuss.HTML.df <- as.data.frame(readHTMLTable(seussHTML))
seuss.HTML.df
##                             NULL.Title NULL.Author NULL.Illustrator
## 1                   Green Eggs and Ham   Dr. Seuss  James Stevenson
## 2 One fish two fish red fish blue fish   Dr. Seuss  James Stevenson
## 3      I Am NOT Going to Get Up Today!   Dr. Seuss  James Stevenson
##   NULL.Pages
## 1         72
## 2         72
## 3         78
#Rename the columns
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Title"] <- "Title"
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Author"] <- "Author"
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Illustrator"] <- "Illustrator"
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Pages"] <- "Pages"
seuss.HTML.df
##                                  Title    Author     Illustrator Pages
## 1                   Green Eggs and Ham Dr. Seuss James Stevenson    72
## 2 One fish two fish red fish blue fish Dr. Seuss James Stevenson    72
## 3      I Am NOT Going to Get Up Today! Dr. Seuss James Stevenson    78

JSON

https://raw.githubusercontent.com/excelsiordata/DATA607/master/Week%207%20Table.json

https://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.json

seussJSON <- fromJSON("http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.json")
seuss.JSON.df <- as.data.frame(seussJSON)
seuss.JSON.df
##                         booklist.Title booklist.Author
## 1                   Green Eggs and Ham       Dr. Seuss
## 2 One fish two fish red fish blue fish       Dr. Seuss
## 3       I Am NOT Going to Get up Today       Dr. Seuss
##   booklist.Illustrator booklist.Pages
## 1      James Stevenson             72
## 2      James Stevenson             72
## 3      James Stevenson             48
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Title"] <- "Title"
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Author"] <- "Author"
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Illustrator"] <- "Illustrator"
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Pages"] <- "Pages"
seuss.JSON.df
##                                  Title    Author     Illustrator Pages
## 1                   Green Eggs and Ham Dr. Seuss James Stevenson    72
## 2 One fish two fish red fish blue fish Dr. Seuss James Stevenson    72
## 3       I Am NOT Going to Get up Today Dr. Seuss James Stevenson    48

Now that I’ve updated the column names, the only difference between the data frames is the data type. The JSON data frame is all character values, while the XML and HTML data frames are factors.