For our week 7 assignment, we are challenged to choose 3 books on a topic that interests us. The first thing that came to mind were the Dr. Seuss books I read as a child, so we’re going to use those. My favorites were:
Title: “Green Eggs and Ham”
Authors: Dr. Seuss and James Stevenson (illustrations)
Pages: 72
Title: “One fish two fish red fish blue fish”
Authors: Dr. Seuss and James Stevenson (illustrations)
Pages: 72
“I Am NOT Going to Get up Today!”
Authors: Dr. Seuss and James Stevenson (illustrations)
Pages: 48
They are all by the same author, but part of the assignment requested that we choose something with multiple authors, so I’ll be including the illustrator James Stevenson as well.
#install.packages('XML', dependencies = TRUE)
#install.packages(jasonlite)
library(XML)
library(RCurl)
## Loading required package: bitops
library(bitops)
library(jsonlite)
https://raw.githubusercontent.com/excelsiordata/DATA607/master/Week%207%20Table.xml
http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.xml
seussXML = "http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.xml"
seussXMLParsed <- xmlParse(seussXML)
seussXMLRoot <- xmlRoot(seussXMLParsed)
seuss.XML.df <- xmlToDataFrame(seussXMLRoot)
seuss.XML.df
## Title Author Illustrator Pages
## 1 Green Eggs and Ham Dr. Seuss James Stevenson 72
## 2 One fish two fish red fish blue fish Dr. Seuss James Stevenson 72
## 3 I Am NOT Going to Get up Today Dr. Seuss James Stevenson 48
https://raw.githubusercontent.com/excelsiordata/DATA607/master/Week%207%20Table.html
http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.html
seussHTML = xmlParse("http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.html", isHTML = TRUE)
#Create the data frame from the HTML data
seuss.HTML.df <- as.data.frame(readHTMLTable(seussHTML))
seuss.HTML.df
## NULL.Title NULL.Author NULL.Illustrator
## 1 Green Eggs and Ham Dr. Seuss James Stevenson
## 2 One fish two fish red fish blue fish Dr. Seuss James Stevenson
## 3 I Am NOT Going to Get Up Today! Dr. Seuss James Stevenson
## NULL.Pages
## 1 72
## 2 72
## 3 78
#Rename the columns
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Title"] <- "Title"
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Author"] <- "Author"
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Illustrator"] <- "Illustrator"
names(seuss.HTML.df)[names(seuss.HTML.df)=="NULL.Pages"] <- "Pages"
seuss.HTML.df
## Title Author Illustrator Pages
## 1 Green Eggs and Ham Dr. Seuss James Stevenson 72
## 2 One fish two fish red fish blue fish Dr. Seuss James Stevenson 72
## 3 I Am NOT Going to Get Up Today! Dr. Seuss James Stevenson 78
https://raw.githubusercontent.com/excelsiordata/DATA607/master/Week%207%20Table.json
https://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.json
seussJSON <- fromJSON("http://cdn.rawgit.com/excelsiordata/DATA607/master/Week%207%20Table.json")
seuss.JSON.df <- as.data.frame(seussJSON)
seuss.JSON.df
## booklist.Title booklist.Author
## 1 Green Eggs and Ham Dr. Seuss
## 2 One fish two fish red fish blue fish Dr. Seuss
## 3 I Am NOT Going to Get up Today Dr. Seuss
## booklist.Illustrator booklist.Pages
## 1 James Stevenson 72
## 2 James Stevenson 72
## 3 James Stevenson 48
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Title"] <- "Title"
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Author"] <- "Author"
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Illustrator"] <- "Illustrator"
names(seuss.JSON.df)[names(seuss.JSON.df)=="booklist.Pages"] <- "Pages"
seuss.JSON.df
## Title Author Illustrator Pages
## 1 Green Eggs and Ham Dr. Seuss James Stevenson 72
## 2 One fish two fish red fish blue fish Dr. Seuss James Stevenson 72
## 3 I Am NOT Going to Get up Today Dr. Seuss James Stevenson 48
Now that I’ve updated the column names, the only difference between the data frames is the data type. The JSON data frame is all character values, while the XML and HTML data frames are factors.