Goal of Assignment:
First create three files (XML, HTML, JSON) containing the same data- your favorite books. For each book, include the title, authors, and two or three other attributes that you find interesting.
Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?
#Load required packages
library("rjson")
library(RCurl)
## Loading required package: bitops
library(XML)
## Warning: package 'XML' was built under R version 3.4.3
library(stringr)
##Let's read in XML file from github
xml.url <- getURL("https://raw.githubusercontent.com/rickidonsingh/Data607/master/books.xml")
file.xml <- xmlParse(file = xml.url)
#Next, let's call the xml function and put it in a df
df.xml <- xmlToDataFrame(file.xml)
#Let's see what it looks like
df.xml
## Title Author
## 1 The Catcher in the Rye JD Salinger
## 2 Moby Dick Herman Melville
## 3 The Grapes of Wrath John Steinbeck
## Brief ISBN
## 1 Salinger’s sort of autobiographical account 7543321726
## 2 Captain Ahab on the hunt for the monstrous white whale 1503280780
## 3 Poor family driven from their land in the Great Depression 0143039431
## Pages
## 1 240
## 2 378
## 3 464
##Let's read in HTML file from github
html.url <- getURL("https://raw.githubusercontent.com/rickidonsingh/Data607/master/books.html")
df.html <- readHTMLTable(html.url, header = T, as.data.frame = T)
#Let's see what it looks like
df.html
## $`NULL`
## Title Author
## 1 The Catcher in the Rye JD Salinger
## 2 Moby Dick Herman Melville
## 3 The Grapes of Wrath John Steinbeck
## Brief ISBN
## 1 Salingerâ\u0080\u0099s sort of autobiographical account 7543321726
## 2 Captain Ahab on the hunt for the monstrous white whale 1503280780
## 3 Poor family driven from their land in the Great Depression 0143039431
## Pages
## 1 240
## 2 378
## 3 464
#Let's read in JSON file from github
json.url <- getURL("https://raw.githubusercontent.com/rickidonsingh/Data607/master/books.json")
file.json <- (file = json.url)
#Next, let's call the JSON function and put it in a df
data.json <- fromJSON(file.json)
df.json <- as.data.frame(data.json)
#Let's see what it looks like
df.json
## book_table.book.Title book_table.book.Author
## 1 The Catcher in the Rye JD Salinger
## book_table.book.Brief book_table.book.ISBN
## 1 Salinger’s sort of autobiographical account 7543321726
## book_table.book.Pages book_table.book.Title.1 book_table.book.Author.1
## 1 240 Moby Dick Herman Melville
## book_table.book.Brief.1
## 1 Captain Ahab on the hunt for the monstrous white whale
## book_table.book.ISBN.1 book_table.book.Pages.1 book_table.book.Title.2
## 1 1503280780 378 The Grapes of Wrath
## book_table.book.Author.2
## 1 John Steinbeck
## book_table.book.Brief.2
## 1 Poor family driven from their land in the Great Depression
## book_table.book.ISBN.2 book_table.book.Pages.2
## 1 0143039431 464
str(df.xml)
## 'data.frame': 3 obs. of 5 variables:
## $ Title : Factor w/ 3 levels "Moby Dick","The Catcher in the Rye",..: 2 1 3
## $ Author: Factor w/ 3 levels "Herman Melville",..: 2 1 3
## $ Brief : Factor w/ 3 levels "Captain Ahab on the hunt for the monstrous white whale",..: 3 1 2
## $ ISBN : Factor w/ 3 levels "0143039431","1503280780",..: 3 2 1
## $ Pages : Factor w/ 3 levels "240","378","464": 1 2 3
str(df.html)
## List of 1
## $ NULL:'data.frame': 3 obs. of 5 variables:
## ..$ Title : Factor w/ 3 levels "Moby Dick","The Catcher in the Rye",..: 2 1 3
## ..$ Author: Factor w/ 3 levels "Herman Melville",..: 2 1 3
## ..$ Brief : Factor w/ 3 levels "Captain Ahab on the hunt for the monstrous white whale",..: 3 1 2
## ..$ ISBN : Factor w/ 3 levels "0143039431","1503280780",..: 3 2 1
## ..$ Pages : Factor w/ 3 levels "240","378","464": 1 2 3
str(df.json)
## 'data.frame': 1 obs. of 15 variables:
## $ book_table.book.Title : Factor w/ 1 level "The Catcher in the Rye": 1
## $ book_table.book.Author : Factor w/ 1 level "JD Salinger": 1
## $ book_table.book.Brief : Factor w/ 1 level "Salinger’s sort of autobiographical account": 1
## $ book_table.book.ISBN : Factor w/ 1 level "7543321726": 1
## $ book_table.book.Pages : Factor w/ 1 level "240": 1
## $ book_table.book.Title.1 : Factor w/ 1 level "Moby Dick": 1
## $ book_table.book.Author.1: Factor w/ 1 level "Herman Melville": 1
## $ book_table.book.Brief.1 : Factor w/ 1 level "Captain Ahab on the hunt for the monstrous white whale": 1
## $ book_table.book.ISBN.1 : Factor w/ 1 level "1503280780": 1
## $ book_table.book.Pages.1 : Factor w/ 1 level "378": 1
## $ book_table.book.Title.2 : Factor w/ 1 level "The Grapes of Wrath": 1
## $ book_table.book.Author.2: Factor w/ 1 level "John Steinbeck": 1
## $ book_table.book.Brief.2 : Factor w/ 1 level "Poor family driven from their land in the Great Depression": 1
## $ book_table.book.ISBN.2 : Factor w/ 1 level "0143039431": 1
## $ book_table.book.Pages.2 : Factor w/ 1 level "464": 1