Assignment
Make 3 files (html, xml, and json) that contains information on 3 books. Load the files into R and compare them to see if there are any differences.
Load Libraries
library(knitr)
library(RCurl)
library(XML)
library(jsonlite)
library(plyr)
Load HTML File
htmlurl <- getURL("https://raw.githubusercontent.com/dquarshie89/Data607/master/books.html")
html <- readHTMLTable(htmlurl, header=TRUE, which=1)
knitr::kable(html)
| R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics |
Paul Teetor |
O’Reilly Media |
438 |
4.5 |
| R for Data Science: Import, Tidy, Transform, Visualize, and Model Data |
Hadley Wickham |
O’Reilly Media |
522 |
5 |
| R Graphics Cookbook: Practical Recipes for Visualizing Data |
Winston Chang |
O’Reilly Media |
416 |
4.5 |
Load XML File
xmlurl <- getURL("https://raw.githubusercontent.com/dquarshie89/Data607/master/books.xml")
xml <- xmlToDataFrame(xmlurl)
knitr::kable(xml)
| R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics |
Paul Teetor |
O’Reilly Media |
438 |
4.5 |
| R for Data Science: Import, Tidy, Transform, Visualize, and Model Data |
Hadley Wickham |
O’Reilly Media |
522 |
5 |
| R Graphics Cookbook: Practical Recipes for Visualizing Data |
Winston Chang |
O’Reilly Media |
416 |
4.5 |
Load JSON File
jsonurl <- getURL("https://raw.githubusercontent.com/dquarshie89/Data607/master/books.json")
json <- fromJSON(jsonurl)
json <- data.frame(json)
json
## books.table.book.Title
## 1 R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics
## 2 R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
## 3 R Graphics Cookbook: Practical Recipes for Visualizing Data
## books.table.book.Author books.table.book.Publisher
## 1 Paul Teetor O'Reilly Media
## 2 Hadley Wickham O'Reilly Media
## 3 Winston Chang O'Reilly Media
## books.table.book.Pages books.table.book.Rating
## 1 438 4.5
## 2 522 5
## 3 416 4.5
json <- rename(json, c("books.table.book.Title"="Title",
"books.table.book.Author"="Author",
"books.table.book.Publisher"="Publisher",
"books.table.book.Pages"="Pages",
"books.table.book.Rating"="Rating"
))
knitr::kable(json)
| R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics |
Paul Teetor |
O’Reilly Media |
438 |
4.5 |
| R for Data Science: Import, Tidy, Transform, Visualize, and Model Data |
Hadley Wickham |
O’Reilly Media |
522 |
5 |
| R Graphics Cookbook: Practical Recipes for Visualizing Data |
Winston Chang |
O’Reilly Media |
416 |
4.5 |
Compare JSON and XML and HTML
knitr::kable(json == xml)
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
knitr::kable(json == html)
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
knitr::kable(html == xml)
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
| TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
## 'data.frame': 3 obs. of 5 variables:
## $ Title : chr "R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" "R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" "R Graphics Cookbook: Practical Recipes for Visualizing Data"
## $ Author : chr "Paul Teetor" "Hadley Wickham" "Winston Chang"
## $ Publisher: chr "O'Reilly Media" "O'Reilly Media" "O'Reilly Media"
## $ Pages : chr "438" "522" "416"
## $ Rating : chr "4.5" "5" "4.5"
## 'data.frame': 3 obs. of 5 variables:
## $ Title : Factor w/ 3 levels "R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics",..: 1 2 3
## $ Author : Factor w/ 3 levels "Hadley Wickham",..: 2 1 3
## $ Publisher: Factor w/ 1 level "O'Reilly Media": 1 1 1
## $ Pages : Factor w/ 3 levels "416","438","522": 2 3 1
## $ Rating : Factor w/ 2 levels "4.5","5": 1 2 1
## 'data.frame': 3 obs. of 5 variables:
## $ Title : Factor w/ 3 levels "R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics",..: 1 2 3
## $ Author : Factor w/ 3 levels "Hadley Wickham",..: 2 1 3
## $ Publisher: Factor w/ 1 level "O'Reilly Media": 1 1 1
## $ Pages : Factor w/ 3 levels "416","438","522": 2 3 1
## $ Rating : Factor w/ 2 levels "4.5","5": 1 2 1
## Title Author Publisher Pages Rating
## "character" "character" "character" "character" "character"
## Title Author Publisher Pages Rating
## "integer" "integer" "integer" "integer" "integer"
## Title Author Publisher Pages Rating
## "integer" "integer" "integer" "integer" "integer"
Conclusion
All values in the 3 files are equal but looking at the data frames in the environment we see that the JSON transfer has characters while HTML and XML have factors.