This Assignment reads 3 files from the web (Github) which are HTML, XML and

JSON. The data is 3 books with attributes such as title, author, ISBN,

number of pages etc. Lets see how R reads in these three files

library(jsonlite)
library(XML)
library(xml2)
library(RCurl)
## Loading required package: bitops
library(rvest)
## 
## Attaching package: 'rvest'
## The following object is masked from 'package:XML':
## 
##     xml
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

Load JSON data

jsonURL <- "https://raw.githubusercontent.com/jonathan1987/CUNYSPS_IS607/master/Week7/books.json"
books_json <- fromJSON(jsonURL)
htmlURL <- "https://raw.githubusercontent.com/jonathan1987/CUNYSPS_IS607/master/Week7/books.html"
books_HTML <- read_html(htmlURL)
books_HTML <- html_table(books_HTML)
xmlURL <- "https://raw.githubusercontent.com/jonathan1987/CUNYSPS_IS607/master/Week7/Books.xml"
books_XML <- xmlRoot(xmlParse(getURL(xmlURL))) # get XML file contents

# make into a dataset with ldply
books_XML_df <- ldply(xmlToList(books_XML), data.frame)

# remove the .id column
books_XML_df <- books_XML_df %>% select(-.id)

We can see that the datasets are similar except for when it comes to

multiple objects (authors in this case)

JSON takes all authors and puts it in one column

HTML has authors for a book across multiple columns

XML like HTML takes multiples authors for a books and puts them in

multiple columns.