Assignment 8

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author.For each book, include the title, authors, and two or three other attributes that you find interesting.

Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

`Installing the required packages`

library(XML)

## Warning: package 'XML' was built under R version 3.2.3

library(jsonlite)

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:utils':
## 
##     View

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(RCurl)

## Loading required package: bitops

HTML

html <- "file:///C:/Users/Gurpreet/Documents/IS607/ois.html"
books.html <- readHTMLTable(html)

books.html

## $`NULL`
##                                  Title
## 1 Open Intro Statistics: Third Edition
##                                                      Author(s)  Updated on
## 1 David M. Diez, Christopher D. Barr and Mine Citenkaya-Rundel Jan 13,2016

JSON

books.json <- fromJSON("http://raw.githubusercontent.com/gpsingh12/IS-607-MSDA/master/python.json")
books.json <- do.call("rbind", lapply(books.json, data.frame))
books.json

##                              title    author      Publisher
## book :Learning Python, 5th Edition Mark Lutz O'Reilly Media
##                              Formats Print Ebook Pages
## book Print Ebook Safari Books Online  2013  2013  1600

XML

download.file("https://raw.githubusercontent.com/gpsingh12/IS-607-MSDA/master/book.xml","books.xml")

books.xml<-xmlToList(xmlInternalTreeParse("books.xml"))

books.xml<-data.frame(do.call(bind_rows, lapply(books.xml, data.frame)))
books.xml

##                                                                                  title
## 1 Automated Data Collection with R: A practical guide to Web Scrapping and Text Mining
##           author        author.1      author.2       author.3 firstPubDate
## 1 Munzert  Simon Rubba Christian Meibner Peter Nyhuis Dominic         2015
##    Publisher Publication
## 1 John Wiley       WILEY

The three data frames look identical.

Assignment 8

GP SINGH

March 14, 2016

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author.For each book, include the title, authors, and two or three other attributes that you find interesting.

`Installing the required packages`

HTML

JSON

XML