Week 8

For this project, I created 3 documents using XML, HTML and JSON in a notepad (I have never done this before… My apologies if these documents make no sense, but I copied the basic formats I found via the slide deck from Gaston Sanchez and from Google searches on HTML tables.)

Using the r packages I then uploaded the data using the following packages:

library(XML)
library(jsonlite)

## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View

library(RJSONIO)

## 
## Attaching package: 'RJSONIO'
## 
## The following objects are masked from 'package:jsonlite':
## 
##     fromJSON, toJSON

library(rvest)

## Loading required package: xml2
## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:XML':
## 
##     xml

library(RCurl)

## Loading required package: bitops

For the week 8 assignment I created the following files:

url_html <- getURL("https://raw.githubusercontent.com/mfarris9505/Week-8-Hwk/master/Books.HTML")
url_json <- getURL("https://raw.githubusercontent.com/mfarris9505/Week-8-Hwk/master/Books.JSON")
url_xml <- getURL("https://raw.githubusercontent.com/mfarris9505/Week-8-Hwk/master/Books.xml")

#Data_XML <- xmlToDataFrame(url_xml)
#Data_JSON <- fromJSON("data/Books.JSON")
Data_HTML <-readHTMLTable("data/Books.HTML")
#Data_XMl
#Data_JSON
Data_HTML

## $`NULL`
##                                   Title         Author1         Author2
## 1                Counte of Monte Cristo Alexander Dumas              NA
## 2      Automated Data Collection with R   Simon Munzert Christian Rubba
## 3 Goosebumps #13: Welcome to Dead House      R.L. Stine              NA
##         Author3        Author4             Publisher            Genre
## 1            NA             NA       Penquin Classic Historical Novel
## 2 Peter Meibner Dominic Nyhuis                 Wiley         Textbook
## 3            NA             NA Scholastic Paperbacks           Horror
##   Published
## 1      1844
## 2      2015
## 3      2010

As you can see the data is read differently based on the packages (and because it was written slightly different in each type. )

Week 8

Section 2

October 18, 2015