For this project, I created 3 documents using XML, HTML and JSON in a notepad (I have never done this before⦠My apologies if these documents make no sense, but I copied the basic formats I found via the slide deck from Gaston Sanchez and from Google searches on HTML tables.)
Using the r packages I then uploaded the data using the following packages:
library(XML)
library(jsonlite)
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:utils':
##
## View
library(RJSONIO)
##
## Attaching package: 'RJSONIO'
##
## The following objects are masked from 'package:jsonlite':
##
## fromJSON, toJSON
library(rvest)
## Loading required package: xml2
##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:XML':
##
## xml
library(RCurl)
## Loading required package: bitops
For the week 8 assignment I created the following files:
url_html <- getURL("https://raw.githubusercontent.com/mfarris9505/Week-8-Hwk/master/Books.HTML")
url_json <- getURL("https://raw.githubusercontent.com/mfarris9505/Week-8-Hwk/master/Books.JSON")
url_xml <- getURL("https://raw.githubusercontent.com/mfarris9505/Week-8-Hwk/master/Books.xml")
#Data_XML <- xmlToDataFrame(url_xml)
#Data_JSON <- fromJSON("data/Books.JSON")
Data_HTML <-readHTMLTable("data/Books.HTML")
#Data_XMl
#Data_JSON
Data_HTML
## $`NULL`
## Title Author1 Author2
## 1 Counte of Monte Cristo Alexander Dumas NA
## 2 Automated Data Collection with R Simon Munzert Christian Rubba
## 3 Goosebumps #13: Welcome to Dead House R.L. Stine NA
## Author3 Author4 Publisher Genre
## 1 NA NA Penquin Classic Historical Novel
## 2 Peter Meibner Dominic Nyhuis Wiley Textbook
## 3 NA NA Scholastic Paperbacks Horror
## Published
## 1 1844
## 2 2015
## 3 2010
As you can see the data is read differently based on the packages (and because it was written slightly different in each type. )