607 Wk 7

As always, we will start with our dependendencies. The data for this can be found on github.

library(XML, quietly= TRUE)
suppressPackageStartupMessages(library(tidyverse, quietly = TRUE))
library(rjson)

Reading an html table is very straightforward.

data.file <- "data.html"
html.data <- readHTMLTable(data.file)
html.data

## $`NULL`
##                       Title         Author  Year          ISBN 
## 1 The Universe in a Nutshell Steven Hawking  2001 0-553-80202-X
## 2       The Elegant Universe    Brian Green  1999 0-393-05858-1
## 3                     Cosmos     Carl Sagan  1980 0-394-50294-9

In the XML file, we must establish a node as a root node, then unnest each node below it. Finally, we store the transpose of this data as the dataframe we see below.

data.file <- "data.xml"
xml.data <- xmlParse(data.file)
rootNode <- xmlRoot(xml.data)
xml.data <- xmlSApply(rootNode,function(x) xmlSApply(x, xmlValue))
xml.frame <- data.frame(t(xml.data),row.names=NULL)
xml.frame

##                        title         author   year            isbn
## 1 The Universe in a Nutshell Steven Hawking   2001   0-553-80202-X
## 2       The Elegant Universe  Briane Greene   1999   0-393-05858-1
## 3                    Cosmos     Carl Sagan   1980   0-394-50294-9

Jsons are a bit simpler again given that this one is formatted as a list already. (As opposed to the structure we defined on the fly in an XML). Each row is an instance of the ‘books’ object that complies to the .js standard.

json.data <- fromJSON(file ="data.json", simplify = TRUE )
data.frame(json.data)

##                  books.title   books.author books.year    books.isbn
## 1 The Universe in a Nutshell Steven Hawking       2001 0-553-80202-X
##          books.title.1 books.author.1 books.year.1  books.isbn.1
## 1 The Elegant Universe    Brian Green         1999 0-393-05858-1
##   books.title.2 books.author.2 books.year.2  books.isbn.2
## 1        Cosmos     Carl Sagan         1980 0-394-50294-9

This file can be found on github as well as rendered as an html on rpubs here

607 Wk 7

simplymathematics

October 14, 2018