As always, we will start with our dependendencies. The data for this can be found on github.
library(XML, quietly= TRUE)
suppressPackageStartupMessages(library(tidyverse, quietly = TRUE))
library(rjson)
Reading an html table is very straightforward.
data.file <- "data.html"
html.data <- readHTMLTable(data.file)
html.data
## $`NULL`
## Title Author Year ISBN
## 1 The Universe in a Nutshell Steven Hawking 2001 0-553-80202-X
## 2 The Elegant Universe Brian Green 1999 0-393-05858-1
## 3 Cosmos Carl Sagan 1980 0-394-50294-9
In the XML file, we must establish a node as a root node, then unnest each node below it. Finally, we store the transpose of this data as the dataframe we see below.
data.file <- "data.xml"
xml.data <- xmlParse(data.file)
rootNode <- xmlRoot(xml.data)
xml.data <- xmlSApply(rootNode,function(x) xmlSApply(x, xmlValue))
xml.frame <- data.frame(t(xml.data),row.names=NULL)
xml.frame
## title author year isbn
## 1 The Universe in a Nutshell Steven Hawking 2001 0-553-80202-X
## 2 The Elegant Universe Briane Greene 1999 0-393-05858-1
## 3 Cosmos Carl Sagan 1980 0-394-50294-9
Jsons are a bit simpler again given that this one is formatted as a list already. (As opposed to the structure we defined on the fly in an XML). Each row is an instance of the ‘books’ object that complies to the .js standard.
json.data <- fromJSON(file ="data.json", simplify = TRUE )
data.frame(json.data)
## books.title books.author books.year books.isbn
## 1 The Universe in a Nutshell Steven Hawking 2001 0-553-80202-X
## books.title.1 books.author.1 books.year.1 books.isbn.1
## 1 The Elegant Universe Brian Green 1999 0-393-05858-1
## books.title.2 books.author.2 books.year.2 books.isbn.2
## 1 Cosmos Carl Sagan 1980 0-394-50294-9
This file can be found on github as well as rendered as an html on rpubs here