This project imports an identical table of book data into R dataframes for downstream processing.
Three formats are used:
1. HTML format
2. XML format
3. JSON format
The tables consist of 3 rows with 8 variables:
id
author
author2 (if applicable)
title
genre
isbn
year
comments
We start by loading the libraries and file paths.
library(XML)
library(rjson)
library(curl)
html_file = "https://raw.githubusercontent.com/bsnacks000/IS607-DataAq/master/Data/books.html"
json_file = "https://raw.githubusercontent.com/bsnacks000/IS607-DataAq/master/Data/books.json"
xml_file = "https://raw.githubusercontent.com/bsnacks000/IS607-DataAq/master/Data/books.xml"
HTML and XML both can be converted to data frames easily using the XML package. The HTML code is wrapped in as.data.frame() to coerce the list directly to a df.
conn_html = curl(html_file)
html_table = readLines(conn_html)
conn_xml = curl(xml_file)
xml_table = readLines(conn_xml)
h = as.data.frame(readHTMLTable(html_table, header = TRUE, stringsAsFactors = FALSE))
x = xmlToDataFrame(xml_table, stringsAsFactors = FALSE)
colnames(h) <- colnames(x) # tidying col names in html file
Because of the way the JSON file is structured, it does not coerce easily to a data frame. Here we instead use do.call to rbind each row in a new dataframe created with a nested call to lapply.
conn_json = curl(json_file)
j = fromJSON(file = conn_json)
jdf = do.call("rbind", lapply(j, data.frame, stringsAsFactors=FALSE))
Below are the identical dataframes.
h
## id author author_2
## 1 1 Dick, Philip K. NA
## 2 2 Herbert, Frank NA
## 3 3 Martin, George R.R. Dozois, Gardner
## title genre isbn year
## 1 Do Androids Dream of Electric Sheep? Sci-Fi 0-345-40447 1968
## 2 Dune Sci-Fi 0-441-17271-7 1965
## 3 Hunter's Run Sci-Fi 0-06-137329-X 2007
## comments
## 1 A depressed detective hunts androids masquerading as humans in a dystopian future. A brilliant movie based on this book was made by Ridley Scott in the 80s.
## 2 A whiney prince becomes a prophet and leader of a tribe of drug addicted desert nomads. A not-so-brilliant movie based on this book was made by David Lynch in the 80s.
## 3 A scruffy loner kills a man and while on the run discovers an alien hideout on a strange planet. No movie based on this book has yet to be made.
x
## id author author_2
## 1 1 Dick, Philip K. NA
## 2 2 Herbert, Frank NA
## 3 3 Martin, George R.R. Dozois, Gardner
## title genre isbn year
## 1 Do Androids Dream of Electric Sheep? Sci-fi 0-345-40447 1968
## 2 Dune Sci-Fi 0-441-17271-7 1965
## 3 Hunter's Run Sci-Fi 0-06-137329-X 2007
## comments
## 1 A depressed detective hunts androids masquerading as humans in a dystopian future. A brilliant movie based on this book was made by Ridley Scott in the 80s.
## 2 A whiney prince becomes a prophet and leader of a tribe of drug addicted desert nomads. A not-so-brilliant movie based on this book was made by David Lynch in the 80s.
## 3 A scruffy loner kills a man and while on the run discovers an alien hideout on a strange planet. No movie based on this book has yet to be made.
jdf
## id Author Author_2
## 1 1 Dick, Philip K. NA
## 2 2 Herbert, Frank NA
## 3 3 Martin, George R.R. Dozois, Gardner
## Title Genre ISBN Year
## 1 Do Androids Dream of Electric Sheep? Sci-fi 0-345-40447 1968
## 2 Dune Sci-fi 0-441-17271-7 1965
## 3 Hunter's Run Sci-fi 0-06-137329-X 2007
## Comments
## 1 A depressed detective hunts androids masquerading as humans in a dystopian future. A brilliant movie based on this book was made by Ridley Scott in the 80s.
## 2 A whiney prince becomes a prophet and leader of a tribe of drug addicted desert nomads. A not-so-brilliant movie based on this book was made by David Lynch in the 80s.
## 3 A scruffy loner kills a man and while on the run discovers an alien hideout on a strange planet. No movie based on this book has yet to be made.