Create three files which store the books’ information in HTML(using an html table), XML, and JSON formats.(Uploaded on Github)
Write R code to load the information from each of the three sources into separate R data frames.
library(rjson)
library(RCurl)
## Loading required package: bitops
library(XML)
library(stringr)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.4.4
##
## Attaching package: 'jsonlite'
## The following objects are masked from 'package:rjson':
##
## fromJSON, toJSON
#Import HTML file and read HTML table
book.html.url <- getURL("https://raw.githubusercontent.com/xiaoxiaogao-DD/DATA607_Assignment7/master/books.html")
book.html.table <- readHTMLTable(book.html.url,header = TRUE)
#Convert HTML table into a data frame
book.html.dataframe <- as.data.frame(book.html.table)
#Adjust the column names
colnames(book.html.dataframe) <- substring(colnames(book.html.dataframe),6,)
book.html.dataframe
## title
## 1 Python for Data Analysis
## 2 Hands-On Machine Learning with Scikit-Learn and TensorFlow
## 3 R for Data Science
## ISBN editors price
## 1 978-1-449-31979-3 Julie Steele;Meghan Blanchette 39.99
## 2 978-1-491-96229-9 Nicole Tache 49.99
## 3 978-1-491-91039-9 Marie Beaugureau;Mike Loukides 39.99
#Import JSON file
book.json.url <- getURL("https://raw.githubusercontent.com/xiaoxiaogao-DD/DATA607_Assignment7/master/books.json")
#Convert data in JSON into a data frame
book.json.dataframe <- flatten(as.data.frame(fromJSON(book.json.url)))
book.json.dataframe
## title
## 1 Python for Data Analysis
## 2 Hands-On Machine Learning with Scikit-Learn and TensorFlow
## 3 R for Data Science
## ISBN editors price
## 1 978-1-449-31979-3 Julie Steele;Meghan Blanchette 39.99
## 2 978-1-491-96229-9 Nicole Tache 49.99
## 3 978-1-491-91039-9 Marie Beaugureau;Mike Loukides 39.99
#Import XML file
book.xml.url <- getURL("https://raw.githubusercontent.com/xiaoxiaogao-DD/DATA607_Assignment7/master/books.xml")
#Convert data in XML into a data frame
book.xml.dataframe <- xmlToDataFrame(xmlParse(book.xml.url))
book.xml.dataframe
## title
## 1 Python for Data Analysis
## 2 Hands-On Machine Learning with Scikit-Learn and TensorFlow
## 3 R for Data Science
## isbn editors price
## 1 978-1-449-31979-3 Julie Steele;Meghan Blanchette 39.99
## 2 978-1-491-96229-9 Nicole Tache 49.99
## 3 978-1-491-91039-9 Marie Beaugureau;Mike Loukides 39.99
Are the three data frames identical?
While the HTML, JSON and XML files have different structures, after manipulation process, the three data frames created are very similar especially for the ones from HTML and XML. For the JSON data frame, serial numbers are created and the data types are signed automatically.