We make an attempt to acquire data in different formats and use appropriate packages to acquire / load and process them into R readable format of data frame.
Loading Packages Used
knitr::opts_chunk$set(message = FALSE, echo = TRUE)
# Library to read data file
library(RCurl)
library(knitr)
# For loading XML, JSON Data files
library(XML)
library(jsonlite)
library(htmltab)
# Library for data display in tabular format
library(DT)Data available in HTML, JSON and XML formats are loaded.
The data files considered are in following format. Below are the github links to the data files.
html.giturl <- "https://raw.githubusercontent.com/DataDriven-MSDA/DATA607/master/Week7A/myfavbookshtml.html"
# The htmltab() from package HTMLTAB directly gives data frame as output.
book.html <- htmltab::htmltab(html.giturl)
class(book.html)## [1] "data.frame"
# Verifying records and variables
nrow(book.html)## [1] 5
ncol(book.html)## [1] 8
# Renaming Columns
colnames(book.html) <- c("Title", "Authors", "Genre", "Language", "Publisher", "Pages",
"Rating", "Price(INR)")
# Display data frame content
datatable(book.html, rownames = FALSE)json.giturl <- "https://raw.githubusercontent.com/DataDriven-MSDA/DATA607/master/Week7A/myfavbooksjson.json"
json.baseurlstr <- paste(readLines(json.giturl), collapse = "")
book.jsondata <- fromJSON(json.baseurlstr)
class(book.jsondata)## [1] "list"
# converting to Data Frame.
book.json <- as.data.frame(book.jsondata)
# Verifying records and variables
nrow(book.json)## [1] 5
ncol(book.json)## [1] 8
# Renaming Columns
colnames(book.json) <- c("Title", "Authors", "Genre", "Language", "Publisher", "Pages",
"Rating", "Price(INR)")
# Display data frame content
datatable(book.json, rownames = FALSE)xml.giturl <- "https://raw.githubusercontent.com/DataDriven-MSDA/DATA607/master/Week7A/myfavbooksxml.xml"
book.xmldata <- xmlParse(getURL(xml.giturl)) # get XML file contents
class(book.xmldata)## [1] "XMLInternalDocument" "XMLAbstractDocument"
xmlSize(book.xmldata)## [1] 1
# Converting to data frame
book.xml <- xmlToDataFrame(book.xmldata)
# Verifying records and variables
nrow(book.xml)## [1] 5
ncol(book.xml)## [1] 8
head(book.xml)## Title Authors
## 1 Birbal the Witty Kamala Chandrakant
## 2 Tales from the Panchatantra Anant Pai, Kamala Chandrakant
## 3 Oliver Twist Charles Dickens
## 4 The Road Ahead Bill Gates, Nathan Myhrvold, Peter Rinearson
## 5 R In Action Robert L. Kabacoff
## Genre Language Publisher Pages Rating
## 1 Children's Books English Amar Chitra Katha 32 5
## 2 Children's Books English Amar Chitra Katha 96 5
## 3 Young Adult English Vintage Children's Classics 736 5
## 4 Young Adult, Business English Penguin Books 352 4.5
## 5 Computer Science English Manning 608 4
## PriceINR
## 1 67
## 2 250
## 3 150
## 4 1250
## 5 575
# Renaming Columns
colnames(book.xml) <- c("Title", "Authors", "Genre", "Language", "Publisher", "Pages",
"Rating", "Price(INR)")
# Display data frame content
datatable(book.xml, rownames = FALSE)The identical data frames are obtained although the source data formats were different.
Thus we observe that data can be acquired and loaded in different formats with usage of appropriate loading libraries and processing steps.