Loading packages

Firstly, let’s load necessary packages. XML package is used to parse XML and html file, and jsonlite is used to parse json file.

library(RCurl)

## Loading required package: bitops

library(XML)
library(jsonlite)
library(DT)
library(stringr)
library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:RCurl':
## 
##     complete

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

HTML

Let’s read html file from github with using getURL() function and then read html table. Since the clasee of html_book is a list, I used data.frame() function in order to convert to data frame class.

html <- getURL("https://raw.githubusercontent.com/ekhahm/datascience/master/week7/books.html")
html_book<- readHTMLTable(html)
html_book

## $`NULL`
##                                                                 Title
## 1                            Harry potter and the philosopher's stone
## 2                               Hitchhiker's guide to the galaxy book
## 3 Good Omens: The Nice and Accurate Prophecies of Agnes Nutter, Witch
##                         Author       ISBN                 Genre
## 1                J. K. Rowling 0747532699               Fantasy
## 2                Douglas Adams 0330258648 Comic science fiction
## 3 Terry Pratchetth,Neil Gaiman 057504800X                Horror

class(html_book)

## [1] "list"

html_df <- data.frame(html_book)
html_df

Now, we are taking a look of the html data table.

options(DT.options = list(dom = 't', scrollX = TRUE))
datatable(html_df)

XML

Let’s get xml file from github with using getURL() function and then parse xml table with xmlParse(). I used getNodeSet() function to find matching each node in an xml treen and then change the class to dataframe. Lastly set the names in an object for each node.

xml <-getURL("https://raw.githubusercontent.com/ekhahm/datascience/master/week7/books.xml")
xml_book <- xmlParse(xml)

a <- setNames(xmlToDataFrame(node = getNodeSet(xml_book, "//root/book/Title")), "Title")
b <- setNames(xmlToDataFrame(node = getNodeSet(xml_book, "//root/book/Author")), "Author")
c <- setNames(xmlToDataFrame(node = getNodeSet(xml_book, "//root/book/ISBN")), "ISBN")
d <- setNames(xmlToDataFrame(node = getNodeSet(xml_book, "//root/book/Genre")), "Genre")

xml_book <- cbind(a,b,c,d)
xml_book

Now, we are taking a look of xml data table.

datatable(xml_book)

JSON

Let’s read JSON file from github with using getURL() function and then convert r object from Json using fromJSON . Since the clasee of json_book is a list, I used data.frame() function in order to convert to data frame class.

json <- getURL("https://raw.githubusercontent.com/ekhahm/datascience/master/week7/books.json")
json_book <- fromJSON(json)
json_book

json_df <- data.frame(json_book)
json_df

Now we are looking at the data table of json_df

datatable(json_df)

Assignment week7

Eunkyu Hahm

10/11/2019

Loading packages

HTML

XML

JSON

Conclusion