Data 607 Assignment, Week 6

by Catherine Cho

For this assignment, I created three different files using HTML, XML, and JSON to store information about my three favorite fictional novels. The following content will load the information from each file type into their respective R dataframes.

1.JSON

The following .json file is read from github then parsed into a dataframe.

books.json

library(rjson)
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## The following objects are masked from 'package:rjson':
## 
##     fromJSON, toJSON
url<-"https://raw.githubusercontent.com/catcho1632/607-assignment-wk6/main/books.json"
parsedJSON<-fromJSON(url)
jsondf<-parsedJSON$Fiction
jsondf<-as.data.frame(jsondf)
jsondf
##                Title                   authors         genre             theme
## 1 Brothers Karamazov         Fyodor Dostoevsky psychological  Craving of Faith
## 2       The Talisman Stephen King Peter Straub        horror     Coming of Age
## 3   The Great Gatsby        F Scott Fitzgerald       tragedy Society and Class

2. HTML

The follwing books.html code is read in from github then parsed into a dataframe.

books.html

library(RCurl)
library(XML)
html_raw<-getURL("https://raw.githubusercontent.com/catcho1632/607-assignment-wk6/main/books.html")
htmldf<-readHTMLTable(html_raw)
htmldf
## $`NULL`
##                 Book                    Author         Genre             Theme
## 1 Brothers Karamazov         Fyodor Dostoevsky Psychological Craving for Faith
## 2           Talisman Stephen King Peter Straub        Horror     Coming of Age
## 3       Great Gatsby        F Scott Fitzgerald       tragedy Soceity and Class

3. XML

The following XML file is read in from github then parsed into a dataframe.

books.xml

library(httr)
library(curl)
## Using libcurl 7.64.1 with LibreSSL/2.8.3
## 
## Attaching package: 'curl'
## The following object is masked from 'package:httr':
## 
##     handle_reset
library(XML)
library(RCurl)
url_raw<-getURL("https://raw.githubusercontent.com/catcho1632/607-assignment-wk6/main/books.xml")
xmldf<-xmlToDataFrame(url_raw)
xmldf
##                Title                   Author         genre             theme
## 1 Brothers Karamazov         Fyodor Dostoesky psychological Craving for Faith
## 2       The Talisman Stephen KingPeter Straub        horror     Coming of Age
## 3   The Great Gatsby       F Scott Fitzgerald       tragedy Society and Class

All three dataframes generated from XML, JSON, and HTML are identical as shown below.

jsondf
##                Title                   authors         genre             theme
## 1 Brothers Karamazov         Fyodor Dostoevsky psychological  Craving of Faith
## 2       The Talisman Stephen King Peter Straub        horror     Coming of Age
## 3   The Great Gatsby        F Scott Fitzgerald       tragedy Society and Class
htmldf
## $`NULL`
##                 Book                    Author         Genre             Theme
## 1 Brothers Karamazov         Fyodor Dostoevsky Psychological Craving for Faith
## 2           Talisman Stephen King Peter Straub        Horror     Coming of Age
## 3       Great Gatsby        F Scott Fitzgerald       tragedy Soceity and Class
xmldf
##                Title                   Author         genre             theme
## 1 Brothers Karamazov         Fyodor Dostoesky psychological Craving for Faith
## 2       The Talisman Stephen KingPeter Straub        horror     Coming of Age
## 3   The Great Gatsby       F Scott Fitzgerald       tragedy Society and Class