Project Overview

This goal of this assignment is to understand and create HTML, XML, and JSON files. Data on three books were collected and stored in the three file types. These files were then imported into separate R dataframes.

Create files

These are the books I chose for this assignment:

  1. Social Pyschology Eighth Edition by Elliot Aronson, Timothy D. Wilson, & Robin M Akert

  2. The Hidden Brain by Shankar Vedantam

  3. The Stranger by Albert Camus

I created the following files including each book’s title, author(s), copyright, publisher, and genre:

Load Libraries

library("RCurl") #To get URL data
library("XML") #To read HTML and XML files
library("jsonlite")  #To read JSON files
library("kableExtra") # To create HTML tables
library("dplyr") #To transform dataframes

Import Files and Create Dataframes

HTML

#Import file
urlHTML <- getURL("https://raw.githubusercontent.com/KatherineEvers/607-Week-7-Assignment/master/books.html")

#Read data in HTML format
booksHTML <- htmlParse(urlHTML)

#Convert to dataframe
booksHtmlDf <- as.data.frame(readHTMLTable(booksHTML))

#Rename columns
names(booksHtmlDf) <- c("Book Title", "Author(s)", "Copyright", "Publisher", "Genre", "Pages")

#Manipulate table and display
booksHtmlDf  %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
  scroll_box(width = "100%", height = "200px")
Book Title Author(s) Copyright Publisher Genre Pages
Social Pyschology Eighth Edition Elliot Aronson, Timothy D. Wilson, Robin M Akert 2013 Pearson Non-fiction 576
The Hidden Brain Shankar Vedantam 2010 Spiegel and Grau Non-fiction 270
The Stranger Albert Camus 1988 Vintage International Fiction 123

XML

#Read in file
urlXML <- getURL("https://raw.githubusercontent.com/KatherineEvers/607-Week-7-Assignment/master/books.xml")

#Read data in XML format
booksXML <- xmlParse(urlXML)

#Convert to dataframe
booksXmlDf <- xmlToDataFrame(booksXML)

#Rename columns
names(booksXmlDf) <- c("Book Title", "Author(s)", "Copyright", "Publisher", "Genre", "Pages")

#Manipulate table and display
booksXmlDf  %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
  scroll_box(width = "100%", height = "200px")
Book Title Author(s) Copyright Publisher Genre Pages
Social Pyschology Eighth Edition Elliot AronsonTimothy D. WilsonRobin M Akert 2013 Pearson Non-fiction 576
The Hidden Brain Shankar Vedantam 2010 Spiegel and Grau Non-fiction 270
The Stranger Albert Camus 1988 Vintage International Fiction 123

JSON

#Read in file
file <- 'http://raw.githubusercontent.com/KatherineEvers/607-Week-7-Assignment/master/books.json' 
con = file(file, "r") 

#Read lines from url
booksJSON <- readLines(con, -1L) 

#Remove EOF markers for proper import into dataframe
booksJSON<- paste(booksJSON, collapse="")

#Read data in JSON format
booksJSON <- parse_json(booksJSON)

#Convert list to dataframe
booksJsonDf  <- bind_rows(booksJSON, .id = "column_label")

#Create subset of dataframe
booksJsonDf <- subset(booksJsonDf, select=c("Title", "Author(s)", "Copyright", "Publisher", "Genre", "Pages"))

#Rename first column
names(booksJsonDf)[1] <- "Book Title"

#Manipulate table and display
booksJsonDf  %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
  scroll_box(width = "100%", height = "200px")
Book Title Author(s) Copyright Publisher Genre Pages
Social Psychology Eighth Edition Elliot Aronson, Timothy D. Wilson, Robin M Akert 2013 Pearson Non-fiction 576
The Hidden Brain Shankar Vedantam 2010 Spiegel and Grau Non-fiction 270
The Stranger Albert Camus 1988 Vintage International Fiction 123

Conclusion

Although the three files had different structures and required different r commands, the three resulting dataframes are identical.