1. Take three files with identical data in HTML, XML, JSON.
  2. Load data into three separate R data frames.
  3. Review the three data frames and determine whether they’re identical.

1-2. (HTML) Begin by collecting data from HTML file.

# The HTML file created is stored on github for ease of access.

html.url <- getURL("https://raw.githubusercontent.com/JeremyOBrien16/DATA-607/master/favbooks.html") 
data.html <- readHTMLTable(html.url, header = T, as.data.frame = T) 

# We give it a visual check.

data.html %>% kable
author title genre publisher_name price publication_year review pagenums
Charles Stross Palimpsest Science Fiction Subterranean 38.88 2011 Welcome to the Stasis, the clandestine, near-omnipotent organization 136
Naomi Duguid, Jeffrey Alford Beyond the Great Wall Cooking Artisan 33.88 2008 Bring home the enticing flavors of the outlying areas of China 376
Norman Davies Vanished Kingdoms: The Rise and Fall of States and Nations History Viking 40.00 2012 A dozen-plus exammples from Europearn history constitute this ruminative disquisition of the impermanence of polities 848
# Attempted to use kable_styling to improve look of resulting tables - strangely, nothing seemed to make any changes.

1-2. (XML) Next, collect data from XML file.

# Ditto, the XML file created is stored on github for ease of access.

xml.url <- getURL("https://raw.githubusercontent.com/JeremyOBrien16/DATA-607/master/favbooks.xml") 
file.xml <- xmlParse(file = xml.url)

# We call an XML function to get the data in a dataframe.

df.xml <- xmlToDataFrame(file.xml)

# We give it a visual check.

df.xml %>% kable()
author title genre publisher_name price publication_year review pagenums
Charles Stross Palimpsest Science Fiction Subterranean 38.88 2011 Welcome to the Stasis, the clandestine, near-omnipotent organization 136
Naomi Duguid, Jeffrey Alford Beyond the Great Wall Cooking Artisan 33.88 2008 Bring home the enticing flavors of the outlying areas of China 376
Norman Davies Vanished Kingdoms: The Rise and Fall of States and Nations History Viking 40.00 2012 A dozen-plus examples from European history constitute this ruminative disquisition of the impermanence of polities 848
# Attempted to use kable_styling to improve look of resulting tables - strangely, nothing seemed to make any changes.

1-2. (JSON) Lastly, collect data from JSON file.

# Likewise, the JSON file created is stored on github for ease of access.

json.url <- getURL("https://raw.githubusercontent.com/JeremyOBrien16/DATA-607/master/favbooks.json")
file.json <- (file = json.url)

# We call jsonlite's primary extractive function to get data into R, and then coerce that into a data frame.

data.json <- fromJSON(file.json)
df.json <- as.data.frame(data.json)

# We clean up the column headers.

colnames(df.json) <- str_extract_all(colnames(df.json), "(?<=\\.)[[:alpha:]]+")

# We give it a visual check.

df.json %>% kable()
author title genre publisher price publication review pagenums
Stross, Charles Palimpsest Science Fiction Subterranean 38.88 2011 Welcome to the Stasis, the clandestine, near-omnipotent organization 136
Duguid, Naomi, Alford, Jeffrey Beyond the Great Wall Cooking Artisan 33.88 2008 Bring home the enticing flavors of the outlying areas of China 376
Davies, Norman Vanished Kingdoms: The Rise and Fall of States and Nations History Viking 40.00 2012 A dozen-plus examples from European history constitute this ruminative disquisition of the impermanence of polities 848
# Attempted to use kable_styling to improve look of resulting tables - strangely, nothing seemed to make any changes.

  1. Review the three data frames and determine whether they’re identical.
# Examine the structure of each data frame.
str(data.html)
## List of 1
##  $ NULL:'data.frame':    3 obs. of  8 variables:
##   ..$ author          : Factor w/ 3 levels "Charles Stross",..: 1 2 3
##   ..$ title           : Factor w/ 3 levels "Beyond the Great Wall",..: 2 1 3
##   ..$ genre           : Factor w/ 3 levels "Cooking","History",..: 3 1 2
##   ..$ publisher_name  : Factor w/ 3 levels "Artisan","Subterranean",..: 2 1 3
##   ..$ price           : Factor w/ 3 levels "33.88","38.88",..: 2 1 3
##   ..$ publication_year: Factor w/ 3 levels "2008","2011",..: 2 1 3
##   ..$ review          : Factor w/ 3 levels "A dozen-plus exammples from Europearn history constitute this ruminative disquisition of the impermanence of polities",..: 3 2 1
##   ..$ pagenums        : Factor w/ 3 levels "136","376","848": 1 2 3
str(df.xml)
## 'data.frame':    3 obs. of  8 variables:
##  $ author          : Factor w/ 3 levels "Charles Stross",..: 1 2 3
##  $ title           : Factor w/ 3 levels "Beyond the Great Wall",..: 2 1 3
##  $ genre           : Factor w/ 3 levels "Cooking","History",..: 3 1 2
##  $ publisher_name  : Factor w/ 3 levels "Artisan","Subterranean",..: 2 1 3
##  $ price           : Factor w/ 3 levels "33.88","38.88",..: 2 1 3
##  $ publication_year: Factor w/ 3 levels "2008","2011",..: 2 1 3
##  $ review          : Factor w/ 3 levels "A dozen-plus examples from European history constitute this ruminative disquisition of the impermanence of polities",..: 3 2 1
##  $ pagenums        : Factor w/ 3 levels "136","376","848": 1 2 3
str(df.json)
## 'data.frame':    3 obs. of  8 variables:
##  $ author     :List of 3
##   ..$ : chr "Stross, Charles"
##   ..$ : chr  "Duguid, Naomi" "Alford, Jeffrey"
##   ..$ : chr "Davies, Norman"
##  $ title      : chr  "Palimpsest" "Beyond the Great Wall" "Vanished Kingdoms: The Rise and Fall of States and Nations"
##  $ genre      : chr  "Science Fiction" "Cooking" "History"
##  $ publisher  : chr  "Subterranean" "Artisan" "Viking"
##  $ price      : num  38.9 33.9 40
##  $ publication: int  2011 2008 2012
##  $ review     : chr  "Welcome to the Stasis, the clandestine, near-omnipotent organization" "Bring home the enticing flavors of the outlying areas of China" "A dozen-plus examples from European history constitute this ruminative disquisition of the impermanence of polities"
##  $ pagenums   : int  136 376 848

As a next step, we could harmonize each data.frame by coercing variables to the following classes:

(Gave this a shot but it got messy so moved on.)