In this assignment, we will compare the data frame objects created by different file formats. At this point, I can’t really tell if I love or hate JSON data (it’s got to be one or the other). I suppose we will find out.

library(RCurl)
library(jsonlite)
library(rvest)
library(XML)
library(janitor)

Here are the files…. The JSON data loads quite clean right into a data.frame object:

html_url <- "https://raw.githubusercontent.com/TheWerefriend/data607/master/assignment7/books.html"
xml_url <- "https://raw.githubusercontent.com/TheWerefriend/data607/master/assignment7/books.xml"
json_url <- "https://raw.githubusercontent.com/TheWerefriend/data607/master/assignment7/books.json"

json_df <- fromJSON(json_url, simplifyDataFrame = TRUE)
json_df
##                                             title        author1
## 1 Super Nezh: Rashid Nezhmetdinov, Chess Assassin   Alex Pishkin
## 2             Tigran Petrosian: Master of Defence    P.H. Clarke
## 3                      Dvoretsky’s Endgame Manual Mark Dvoretsky
##            author2        author3 priceUsed priceNew year
## 1                                   $591.99  $768.57 1891
## 2                                                    1992
## 3 Vladimir Kramnik Karsten Muller    $32.48   $34.95 2003

These other formats… Not so much.

html_df <- read_html(html_url) %>%
  html_table() %>%
  data.frame() %>%
  row_to_names(row_number = 1)

html_df
##                                             title        author1
## 2 Super Nezh: Rashid Nezhmetdinov, Chess Assassin   Alex Pishkin
## 3             Tigran Petrosian: Master of Defence    P.H. Clarke
## 4                      Dvoretsky’s Endgame Manual Mark Dvoretsky
##            author2        author3 priceUsed priceNew year
## 2                                   $591.99  $768.57 2000
## 3                                                    1992
## 4 Vladimir Kramnik Karsten Muller    $32.48   $34.95 2003
xml_df <- getURL(xml_url) %>%
  xmlParse() %>%
  xmlRoot() %>%
  xmlToDataFrame()

xml_df
##                                             title        author1
## 1 Super Nezh: Rashid Nezhmetdinov, Chess Assassin   Alex Pishkin
## 2             Tigran Petrosian: Master of Defence    P.H. Clarke
## 3                    Dvoretsky’s Endgame Manual Mark Dvoretsky
##            author2        author3 priceUsed priceNew year
## 1                                   $591.99  $768.57 2000
## 2                                                    1992
## 3 Vladimir Kramnik Karsten Muller    $32.48   $34.95 2003