In this assignment, we will compare the data frame objects created by different file formats. At this point, I can’t really tell if I love or hate JSON data (it’s got to be one or the other). I suppose we will find out.
library(RCurl)
library(jsonlite)
library(rvest)
library(XML)
library(janitor)
Here are the files…. The JSON data loads quite clean right into a data.frame object:
html_url <- "https://raw.githubusercontent.com/TheWerefriend/data607/master/assignment7/books.html"
xml_url <- "https://raw.githubusercontent.com/TheWerefriend/data607/master/assignment7/books.xml"
json_url <- "https://raw.githubusercontent.com/TheWerefriend/data607/master/assignment7/books.json"
json_df <- fromJSON(json_url, simplifyDataFrame = TRUE)
json_df
## title author1
## 1 Super Nezh: Rashid Nezhmetdinov, Chess Assassin Alex Pishkin
## 2 Tigran Petrosian: Master of Defence P.H. Clarke
## 3 Dvoretsky’s Endgame Manual Mark Dvoretsky
## author2 author3 priceUsed priceNew year
## 1 $591.99 $768.57 1891
## 2 1992
## 3 Vladimir Kramnik Karsten Muller $32.48 $34.95 2003
These other formats… Not so much.
html_df <- read_html(html_url) %>%
html_table() %>%
data.frame() %>%
row_to_names(row_number = 1)
html_df
## title author1
## 2 Super Nezh: Rashid Nezhmetdinov, Chess Assassin Alex Pishkin
## 3 Tigran Petrosian: Master of Defence P.H. Clarke
## 4 Dvoretsky’s Endgame Manual Mark Dvoretsky
## author2 author3 priceUsed priceNew year
## 2 $591.99 $768.57 2000
## 3 1992
## 4 Vladimir Kramnik Karsten Muller $32.48 $34.95 2003
xml_df <- getURL(xml_url) %>%
xmlParse() %>%
xmlRoot() %>%
xmlToDataFrame()
xml_df
## title author1
## 1 Super Nezh: Rashid Nezhmetdinov, Chess Assassin Alex Pishkin
## 2 Tigran Petrosian: Master of Defence P.H. Clarke
## 3 Dvoretsky’s Endgame Manual Mark Dvoretsky
## author2 author3 priceUsed priceNew year
## 1 $591.99 $768.57 2000
## 2 1992
## 3 Vladimir Kramnik Karsten Muller $32.48 $34.95 2003