For this assignment, I picked three books from one of my favorite genres: horror fiction, and created three files in which I stored each book’s information, and read each file into R:
For this assignment, I utilized the tidyverse package in order to pipe my data into R for readability. The individual packages I will note with their corresponding file formats for comprehension.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(htmltab)
file <- "https://raw.githubusercontent.com/josh1den/DATA-607/main/HW/HW7/DATA607_HW7.html"
html <- htmltab(file, which = 1) |>
as.data.frame()
print(html)
## title author
## 2 Other Terrors: An Inclusive Anthology Vince Liaguno and Rena Mason
## 3 The Book Of Accidents Chuck Wendig
## 4 The Rim Of Morning: Two Tales of Cosmic Horror William Sloane
## genre published pages
## 2 Horror, Fiction 2022 363
## 3 Horror, Fiction 2021 560
## 4 Horror, Fiction 2015 480
library(XML)
library(xml2)
file <- "https://raw.githubusercontent.com/josh1den/DATA-607/main/HW/HW7/DATA607_HW7.xml"
xml <- xml2::read_xml(file) |>
XML::xmlParse() |>
XML::xmlToDataFrame()
print(xml)
## title author
## 1 Other Terrors: An Inclusive Anthology Vince Liaguno and Rena Mason
## 2 The Book Of Accidents Chuck Wendig
## 3 The Rim Of Morning: Two Tales of Cosmic Horror William Sloane
## genre published pages
## 1 Horror, Fiction 2022 363
## 2 Horror, Fiction 2021 560
## 3 Horror, Fiction 2015 480
library(rjson)
file <- "https://raw.githubusercontent.com/josh1den/DATA-607/main/HW/HW7/DATA607_HW7.json"
json <- fromJSON(file=file) |>
as.data.frame()
print(json)
## books.title books.author
## 1 Other Terrors: An Inclusive Anthology Vince Liaguno and Rena Mason
## books.genre books.published books.pages books.title.1
## 1 Horror, Fiction 2022 363 The Book Of Accidents
## books.author.1 books.genre.1 books.published.1 books.pages.1
## 1 Chuck Wendig Horror, Fiction 2021 560
## books.title.2 books.author.2 books.genre.2
## 1 The Rim Of Morning: Two Tales of Cosmic Horror William Sloane Horror, Fiction
## books.published.2 books.pages.2
## 1 2015 480
While the HTML and XML dataframes each approximate one another, the JSON does not. Upon investigation, I discovered that the format of my JSON was providing challenges I was unable to resolve. By altering my JSON format, I was able to read in to R delivering an output equivalent to those of the XML and HTML formats:
file_v2 <- "https://raw.githubusercontent.com/josh1den/DATA-607/main/HW/HW7/DATA607_HW7_V2.json"
json_v2 <- fromJSON(file=file_v2) |>
as.data.frame()
print(json_v2)
## title author
## 1 Other Terrors: An Inclusive Anthology Vince Liaguno and Rena Mason
## 2 The Book Of Accidents Chuck Wendig
## 3 The Rim Of Morning: Two Tales of Cosmic Horror William Sloane
## genre published pages
## 1 Horror, Fiction 2022 363
## 2 Horror, Fiction 2021 560
## 3 Horror, Fiction 2015 480
While I did not resolve the output challenge for the first version JSON file, one insight I glean from this is that JSON formatting can have a major effect on its output, and understanding the structure of your source file is essential to crafting code to achieve the desired output.