The purpose of this assignment it to familiarize ourselves with different formats of stored data. I have created 3 identical tables in 3 different file types (XML, HTML, and JSON). We will now be loading these files into our R environment.
Here we load in our XML file and take a look at the structure.
library(xml2)
library(XML)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
books_XML <- read_xml(url("https://raw.githubusercontent.com/bwolin99/TestRepo/refs/heads/main/Assignment%207/Books.XML"))
xml_structure(books_XML)
## <table>
## <book>
## <Title>
## {text}
## <Author1>
## {text}
## <Author2>
## {text}
## <Genre>
## {text}
## <book>
## <Title>
## {text}
## <Author1>
## {text}
## <Author2>
## <Genre>
## {text}
## <book>
## <Title>
## {text}
## <Author1>
## {text}
## <Author2>
## <Genre>
## {text}
Next we extract the columns and put them into an R dataframe.
Names <- books_XML %>%
xml_find_all("//Title") %>%
xml_text()
Author1 <- books_XML %>%
xml_find_all("//Author1") %>%
xml_text()
Author2 <- books_XML %>%
xml_find_all("//Author2") %>%
xml_text()
Genre <- books_XML %>%
xml_find_all("//Genre") %>%
xml_text()
books_xml_final <- data.frame("Title" = Names, "Author 1" = Author1, "Author 2" = Author2, "Genre" = Genre)
books_xml_final
## Title Author.1 Author.2
## 1 Stardance Spider Robinson Jeanne Robinson
## 2 Hyperion Dan Simmons
## 3 Do Androids Dream of Electric Sheep? Philip K. Dick
## Genre
## 1 Sci Fi
## 2 Sci Fi, Thriller
## 3 Sci Fi, Thriller
Here we load in our HTML table and tidy it to be a neet dataframe.
library(rvest)
books_html <- read_html(url("https://raw.githubusercontent.com/bwolin99/TestRepo/refs/heads/main/Assignment%207/Books.html"))
books_html_final <- books_html %>%
html_element("body") %>%
html_table()
books_html_final
## # A tibble: 3 × 4
## Title `Author 1` `Author 2` Genre
## <chr> <chr> <chr> <chr>
## 1 Stardance Spider Robinson "Jeanne Robinson" Sci Fi
## 2 Hyperion Dan Simmons "" Sci Fi…
## 3 Do Androids Dream of Electric Sheep? Philip K. Dick "" Sci Fi…
With the jsonlite library, we don’t even need to tidy our JSON file, the fromJson function will do this for us.
library(jsonlite)
books_json <- fromJSON(url("https://raw.githubusercontent.com/bwolin99/TestRepo/refs/heads/main/Assignment%207/Books.json"))
books_json
## Title Author1 Author2
## 1 Stardance Spider Robinson Jeanne Robinson
## 2 Hyperion Dan Simmons
## 3 Do Androids Dream of Electric Sheep? Philip K. Dick
## Genre
## 1 Sci Fi
## 2 Sci Fi, Thriller
## 3 Sci Fi, Thriller