Loading the HTML, JSON, & XML files into R
###HTML
##This will load the table as a list containing the head and body. To fix this, I will need a handy bit of code that can reconvert this list into a table
books_html <- read_html("https://raw.githubusercontent.com/Mattr5541/DATA-607/main/Week%206/books.html")
##Now the lists will be condensed into a singular list containing the table laid out in a more organized manner. Of course, I still need to convert this into an R dataframe for completeness
books_html <- html_table(books_html)
books_html <- as.data.frame(books_html[[1]])
kable(books_html)
| DUNE |
Frank Herbert |
Sci-Fi |
1965 |
20,000,000 |
y |
| The Three Body Problem |
Liu Cixin |
Sci-Fi |
2008 |
8,000,000 |
n |
| The Long Earth |
Terry Pratchett and Stephen Baxter |
Sci-Fi |
2012 |
100,000,000 |
n |
###JSON
##THis code will load the json table into R. However, it seems to convert the sales values into scientific notation. Let's fix that
books_json <- fromJSON("https://raw.githubusercontent.com/Mattr5541/DATA-607/main/Week%206/books_new.json")
books_json$`Sales (estimate)` <- format(books_json$`Sales (estimate)`, scientific = F)
kable(books_json)
| DUNE |
Frank Herbert |
Sci-Fi |
1965 |
20000000 |
y |
| The Three Body Problem |
Liu Cixin |
Sci-Fi |
2008 |
8000000 |
n |
| The Long Earth |
Terry Pratchett and Stephen Baxter |
Sci-Fi |
2012 |
100000000 |
n |
###XML
##And finally, let's load in an XML table
books_xml <- read_xml("https://raw.githubusercontent.com/Mattr5541/DATA-607/main/Week%206/books.xml")
##But it saved every element in the schema as a list, so I'll have to do something that's a little less straightforward than the html table, but it should be manageable
Title <- xml_text(xml_find_all(books_xml, "//Title"))
Author <- xml_text(xml_find_all(books_xml, "//Author"))
Genre <- xml_text(xml_find_all(books_xml, "//Genre"))
Release <- xml_text(xml_find_all(books_xml, "//Release"))
`Sales (estimate)` <- xml_text(xml_find_all(books_xml, "//Sales"))
Adapted_Into_Movie_Format <- xml_text(xml_find_all(books_xml, "//Adapted_Into_Movie_Format"))
books_xml <- data.frame(Title = Title,
Author = Author,
Genre = Genre,
Release = as.numeric(Release),
`Sales (estimate)` = `Sales (estimate)`, Adapted_Into_Movie_Format = Adapted_Into_Movie_Format)
kable(books_xml)
| DUNE |
Frank Herbert |
Sci-Fi |
1965 |
20,000,000 |
y |
| The Three Body Problem |
Liu Cixin |
Sci-Fi |
2008 |
8,000,000 |
n |
| The Long Earth |
Terry Pratchett and Stephen Baxter |
Sci-Fi |
2012 |
100,000,000 |
n |
So, to summarize the results of this exercise, the three
dataframes were not entirely identical, and required some cleaning for
standardization purposes. However, they were all rather
similar.