In this assignment, our objective is to choose three books (at least one of which has more than one author) and include some attributes that may be interesting. The books I’ve selected for this come from a fantastic series called: The Hitchhicker’s Guide to the Galaxy. While there are six books, five were written by creator Douglas Adams with the six being Eoin Colfer with the support of Jane Belson (Adma’s widow) whom we’ll credit authorship along with token authorship by Douglas Adams (the book will have three authors in our dataset). The three books we’ll select are mostly at random, with the third defaulting to “And Another Thing…” due to it having more than one author (which we forced it to, it didn’t get much of a choice). Here’s the random selection:
booklist <- c("The Hitchhicker's Guide to the Galaxy", "The Restaurant at the End of the Universe", "Life, the Universe and Everything", "So Long, and Thanks for All the Fish", "Mostly Harmless")
sample(booklist,2)
The results: “The Hitchhicker’s Guide to the Galaxy” and “Life, the Universe and Everything”! Congrats to the winners. Now we’ll need to select a few attributes:
bookattributes <- c("book cover color", "main character", "love interest", "main planet", "publish date", "adapted for radio", "bad guys", "main planet/location/ship", "publisher", "goodreads book rating", "number of pages")
sample(bookattributes, 4)
The results: “main planet”, “publisher”, “book cover color”, and “bad guys” (which will be the name of the main bad guys or bad guy species/group).
Here’s what the table looks like in R, which has been re-created seperately in HTML, JSON, and XML formats (see the import section for the links).
books <- data.frame(matrix(vector(), 3, 6), stringsAsFactors = FALSE)
colnames(books) <- c("name", "author", "planet", "publisher", "color", "baddie")
books$name <- c("The Hitchhicker's Guide to the Galaxy", "Life, the Universe and Everything", "And Another Thing...")
books$author <- c("Douglas Adams", "Douglas Adams", "Eoin Colfer, Jane Belson, Douglas Adams")
books$planet <- c("Space", "Krikkit", "Earth...ish")
books$publisher <- c("Del Rey Books", "Del Rey Books", "Hyperion")
books$color <- c("green", "purple", "blue")
books$baddie <- c("Vogons", "People of Krikkit", "Vogons")
kable(books)
name | author | planet | publisher | color | baddie |
---|---|---|---|---|---|
The Hitchhicker’s Guide to the Galaxy | Douglas Adams | Space | Del Rey Books | green | Vogons |
Life, the Universe and Everything | Douglas Adams | Krikkit | Del Rey Books | purple | People of Krikkit |
And Another Thing… | Eoin Colfer, Jane Belson, Douglas Adams | Earth…ish | Hyperion | blue | Vogons |
To continue our stampede toward the objective, we’re going to need to export this table into HTML, XML, and JSON file formats. Let’s load some packages first that will help:
library(jsonlite)
library(XML)
library(plyr)
library(RCurl)
library(htmltab)
library(xtable)
Now we’ll use these packages to export our table into the various file types:
#HTML file
books.html1 <- print(xtable(books), type="html", file="books.html")
#XML file
#I actually couldn't figure this out.
#JSON file
books.json1 <- toJSON(books, pretty=TRUE)
file.output <- file("books.json")
writeLines(books.json1, file.output)
close(file.output)
I also uploaded a hand-made version of the files (created using Notepad++) to my personal GitHub account which can be used to import the files back into R (which are notified here as the alternative method when a local version couldn’t be exported.)
Disclaimer: I cheated a bit here. Instead of using the html file we previously exported, I hand-made a version to show off the differences between the types of formats.
#HTML file
#alternative: books.html <- htmltab(doc = "books.html")
html.url <- getURL("https://raw.githubusercontent.com/chrisgmartin/DATA607/master/books.html")
books.html2 <- htmltab(doc = html.url)
## Argument 'which' was left unspecified. Choosing first table.
kable(books.html2)
name | author | planet | publisher | color | baddie | |
---|---|---|---|---|---|---|
2 | The Hitchhicker’s Guide to the Galaxy | Douglas Adams | Space | Del Rey Books | green | Vogons |
3 | Life, the Universe and Everything | Douglas Adams | Krikkit | Del Rey Books | purple | People of Krikkit |
4 | And Another Thing… | Eoin Colfer | Earth…ish | Hyperion | blue | Vogons |
5 | And Another Thing… | Jane Belson | Earth…ish | Hyperion | blue | Vogons |
6 | And Another Thing… | Douglas Adams | Earth…ish | Hyperion | blue | Vogons |
#XML file
xml.url <- getURL("https://raw.githubusercontent.com/chrisgmartin/DATA607/master/books.xml", ssl.verifyPeer=FALSE)
books.xml <- xmlParse(xml.url)
books.xml2 <- ldply(xmlToList(books.xml), data.frame)
kable(books.xml2)
.id | name | author | planet | publisher | color | baddie | .attrs |
---|---|---|---|---|---|---|---|
book | The Hitchhiker’s Guide to the Galaxy | Douglas Adams | Space | Del Rey Books | green | Vogons | 1 |
book | Life, the Universe and Everything | Douglas Adams | Krikkit | Del Rey Books | purple | People of Krikkit | 2 |
book | And Another Thing… | Eoin Colfer | Earth…ish | Hyperion | blue | Vogons | 3 |
book | And Another Thing… | Jane Belson | Earth…ish | Hyperion | blue | Vogons | 3 |
book | And Another Thing… | Douglas Adams | Earth…ish | Hyperion | blue | Vogons | 3 |
#JSON file
#alternative: books.json2 <- fromJSON("books.json")
books.json2 <- fromJSON("https://raw.githubusercontent.com/chrisgmartin/DATA607/master/books.json")
kable(books.json2)
|
As you can see from the examples, each format has it’s own unique features, pros, and cons. As Douglas Adams himself would have said (he actually did say): “A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.”