Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.
Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats. Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?
require(XML)
## Loading required package: XML
require(rvest)
## Loading required package: rvest
## Loading required package: xml2
##
## Attaching package: 'rvest'
## The following object is masked from 'package:XML':
##
## xml
require(kableExtra)
## Loading required package: kableExtra
require(RCurl)
## Loading required package: RCurl
## Loading required package: bitops
require(methods)
require(xml2)
require(jsonlite)
## Loading required package: jsonlite
First file to load is the HTML table, we first upload the html file into the GitHub.
htmlgithub<-getURL("https://raw.githubusercontent.com/Luz917/books.html/master/book.html")
cat(htmlgithub)
## <table>
## <tr>
## <th>Title</th>
## <th>Author 1</th>
## <th>Author 2</th>
## <th>Pages</th>
## <th>Publisher</th>
## <th>ISBN</th>
## </tr>
## <body>
## <tr>
## <td>Harry Potter and the Soccerer's Stone</td>
## <td> J.K Rowling</td>
## <td>Mary GrandPre(Illustrator)
## <td>322</td>
## <td>Scholastic Inc</td>
## <td>0439554934</td>
## </tr>
## <tr>
## <td>Beautiful Creatures</td>
## <td>Kami Garcia</td>
## <td>Margaret Stohl</td>
## <td>563</td>
## <td>Little, Brown and Company</td>
## <td>0316042676</td>
## </tr>
## <tr>
## <td>The Hunger Games</td>
## <td>Suzanne Collins</td>
## <td>None</td>
## <td>374</td>
## <td>Scholastic Press</td>
## <td>0439023483</td>
## </tr>
## </body>
## </table>
Here we read in the HTML, and convert it to a data.frame. Once we do the data.frame we change the column names to remove the NULL. from each of the column titles. And finally use kableExtra to styalize the table.
books_html<-readHTMLTable(htmlgithub)
books_html<-data.frame(books_html)
colnames(books_html)<-c("Title","Author1","Author2","Pages","Publisher","ISBN")
kable(books_html) %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "right")
| Title | Author1 | Author2 | Pages | Publisher | ISBN |
|---|---|---|---|---|---|
| Harry Potter and the Soccerer’s Stone | J.K Rowling | Mary GrandPre(Illustrator) | 322 | Scholastic Inc | 0439554934 |
| Beautiful Creatures | Kami Garcia | Margaret Stohl | 563 | Little, Brown and Company | 0316042676 |
| The Hunger Games | Suzanne Collins | None | 374 | Scholastic Press | 0439023483 |
Next we load the the xml from the github. XML Structure is slightly different from html its the same concept in where you have to open and close for each line with < > </ >.
xmlgithub<-getURLContent("https://raw.githubusercontent.com/Luz917/books.xml/master/book.xml")
cat(xmlgithub)
## <fiction>
## <book id = "Rowling">
## <Title>Harry Potter and the Soccerer's Stone</Title>
## <Author1>J.K Rowling</Author1>
## <Author2>Mary GrandPre(Illustrator)</Author2>
## <Pages>322</Pages>
## <Publisher>Scholastic Inc</Publisher>
## <ISBN>0439554934</ISBN>
## </book>
## <book id = "Garcia">
## <Title>Beautiful Creatures</Title>
## <Author1>Kami Garcia</Author1>
## <Author2>Margaret Stohl</Author2>
## <Pages>563</Pages>
## <Publisher>Little, Brown and Company</Publisher>
## <ISBN>0316042676</ISBN>
## </book>
## <book id= "Collins">
## <Title>The Hunger Games</Title>
## <Author1>Suzanne Collins</Author1>
## <Author2>None</Author2>
## <Pages>374</Pages>
## <Publisher>Scholastic Press</Publisher>
## <ISBN>0439023483</ISBN>
## </book>
## </fiction>
For XML we have to first parse the data, and after we do that we can change it into a data frame. In the XML we dont have to worry about changing the column names as we had to in html. And again we style the table with kableExtra.
books_xml<-xmlParse(xmlgithub)
books_xmldf<-xmlToDataFrame(books_xml)
kable(books_xmldf) %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "right")
| Title | Author1 | Author2 | Pages | Publisher | ISBN |
|---|---|---|---|---|---|
| Harry Potter and the Soccerer’s Stone | J.K Rowling | Mary GrandPre(Illustrator) | 322 | Scholastic Inc | 0439554934 |
| Beautiful Creatures | Kami Garcia | Margaret Stohl | 563 | Little, Brown and Company | 0316042676 |
| The Hunger Games | Suzanne Collins | None | 374 | Scholastic Press | 0439023483 |
Next we upload the JSON file from the github, and read in the document. JSON structure is completely different from XML and HTML, and it is less repetative then the other formats. Its a little bit easier to write.
jsongithub<-getURLContent("https://raw.githubusercontent.com/Luz917/-books.json/master/books.json")
cat(jsongithub)
## [
## {
## "Title": "Harry Potter and the Soccerer's Stone",
## "Author1": "J.K Rowling",
## "Author2": "Mary GrandPre(Illustrator)",
## "Pages": 322,
## "Publisher": "Scholastic Inc",
## "ISBN": "0439554934" },
## {
## "Title": "Beautiful Creatures",
## "Author1": "Kami Garcia",
## "Author2": "Margaret Stohl",
## "Pages": 563,
## "Publisher": "Little, Brown and Company",
## "ISBN": "0316042676" },
## {
## "Title": "The Hunger Games",
## "Author1": "Suzanne Collins",
## "Author2": "None",
## "Pages": 374,
## "Publisher": "Scholastic Press",
## "ISBN": "0439023483" }
## ]
When it comes to the JSON you don’t have to worry about with just the fromJSON it does it for you. And then again we style the table with kableExtra
books_json<-fromJSON(jsongithub)
kable(books_json) %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "right")
| Title | Author1 | Author2 | Pages | Publisher | ISBN |
|---|---|---|---|---|---|
| Harry Potter and the Soccerer’s Stone | J.K Rowling | Mary GrandPre(Illustrator) | 322 | Scholastic Inc | 0439554934 |
| Beautiful Creatures | Kami Garcia | Margaret Stohl | 563 | Little, Brown and Company | 0316042676 |
| The Hunger Games | Suzanne Collins | None | 374 | Scholastic Press | 0439023483 |
All three tables HTML, XML, and JSON, although the structures of writing code are different the tables are identical.
KableExtra https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
HTML Intro https://www.w3schools.com/html/html_intro.asp
JSOn Intro https://www.w3schools.com/js/js_json_intro.asp
XML Intro https://www.w3schools.com/xml/