INTRODUCTION:
Below, I have created three files: HTML,JSON, and XML. All three files contain the title, author, and attributes of three of my favorite books. I uploaded the files to GitHub and then retrieved them and put them into a data frame. Finally, I printed the output of each file.
library(RCurl)
library(XML)
library(jsonlite)
library(knitr)
Loading the html format of favorite books into a data frame called: “books_html_format”.
#retrieve and load the html format file.
books_html_format <- readHTMLTable(getURLContent("https://raw.githubusercontent.com/GitHub-Vlad/Data-Science/main/books_html_format.html"))
#use the kable() function to ouput html data in a tablular format.
kable(books_html_format)
|
Loading the JSON format of favorite books into a data frame called: “books_json_format”.
##retrieve and read data from the JSON file
json_data <- fromJSON("https://raw.githubusercontent.com/GitHub-Vlad/Data-Science/main/books_json_format.json")
#load the JSON data into a data frame
books_json_format <-as.data.frame(json_data[[1]])
#use the kable() function to output json data in a tabular format.
kable(books_json_format)
| Name | Author | Favirote Attributes |
|---|---|---|
| The Adventure of Tom Sawyer | Mark Twain | adventure , free spirited, leadership |
| BEYOND THE END OF THE WORLD | Amie Kaufman , Meagan Spooner | action , prophetic, romantic |
| A Tale of Two Cities | Charles Dickens | revolution, betreyal , murder , revenge |
Loading the XML format of favorite books into a data frame called: “books_xml_format”.
#retrieve and read data from the JSON file
xml_data<-xmlParse(getURL("https://raw.githubusercontent.com/GitHub-Vlad/Data-Science/main/books_xml_format.xml"))
#load the xml data into a data frame
books_xml_format<- xmlToDataFrame(xml_data)
#use the kable() function to output xml data in a tabular format.
kable(books_xml_format)
| Name | Author | FaviroteAttirbutes |
|---|---|---|
| The Adventures of Tom Sawyer | Mark Twain | adventure, free spirited, leadership |
| BEYOND THE END OF THE WORLD | Amie Kaufman, Meagan Spooner | action, prophetic, romantic |
| A Tale of Two Cities | Charles Dickens | revolution, betreyal, murder, revenge |
Conclusion:
All three data frames do look identical. However, each file format serves a different purpose. The HTML file displays and describes a webpage structure. An XML format stores and transfers the data. JSON is a file format that stores and transmits key-value pairs as well as arrays.