DATA 607 - Homework Assignment # 5

Vladimir Nimchenko

INTRODUCTION:

Below, I have created three files: HTML,JSON, and XML. All three files contain the title, author, and attributes of three of my favorite books. I uploaded the files to GitHub and then retrieved them and put them into a data frame. Finally, I printed the output of each file.

library(RCurl)
library(XML)
library(jsonlite)
library(knitr)

Loading the html format of favorite books into a data frame called: “books_html_format”.

#retrieve and load the html format file.
books_html_format <- readHTMLTable(getURLContent("https://raw.githubusercontent.com/GitHub-Vlad/Data-Science/main/books_html_format.html"))

#use the kable() function to ouput html data in a tablular format.
kable(books_html_format)
Name Author Favirote Attributes
The Adventures of Tom Sawyer Mark Twain adventure, free spirited, leadership
BEYOND THE END OF THE WORLD Amie Kaufman,Meagan Spooner action,prophetic,romantic
A Tale of Two Cities Charles Dickens revolution,betreyal,murder,revenge

Loading the JSON format of favorite books into a data frame called: “books_json_format”.

##retrieve and read data from the JSON file
json_data <- fromJSON("https://raw.githubusercontent.com/GitHub-Vlad/Data-Science/main/books_json_format.json")


#load the JSON data into a data frame
books_json_format <-as.data.frame(json_data[[1]])

#use the kable() function to output json data in a tabular format.
kable(books_json_format)
Name Author Favirote Attributes
The Adventure of Tom Sawyer Mark Twain adventure , free spirited, leadership
BEYOND THE END OF THE WORLD Amie Kaufman , Meagan Spooner action , prophetic, romantic
A Tale of Two Cities Charles Dickens revolution, betreyal , murder , revenge

Loading the XML format of favorite books into a data frame called: “books_xml_format”.

#retrieve and read data from the JSON file
xml_data<-xmlParse(getURL("https://raw.githubusercontent.com/GitHub-Vlad/Data-Science/main/books_xml_format.xml"))

#load the xml data into a data frame
books_xml_format<- xmlToDataFrame(xml_data)

#use the kable() function to output xml data in a tabular format.
kable(books_xml_format)
Name Author FaviroteAttirbutes
The Adventures of Tom Sawyer Mark Twain adventure, free spirited, leadership
BEYOND THE END OF THE WORLD Amie Kaufman, Meagan Spooner action, prophetic, romantic
A Tale of Two Cities Charles Dickens revolution, betreyal, murder, revenge

Conclusion:

All three data frames do look identical. However, each file format serves a different purpose. The HTML file displays and describes a webpage structure. An XML format stores and transfers the data. JSON is a file format that stores and transmits key-value pairs as well as arrays.