The Assignment – Working with XML and JSON in R

I picked three movies:

1- Rich Dad Poor.

2- Shoe Dog: A Memoir by the Creator of Nike.

3- Fear: Trump in the White House.

Other than the Book Title I am using the four other attributes including Author, Genre, YearPublished, and Number of Pages. I created three files to store the information in HTML, XML, and JSON formats, I will run codes to pull the data from my github to compare the data.

Please find the Rpubs here

Please find rmd, html, xml, json files on Github link

CSV File

I will start by pulling the CSV file to compare the data

df_csv <- read.csv("https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.csv")
df_csv
##                                   BookTitle                          Author
## 1                         Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike                     Phil Knight
## 3            Fear: Trump in the White House                    Bob Woodward
##              Genre YearPublished NumberPages
## 1 Personal Finance          1997         207
## 2        Biography          2016         386
## 3        Biography          2018         448

XML File

#running the required libraries
library("RCurl", quietly = TRUE)
library("XML", quietly = TRUE)

#getting the url for the xml data
df_xml <-getURL("https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.xml")

#getting the xml data frame for the file
df_xml <- xmlToDataFrame(df_xml)
df_xml
##                                   BookTitle                          Author
## 1                         Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike                     Phil Knight
## 3            Fear: Trump in the White House                    Bob Woodward
##              Genre YearPublished NumberPages
## 1 Personal Finance          1997         207
## 2        Biography          2016         386
## 3        Biography          2018         448

HTML File

#running the required libraries
library("textreadr", quietly = TRUE)
library("rvest", quietly = TRUE)
library("xml2", quietly = TRUE)
#getting the url for the HTML data
df_html<-read_html('https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.html')

#Running the html table
df_html <- html_table(df_html, dec = ".", fill = TRUE)
df_html
## [[1]]
##                                   BookTitle                          Author
## 1                         Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike                     Phil Knight
## 3            Fear: Trump in the White House                    Bob Woodward
##              Genre YearPublished NumberPages
## 1 Personal Finance          1997         207
## 2        Biography          2016         386
## 3        Biography          2018         448

JSON File

#running the required libraries
library(jsonlite, quietly = TRUE)

#getting the url for the JSON data
df_json <- read_json("https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.json", simplifyVector = TRUE)

df_json
## $books
##                                   BookTitle                          Author
## 1                         Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike                     Phil Knight
## 3            Fear: Trump in the White House                    Bob Woodward
##              Genre YearPublished NumberPages
## 1 Personal Finance          1997         207
## 2        Biography          2016         386
## 3        Biography          2018         448

Conclusion

The HTML, XML, and JSON files are identical and look the same after running them on Rmarkdown, even with the two authors case it is still identical.

Here a is reference on how to create an HTML tables

Here a is reference on how to create an XML tables

Here a is reference on how to create an JSON tables