I picked three movies:
1- Rich Dad Poor.
2- Shoe Dog: A Memoir by the Creator of Nike.
3- Fear: Trump in the White House.
Other than the Book Title I am using the four other attributes including Author, Genre, YearPublished, and Number of Pages. I created three files to store the information in HTML, XML, and JSON formats, I will run codes to pull the data from my github to compare the data.
Please find the Rpubs here
Please find rmd, html, xml, json files on Github link
I will start by pulling the CSV file to compare the data
df_csv <- read.csv("https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.csv")
df_csv## BookTitle Author
## 1 Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike Phil Knight
## 3 Fear: Trump in the White House Bob Woodward
## Genre YearPublished NumberPages
## 1 Personal Finance 1997 207
## 2 Biography 2016 386
## 3 Biography 2018 448
#running the required libraries
library("RCurl", quietly = TRUE)
library("XML", quietly = TRUE)
#getting the url for the xml data
df_xml <-getURL("https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.xml")
#getting the xml data frame for the file
df_xml <- xmlToDataFrame(df_xml)
df_xml## BookTitle Author
## 1 Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike Phil Knight
## 3 Fear: Trump in the White House Bob Woodward
## Genre YearPublished NumberPages
## 1 Personal Finance 1997 207
## 2 Biography 2016 386
## 3 Biography 2018 448
#running the required libraries
library("textreadr", quietly = TRUE)
library("rvest", quietly = TRUE)
library("xml2", quietly = TRUE)#getting the url for the HTML data
df_html<-read_html('https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.html')
#Running the html table
df_html <- html_table(df_html, dec = ".", fill = TRUE)
df_html## [[1]]
## BookTitle Author
## 1 Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike Phil Knight
## 3 Fear: Trump in the White House Bob Woodward
## Genre YearPublished NumberPages
## 1 Personal Finance 1997 207
## 2 Biography 2016 386
## 3 Biography 2018 448
#running the required libraries
library(jsonlite, quietly = TRUE)
#getting the url for the JSON data
df_json <- read_json("https://raw.githubusercontent.com/akarimhammoud/CUNY-SPS/master/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/W7%20-%20Web%20Technologies%3B%20MongoDB/Books.json", simplifyVector = TRUE)
df_json## $books
## BookTitle Author
## 1 Rich Dad Poor Dad Robert Kiyosaki, Sharon Lechter
## 2 Shoe Dog: A Memoir by the Creator of Nike Phil Knight
## 3 Fear: Trump in the White House Bob Woodward
## Genre YearPublished NumberPages
## 1 Personal Finance 1997 207
## 2 Biography 2016 386
## 3 Biography 2018 448
The HTML, XML, and JSON files are identical and look the same after running them on Rmarkdown, even with the two authors case it is still identical.
Here a is reference on how to create an HTML tables
Here a is reference on how to create an XML tables
Here a is reference on how to create an JSON tables