Title: CUNY SPS MDS DATA607_WK7Assignmt"

Author: Charles Ugiagbe

Date: “10/10/2021”

Load the required R Packages

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.1
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(XML)
## Warning: package 'XML' was built under R version 4.1.1
library(rvest)
## 
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
## 
##     guess_encoding
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
library(knitr)
library(httr)

Read the HTML data into R

url <- "https://raw.githubusercontent.com/omocharly/DATA607_WK7Assignmt/main/Favourite%20Books.html"
df_HTML <- url %>%
  read_html(encoding = 'UTF-8') %>%
  html_table(header = NA, trim = TRUE) %>%
  .[[1]]

kable(df_HTML)
Title Author Publisher Year Edition ISBN
Engineering Mathematics K.A Stroud; Dexter Booth Macmillian Education 2020 8th 978-1-352-01027-5
Data Science for Business Foster Provost, Tom Fawcett O’Reilly Media, Inc 2013 1st 978-1-449-36132-7
The Language of SQL Larry Rockoff Addison Wesley Professional 2016 2nd 978-0-134-65825-4

Read the XML data into R

url2 <- GET("https://raw.githubusercontent.com/omocharly/DATA607_WK7Assignmt/main/Favourite%20Books.xml",
      add_headers(c(Accept = "application/xml",
                    Authorization = "5c81-e875-48f7-98-ee78")))
df_XML <- url2 %>%
  xmlParse() %>%
  xmlRoot() %>%
  xmlToDataFrame(stringsAsFactors = FALSE)
kable(df_XML)
Title Author Publisher Year Edition ISBN
Engineering Mathematics K.A Stroud; Dexter Booth Macmillian Education 2020 8th 978-1-352-01027-5
Data Science for Business Foster Provost; Tom Fawcett O’Reilly Media, Inc 2013 1st 978-1-449-36132-7
The Language of SQL Larry Rockoff Addison Wesley Professional 2016 2nd 978-0-134-65825-4

Read the Json data into R

url3 <- "https://raw.githubusercontent.com/omocharly/DATA607_WK7Assignmt/main/Favourite%20Books.json"
df_JSON = fromJSON(url3)
kable(df_JSON)
Title Author Publisher Year Edition ISBN
Engineering Mathematics K.A Stroud , Dexter Booth Macmillian Education 2020 8th 978-1-352-01027-5
Data Science for Business Foster Provost, Tom Fawcett O’Reilly Media, Inc 2013 1st 978-1-449-36132-7
The Language of SQL Larry Rockoff Addison Wesley Professional 2016 2nd 978-0-134-65825-4

Comparison Between HTML, XML and JSON Dataframe

identical(df_HTML, df_XML)
## [1] FALSE
identical(df_HTML, df_JSON)
## [1] FALSE
identical(df_XML, df_JSON)
## [1] FALSE

The three data frame are not identical but are pretty similar to each other.