Title: CUNY SPS MDS DATA607_WK7Assignmt"
Author: Charles Ugiagbe
Date: “10/10/2021”
Load the required R Packages
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.1
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(XML)
## Warning: package 'XML' was built under R version 4.1.1
library(rvest)
##
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
##
## guess_encoding
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
library(knitr)
library(httr)
Read the HTML data into R
url <- "https://raw.githubusercontent.com/omocharly/DATA607_WK7Assignmt/main/Favourite%20Books.html"
df_HTML <- url %>%
read_html(encoding = 'UTF-8') %>%
html_table(header = NA, trim = TRUE) %>%
.[[1]]
kable(df_HTML)
| Engineering Mathematics |
K.A Stroud; Dexter Booth |
Macmillian Education |
2020 |
8th |
978-1-352-01027-5 |
| Data Science for Business |
Foster Provost, Tom Fawcett |
O’Reilly Media, Inc |
2013 |
1st |
978-1-449-36132-7 |
| The Language of SQL |
Larry Rockoff |
Addison Wesley Professional |
2016 |
2nd |
978-0-134-65825-4 |
Read the XML data into R
url2 <- GET("https://raw.githubusercontent.com/omocharly/DATA607_WK7Assignmt/main/Favourite%20Books.xml",
add_headers(c(Accept = "application/xml",
Authorization = "5c81-e875-48f7-98-ee78")))
df_XML <- url2 %>%
xmlParse() %>%
xmlRoot() %>%
xmlToDataFrame(stringsAsFactors = FALSE)
kable(df_XML)
| Engineering Mathematics |
K.A Stroud; Dexter Booth |
Macmillian Education |
2020 |
8th |
978-1-352-01027-5 |
| Data Science for Business |
Foster Provost; Tom Fawcett |
O’Reilly Media, Inc |
2013 |
1st |
978-1-449-36132-7 |
| The Language of SQL |
Larry Rockoff |
Addison Wesley Professional |
2016 |
2nd |
978-0-134-65825-4 |
Read the Json data into R
url3 <- "https://raw.githubusercontent.com/omocharly/DATA607_WK7Assignmt/main/Favourite%20Books.json"
df_JSON = fromJSON(url3)
kable(df_JSON)
| Engineering Mathematics |
K.A Stroud , Dexter Booth |
Macmillian Education |
2020 |
8th |
978-1-352-01027-5 |
| Data Science for Business |
Foster Provost, Tom Fawcett |
O’Reilly Media, Inc |
2013 |
1st |
978-1-449-36132-7 |
| The Language of SQL |
Larry Rockoff |
Addison Wesley Professional |
2016 |
2nd |
978-0-134-65825-4 |
|
Comparison Between HTML, XML and JSON Dataframe
identical(df_HTML, df_XML)
## [1] FALSE
identical(df_HTML, df_JSON)
## [1] FALSE
identical(df_XML, df_JSON)
## [1] FALSE
The three data frame are not identical but are pretty similar to each other.