A Preview of the data in .csv format

tnsbooks <- read.csv(file ="https://raw.githubusercontent.com/tagensingh/sps-data607-week7/main/SPS-D607-A-7.csv")
tnsbooks <- data.frame(tnsbooks)

tnsbooks
##                                                                           Book.Name
## 1 The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race
## 2                                                                      Freakonomics
## 3                                           How to Win Friends and Influence People
## 4                                                                          King Rat
## 5                                                    Outliers: The Story of Success
## 6            R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
## 7                                                                                  
## 8                                                                                  
##        Book.Author     Book.Author.1         ISBN                 Publisher
## 1  Walter Isaacson                   9.781982e+12          Simon & Schuster
## 2 Steven D. Levitt  Stephen J Dubner 9.780062e+12  HarperCollins Publishers
## 3    Dale Carnegie                   9.780671e+12             Gallery Books
## 4    James Clavell                   9.781983e+12     Blackstone Publishing
## 5 Malcolm Gladwell                   9.780316e+12 Little, Brown and Company
## 6   Hadley Wickham Garrett Grolemund 9.781492e+12            O'Reilly Media
## 7                                              NA                          
## 8                                              NA                          
##   Publication.Date Pages BN.Sales.Rank
## 1         3/9/2021   560            13
## 2         9/1/2009   336         42497
## 3        10/1/1998   288           340
## 4        8/13/2019    NA         76054
## 5         6/7/2011   336          3474
## 6         1/7/2017   520        100080
## 7                     NA            NA
## 8                     NA            NA

Loading the data for each DataFrame

HTML Frame Load

XML Frame Load

JSON Frame Load

Comparing the DataFrames -

htmldf
##                                                                      NULL.Book.Name
## 1 The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race
## 2                                                                      Freakonomics
## 3                                           How to Win Friends and Influence People
## 4                                                                          King Rat
## 5                                                    Outliers: The Story of Success
## 6            R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
##   NULL.Book.Author NULL.Book.Author.1     NULL.ISBN            NULL.Publisher
## 1  Walter Isaacson                    9781982115852          Simon & Schuster
## 2 Steven D. Levitt   Stephen J Dubner 9780061956270  HarperCollins Publishers
## 3    Dale Carnegie                    9780671027032             Gallery Books
## 4    James Clavell                    9781982537593     Blackstone Publishing
## 5 Malcolm Gladwell                    9780316017930 Little, Brown and Company
## 6   Hadley Wickham  Garrett Grolemund 9781491910399            O'Reilly Media
##   NULL.Publication.Date NULL.Pages NULL.BN.Sales.Rank
## 1              3/9/2021        560                 13
## 2              9/1/2009        336              42497
## 3             10/1/1998        288                340
## 4             8/13/2019                         76054
## 5              6/7/2011        336               3474
## 6              1/7/2017        520             100080
xmldf
##     subject
## 1 TNS-Books
##                                                                                  book
## 1 he Code BreakerSimon and Schuster9781982115852560Walter Isaacson3/9/202113Knowledge
##                                                                                          NA
## 1 FreakonomicHarperCollins Publishers9780061956270336Steven D. Levitt9/1/200942497Knowledge
##                                                                                                       NA
## 1 How to Win Friends and Influence PeopleGallery Books9780671027032288Dale Carnegie10/1/1998340Knowledge
##                                                                               NA
## 1 King RatBlackstone Publishing9781982537593James Clavell8/13/201976054Knowledge
##                                                                                                                    NA
## 1 Outliers: The Story of Success id="5"Little, Brown and Company9780316017930336Malcolm Gladwell6/7/20113474Knowledge
##                                                                                                                                          NA
## 1 R for Data Science: Import, Tidy, Transform, Visualize, and Model DataO'Reilly Media9781491910399520Hadley Wickham1/7/2017100080Knowledge
jsdf
##                                                                           Book.Name
## 1 The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race
## 2                                                                      Freakonomics
## 3                                           How to Win Friends and Influence People
## 4                                                                          King Rat
## 5                                                    Outliers: The Story of Success
## 6            R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
##                          Book.Author         ISBN                 Publisher
## 1                    Walter Isaacson 9.781982e+12          Simon & Schuster
## 2 Steven D. Levitt, Stephen J Dubner 9.780062e+12  HarperCollins Publishers
## 3                      Dale Carnegie 9.780671e+12             Gallery Books
## 4                      James Clavell 9.781983e+12     Blackstone Publishing
## 5                   Malcolm Gladwell 9.780316e+12 Little, Brown and Company
## 6  Hadley Wickham, Garrett Grolemund 9.781492e+12            O'Reilly Media
##   Publication.Date Pages BN.Sales.Rank
## 1         3/9/2021   560            13
## 2         9/1/2009   336         42497
## 3        10/1/1998   288           340
## 4        8/13/2019    NA         76054
## 5         6/7/2011   336          3474
## 6         1/7/2017   520        100080

Comparing the DateFrame Structures

str(htmldf)
## 'data.frame':    6 obs. of  8 variables:
##  $ NULL.Book.Name       : chr  "The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race" "Freakonomics" "How to Win Friends and Influence People" "King Rat" ...
##  $ NULL.Book.Author     : chr  "Walter Isaacson" "Steven D. Levitt" "Dale Carnegie" "James Clavell" ...
##  $ NULL.Book.Author.1   : chr  "" "Stephen J Dubner" "" "" ...
##  $ NULL.ISBN            : chr  "9781982115852" "9780061956270" "9780671027032" "9781982537593" ...
##  $ NULL.Publisher       : chr  "Simon & Schuster" "HarperCollins Publishers" "Gallery Books" "Blackstone Publishing" ...
##  $ NULL.Publication.Date: chr  "3/9/2021" "9/1/2009" "10/1/1998" "8/13/2019" ...
##  $ NULL.Pages           : chr  "560" "336" "288" "" ...
##  $ NULL.BN.Sales.Rank   : chr  "13" "42497" "340" "76054" ...
str(xmldf)
## 'data.frame':    1 obs. of  7 variables:
##  $ subject: chr "TNS-Books"
##  $ book   : chr "he Code BreakerSimon and Schuster9781982115852560Walter Isaacson3/9/202113Knowledge"
##  $ NA     : chr "FreakonomicHarperCollins Publishers9780061956270336Steven D. Levitt9/1/200942497Knowledge"
##  $ NA     : chr "How to Win Friends and Influence PeopleGallery Books9780671027032288Dale Carnegie10/1/1998340Knowledge"
##  $ NA     : chr "King RatBlackstone Publishing9781982537593James Clavell8/13/201976054Knowledge"
##  $ NA     : chr "Outliers: The Story of Success id=\"5\"Little, Brown and Company9780316017930336Malcolm Gladwell6/7/20113474Knowledge"
##  $ NA     : chr "R for Data Science: Import, Tidy, Transform, Visualize, and Model DataO'Reilly Media9781491910399520Hadley Wick"| __truncated__
str(jsdf)
## 'data.frame':    6 obs. of  7 variables:
##  $ Book.Name       : chr  "The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race" "Freakonomics" "How to Win Friends and Influence People" "King Rat" ...
##  $ Book.Author     :List of 6
##   ..$ : chr "Walter Isaacson"
##   ..$ : chr  "Steven D. Levitt" "Stephen J Dubner"
##   ..$ : chr "Dale Carnegie"
##   ..$ : chr "James Clavell"
##   ..$ : chr "Malcolm Gladwell"
##   ..$ : chr  "Hadley Wickham" "Garrett Grolemund"
##  $ ISBN            : num  9.78e+12 9.78e+12 9.78e+12 9.78e+12 9.78e+12 ...
##  $ Publisher       : chr  "Simon & Schuster" "HarperCollins Publishers" "Gallery Books" "Blackstone Publishing" ...
##  $ Publication.Date: chr  "3/9/2021" "9/1/2009" "10/1/1998" "8/13/2019" ...
##  $ Pages           : int  560 336 288 NA 336 520
##  $ BN.Sales.Rank   : int  13 42497 340 76054 3474 100080

My Conclusions

The dataset can be read into R using all three of these methods. However the xml and json formats must be carefully formatted to be correctly read by the package. The power of the json and xml formats is in the amuont of data that can stored and then uploaded into a relational database for example - I have uploaded json data files with upwards of 350 MB recently. HTML and XML files look identical but the json structure is different but has a visually similar look.

####Note - The Rcurl package and library is producing an error when used on desktop R Studion. This work was produced using the rstudio.cloud version of R Studio

The error encountered was :

Quitting from lines 38-56 (SPS-DATA607-A7_v2.Rmd)Error in function (type, msg, asError = TRUE):

####error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version ####Calls: … eval -> getURL -> curlPerform -> -> fun ####Execution halted