##Assignment - Working with XML and JSON in R

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g.  “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.

Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

##Data Source For this assignment I created three data files in Textpad, saving each file with a different extension - .json, .xml, or .html. I then loaded the files into my Github repository.

#These packages were needed for this assignment, so once I loaded them, I commented them out. 
#install.packages('rvest')  - for loading html files
#install.packages('XML')    - for loading xml files           
#install.packages('jsonlite')  - for loading json files

library(rvest)
library(XML)
library(jsonlite)
library(httr)

##Loading my “.json” file

# Reading in .json file from github, 
gitJson <- "https://raw.githubusercontent.com/carolc57/Data607-Fall23/main/booklist.json"
booklistjson <- fromJSON(gitJson, flatten = TRUE)
head(booklistjson)
##                            title         author_1      author_2        author_3
## 1 Things I Wish I Told My Mother  Susan Patterson Susan Dilallo James Patterson
## 2                      The Relic  Douglas Preston Lincoln Child            <NA>
## 3                The Glass Ocean Beatriz Williams Lauren Willis     Karen White
##                                  genre num_pgs      type ratings
## 1       Fiction, Romance, Contemporary     320 hardcover    3.99
## 2            Horror, Thriller, Mystery     480 paperback    4.05
## 3 Historical Fiction, Romance, Mystery     624 paperback    3.88
#ratings column came in as chr type; need to change to numeric
booklistjson <- transform(booklistjson, ratings = as.numeric(ratings))

class(booklistjson$ratings)
## [1] "numeric"
booklistjson
##                            title         author_1      author_2        author_3
## 1 Things I Wish I Told My Mother  Susan Patterson Susan Dilallo James Patterson
## 2                      The Relic  Douglas Preston Lincoln Child            <NA>
## 3                The Glass Ocean Beatriz Williams Lauren Willis     Karen White
##                                  genre num_pgs      type ratings
## 1       Fiction, Romance, Contemporary     320 hardcover    3.99
## 2            Horror, Thriller, Mystery     480 paperback    4.05
## 3 Historical Fiction, Romance, Mystery     624 paperback    3.88

##Loading my “.html” file

# Reading from HTML
gitHtml <- "https://github.com/carolc57/Data607-Fall23/blob/main/booklist.html"
booklisthtml <- gitHtml %>% 
  read_html() %>%
  html_table(fill = TRUE) %>%
  .[[1]]
head(booklisthtml)
## # A tibble: 3 x 6
##   title                          author            genre num_pages type  ratings
##   <chr>                          <chr>             <chr>     <int> <chr> <chr>  
## 1 Things I Wish I Told My Mother Susan Patterson,~ Fict~       320 hard~ 3.99   
## 2 the Relic                      Douglas Preston,~ Horr~       480 soft~ 4.05<  
## 3 The Glass Ocean                Beatriz Williams~ Hist~       624 soft~ 3.88
#convert to dataframe
booklisthtml_df <- as.data.frame(booklisthtml)

#ratings column came in as chr type; need to change to numeric
booklisthtml <- transform(booklisthtml, ratings = as.numeric(ratings))
## Warning in eval(substitute(list(...)), `_data`, parent.frame()): NAs introduced
## by coercion
#ratings column came in as chr type; need to change to numeric
class(booklisthtml$ratings)
## [1] "numeric"
booklisthtml_df
##                            title
## 1 Things I Wish I Told My Mother
## 2                      the Relic
## 3                The Glass Ocean
##                                            author
## 1 Susan Patterson, Susan Dilallo, James Patterson
## 2                  Douglas Preston, Lincoln Child
## 3    Beatriz Williams, Lauren Willis, Karen White
##                                  genre num_pages      type ratings
## 1       Fiction, Romance, Contemporary       320 hardcover    3.99
## 2            Horror, Thriller, Mystery       480 softcover   4.05<
## 3 Historical Fiction, Romance, Mystery       624 softcover    3.88

##Loading my “.xml” file

library(xml2)

booklistxml <- xmlParse(read_xml('https://raw.githubusercontent.com/carolc57/Data607-Fall23/main/booklist.xml'))
booklistxml
## <?xml version="1.0" encoding="UTF-8"?>
## <Books>
##   <Book>
##     <title>Things I Wish I Told My Mother</title>
##     <author>Susan Patterson, Susan Dilallo, James Patterson</author>
##     <genre>Fiction, Romance, Contemporary</genre>
##     <pages>320</pages>
##     <type>hardcover</type>
##     <ratings>3.99</ratings>
##   </Book>
##   <Book>
##     <title>The Relic</title>
##     <author>Douglas Preston, Lincoln Child</author>
##     <genre>Horror, Thriller, Mystery</genre>
##     <pages>480</pages>
##     <type>softcover</type>
##     <ratings>4.05</ratings>
##   </Book>
##   <Book>
##     <title>The Glass Ocean</title>
##     <author>Beatriz Williams, Lauren Willis, Karen White</author>
##     <genre>Historical Fiction, Romance, Mystery</genre>
##     <pages>624</pages>
##     <type>softcover</type>
##     <ratings>3.88</ratings>
##   </Book>
## </Books>
## 
#transform xml file to dataframe
booklistxml_df <- xmlToDataFrame(booklistxml)
booklistxml_df
##                            title
## 1 Things I Wish I Told My Mother
## 2                      The Relic
## 3                The Glass Ocean
##                                            author
## 1 Susan Patterson, Susan Dilallo, James Patterson
## 2                  Douglas Preston, Lincoln Child
## 3    Beatriz Williams, Lauren Willis, Karen White
##                                  genre pages      type ratings
## 1       Fiction, Romance, Contemporary   320 hardcover    3.99
## 2            Horror, Thriller, Mystery   480 softcover    4.05
## 3 Historical Fiction, Romance, Mystery   624 softcover    3.88
class(booklistxml_df$ratings)
## [1] "character"
#ratings column came in as chr type; need to change to numeric
booklistxml_df <- transform(booklistxml_df, ratings = as.numeric(ratings))
booklistxml_df
##                            title
## 1 Things I Wish I Told My Mother
## 2                      The Relic
## 3                The Glass Ocean
##                                            author
## 1 Susan Patterson, Susan Dilallo, James Patterson
## 2                  Douglas Preston, Lincoln Child
## 3    Beatriz Williams, Lauren Willis, Karen White
##                                  genre pages      type ratings
## 1       Fiction, Romance, Contemporary   320 hardcover    3.99
## 2            Horror, Thriller, Mystery   480 softcover    4.05
## 3 Historical Fiction, Romance, Mystery   624 softcover    3.88

##Are the three data frames identical?

```r
# Checking if the dataframes are identical
identical(booklistjson, booklistxml_df)
## [1] FALSE
identical(booklisthtml_df, booklistxml_df)  
## [1] FALSE
identical(booklistjson, booklisthtml_df)  
## [1] FALSE
print("No, the three data frames are not identical to each other.")
## [1] "No, the three data frames are not identical to each other."