DATA 607 Assignment7 Working with XML-and JSON in R BKvarnstrom

#INSTRUCTIONS

Working with XML and JSON in R

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”,“books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.

Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

Your deliverable is the three source files and the R code. If you can, package your assignment solution up into an .Rmd file and publish to rpubs.com. [This will also require finding a way to make your three text files accessible from the web].

#SOLUTION

Three books on skincare for Black Women are:

Title: “The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women” Authors: Candace Jenkin, Agnes Jean-Baptiste Other attributes:

Publication Year: 2019 Paperback: 162 pages Title: “Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion” Authors: Celia J. Anderson, Danielle J. Glover Other attributes:

Publication Year: 2020 Paperback: 208 pages Title: “Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin” Authors: Carla Trowell Other attributes:

Publication Year: 2017 Paperback: 126 pages

Load Packages

pkges <- c("XML", "xml2", "jsonlite", "kableExtra", "git2r", "gh", "RCurl", "rvest")

# Loop through the packages
for (p in pkges) {
  # Check if package is installed
  if (!requireNamespace(p, quietly = TRUE)) {
    install.packages(p) #If the package is not installed, install the package
    
    library(p, character.only = TRUE) #Load the package
  } else {
    library(p, character.only = TRUE) #If the package is already installed, load the package
  }
}
#The following Data frame contains the information for the three(3)  skincare books
books <- data.frame(
  BookTitle = c("The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women",
            "Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion",
            "Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin"),
  Authors = c("Candace Jenkin, Agnes Jean-Baptiste",
              "Celia J. Anderson, Danielle J. Glover",
              "Carla Trowell"),
  Publication_Year = c(2019, 2020, 2017),
  Format = c("Paperback", "Paperback", "Paperback"),
  Book_Price = c("$44.95", "$35.95", "$42")
)

books %>% kable() %>% 
  kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
  scroll_box(height = "100%", width = "100%")
BookTitle Authors Publication_Year Format Book_Price
The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women Candace Jenkin, Agnes Jean-Baptiste 2019 Paperback $44.95
Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion Celia J. Anderson, Danielle J. Glover 2020 Paperback $35.95
Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin Carla Trowell 2017 Paperback $42

##File Names

HTML_file <- "skincare_books.html"

XML_file <- "skincare_books.xml"

JSON_file <- "skincare_books.json"

Create Files(HTML, XML, and JSON)

The following code created the three files locally which I uploaded to Github. I was trying to upload the files automatically to my Github repository but the code wasn’t working so I uploaded them manually. (The files will also be created on any local computer that the code is ran on)

HTML file: skincare_books.html Creates the HTML file locally

# Generate the HTML file
html_table <- paste0("<table>\n",
                     "<tr>\n<th>Title</th>\n<th>Authors</th>\n<th>Publication Year</th>\n<th>Format</th>\n<th>Price</th>\n</tr>\n")

for (i in 1:nrow(books)) {
  html_table <- paste0(html_table,
                       "<tr>\n",
                       "<td>", books$BookTitle[i], "</td>\n",
                       "<td>", books$Authors[i], "</td>\n",
                       "<td>", books$Publication_Year[i], "</td>\n",
                       "<td>", books$Format[i], "</td>\n",
                       "<td>", books$Book_Price[i], "</td>\n",
                       "</tr>\n")
}

html_books <- paste0(html_table, "</table>")

#Save the HTML file in local directory
writeLines(html_books, HTML_file)


HTMLfile_GIT <- getURL("https://raw.githubusercontent.com/BeshkiaKvarnstrom/MSDS-DATA607/main/skincare_books.html")

#Parsing the Code using htmlParse
HTMLbooks_parsed <- htmlParse(file = HTMLfile_GIT)
HTMLbooks_parsed
## <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
## <html><body><table>
## <tr>
## <th>Title</th>
## <th>Authors</th>
## <th>Publication Year</th>
## <th>Format</th>
## <th>Price</th>
## </tr>
## <tr>
## <td>The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women</td>
## <td>Candace Jenkin, Agnes Jean-Baptiste</td>
## <td>2019</td>
## <td>Paperback</td>
## <td>$44.95</td>
## </tr>
## <tr>
## <td>Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion</td>
## <td>Celia J. Anderson, Danielle J. Glover</td>
## <td>2020</td>
## <td>Paperback</td>
## <td>$35.95</td>
## </tr>
## <tr>
## <td>Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin</td>
## <td>Carla Trowell</td>
## <td>2017</td>
## <td>Paperback</td>
## <td>$42</td>
## </tr>
## </table></body></html>
## 
#Load the HTML File into a Dataframe
HTMLfile_DF <- HTMLfile_GIT %>%
  read_html(encoding = 'UTF-8') %>%
  html_table(header = NA, trim = TRUE) %>%
  .[[1]]

#Display the contents of the HTML file
HTMLfile_DF %>% kable() %>% 
  kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
  scroll_box(height = "100%", width = "100%")
Title Authors Publication Year Format Price
The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women Candace Jenkin, Agnes Jean-Baptiste 2019 Paperback $44.95
Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion Celia J. Anderson, Danielle J. Glover 2020 Paperback $35.95
Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin Carla Trowell 2017 Paperback $42

XML file: skincare_books.xml Creates the XML file locally

# Generate the XML file
xml_books <- xml_new_root("books")

for (i in 1:nrow(books)) {
  xml_book <- xml_add_child(xml_books, "book")
  
  xml_add_child(xml_book, "title", books$BookTitle[i])
  xml_add_child(xml_book, "authors", books$Authors[i])
  xml_add_child(xml_book, "year", as.character(books$Publication_Year[i]))
  xml_add_child(xml_book, "format", books$Format[i])
   xml_add_child(xml_book, "price", books$Book_Price[i])
}

xml_books <- as.character(xml2::as_xml_document(xml_books))

#Save the xml file in local directory
writeLines(xml_books, XML_file)

xmlbooks_parsed <- xmlParse(file  = xml_books)
xmlbooks_parsed
## <?xml version="1.0" encoding="UTF-8"?>
## <books>
##   <book>
##     <title>The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women</title>
##     <authors>Candace Jenkin, Agnes Jean-Baptiste</authors>
##     <year>2019</year>
##     <format>Paperback</format>
##     <price>$44.95</price>
##   </book>
##   <book>
##     <title>Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion</title>
##     <authors>Celia J. Anderson, Danielle J. Glover</authors>
##     <year>2020</year>
##     <format>Paperback</format>
##     <price>$35.95</price>
##   </book>
##   <book>
##     <title>Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin</title>
##     <authors>Carla Trowell</authors>
##     <year>2017</year>
##     <format>Paperback</format>
##     <price>$42</price>
##   </book>
## </books>
## 
XMLfile_GIT <-getURL("https://raw.githubusercontent.com/BeshkiaKvarnstrom/MSDS-DATA607/main/skincare_books.xml")


#Load the XML File  into a Dataframe
XMLfile_DF <- xmlToDataFrame(XMLfile_GIT)

#Display the contents of the HTML file
XMLfile_DF %>% kable() %>% 
  kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
  scroll_box(height = "100%", width = "100%")
title authors year format price
The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women Candace Jenkin, Agnes Jean-Baptiste 2019 Paperback $44.95
Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion Celia J. Anderson, Danielle J. Glover 2020 Paperback $35.95
Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin Carla Trowell 2017 Paperback $42

JSON file: skincare_books.json Creates the JSON file locally and load the contents of the file from GitHub in a DataFrame

# Generate the JSON file
JSON_books <- toJSON(list(books = books), auto_unbox = TRUE)

#Save the JSON file in local directory
writeLines(JSON_books, JSON_file)

JSON_file_GIT <- read_json("https://raw.githubusercontent.com/BeshkiaKvarnstrom/MSDS-DATA607/main/skincare_books.json", simplifyVector = TRUE)

#Load the JSON File into a Dataframe
booksJSONDF <- JSON_file_GIT

#Use fromJSON function to parse the code
JSONbooks_parsed <- fromJSON(JSON_books)
JSONbooks_parsed
## $books
##                                                                                                        BookTitle
## 1                           The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women
## 2 Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion
## 3                             Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin
##                                 Authors Publication_Year    Format Book_Price
## 1   Candace Jenkin, Agnes Jean-Baptiste             2019 Paperback     $44.95
## 2 Celia J. Anderson, Danielle J. Glover             2020 Paperback     $35.95
## 3                         Carla Trowell             2017 Paperback        $42
head(booksJSONDF)%>% kable() %>% 
  kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
  scroll_box(height = "100%", width = "100%")
BookTitle Authors Publication_Year Format Book_Price
The Black Skin Care Guide: How to Achieve and Maintain Flawless Skin for Black Women Candace Jenkin, Agnes Jean-Baptiste 2019 Paperback $44.95
Skincare for Women of Color: How to Fade Dark Spots, Even Skin Tone, and Achieve a Healthy, Radiant Complexion Celia J. Anderson, Danielle J. Glover 2020 Paperback $35.95
Beautiful Black Skin: For Women of Color - Unlocking the Secrets to Beautiful Skin Carla Trowell 2017 Paperback $42

Are the three data frames identical?

The three output files(HTM:, XML, JSON) are different. However All three dataframes look the same. All three are the same.