Week 7 HW 607

Introduction

This assignment is to create 3 different types of files in order to practice loading the information into R. The 3 books I picked are ones I am currently looking at since French cuisine is something I wanted to learn more about. For this exercise, the information that are created are the Title of Book, Author(s), and certain attributes that are apart of this books such as characteristics.

.XML File

#Libraries Used
library(xml2)
library(RCurl)

#Gathering information from Github
doc <- read_xml( getURL("https://raw.githubusercontent.com/Jlok17/Data-Science-Projects/main/FrenchBooks.xml"))

# Create an empty data frame to store the information
XML <- data.frame(title = character(),
                 author = character(),
                 notes = character(),
                 stringsAsFactors = FALSE)

# Get the root element of the XML file
root <- xml_find_first(doc, "/books")

# Tranferring information into Data frame
for (book in xml_children(root)) {
  title <- xml_text(xml_find_first(book, ".//title"))
  author <- xml_text(xml_find_first(book, ".//author"))
  notes <- xml_text(xml_find_first(book, ".//notes"))
  XML <- rbind(XML, data.frame(title = title,
                             author = author,
                             notes = notes,
                             stringsAsFactors = FALSE))
}

# Results
print(XML)

##                                                                    title
## 1                                    Mastering the Art of French Cooking
## 2                                            The French Laundry Cookbook
## 3 Essential Pépin: More Than 700 All-Time Favorites from My Life in Food
##          author                   notes
## 1   Julia Child                 Classic
## 2 Thomas Keller        Signature dishes
## 3 Jacques Pépin Favorite French recipes

.Json File

#Libraries Used
library(jsonlite)

#Gathering information from Github
Json <- fromJSON("https://raw.githubusercontent.com/Jlok17/Data-Science-Projects/main/FrenchBooks.json")

# Results
head(Json)

## $books
##                                                                    title
## 1                                    Mastering the Art of French Cooking
## 2                                            The French Laundry Cookbook
## 3 Essential Pépin: More Than 700 All-Time Favorites from My Life in Food
##                                          author
## 1 Julia Child, Louisette Bertholle, Simone Beck
## 2                                 Thomas Keller
## 3                                 Jacques Pépin
##                                              notes
## 1                                          Classic
## 2 Signature dishes, Michelin-starred, high-quality
## 3          Favorite French recipes, easy-to-follow

.HTML FILE

#Libraries Used
library(rvest)

#Gathering information from Github
HTML <- read_html("https://raw.githubusercontent.com/Jlok17/Data-Science-Projects/main/FrenchBooks.html")
HTML_df <- HTML %>% html_nodes("table") %>% .[[1]] %>% html_table()

# Results
print(HTML_df)

## # A tibble: 3 × 3
##   Title                                                             Author Notes
##   <chr>                                                             <chr>  <chr>
## 1 Mastering the Art of French Cooking                               Julia… Clas…
## 2 The French Laundry Cookbook                                       Thoma… Sign…
## 3 Essential Pépin: More Than 700 All-Time Favorites from My Life i… Jacqu… Favo…

Results

As created, we have 3 different data frames of the same information regarding French Cuisine. However, the data frames may contain the same information, the data frames themselves are not equal to one another. So for further analysis and manipulation the values are going to need to be picked from each frame and combined into a new existing environment.