This assignment is to create 3 different types of files in order to practice loading the information into R. The 3 books I picked are ones I am currently looking at since French cuisine is something I wanted to learn more about. For this exercise, the information that are created are the Title of Book, Author(s), and certain attributes that are apart of this books such as characteristics.
#Libraries Used
library(xml2)
library(RCurl)
#Gathering information from Github
doc <- read_xml( getURL("https://raw.githubusercontent.com/Jlok17/Data-Science-Projects/main/FrenchBooks.xml"))
# Create an empty data frame to store the information
XML <- data.frame(title = character(),
author = character(),
notes = character(),
stringsAsFactors = FALSE)
# Get the root element of the XML file
root <- xml_find_first(doc, "/books")
# Tranferring information into Data frame
for (book in xml_children(root)) {
title <- xml_text(xml_find_first(book, ".//title"))
author <- xml_text(xml_find_first(book, ".//author"))
notes <- xml_text(xml_find_first(book, ".//notes"))
XML <- rbind(XML, data.frame(title = title,
author = author,
notes = notes,
stringsAsFactors = FALSE))
}
# Results
print(XML)
## title
## 1 Mastering the Art of French Cooking
## 2 The French Laundry Cookbook
## 3 Essential Pépin: More Than 700 All-Time Favorites from My Life in Food
## author notes
## 1 Julia Child Classic
## 2 Thomas Keller Signature dishes
## 3 Jacques Pépin Favorite French recipes
#Libraries Used
library(jsonlite)
#Gathering information from Github
Json <- fromJSON("https://raw.githubusercontent.com/Jlok17/Data-Science-Projects/main/FrenchBooks.json")
# Results
head(Json)
## $books
## title
## 1 Mastering the Art of French Cooking
## 2 The French Laundry Cookbook
## 3 Essential Pépin: More Than 700 All-Time Favorites from My Life in Food
## author
## 1 Julia Child, Louisette Bertholle, Simone Beck
## 2 Thomas Keller
## 3 Jacques Pépin
## notes
## 1 Classic
## 2 Signature dishes, Michelin-starred, high-quality
## 3 Favorite French recipes, easy-to-follow
#Libraries Used
library(rvest)
#Gathering information from Github
HTML <- read_html("https://raw.githubusercontent.com/Jlok17/Data-Science-Projects/main/FrenchBooks.html")
HTML_df <- HTML %>% html_nodes("table") %>% .[[1]] %>% html_table()
# Results
print(HTML_df)
## # A tibble: 3 × 3
## Title Author Notes
## <chr> <chr> <chr>
## 1 Mastering the Art of French Cooking Julia… Clas…
## 2 The French Laundry Cookbook Thoma… Sign…
## 3 Essential Pépin: More Than 700 All-Time Favorites from My Life i… Jacqu… Favo…
As created, we have 3 different data frames of the same information regarding French Cuisine. However, the data frames may contain the same information, the data frames themselves are not equal to one another. So for further analysis and manipulation the values are going to need to be picked from each frame and combined into a new existing environment.