##Intro

In the following codes we will create a table of books in HTML, XML, and JSON, with a goal of mind to pull and create dataframes in R. Loading out packages are in the next code:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rvest)
## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding
library(XML)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 4.3.3
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:purrr':
## 
##     flatten

HTML

First I created the HTML table.

table_HTML <- read_html("https://raw.githubusercontent.com/sokkarbishoy/DATA607/main/Assignment%207%20HTML%20table")

Second we load into a dataframe in R using the following code.

Books_table_HTML <- table_HTML %>% 
  html_node("table") %>%
  html_table(header = TRUE, fill = TRUE)
Books_table_HTML
## # A tibble: 3 × 6
##   Title                   Authors `Publication Year` Summary Genre `Main Themes`
##   <chr>                   <chr>                <int> <chr>   <chr> <chr>        
## 1 Guns, Germs, and Steel  Jared …               1997 Explor… Non-… Geography, A…
## 2 A People's History of … Howard…               1980 Provid… Non-… Social Justi…
## 3 The Rise and Fall of t… Willia…               1960 Chroni… Non-… World War II…

##XML same as we did in the first part, we store out XML code in Github and read it on R

library(xml2)
library(xmlconvert)
## Warning: package 'xmlconvert' was built under R version 4.3.3
url <- "https://raw.githubusercontent.com/sokkarbishoy/DATA607/main/Books%20.XML"

xmltable <- (read_xml(url))
xmltable
## {xml_document}
## <books>
## [1] <book>\n  <title>Guns, Germs, and Steel</title>\n  <authors>Jared Diamond ...
## [2] <book>\n  <title>A People's History of the United States</title>\n  <auth ...
## [3] <book>\n  <title>The Rise and Fall of the Third Reich</title>\n  <authors ...
xml_data <- xmlParse(xmltable)
titles <- xpathSApply(xml_data, "//book/title", xmlValue)
authors <- xpathSApply(xml_data, "//book/authors", xmlValue)
years <- xpathSApply(xml_data, "//book/publication_year", xmlValue)
summaries <- xpathSApply(xml_data, "//book/summary", xmlValue)
main_themes <- xpathSApply(xml_data, "//book/main_themes", xmlValue)
genres <- xpathSApply(xml_data, "//book/genre", xmlValue)
Books_data_XML <- data.frame(
  Title = titles,
  Authors = authors,
  Publication_Year = years,
  Summary = summaries,
  Genre = genres,
  Main_Themes = main_themes)

print(Books_data_XML)
##                                     Title                          Authors
## 1                  Guns, Germs, and Steel                    Jared Diamond
## 2 A People's History of the United States                      Howard Zinn
## 3    The Rise and Fall of the Third Reich William L. Shirer, Ron Rosenbaum
##   Publication_Year
## 1             1997
## 2             1980
## 3             1960
##                                                                                                                                                             Summary
## 1                                                                              Explores the reasons for the dominance of Eurasian civilizations throughout history.
## 2                                Provides a different perspective on American history, focusing on the experiences of ordinary people rather than political elites.
## 3 Chronicles the history of Nazi Germany, from its rise to power to its defeat in World War II, written by a journalist who witnessed many of the events firsthand.
##         Genre                                             Main_Themes
## 1 Non-fiction                      Geography, Anthropology, Sociology
## 2 Non-fiction           Social Justice, Labor Movements, Civil Rights
## 3 Non-fiction World War II, Totalitarianism, Rise and Fall of Empires

##JSON

Json_file = "https://raw.githubusercontent.com/sokkarbishoy/DATA607/main/books.JSON"
json_raw <- fromJSON(Json_file)
Books_table_json <- as.data.frame(json_raw, col.names = c(""))
Books_table_json
##                                     title                          authors
## 1                  Guns, Germs, and Steel                    Jared Diamond
## 2 A People's History of the United States                      Howard Zinn
## 3    The Rise and Fall of the Third Reich William L. Shirer, Ron Rosenbaum
##   publication_year
## 1             1997
## 2             1980
## 3             1960
##                                                                                                                                                             summary
## 1                                                                              Explores the reasons for the dominance of Eurasian civilizations throughout history.
## 2                                Provides a different perspective on American history, focusing on the experiences of ordinary people rather than political elites.
## 3 Chronicles the history of Nazi Germany, from its rise to power to its defeat in World War II, written by a journalist who witnessed many of the events firsthand.
##         genre                                             main_themes
## 1 Non-fiction                      Geography, Anthropology, Sociology
## 2 Non-fiction           Social Justice, Labor Movements, Civil Rights
## 3 Non-fiction World War II, Totalitarianism, Rise and Fall of Empires

##Conclusion In this assignment, I learned how to pull data from HTML, XML, and JSON code formate into R. I am sure this was simple as I was the one who created the table but over the next weeks I will be practicing web scrapping more and would apply what I learned from this assignment.