R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Conclusion : While the HTML, JSON and XML files have different structures, after manipulation process, the three data frames created are very similar especially for the ones from HTML and XML. xml data is little difficult to manipulate as it has been demostrated.For all other cases except html and json I loaded 2 authors under same but for xml created multilist for authors.
# 1. HTML - read and loaded data to data frame
htmlURL <- "https://raw.githubusercontent.com/yathdeep/data607_labs/main/books.html"
readHtml <- read_html(htmlURL)
tables <- html_nodes(readHtml,"table")
tables_ls <- html_table(tables, fill = TRUE)
booksHTML.df <- as.data.frame(tables_ls)
booksHTML.df
## title ISBN
## 1 Python for Data Analysis 978-1-449-31979-3
## 2 Hands-On Machine Learning with Scikit-Learn and TensorFlow 978-1-491-96229-9
## 3 R for Data Science 978-1-491-91039-9
## authors price
## 1 Julie Steele;Meghan Blanchette 39.99
## 2 Nicole Tache 49.99
## 3 Marie Beaugureau;Mike Loukides 39.99
#2. XML - loaded xml on local machine - as loading from git was giving issues.
# Convert the input xml file to a data frame.
#first way of showing data
xmldataframe <- xmlToDataFrame("books.xml")
xmldataframe
## title isbn
## 1 Python for Data Analysis 978-1-449-31979-3
## 2 Hands-On Machine Learning with Scikit-Learn and TensorFlow 978-1-491-96229-9
## 3 R for Data Science 978-1-491-91039-9
## Authors price
## 1 DS1DS2 39.99
## 2 Nicole Tache 49.99
## 3 AJ1AJ2 39.99
#Second way of showing data
ldply(xmlToList("books.xml"), data.frame)
## .id title
## 1 book Python for Data Analysis
## 2 book Hands-On Machine Learning with Scikit-Learn and TensorFlow
## 3 book R for Data Science
## isbn Authors.Author Authors.Author.1 price Author
## 1 978-1-449-31979-3 DS1 DS2 39.99 <NA>
## 2 978-1-491-96229-9 <NA> <NA> 49.99 Nicole Tache
## 3 978-1-491-91039-9 AJ1 AJ2 39.99 <NA>
# 3. JSON - read and loaded data to data frame
jsonURL <- "https://raw.githubusercontent.com/yathdeep/data607_labs/main/books.json"
#Convert data in JSON into a data frame
book.json.dataframe <- flatten(as.data.frame(fromJSON(jsonURL)))
book.json.dataframe
## title ISBN
## 1 Python for Data Analysis 978-1-449-31979-3
## 2 Hands-On Machine Learning with Scikit-Learn and TensorFlow 978-1-491-96229-9
## 3 R for Data Science 978-1-491-91039-9
## authors price
## 1 Julie Steele;Meghan Blanchette 39.99
## 2 Nicole Tache 49.99
## 3 Marie Beaugureau;Mike Loukides 39.99