Required Libraries
library(XML)
library(plyr)
library(RCurl)
library(jsonlite)
Converting HTML file to data frame
myhtml_file<-getURL("https://raw.githubusercontent.com/Raji030/data_607_html/main/new0000%201.html")
myhtml_table<-readHTMLTable(myhtml_file)
myhtml_data<-as.data.frame(myhtml_table)
str(myhtml_data)
## 'data.frame': 3 obs. of 5 variables:
## $ NULL.Title : chr "Applied Fluid Mechanics" "Fluid Mechanics with Engineering Applications" "Basics of Fluid Mechanics"
## $ NULL.Author : chr "Robert L. Mott" "Robert L. Daugherty, Joseph B. Franzini, E. John Finnemore" "Genick Bar-Meir"
## $ NULL.Genre : chr "Education" "Education" "Education"
## $ NULL.First_Published: chr "1972" "1984" "2009"
## $ NULL.Price : chr "157.26" "174.33" "33.50"
Converting JSON the file to data frame
jsonfile<-fromJSON("https://raw.githubusercontent.com/Raji030/myjsondata_607/main/data_607.json")
myjson_data<-as.data.frame(jsonfile)
str(myjson_data)
## 'data.frame': 3 obs. of 5 variables:
## $ Title : chr "Applied Fluid Mechanics" "Fluid Mechanics with Engineering Applications" "Basics of Fluid Mechanics"
## $ Author :List of 3
## ..$ : chr "Robert L. Mott"
## ..$ : chr "Robert L. Daugherty" "Joseph B. Franzini" "E. John Finnemore"
## ..$ : chr "Genick Bar-Meir"
## $ Genre : chr "Education" "Education" "Education"
## $ First_Published: chr "1992" "1984" "2009"
## $ Price : chr "157.26" "174.33" "33.50"
Converting XML file to data frame
xmlfile<-getURL("https://raw.githubusercontent.com/Raji030/data_607_xmlfile/main/data607.xml")
xmllist<-xmlToList(xmlParse(xmlfile))
myxml_data <- ldply(xmllist, data.frame)
str(myxml_data)
## 'data.frame': 3 obs. of 8 variables:
## $ .id : chr "Book" "Book" "Book"
## $ Title : chr "Applied Fluid Mechanics" "Fluid Mechanics with Engineering Applications" "Basics of Fluid Mechanics"
## $ Author : chr "Robert L. Mott" "Robert L. Daugherty" "Genick Bar-Meir"
## $ Genre : chr "Education" "Education" "Education"
## $ First_Published: chr "1972" "1984" "2009"
## $ Price : chr "157.26" "174.33" "33.50"
## $ Author.1 : chr NA "Joseph B. Franzini" NA
## $ Author.2 : chr NA "E. John Finnemore" NA
Conclusion: The data frames are not identical for the three file
formats. For the HTML and JSON file formats,though, both the data frames
contain 3 observations and 5 variables, but they are presenting their
author cloumns’ values in different class and ways. On the other hand,
the XML file format contains data frame with 3 observations and 8
variables.That data frame has been created with splitting the 3 author
names from one book into 3 different columns rather to showing the all 3
author names in a single column. Thus, each data frame has been created
in diffrent layouts.