library(XML)
library(rjson)
library(plyr)
file_html<- "C:/Users/JGARCIA/Desktop/books.html"
html<- readHTMLTable(file_html)
htmldataframe<- data.frame(html, stringsAsFactors = FALSE)
names(htmldataframe)<- gsub("NULL.", "", names(htmldataframe))
htmldataframe<- rename(htmldataframe, c("QR.Code" = "QR Code"))
## The following `from` values were not present in `x`: QR.Code
print(htmldataframe)
## ID Title
## 1 001 Elements of Statistical Learning
## 2 002 Studies in Ethnomethodology
## 3 003 R for Data Science
## Authors Subject
## 1 Trevor Hastie, Robert Tibshirani, Jerome Friedman Machine Learning
## 2 Harold Garfinkel Sociology
## 3 Hadley Wickham Data Science
## Publisher Year
## 1 Springer 2009
## 2 Prentice-Hall 1967
## 3 Oreilly 2016
file_xml<- "C:/Users/JGARCIA/Desktop/books.xml"
xmldataframe <- xmlToDataFrame(file_xml, stringsAsFactors = FALSE)
print(xmldataframe)
## Title
## 1 Elements of Statistical Learning
## 2 Studies in Ethnomethodology
## 3 R for Data Science
## Authors Subject
## 1 Trevor Hastie, Robert Tibshirani, Jerome Friedman Machine Learning
## 2 Harold Garfinkel Sociology
## 3 Hadley Wickham Data Science
## Publisher Year
## 1 Springer 2009
## 2 Prentice-Hall 1967
## 3 Oreilly 2016
file_json<- "C:/Users/JGARCIA/Desktop/books.json"
json<- fromJSON(file = file_json)
jsondataframe <- as.data.frame(json, stringsAsFactors = FALSE)
jsondataframe<- rename(jsondataframe, c("QR.Code" = "QR Code"))
## The following `from` values were not present in `x`: QR.Code
print(jsondataframe)
## ID Title
## 1 001 Elements of Statistical Learning
## 2 002 Studies in Ethnomethodology
## 3 003 R for Data Science
## Authors Subject
## 1 Trevor Hastie, Robert Tibshirani, Jerome Friedman Machine Learning
## 2 Harold Garfinkel Sociology
## 3 Hadley Wickham Data Science
## Publisher Year
## 1 Springer 2009
## 2 Prentice-Hall 1967
## 3 Oreilly 2016
The three files looks quite identical. The html data frame is the largest sizes followed by the the xml data frame and json data frame is the smallest in size.