I have picked the below three books (at random) from the Barnes & Nobles website:
Have manually created the below three files “by hand” to capture the mentioned key information/attributes:
books.html
books.json
books.xml
We now load the information from each of the three files into separate R data frames and compare the structures:
theXmlUrl <- "https://raw.githubusercontent.com/kamathvk1982/Data607-Week07/master/books.xml"
XmlUrlData <- getURL(theXmlUrl)
xml.data <- xmlParse(XmlUrlData)
xml.root <- xmlRoot(xml.data)
xml.df <- data.frame(t(xmlSApply(xml.root, function(x) xmlSApply(x, xmlValue))), row.names = NULL)
datatable(xml.df)
Comments
Based on above we can see the data structures looks very similar. For the HTML file we can see the column names came with addtional NULL. value in it. We may need to diuring actual processing see how the data types handling would hold.