Working with XML and JSON in R

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”).

Working with JSON

I have use JSONLITE library to read and parse data in JSON.

json_file <- "C:/data/book.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
jsonDF <- data.frame(json_data)
for (i in 1:3) {
  url <- rep(jsonDF$Amazon[i], 1)
  jsonDF$Amazon[i] <- paste0("[", "AmazonURL", "](", url, ")")
   if (i==1) {
    image <- paste0("[![](1.jpg)](", jsonDF$Image[i], ")") 
    jsonDF$Image[1] <- sprintf(image)
  } else if (i==2) {
    image <- paste0("[![](2.jpg)](", jsonDF$Image[i], ")")
    jsonDF$Image[2] <- sprintf(image)
  } else if (i==3) {
    image <- paste0("[![](3.jpg)](", jsonDF$Image[i], ")")
    jsonDF$Image[3] <- sprintf("![](3.jpg)")
  } 
}
kable(jsonDF)

Title	ISBN10	Publisher	Author	Language	Amazon
Applied predictive modeling	1461468485	Springer	Max Kuhn, Kjell Johnson	English	AmazonURL
Happiness: a guide to developing life’s most important skill	316167258	Little Brown	Matthieu Ricard, Daniel Goleman	English	AmazonURL
Why we believe what we believe: our biological need for meaning, spirituality, and truth	743274970	Free Press	Andrew B. Newberg, Mark Robert Waldman	English	AmazonURL

Working with XML

For XML, I have used XML library to read and parse data in XML.

rawXML <- xmlParse("C:/data/book.xml")
xmlDF <- xmlToDataFrame(rawXML)
rXML <- xmlToList(rawXML)
xmlDF$Amazon <- NULL
xmlDF$Image <- NULL
for (i in 1:nrow(xmlDF)) {
  if (i==1) {
    url <- rXML$Book1$Image
    xmlDF$Amazon[i] <- paste0("[", "AmazonURL", "](", url, ")")
    image <- paste0("[![](1.jpg)](", url, ")") 
    xmlDF$Image[1] <- sprintf(image)
  } else if (i==2) {
    url <- rXML$Book2$Image
    xmlDF$Amazon[i] <- paste0("[", "AmazonURL", "](", url, ")")
    image <- paste0("[![](2.jpg)](", url, ")")
    xmlDF$Image[2] <- sprintf(image)
  } else if (i==3) {
    url <- rXML$Book3$Image
    xmlDF$Amazon[i] <- paste0("[", "AmazonURL", "](", url, ")")
    image <- paste0("[![](3.jpg)](", xmlDF$Image[i], ")")
    xmlDF$Image[3] <- sprintf("![](3.jpg)")
  } 
}
kable(xmlDF)

Title	ISBN10	Publisher	Author	Language	Amazon
Applied predictive modeling	1461468485	Springer	Max Kuhn, Kjell Johnson	English	AmazonURL
Happiness: a guide to developing life’s most important skill	316167258	Little Brown	Matthieu Ricard, Daniel Goleman	English	AmazonURL
Why we believe what we believe: our biological need for meaning, spirituality, and truth	743274970	Free Press	Andrew B. Newberg, Mark Robert Waldman	English	AmazonURL

Working with HTML

For extraxting data from table tag from HTML document, I have used HTMLTAB library.

html_data <- htmltab(doc = "C:/data/book.html")

## Argument 'which' was left unspecified. Choosing first table.

for (i in 1:3) {
  url <- rep(html_data$Amazon[i], 1)
  html_data$Amazon[i] <- paste0("[", "AmazonURL", "](", url, ")")
  if (i==1) {
    image <- paste0("[![](1.jpg)](", html_data$Image[i], ")") 
    html_data$Image[1] <- sprintf(image)
  } else if (i==2) {
    image <- paste0("[![](2.jpg)](", html_data$Image[i], ")")
    html_data$Image[2] <- sprintf(image)
  } else if (i==3) {
    image <- paste0("[![](3.jpg)](", html_data$Image[i], ")")
    html_data$Image[3] <- sprintf("![](3.jpg)")
  } 
}
kable(html_data)

	Title	ISBN10	Publisher	Author	Language	Amazon
2	Applied predictive modeling	1461468485	Springer	Max Kuhn, Kjell Johnson	English	AmazonURL
3	Happiness: a guide to developing life’s most important skill	316167258	Little Brown	Matthieu Ricard, Daniel Goleman	English	AmazonURL
4	Why we believe what we believe: our biological need for meaning, spirituality, and truth	743274970	Free Press	Andrew B. Newberg, Mark Robert Waldman	English	AmazonURL

Working with XML and JSON in R

Dhananjay Kumar

October 16, 2016