Introduction

   Working with XML and JSON in R, In this exercise 3 smiliar data files wit different formats are fetched using the appropriate library and find out the techniques involved retrivin data from different file formats

Methods

   Different types of data and methods used
      HTML - Hypertext markup lanaguage, tabular kind of data, used readHTMLTable function to extract data
      XML - Each elements are covered with tags, used xmlToDataFrame fnction to extract data
      JSON - JavaScript Object Notation, a minimal readable format of structured data with name value pairs, used fromJSON function and as.data.frame to extract data

HTML Method

library(knitr)
library(kableExtra)
library(XML)
library(RCurl)
library(jsonlite)

#Read the data from mgithub repository
html_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.html")
#Extract to R dataframe using readHTMLTable function
html_dataframe <- readHTMLTable(html_format, which = 1)
#display the data
kable(data.frame(html_dataframe)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872") 
Title Authors ASIN Publisher Published Pages
To Kill A Mockingbird Harper Lee B00K1XOV5G Cornerstone Digital 8-Jul-14 322
Pride & Prejudice Inkwater Press, Jane Austen 1853260002 Wordsworth Editions Ltd 1-Sep-97 329
The Hobbit: Or, There and Back Again J.R.R. Tolkien 061815082X Young Readers Paperback Tolkien 1-Sep-01 333

XML Method

#Read the data from mgithub repository
xml_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.xml")
#Extract to R dataframe using xmlToDataFrame function
xml_dataframe <- xmlToDataFrame(xml_format)
#display the data
kable(data.frame(xml_dataframe)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872") 
Title Authors ASIN Publisher Published Pages
To Kill A Mockingbird Harper Lee B00K1XOV5G Cornerstone Digital 8-Jul-14 322
Pride & Prejudice Inkwater Press, Jane Austen 1853260002 Wordsworth Editions Ltd 1-Sep-97 329
The Hobbit: Or, There and Back Again J.R.R. Tolkien 061815082X Young Readers Paperback Tolkien 1-Sep-01 333

JSON Method

#Read the data from mgithub repository
json_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.json")
#Extract to R dataframe using fromJSON function
json_dataframe <- fromJSON(json_format)
json_dataframe <- as.data.frame(json_dataframe)
#display the data
kable(data.frame(json_dataframe)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872") 
Title Authors ASIN Publisher Published Pages
To Kill A Mockingbird Harper Lee B00K1XOV5G Cornerstone Digital 8-Jul-14 322
Pride & Prejudice Inkwater Press, Jane Austen 1853260002 Wordsworth Editions Ltd 1-Sep-97 329
The Hobbit: Or, There and Back Again J.R.R. Tolkien 061815082X Young Readers Paperback Tolkien 1-Sep-01 333

Conclusion

   R is flexible enough to read data from almost popular data files such as html,xml,json, however we can say that JSON is most comfortable way of reading the data