DATA607_Week7_Assignment

Introduction

Working with XML and JSON in R, In this exercise 3 smiliar data files wit different formats are fetched using the appropriate library and find out the techniques involved retrivin data from different file formats

Methods

   Different types of data and methods used
      HTML - Hypertext markup lanaguage, tabular kind of data, used readHTMLTable function to extract data
      XML - Each elements are covered with tags, used xmlToDataFrame fnction to extract data
      JSON - JavaScript Object Notation, a minimal readable format of structured data with name value pairs, used fromJSON function and as.data.frame to extract data

HTML Method

library(knitr)
library(kableExtra)
library(XML)
library(RCurl)
library(jsonlite)

#Read the data from mgithub repository
html_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.html")
#Extract to R dataframe using readHTMLTable function
html_dataframe <- readHTMLTable(html_format, which = 1)
#display the data
kable(data.frame(html_dataframe)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872")

Title	Authors	ASIN	Publisher	Published	Pages
To Kill A Mockingbird	Harper Lee	B00K1XOV5G	Cornerstone Digital	8-Jul-14	322
Pride & Prejudice	Inkwater Press, Jane Austen	1853260002	Wordsworth Editions Ltd	1-Sep-97	329
The Hobbit: Or, There and Back Again	J.R.R. Tolkien	061815082X	Young Readers Paperback Tolkien	1-Sep-01	333

XML Method

#Read the data from mgithub repository
xml_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.xml")
#Extract to R dataframe using xmlToDataFrame function
xml_dataframe <- xmlToDataFrame(xml_format)
#display the data
kable(data.frame(xml_dataframe)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872")

Title	Authors	ASIN	Publisher	Published	Pages
To Kill A Mockingbird	Harper Lee	B00K1XOV5G	Cornerstone Digital	8-Jul-14	322
Pride & Prejudice	Inkwater Press, Jane Austen	1853260002	Wordsworth Editions Ltd	1-Sep-97	329
The Hobbit: Or, There and Back Again	J.R.R. Tolkien	061815082X	Young Readers Paperback Tolkien	1-Sep-01	333

JSON Method

#Read the data from mgithub repository
json_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.json")
#Extract to R dataframe using fromJSON function
json_dataframe <- fromJSON(json_format)
json_dataframe <- as.data.frame(json_dataframe)
#display the data
kable(data.frame(json_dataframe)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872")

Title	Authors	ASIN	Publisher	Published	Pages
To Kill A Mockingbird	Harper Lee	B00K1XOV5G	Cornerstone Digital	8-Jul-14	322
Pride & Prejudice	Inkwater Press, Jane Austen	1853260002	Wordsworth Editions Ltd	1-Sep-97	329
The Hobbit: Or, There and Back Again	J.R.R. Tolkien	061815082X	Young Readers Paperback Tolkien	1-Sep-01	333

Conclusion

R is flexible enough to read data from almost popular data files such as html,xml,json, however we can say that JSON is most comfortable way of reading the data