Working with XML and JSON in R, In this exercise 3 smiliar data files wit different formats are fetched using the appropriate library and find out the techniques involved retrivin data from different file formats
Different types of data and methods used
HTML - Hypertext markup lanaguage, tabular kind of data, used readHTMLTable function to extract data
XML - Each elements are covered with tags, used xmlToDataFrame fnction to extract data
JSON - JavaScript Object Notation, a minimal readable format of structured data with name value pairs, used fromJSON function and as.data.frame to extract data
library(knitr)
library(kableExtra)
library(XML)
library(RCurl)
library(jsonlite)
#Read the data from mgithub repository
html_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.html")
#Extract to R dataframe using readHTMLTable function
html_dataframe <- readHTMLTable(html_format, which = 1)
#display the data
kable(data.frame(html_dataframe)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872") | Title | Authors | ASIN | Publisher | Published | Pages |
|---|---|---|---|---|---|
| To Kill A Mockingbird | Harper Lee | B00K1XOV5G | Cornerstone Digital | 8-Jul-14 | 322 |
| Pride & Prejudice | Inkwater Press, Jane Austen | 1853260002 | Wordsworth Editions Ltd | 1-Sep-97 | 329 |
| The Hobbit: Or, There and Back Again | J.R.R. Tolkien | 061815082X | Young Readers Paperback Tolkien | 1-Sep-01 | 333 |
#Read the data from mgithub repository
xml_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.xml")
#Extract to R dataframe using xmlToDataFrame function
xml_dataframe <- xmlToDataFrame(xml_format)
#display the data
kable(data.frame(xml_dataframe)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872") | Title | Authors | ASIN | Publisher | Published | Pages |
|---|---|---|---|---|---|
| To Kill A Mockingbird | Harper Lee | B00K1XOV5G | Cornerstone Digital | 8-Jul-14 | 322 |
| Pride & Prejudice | Inkwater Press, Jane Austen | 1853260002 | Wordsworth Editions Ltd | 1-Sep-97 | 329 |
| The Hobbit: Or, There and Back Again | J.R.R. Tolkien | 061815082X | Young Readers Paperback Tolkien | 1-Sep-01 | 333 |
#Read the data from mgithub repository
json_format <- getURL("https://raw.githubusercontent.com/thasleem1/DATA607/master/booksw7/books.json")
#Extract to R dataframe using fromJSON function
json_dataframe <- fromJSON(json_format)
json_dataframe <- as.data.frame(json_dataframe)
#display the data
kable(data.frame(json_dataframe)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872") | Title | Authors | ASIN | Publisher | Published | Pages |
|---|---|---|---|---|---|
| To Kill A Mockingbird | Harper Lee | B00K1XOV5G | Cornerstone Digital | 8-Jul-14 | 322 |
| Pride & Prejudice | Inkwater Press, Jane Austen | 1853260002 | Wordsworth Editions Ltd | 1-Sep-97 | 329 |
| The Hobbit: Or, There and Back Again | J.R.R. Tolkien | 061815082X | Young Readers Paperback Tolkien | 1-Sep-01 | 333 |
R is flexible enough to read data from almost popular data files such as html,xml,json, however we can say that JSON is most comfortable way of reading the data