Working with XML and JSON in R
Our Goal
Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.
Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.
Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?
Loading the required packages
- Install package
xmland load the library
# options(repos = c(CRAN = 'http://cran.rstudio.com'))
# install.packages('XML')
suppressWarnings(suppressMessages(library(XML)))- Install the package
RCurland load the library
- Install the package
bitopsand load the library
- Install the package
rjsonand load the library
# install.packages('rjson')
suppressWarnings(suppressMessages(library(rjson)))
suppressWarnings(suppressMessages(library(jsonlite)))- Install the package
tidyrand load the library
- Install the package
dplyrand load the library
- Install the package
stringrand load the library
- Install the package
jsonliteand load the library
Importing dataset Books from XML Table
- Load the XML table of Books into R Dataframe
- get the dataset Books using
getURL()function - assign the dataset to
xml_books
- Convert the
xml_bookstable from xml to dataframe format usingxmlToDataFrame()
Title
1 When God writes your love story
2 Discerning the Voice of GothicGod: How to Recognize When God Speaks
3 The Lost Art of True Beauty
Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy 1999 English
2 Priscilla Shirer 2006 English
3 Leslie Ludy 2010 English
Goodreads_rating
1 4.1
2 4.5
3 4.12
The above dataframe has following columns: 1) Title 2) Author(Book 1 has two authors ; Book 2 , Book 3 has 1 author) 3) Year_Published 4) OriginalLanguage_written 5) Goodreads_rating
Importing dataset Books from HTML Table
- Load the HTML table of Books into R Dataframe
- get the dataset Books using
getURL()function - assign the dataset to
html_books
- Convert the html_books table from html to dataframe format using
readHTMLTable()
$`NULL`
Title
1 When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3 The Lost Art of True Beauty
Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy 1999 English
2 F. Scott Fitzgerald 2006 English
3 Leslie Ludy 2010 English
Goodreads_rating
1 4.1
2 4.5
3 4.12
The above dataframe has following columns: 1) Title 2) Author(Book 1 has two authors ; Book 2 , Book 3 has 1 author) 3) Year_Published 4) OriginalLanguage_written 5) Goodreads_rating
Importing dataset Books from JSON Table
- Load the JSON table of Books into R Dataframe
- get the dataset Books using
fromJSON()function - assign the dataset to
json_books
json_books <- fromJSON("https://raw.githubusercontent.com/PriyaShaji/Data607/master/Assignment_7/Books.json")
json_books$`books:`
Title
1 When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3 The Lost Art of True Beauty
Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy 1999 English
2 Priscilla Shirer 2006 English
3 Leslie Ludy 2010 English
Goodreads_rating
1 4.1
2 4.5
3 4.12
The above dataframe has following columns: 1) Title 2) Author(Book 1 has two authors ; Book 2 , Book 3 has 1 author) 3) Year_Published 4) OriginalLanguage_written 5) Goodreads_rating
Conclusion
The XML, HTML, JSON datastructure is unique in it’s structure and all the three dataframes are unique by its format i.e. .xml, .json, .html, all the three dataframes are identical by the number of content and by the matter of content
The XML Table Format
Title
1 When God writes your love story
2 Discerning the Voice of GothicGod: How to Recognize When God Speaks
3 The Lost Art of True Beauty
Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy 1999 English
2 Priscilla Shirer 2006 English
3 Leslie Ludy 2010 English
Goodreads_rating
1 4.1
2 4.5
3 4.12
The HTML Table Format
$`NULL`
Title
1 When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3 The Lost Art of True Beauty
Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy 1999 English
2 F. Scott Fitzgerald 2006 English
3 Leslie Ludy 2010 English
Goodreads_rating
1 4.1
2 4.5
3 4.12
The JSON Table Format
$`books:`
Title
1 When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3 The Lost Art of True Beauty
Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy 1999 English
2 Priscilla Shirer 2006 English
3 Leslie Ludy 2010 English
Goodreads_rating
1 4.1
2 4.5
3 4.12