Working with XML and JSON in R

Our Goal

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.

Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

Importing dataset Books from XML Table

  1. Load the XML table of Books into R Dataframe
  2. get the dataset Books using getURL() function
  3. assign the dataset to xml_books
  1. Convert the xml_books table from xml to dataframe format using xmlToDataFrame()
                                                                Title
1                                     When God writes your love story
2 Discerning the Voice of GothicGod: How to Recognize When God Speaks
3                                         The Lost Art of True Beauty
                  Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy           1999                  English
2       Priscilla Shirer           2006                  English
3            Leslie Ludy           2010                  English
  Goodreads_rating
1              4.1
2              4.5
3             4.12

The above dataframe has following columns: 1) Title 2) Author(Book 1 has two authors ; Book 2 , Book 3 has 1 author) 3) Year_Published 4) OriginalLanguage_written 5) Goodreads_rating

Importing dataset Books from HTML Table

  1. Load the HTML table of Books into R Dataframe
  2. get the dataset Books using getURL() function
  3. assign the dataset to html_books
  1. Convert the html_books table from html to dataframe format using readHTMLTable()
$`NULL`
                                                          Title
1                               When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3                                   The Lost Art of True Beauty
                  Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy           1999                  English
2    F. Scott Fitzgerald           2006                  English
3            Leslie Ludy           2010                  English
  Goodreads_rating
1              4.1
2              4.5
3             4.12

The above dataframe has following columns: 1) Title 2) Author(Book 1 has two authors ; Book 2 , Book 3 has 1 author) 3) Year_Published 4) OriginalLanguage_written 5) Goodreads_rating

Importing dataset Books from JSON Table

  1. Load the JSON table of Books into R Dataframe
  2. get the dataset Books using fromJSON() function
  3. assign the dataset to json_books
$`books:`
                                                          Title
1                               When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3                                  The Lost Art of True Beauty 
                  Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy           1999                  English
2       Priscilla Shirer           2006                  English
3            Leslie Ludy           2010                  English
  Goodreads_rating
1              4.1
2              4.5
3             4.12

The above dataframe has following columns: 1) Title 2) Author(Book 1 has two authors ; Book 2 , Book 3 has 1 author) 3) Year_Published 4) OriginalLanguage_written 5) Goodreads_rating

Conclusion

The XML, HTML, JSON datastructure is unique in it’s structure and all the three dataframes are unique by its format i.e. .xml, .json, .html, all the three dataframes are identical by the number of content and by the matter of content

The XML Table Format

                                                                Title
1                                     When God writes your love story
2 Discerning the Voice of GothicGod: How to Recognize When God Speaks
3                                         The Lost Art of True Beauty
                  Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy           1999                  English
2       Priscilla Shirer           2006                  English
3            Leslie Ludy           2010                  English
  Goodreads_rating
1              4.1
2              4.5
3             4.12

The HTML Table Format

$`NULL`
                                                          Title
1                               When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3                                   The Lost Art of True Beauty
                  Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy           1999                  English
2    F. Scott Fitzgerald           2006                  English
3            Leslie Ludy           2010                  English
  Goodreads_rating
1              4.1
2              4.5
3             4.12

The JSON Table Format

$`books:`
                                                          Title
1                               When God writes your love story
2 Discerning the Voice of God: How to Recognize When God Speaks
3                                  The Lost Art of True Beauty 
                  Author Year_Published OriginalLanguage_written
1 Leslie Ludy, Eric Ludy           1999                  English
2       Priscilla Shirer           2006                  English
3            Leslie Ludy           2010                  English
  Goodreads_rating
1              4.1
2              4.5
3             4.12