Homework 7

Cesar L. Espitia

3/18/2017

Assignment – Working with XML and JSON in R

This assignment involves importing different data sources into R. Three files were generated:
  • JSON - Books.json
  • XML - Books.xml
  • HTML - Books.html
  • Each one handling a different file format. Three books were imported into each file including “I, Robot”, “Brave New World”, and “The Talisman.”

    The files were generated using Notepad++.

    Figure 1. Notepad++ Data.

    Figure 1. Notepad++ Data.

    Importing the Data

    The HTML and XML file was imported using the XML library file and the JSON file was imported using the jsonlite program.

    // Import and Parsing Data
    library(jsonlite)
    library(XML)
    
    #Import Json
    JSONBooks <- fromJSON("CesarBooks.json", flatten=TRUE)
    class(JSONBooks)
    colnames(JSONBooks)
    
    #Import XML
    XMLBooks=xmlParse("CesarBooks.xml")
    class(XMLBooks) 
    
    #Import HTML
    HTMLBooks<-readHTMLTable('CesarBooks.html', header = TRUE)
    
    # Replace all \n by spaces
    class(HTMLBooks) 

    Structures of Each File

    Each file is imported by R in different formats.

    // See the structre of each file.
    class(JSONBooks)
    
    class(XMLBooks) 
    
    class(HTMLBooks) 
    
    library(knitr)
    kable(head(JSONBooks), caption = "Table 1. JSON Table")
    
    kable(head(XMLBooks), caption = "Table 2. XML Table")
    
    kable(head(HTMLBooks), caption = "Table 3. HTML Table")
    class(JSONBooks)
    ## [1] "list"
    class(XMLBooks) 
    ## [1] "data.frame"
    class(HTMLBooks) 
    ## [1] "list"
    class(JSONBooks)
    ## [1] "list"
    class(XMLBooks) 
    ## [1] "data.frame"
    class(HTMLBooks) 
    ## [1] "list"
    library(knitr)
    kable(head(JSONBooks), caption = "Table 1. JSON Table")
    Table 1. JSON Table
    ID Author Title Genre Year Language
    1 Aldous Huxley Brave New World Science Fiction 1931 English
    2 Stephen King, Peter Straub The Talisman Dark Fantasy 1984 English
    3 Isaac Asimov I, Robot Science Fiction 1950 English
    kable(head(XMLBooks), caption = "Table 2. XML Table")
    Table 2. XML Table
    id author title genre year Language
    1 Aldous Huxley Brave New World Science Fiction 1931 English
    2 Stephen King, Peter Straub The Talisman Dark Fantasy 1984 English
    3 Isaac Asimov I, Robot Science Fiction 1950 English
    kable(head(HTMLBooks), caption = "Table 3. HTML Table")
    Table 3. HTML Table
    ID author title genre year language
    1 Aldous Huxley Brave New World Science Fiction 1931 English
    2 Stephen King, Peter Straub The Talisman Dark Fantasy 1984 English
    3 Isaac Asimov I, Robot Sceince Fiction 1950 English

    Are they identical?

    identical(JSONBooks, XMLBooks)
    ## [1] FALSE
    identical(JSONBooks, HTMLBooks)
    ## [1] FALSE
    identical(XMLBooks, HTMLBooks)
    ## [1] FALSE

    None of these frames are identical.