Assignment – Working with XML and JSON in R
This assignment involves importing different data sources into R. Three files were generated:Each one handling a different file format. Three books were imported into each file including “I, Robot”, “Brave New World”, and “The Talisman.”
The files were generated using Notepad++.
Figure 1. Notepad++ Data.
Importing the Data
The HTML and XML file was imported using the XML library file and the JSON file was imported using the jsonlite program.
// Import and Parsing Data
library(jsonlite)
library(XML)
#Import Json
JSONBooks <- fromJSON("CesarBooks.json", flatten=TRUE)
class(JSONBooks)
colnames(JSONBooks)
#Import XML
XMLBooks=xmlParse("CesarBooks.xml")
class(XMLBooks)
#Import HTML
HTMLBooks<-readHTMLTable('CesarBooks.html', header = TRUE)
# Replace all \n by spaces
class(HTMLBooks)
Structures of Each File
Each file is imported by R in different formats.
// See the structre of each file.
class(JSONBooks)
class(XMLBooks)
class(HTMLBooks)
library(knitr)
kable(head(JSONBooks), caption = "Table 1. JSON Table")
kable(head(XMLBooks), caption = "Table 2. XML Table")
kable(head(HTMLBooks), caption = "Table 3. HTML Table")
class(JSONBooks)
## [1] "list"
class(XMLBooks)
## [1] "data.frame"
class(HTMLBooks)
## [1] "list"
class(JSONBooks)
## [1] "list"
class(XMLBooks)
## [1] "data.frame"
class(HTMLBooks)
## [1] "list"
library(knitr)
kable(head(JSONBooks), caption = "Table 1. JSON Table")
|
kable(head(XMLBooks), caption = "Table 2. XML Table")
id | author | title | genre | year | Language |
---|---|---|---|---|---|
1 | Aldous Huxley | Brave New World | Science Fiction | 1931 | English |
2 | Stephen King, Peter Straub | The Talisman | Dark Fantasy | 1984 | English |
3 | Isaac Asimov | I, Robot | Science Fiction | 1950 | English |
kable(head(HTMLBooks), caption = "Table 3. HTML Table")
|
Are they identical?
identical(JSONBooks, XMLBooks)
## [1] FALSE
identical(JSONBooks, HTMLBooks)
## [1] FALSE
identical(XMLBooks, HTMLBooks)
## [1] FALSE
None of these frames are identical.