============================================================
Libraries required
library(XML)
library(rjson)
library(plyr)
library(dplyr)
============================================================
XML File: CLICK FOR XML
#using the XML library. The file could not be sourced to github.
url<- "http://delineator.org/storage/travelogues.xml"
xml <- xmlParse(file=url)
xml2DF<-xmlToDataFrame(xml)
xml2DF
## author coauthor genre location subgenre
## 1 Meriwether Lewis William Clark Adventure North America Travel
## 2 Paul Theroux Travel Oceania Adventure
## 3 Alfred Lansing History Antarctica Adventure
## title year
## 1 The Definitive Journals of Lewis and Clark 2002
## 2 The Happy Isles of Oceania 1992
## 3 Endurance- Shackleton's Incredible Voyage 2015
============================================================
JSON File: CLICK FOR JSON
A helpful conversion function was modified from the following source.
#using rjson library
json <- fromJSON(file = "https://raw.githubusercontent.com/RobertSellers/R/master/data/IS607_Homework8/travelogues.json")
#function modified from stackoverflow source, uses plyr library.
json <- lapply(json, function(j) {
as.data.frame(replace(j, sapply(j, is.list), NA))
})
json2DF <- rbind.fill(json)
============================================================
HTML File: CLICK FOR HTML
#Using the XML library, and sourcing a not github file
url<- "http://delineator.org/storage/travelogues.html"
html <- readHTMLTable(url)
html2DF<-as.data.frame.list(html)
============================================================
Data Comparison
glimpse(html2DF)
## Observations: 3
## Variables: 7
## $ NULL.Title (fctr) The Definitive Journals of Lewis and Clark, The...
## $ NULL.Author (fctr) Meriwether Lewis, Paul Theroux, Alfred Lansing
## $ NULL.Coauthor (fctr) William Clark, ,
## $ NULL.Year (fctr) 2002, 1992, 2015
## $ NULL.Genre (fctr) Adventure, Travel, History
## $ NULL.Subgenre (fctr) Travel, Adventure, Adventure
## $ NULL.Location (fctr) North America, Oceania, Antarctica
glimpse(json2DF)
## Observations: 3
## Variables: 7
## $ title (fctr) The Definitive Journals of Lewis and Clark, The Happ...
## $ author (fctr) Meriwether Lewis, Paul Theroux, Alfred Lansing
## $ coauthor (fctr) William Clark, ,
## $ year (fctr) 2002, 1992, 2015
## $ genre (fctr) Adventure, Travel, History
## $ subgenre (fctr) Travel, Adventure, Adventure
## $ location (fctr) North America, Oceania, Antarctica
glimpse(xml2DF)
## Observations: 3
## Variables: 7
## $ author (fctr) Meriwether Lewis, Paul Theroux, Alfred Lansing
## $ coauthor (fctr) William Clark, ,
## $ genre (fctr) Adventure, Travel, History
## $ location (fctr) North America, Oceania, Antarctica
## $ subgenre (fctr) Travel, Adventure, Adventure
## $ title (fctr) The Definitive Journals of Lewis and Clark, The Happ...
## $ year (fctr) 2002, 1992, 2015
============================================================
We notice differences in the column arrangements, and with “NULL” row information. Not only are there idiosyncratic loading mechanisms, but the data transformation required, not the least defining data types, also differs between file types. Further refinement would be necessary to create perfectly identical data.
============================================================