Create 3 data files, storing the same information in 3 different formats : an html table, xml and json. Load data from the 3 files into separate R dataframes. Are the three dataframes identical?
First we load packages that we will be using
library(XML)
## Warning: package 'XML' was built under R version 3.2.2
library(plyr)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.2.2
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:utils':
##
## View
we will start with the html table
html_info_source <- "https://raw.githubusercontent.com/karenweigandt/IS607/master/FairyTales.html" ## Get the source file
html_info <- readLines(con = html_info_source) ## Read the info into a variable
## Warning in readLines(con = html_info_source): incomplete final line
## found on 'https://raw.githubusercontent.com/karenweigandt/IS607/master/
## FairyTales.html'
html_info_2 <- readHTMLTable(html_info, stringsAsFactors = FALSE) ## extract the table info
html_info_df <- as.data.frame(html_info_2) ## convert list to data frame
html_info_df
## NULL.Book
## 1 Grimms Complete Fairy Tales
## 2 Hans Christian Andersen's Complete Fairy Tales
## 3 Tikki Tikki Tembo
## NULL.Author NULL.Publisher NULL.Pages
## 1 Jacob Grimm, Wilhelm Grimm Canterbury Classics 652
## 2 Hans Christian Andersen Canterbury Classics 784
## 3 Arlene Mosel Square Fish 48
Next the xml file
xml_info_source <- "https://raw.githubusercontent.com/karenweigandt/IS607/master/FairyTales.xml" ## Get the source file
xml_info <- readLines(con = xml_info_source) ## Read the info into a variable
## Warning in readLines(con = xml_info_source): incomplete final line
## found on 'https://raw.githubusercontent.com/karenweigandt/IS607/master/
## FairyTales.xml'
xml_info_2 <- xmlToList(xml_info) ## Read the info into a list variable to maintain the list of 2 author names
xml_info_df <- ldply(xmlToList(xml_info), data.frame) ## convert list to data frame
xml_info_df
## .id name
## 1 book Grimms Complete Fairy Tales
## 2 book Grimms Complete Fairy Tales
## 3 book Hans Christian Andersen's Complete Fairy Tales
## 4 book Tikki Tikki Tembo
## author publisher pages .attrs
## 1 Jacob Grimm Canterbury Classics 652 1
## 2 Wilhelm Grimm Canterbury Classics 652 1
## 3 Hans Christian Andersen Canterbury Classics 784 2
## 4 Arlene Mosel Square Fish 48 3
And last, the json file
json_info_source <- fromJSON("https://raw.githubusercontent.com/karenweigandt/IS607/master/FairyTales.json") ## Get the source file
json_info_df <- data.frame(json_info_source, stringsAsFactors = FALSE) ## make a dataframe
json_info_df
## Fairy.and.Folk.Tales.book
## 1 Grimms Complete Fairy Tales
## 2 Hans Christian Andersen's Complete Fairy Tales
## 3 Tikki Tikki Tembo
## Fairy.and.Folk.Tales.author Fairy.and.Folk.Tales.publisher
## 1 Jacob Grimm, Wilhelm Grimm Canterbury Classics
## 2 Hans Christian Andersen Canterbury Classics
## 3 Arlene Mosel Square Fish
## Fairy.and.Folk.Tales.pages
## 1 652
## 2 784
## 3 48
One thing I noticed with json is that the package you try makes a huge difference. I am also not quite sure why for the html and xml files it tells me my final line is incomplete.
The 3 data frames are definitely not identical.