Create 3 data files, storing the same information in 3 different formats : an html table, xml and json. Load data from the 3 files into separate R dataframes. Are the three dataframes identical?

First we load packages that we will be using

library(XML)
## Warning: package 'XML' was built under R version 3.2.2
library(plyr)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.2.2
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View

we will start with the html table

html_info_source <- "https://raw.githubusercontent.com/karenweigandt/IS607/master/FairyTales.html" ## Get the source file
html_info <- readLines(con = html_info_source) ## Read the info into a variable
## Warning in readLines(con = html_info_source): incomplete final line
## found on 'https://raw.githubusercontent.com/karenweigandt/IS607/master/
## FairyTales.html'
html_info_2 <- readHTMLTable(html_info, stringsAsFactors = FALSE) ## extract the table info
html_info_df <- as.data.frame(html_info_2) ## convert list to data frame 
html_info_df
##                                        NULL.Book
## 1                    Grimms Complete Fairy Tales
## 2 Hans Christian Andersen's Complete Fairy Tales
## 3                              Tikki Tikki Tembo
##                  NULL.Author      NULL.Publisher NULL.Pages
## 1 Jacob Grimm, Wilhelm Grimm Canterbury Classics        652
## 2    Hans Christian Andersen Canterbury Classics        784
## 3               Arlene Mosel         Square Fish         48

Next the xml file

xml_info_source <- "https://raw.githubusercontent.com/karenweigandt/IS607/master/FairyTales.xml" ## Get the source file
xml_info <- readLines(con = xml_info_source) ## Read the info into a variable
## Warning in readLines(con = xml_info_source): incomplete final line
## found on 'https://raw.githubusercontent.com/karenweigandt/IS607/master/
## FairyTales.xml'
xml_info_2 <- xmlToList(xml_info) ## Read the info into a list variable to maintain the list of 2 author names
xml_info_df <- ldply(xmlToList(xml_info), data.frame) ## convert list to data frame
xml_info_df
##    .id                                           name
## 1 book                    Grimms Complete Fairy Tales
## 2 book                    Grimms Complete Fairy Tales
## 3 book Hans Christian Andersen's Complete Fairy Tales
## 4 book                              Tikki Tikki Tembo
##                    author           publisher pages .attrs
## 1             Jacob Grimm Canterbury Classics   652      1
## 2           Wilhelm Grimm Canterbury Classics   652      1
## 3 Hans Christian Andersen Canterbury Classics   784      2
## 4            Arlene Mosel         Square Fish    48      3

And last, the json file

json_info_source <- fromJSON("https://raw.githubusercontent.com/karenweigandt/IS607/master/FairyTales.json") ## Get the source file
json_info_df <- data.frame(json_info_source, stringsAsFactors = FALSE)  ## make a dataframe
json_info_df
##                        Fairy.and.Folk.Tales.book
## 1                    Grimms Complete Fairy Tales
## 2 Hans Christian Andersen's Complete Fairy Tales
## 3                              Tikki Tikki Tembo
##   Fairy.and.Folk.Tales.author Fairy.and.Folk.Tales.publisher
## 1  Jacob Grimm, Wilhelm Grimm            Canterbury Classics
## 2     Hans Christian Andersen            Canterbury Classics
## 3                Arlene Mosel                    Square Fish
##   Fairy.and.Folk.Tales.pages
## 1                        652
## 2                        784
## 3                         48

One thing I noticed with json is that the package you try makes a huge difference. I am also not quite sure why for the html and xml files it tells me my final line is incomplete.

The 3 data frames are definitely not identical.