Regular csv for comparison
dfcsv <- read.csv("https://raw.githubusercontent.com/davidblumenstiel/CUNY-MSDS-Data-607/master/Homework%207/books.csv", stringsAsFactors = FALSE)
dfcsv
## Title
## 1 The_Elements:_A_Visual_Exploration_of_Every_Known_Atom_in_the_Universe
## 2 Superheavy:_Making_and_Breaking_the_Periodic_Table
## 3 Oxygen:_The_Molecule_that_Made_the_World
## Authors Amazon_Price Date_Published
## 1 Theodore_Gray,_Nick_Mann 15.99 4/3/2012
## 2 Kit_Chapman 11.99 6/13/2019
## 3 Nick_Lane 12.62 7/1/2016
This uses the library RCurl to retreive the URL, and the library XML for an easy to-dataframe function
library("RCurl")
library("XML")
xmlurl <-getURL("https://raw.githubusercontent.com/davidblumenstiel/CUNY-MSDS-Data-607/master/Homework%207/books.xml")
dfxml <- xmlToDataFrame(xmlurl)
dfxml
## Title
## 1 The_Elements:_A_Visual_Exploration_of_Every_Known_Atom_in_the_Universe
## 2 Superheavy:_Making_and_Breaking_the_Periodic_Table
## 3 Oxygen:_The_Molecule_that_Made_the_World
## Authors Amazon_Price Date_Published
## 1 Theodore_Gray,_Nick_Mann 15.99 4/3/2012
## 2 Kit_Chapman 11.99 6/13/2019
## 3 Nick_Lane 12.62 7/1/2016
Needed a few libraies to read this into a dataframe simply. It comes out as a list of two, so only the first element (the data) is read into the dataframe
library("textreadr")
library("rvest")
## Loading required package: xml2
##
## Attaching package: 'xml2'
## The following object is masked from 'package:textreadr':
##
## read_html
##
## Attaching package: 'rvest'
## The following object is masked from 'package:XML':
##
## xml
library("xml2")
dfhtml<-read_html('https://raw.githubusercontent.com/davidblumenstiel/CUNY-MSDS-Data-607/master/Homework%207/books.htm')
dfhtml<-html_table(dfhtml, dec = ".")[[1]]
dfhtml
## Title
## 1 The_Elements:_A_Visual_Exploration_of_Every_Known_Atom_in_the_Universe
## 2 Superheavy:_Making_and_Breaking_the_Periodic_Table
## 3 Oxygen:_The_Molecule_that_Made_the_World
## Authors Amazon_Price Date_Published
## 1 Theodore_Gray,_Nick_Mann 15.99 4/3/2012
## 2 Kit_Chapman 11.99 6/13/2019
## 3 Nick_Lane 12.62 7/1/2016
Used the jsonlite library for an easy import. Origionally, the file read as a long vector, but ‘simplifyVector’ reduced it
library(jsonlite)
dfjson <- read_json("https://raw.githubusercontent.com/davidblumenstiel/CUNY-MSDS-Data-607/master/Homework%207/books.json", simplifyVector = TRUE)
dfjson
## Title
## 1 The_Elements:_A_Visual_Exploration_of_Every_Known_Atom_in_the_Universe
## 2 Superheavy:_Making_and_Breaking_the_Periodic_Table
## 3 Oxygen:_The_Molecule_that_Made_the_World
## Authors Amazon_Price Date_Published
## 1 Theodore_Gray,_Nick_Mann 15.99 4/3/2012
## 2 Kit_Chapman 11.99 6/13/2019
## 3 Nick_Lane 12.62 7/1/2016
All of these are pretty much identical. The only difference I can spot is that the data types in the xml file all came across as factors.