For this homework, I created a table in Numbers from MacOS and then exported the table as an .CSV file. Then I used the following online conversion tools to load the .CSV file and convert it to to the required formats.
Convert CSV to JSON, XML
http://www.convertcsv.com/csv-to-json.htm http://www.convertcsv.com/csv-to-xml.htm http://www.convertcsv.com/csv-to-html.htm
# Load HTML file
HTML.File <- "/Users/josemawyin/Library/Mobile Documents/com~apple~CloudDocs/Data Science Masters /607/607Week7/FavBooks.htm"
# Parse HTML data
HTML.Table <- readHTMLTable(HTML.File)
The easiest load was from HTM into a data.frame using “readHTMLTable” from the XML package. Right away, you could use the created data.frame for further analysis.
HTML.Table
## $`NULL`
## Title
## 1 Evolution
## 2 The Player of Games
## 3 Eye of Terra
## Authors
## 1 Stephen Baxter
## 2 Iain M. Banks
## 3 Graham McNeill, Aaron Dembski-Bowden, Chris Wraight, Gav Thorpe, Matthew Farrer, Rob Sanders
## Published Date Publisher ISBN Pages
## 1 January 1, 2003 Del Rey 345457838 672
## 2 March 26, 2008 Orbit 316005401 416
## 3 June 27, 2017 Black Library B074R7ZCDF 480
Extracting data from a XML file is a bit more elaborate because we need to specify the XML tag that defines the individual entries. To load the .XML file we use the “xmlParse” function from the XML package.
# Load XML file
XML.File <- "/Users/josemawyin/Library/Mobile Documents/com~apple~CloudDocs/Data Science Masters /607/607Week7/favbooks.xml"
# Parse XML data
xmlfile <- xmlParse(XML.File)
xmlfile
## <?xml version="1.0" encoding="UTF-8"?>
## <root>
## <row>
## <Title>Evolution</Title>
## <Authors>Stephen Baxter</Authors>
## <Published_Date>January 1, 2003</Published_Date>
## <Publisher>Del Rey</Publisher>
## <ISBN>345457838</ISBN>
## <Pages>672</Pages>
## </row>
## <row>
## <Title>The Player of Games</Title>
## <Authors>Iain M. Banks</Authors>
## <Published_Date>March 26, 2008</Published_Date>
## <Publisher>Orbit</Publisher>
## <ISBN>316005401</ISBN>
## <Pages>416</Pages>
## </row>
## <row>
## <Title>Eye of Terra</Title>
## <Authors>Graham McNeill, Aaron Dembski-Bowden, Chris Wraight, Gav Thorpe, Matthew Farrer, Rob Sanders</Authors>
## <Published_Date>June 27, 2017</Published_Date>
## <Publisher>Black Library</Publisher>
## <ISBN>B074R7ZCDF</ISBN>
## <Pages>480</Pages>
## </row>
## </root>
##
Here we define the XML tag that defines a node or entry from the encoded table using the “xmlToDataFrame” function from the XML package.
# Get place nodes
XML.Table <- xmlToDataFrame(nodes=getNodeSet(xmlfile,"//row"))
XML.Table
colnames(XML.Table)
## [1] "Title" "Authors" "Published_Date" "Publisher"
## [5] "ISBN" "Pages"
Working with a .JSON file was the most difficult from all the file types. I used the “fromJSON” function fromt the rJason package to load the file into a table.
#Load JSON file
JSON.File <- "/Users/josemawyin/Library/Mobile Documents/com~apple~CloudDocs/Data Science Masters /607/607Week7/FavBooks.json"
# You can pass directly the filename
my.JSON <- fromJSON(file=JSON.File)
str(my.JSON)
## List of 3
## $ :List of 6
## ..$ Title : chr "Evolution"
## ..$ Authors : chr "Stephen Baxter"
## ..$ Published Date: chr "January 1, 2003"
## ..$ Publisher : chr "Del Rey"
## ..$ ISBN : chr "345457838"
## ..$ Pages : num 672
## $ :List of 6
## ..$ Title : chr "The Player of Games"
## ..$ Authors : chr "Iain M. Banks"
## ..$ Published Date: chr "March 26, 2008"
## ..$ Publisher : chr "Orbit"
## ..$ ISBN : chr "316005401"
## ..$ Pages : num 416
## $ :List of 6
## ..$ Title : chr "Eye of Terra"
## ..$ Authors : chr "Graham McNeill, Aaron Dembski-Bowden, Chris Wraight, Gav Thorpe, Matthew Farrer, Rob Sanders"
## ..$ Published Date: chr "June 27, 2017"
## ..$ Publisher : chr "Black Library"
## ..$ ISBN : chr "B074R7ZCDF"
## ..$ Pages : num 480
The “fromJSON” function creates a list of list that now we need to parse through to create a data.frame. This we do below through the function “lapply”.
df <- lapply(my.JSON, function(play) # Loop through each "play"
{
# Convert each group to a data frame.
# This assumes you have 6 elements each time
data.frame(matrix(unlist(play), ncol=6, byrow=T))
})
# Now you have a list of data frames, connect them together in
# one single dataframe
df <- do.call(rbind, df)
# Make column names nicer, remove row names
#colnames(df) <- names(my.JSON[[1]][[1]])
colnames(df) <- colnames(XML.Table)
rownames(df) <- NULL
df
We have shown how loaded data from 3 different formats (HTML, XML and JSON) and successfully loaded the data into a data.frame. The conversion was easier with the HML and XML formats. However, JSON encapsulation of data into a string makes it suitable as a vehicle for tabulated data as well.