I have always been fascinated with leadership, so I put in a file 4 classic books from my shelf. I coded the versions in html, xml and JSON and I uploaded the files into github. I then opened up the various packages required.
require(XML)
require(RCurl)
require(RJSONIO)
The tricky part was figuring out how to get the github html file to read into R the html code. Thankfully I was able to solve it using the getURL function from RCurl. Then with the function readHTMLTable from XML the rest was pretty straight forward.
testhtml <- getURL("https://raw.githubusercontent.com/jeffnieman11/Data607_HW8/master/books.html")
testhtml
## [1] "\n\n<!DOCTYPE html>\n<html>\n <head>\n <title>books</title>\n </head>\n <body> \n <h1>Books On Leadership</h1>\n <table>\n <tbody>\n <tr>\n <th>Title</th>\n <th>Author1</th>\n <th>Author2</th>\n <th>Edition</th>\n <th>Pages</th>\n <th>Year</th>\n <th>Publisher</th>\n </tr>\n <tr>\n <td>On Becoming a Leader</td>\n <td>Warren Bennis</td>\n <td>NA</td>\n <td>1</td>\n <td>226</td>\n <td>1994</td>\n <td>Addison-Wesley</td>\n </tr>\n <tr>\t\n <td>Leaders</td>\n <td>Warren Bennis</td>\n <td>Burt Nannus</td>\n <td>2</td>\n <td>235</td>\n <td>1997</td>\n <td>Harper Business</td>\n </tr>\n <tr>\n <td>The Leadership Challenge</td>\n <td>James Kouzes</td>\n <td>Barry Postner</td>\n <td>3</td>\n <td>458</td>\n <td>2002</td>\n <td>Jossey-Bass</td>\n </tr>\n <tr>\n <td>Leading Change</td>\n <td>John Kotter</td>\n <td>NA</td>\n <td>1</td>\n <td>186</td>\n <td>1996</td>\n <td>Harvard Business School Press</td>\n </tr>\n </tbody>\n </table>\n </body>\n</html>\n"
bookshtml <- readHTMLTable(testhtml, which = 1, header = TRUE, stringsAsFactors = FALSE)
bookshtml
## Title Author1 Author2 Edition Pages Year
## 1 On Becoming a Leader Warren Bennis NA 1 226 1994
## 2 Leaders Warren Bennis Burt Nannus 2 235 1997
## 3 The Leadership Challenge James Kouzes Barry Postner 3 458 2002
## 4 Leading Change John Kotter NA 1 186 1996
## Publisher
## 1 Addison-Wesley
## 2 Harper Business
## 3 Jossey-Bass
## 4 Harvard Business School Press
I followed the same process to get the github xml file into R. Then I parsed the file to clean it up and finished by using the xmlToDataFrame function to create the table. It was identical to the one created in the html process above.
testxml <- getURL("https://raw.githubusercontent.com/jeffnieman11/Data607_HW8/master/books.xml")
testxml
## [1] "<U+FEFF><?xml version=\"1.0\" encoding=\"UTF-8\"?> \n<Books_on_Leadership>\n <book id=\"1\">\n <Title>On Becoming a Leader</Title>\n <Author1>Warren Bennis</Author1>\n <Author2>NA</Author2>\n <Edition>1</Edition>\n <Pages>226</Pages>\n <Year>1994</Year>\n <Publisher>Addison-Wesley</Publisher>\n </book>\n <book id=\"2\">\n <Title>Leaders</Title>\n <Author1>Warren Bennis</Author1>\n <Author2>Burt Nannus</Author2>\n <Edition>2</Edition>\n <Pages>235</Pages>\n <Year>1997</Year>\n <Publisher>Harper Business</Publisher>\n </book>\n <book id=\"3\">\n <Title>The Leadership Challenge</Title>\n <Author1>James Kouzes</Author1>\n <Author2>Barry Postner</Author2>\n <Edition>3</Edition>\n <Pages>458</Pages>\n <Year>2002</Year>\n <Publisher>Jossey-Bass</Publisher>\n </book>\n <book id=\"4\">\n <Title>Leading Change</Title>\n <Author1>John Kotter</Author1>\n <Author2>NA</Author2>\n <Edition>1</Edition>\n <Pages>186</Pages>\n <Year>1996</Year>\n <Publisher>Harvard Business School Press</Publisher>\n </book>\n</Books_on_Leadership>\n\n\t"
parsexml <- xmlParse(testxml)
parsexml
## <?xml version="1.0" encoding="UTF-8"?>
## <Books_on_Leadership>
## <book id="1">
## <Title>On Becoming a Leader</Title>
## <Author1>Warren Bennis</Author1>
## <Author2>NA</Author2>
## <Edition>1</Edition>
## <Pages>226</Pages>
## <Year>1994</Year>
## <Publisher>Addison-Wesley</Publisher>
## </book>
## <book id="2">
## <Title>Leaders</Title>
## <Author1>Warren Bennis</Author1>
## <Author2>Burt Nannus</Author2>
## <Edition>2</Edition>
## <Pages>235</Pages>
## <Year>1997</Year>
## <Publisher>Harper Business</Publisher>
## </book>
## <book id="3">
## <Title>The Leadership Challenge</Title>
## <Author1>James Kouzes</Author1>
## <Author2>Barry Postner</Author2>
## <Edition>3</Edition>
## <Pages>458</Pages>
## <Year>2002</Year>
## <Publisher>Jossey-Bass</Publisher>
## </book>
## <book id="4">
## <Title>Leading Change</Title>
## <Author1>John Kotter</Author1>
## <Author2>NA</Author2>
## <Edition>1</Edition>
## <Pages>186</Pages>
## <Year>1996</Year>
## <Publisher>Harvard Business School Press</Publisher>
## </book>
## </Books_on_Leadership>
##
booksxml <- xmlToDataFrame(parsexml)
booksxml
## Title Author1 Author2 Edition Pages Year
## 1 On Becoming a Leader Warren Bennis NA 1 226 1994
## 2 Leaders Warren Bennis Burt Nannus 2 235 1997
## 3 The Leadership Challenge James Kouzes Barry Postner 3 458 2002
## 4 Leading Change John Kotter NA 1 186 1996
## Publisher
## 1 Addison-Wesley
## 2 Harper Business
## 3 Jossey-Bass
## 4 Harvard Business School Press
Once again I followed the same process to get the github JSON file into R. I used the fromJSON function from the RJSONIO package to extract the code, pulled out the “Books on Leadership” string to simplify, and then used sapply to make the data frame that is the table. Once again it matches the others.
testjson <- getURL("https://raw.githubusercontent.com/jeffnieman11/Data607_HW8/master/books.json")
testjson
## [1] "{\"Books on Leadership\" :[\n {\n \"Title\" : \"On Becoming a Leader\",\n \"Author1\" : \"Warren Bennis\",\n \"Author2\" : \"NA\",\n \"Edition\" : 1,\n \"Pages\" : 226,\n\t\"Year\" : 1994,\n\t\"Publisher\" : \"Addison-Wesley\"\n },\n {\n \"Title\" : \"Leaders\",\n \"Author1\" : \"Warren Bennis\",\n \"Author2\" : \"Burt Nannus\",\n \"Edition\" : 2,\n \"Pages\" : 235,\n\t\"Year\" : 1997,\n\t\"Publisher\" : \"Harper Business\"\n },\n {\n \"Title\" : \"The Leadership Challenge\",\n \"Author1\" : \"James Kouzes\",\n \"Author2\" : \"Barry Pousner\",\n \"Edition\" : 3,\n \"Pages\" : 458,\n\t\"Year\" : 2002,\n\t\"Publisher\" : \"Jossey-Bass\"\n },\n {\n \"Title\" : \"Leading Change\",\n \"Author1\" : \"John Kotter\",\n \"Author2\" : \"NA\",\n \"Edition\" : 1,\n \"Pages\" : 186,\n\t\"Year\" : 1996,\n\t\"Publisher\" : \"Harvard Business School Press\"\n }]\n\n}\n"
extractjson <- fromJSON(testjson)
extractjson
## $`Books on Leadership`
## $`Books on Leadership`[[1]]
## $`Books on Leadership`[[1]]$Title
## [1] "On Becoming a Leader"
##
## $`Books on Leadership`[[1]]$Author1
## [1] "Warren Bennis"
##
## $`Books on Leadership`[[1]]$Author2
## [1] "NA"
##
## $`Books on Leadership`[[1]]$Edition
## [1] 1
##
## $`Books on Leadership`[[1]]$Pages
## [1] 226
##
## $`Books on Leadership`[[1]]$Year
## [1] 1994
##
## $`Books on Leadership`[[1]]$Publisher
## [1] "Addison-Wesley"
##
##
## $`Books on Leadership`[[2]]
## $`Books on Leadership`[[2]]$Title
## [1] "Leaders"
##
## $`Books on Leadership`[[2]]$Author1
## [1] "Warren Bennis"
##
## $`Books on Leadership`[[2]]$Author2
## [1] "Burt Nannus"
##
## $`Books on Leadership`[[2]]$Edition
## [1] 2
##
## $`Books on Leadership`[[2]]$Pages
## [1] 235
##
## $`Books on Leadership`[[2]]$Year
## [1] 1997
##
## $`Books on Leadership`[[2]]$Publisher
## [1] "Harper Business"
##
##
## $`Books on Leadership`[[3]]
## $`Books on Leadership`[[3]]$Title
## [1] "The Leadership Challenge"
##
## $`Books on Leadership`[[3]]$Author1
## [1] "James Kouzes"
##
## $`Books on Leadership`[[3]]$Author2
## [1] "Barry Pousner"
##
## $`Books on Leadership`[[3]]$Edition
## [1] 3
##
## $`Books on Leadership`[[3]]$Pages
## [1] 458
##
## $`Books on Leadership`[[3]]$Year
## [1] 2002
##
## $`Books on Leadership`[[3]]$Publisher
## [1] "Jossey-Bass"
##
##
## $`Books on Leadership`[[4]]
## $`Books on Leadership`[[4]]$Title
## [1] "Leading Change"
##
## $`Books on Leadership`[[4]]$Author1
## [1] "John Kotter"
##
## $`Books on Leadership`[[4]]$Author2
## [1] "NA"
##
## $`Books on Leadership`[[4]]$Edition
## [1] 1
##
## $`Books on Leadership`[[4]]$Pages
## [1] 186
##
## $`Books on Leadership`[[4]]$Year
## [1] 1996
##
## $`Books on Leadership`[[4]]$Publisher
## [1] "Harvard Business School Press"
nextjson <- extractjson$`Books on Leadership`
nextjson
## [[1]]
## [[1]]$Title
## [1] "On Becoming a Leader"
##
## [[1]]$Author1
## [1] "Warren Bennis"
##
## [[1]]$Author2
## [1] "NA"
##
## [[1]]$Edition
## [1] 1
##
## [[1]]$Pages
## [1] 226
##
## [[1]]$Year
## [1] 1994
##
## [[1]]$Publisher
## [1] "Addison-Wesley"
##
##
## [[2]]
## [[2]]$Title
## [1] "Leaders"
##
## [[2]]$Author1
## [1] "Warren Bennis"
##
## [[2]]$Author2
## [1] "Burt Nannus"
##
## [[2]]$Edition
## [1] 2
##
## [[2]]$Pages
## [1] 235
##
## [[2]]$Year
## [1] 1997
##
## [[2]]$Publisher
## [1] "Harper Business"
##
##
## [[3]]
## [[3]]$Title
## [1] "The Leadership Challenge"
##
## [[3]]$Author1
## [1] "James Kouzes"
##
## [[3]]$Author2
## [1] "Barry Pousner"
##
## [[3]]$Edition
## [1] 3
##
## [[3]]$Pages
## [1] 458
##
## [[3]]$Year
## [1] 2002
##
## [[3]]$Publisher
## [1] "Jossey-Bass"
##
##
## [[4]]
## [[4]]$Title
## [1] "Leading Change"
##
## [[4]]$Author1
## [1] "John Kotter"
##
## [[4]]$Author2
## [1] "NA"
##
## [[4]]$Edition
## [1] 1
##
## [[4]]$Pages
## [1] 186
##
## [[4]]$Year
## [1] 1996
##
## [[4]]$Publisher
## [1] "Harvard Business School Press"
booksjson <- data.frame(t(sapply(nextjson,c)))
booksjson
## Title Author1 Author2 Edition Pages Year
## 1 On Becoming a Leader Warren Bennis NA 1 226 1994
## 2 Leaders Warren Bennis Burt Nannus 2 235 1997
## 3 The Leadership Challenge James Kouzes Barry Pousner 3 458 2002
## 4 Leading Change John Kotter NA 1 186 1996
## Publisher
## 1 Addison-Wesley
## 2 Harper Business
## 3 Jossey-Bass
## 4 Harvard Business School Press
Conclusion: By using functions from XML, RCurl and RJSONIO I was able to make html, xml and JSON tables to look the same.