1 Assignment Instructions

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.

Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

2 Pre-Requistes : Available Libraries

  • XML
  • htmltab
  • rvest
  • RJSONIO
  • jsonlite
  • rjson
  • tidyjson
  • dplyr
  • plyr
  • DT
  • sqldf
  • knitr
  • kableExtra

3 HTML (>1993)

Used for Websites since 1993, less structured and poor human readable.

  • An HTML table is defined with the \(<table>\) tag.
  • Each table row is defined with the \(<tr>\) tag.
  • A table header is defined with the \(<th>\) tag. By default, table headings are bold and centered.
  • A table data/cell is defined with the \(<td>\) tag.

3.1 Read HTML File into R from local working directory

myWorkingDir <- getwd()
myHTMLfile <- paste0(myWorkingDir,"/BookList.html")

3.2 Show Raw Data

Show the raw HTML data by parsing HTML file using various functions as follows

3.2.1 Using htmlParse

Parse the html file with htmlParse

parsedHTML <- htmlParse(file = myHTMLfile,encoding = "UTF-8")
parsedHTML
## <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
## <html><body><table class="table table-bordered table-hover table-condensed">
## <thead><tr>
## <th title="Field #1">Book#</th>
## <th title="Field #2">Topic</th>
## <th title="Field #3">Title</th>
## <th title="Field #4">Authors</th>
## <th title="Field #5">Rating</th>
## <th title="Field #6">Type</th>
## <th title="Field #7">Pages</th>
## <th title="Field #8">Publisher</th>
## <th title="Field #9">LatestReleaseDate</th>
## <th title="Field #10">List Price</th>
## <th title="Field #11">BookCover</th>
## <th title="Field #12">AmazonLink</th>
## </tr></thead>
## <tbody>
## <tr>
## <td align="right">1</td>
## <td>Machine Learning</td>
## <td>Deep Learning (Adaptive Computation and Machine Learning series)</td>
## <td>Ian Goodfellow, Yoshua Bengio, Aaron Courville</td>
## <td align="right">4.8</td>
## <td>HardCover</td>
## <td align="right">775</td>
## <td>The MIT Press</td>
## <td>18-Nov-16</td>
## <td>$28.99</td>
## <td><img src="1%20-%20Deep%20Learning.jpg"></td>
## <td>https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&amp;psc=1&amp;refRID=9HK33PPS16VDZN3B1N8G</td>
## </tr>
## <tr>
## <td align="right">2</td>
## <td>Machine Learning</td>
## <td>Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time</td>
## <td>Andrew Butler</td>
## <td align="right">4.2</td>
## <td>PaperBack</td>
## <td align="right">118</td>
## <td>CreateSpace Independent Publishing Platform</td>
## <td>12-Aug-16</td>
## <td>$9.95</td>
## <td><img src="2%20-%20Amazon%20Echo.jpg"></td>
## <td>https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&amp;psc=1&amp;refRID=9HK33PPS16VDZN3B1N8G</td>
## </tr>
## <tr>
## <td align="right">3</td>
## <td>Machine Learning</td>
## <td>An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)</td>
## <td>Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani</td>
## <td align="right">4.8</td>
## <td>HardCover</td>
## <td align="right">426</td>
## <td>Springer</td>
## <td>1-Sep-17</td>
## <td>$68.61</td>
## <td><img src="3%20-%20An%20Introduction%20to%20Statistical%20Learning.jpg"></td>
## <td>https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&amp;psc=1&amp;refRID=9HK33PPS16VDZN3B1N8G</td>
## </tr>
## </tbody>
## </table></body></html>
## 

3.2.2 Using readHTML

readHTML <- read_html(myHTMLfile)
names(readHTML)
## [1] "node" "doc"
#print(readHTML)
readHTML
## {xml_document}
## <html>
## [1] <body><table class="table table-bordered table-hover table-condensed ...

3.2.3 Using readHTMLTable

Use the XML package to read our HTML file using the readHTMLTable function which creates a list

readHTMLTable <- readHTMLTable(myHTMLfile, which = 1)
#names(readHTMLTable)
#print(readHTMLTable)
readHTMLTable
##   Book#            Topic
## 1     1 Machine Learning
## 2     2 Machine Learning
## 3     3 Machine Learning
##                                                                                            Title
## 1                               Deep Learning (Adaptive Computation and Machine Learning series)
## 2                                Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
##                                                          Authors Rating
## 1                 Ian Goodfellow, Yoshua Bengio, Aaron Courville    4.8
## 2                                                  Andrew Butler    4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani    4.8
##        Type Pages                                   Publisher
## 1 HardCover   775                               The MIT Press
## 2 PaperBack   118 CreateSpace Independent Publishing Platform
## 3 HardCover   426                                    Springer
##   LatestReleaseDate List Price BookCover
## 1         18-Nov-16     $28.99          
## 2         12-Aug-16      $9.95          
## 3          1-Sep-17     $68.61          
##                                                                                                                                                          AmazonLink
## 1                 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 2                     https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G

3.2.4 Using htmltab

readHTMLTab <- htmltab(doc = myHTMLfile)
## Argument 'which' was left unspecified. Choosing first table.
## Warning: Columns [BookCover] seem to have no data and are removed. Use
## rm_nodata_cols = F to suppress this behavior
#names(readHTMLTable)
#print(readHTMLTable)
readHTMLTab
##   Book#            Topic
## 2     1 Machine Learning
## 3     2 Machine Learning
## 4     3 Machine Learning
##                                                                                            Title
## 2                               Deep Learning (Adaptive Computation and Machine Learning series)
## 3                                Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 4 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
##                                                          Authors Rating
## 2                 Ian Goodfellow, Yoshua Bengio, Aaron Courville    4.8
## 3                                                  Andrew Butler    4.2
## 4 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani    4.8
##        Type Pages                                   Publisher
## 2 HardCover   775                               The MIT Press
## 3 PaperBack   118 CreateSpace Independent Publishing Platform
## 4 HardCover   426                                    Springer
##   LatestReleaseDate List Price
## 2         18-Nov-16     $28.99
## 3         12-Aug-16      $9.95
## 4          1-Sep-17     $68.61
##                                                                                                                                                          AmazonLink
## 2                 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3                     https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 4 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G

3.3 Show Data as Table

Show HTML data loaded from above different approaches in the form of various tabular formats

df_parsedHTML <- readHTMLTable(parsedHTML) %>% ldply(data.frame) %>% select(-.id) # eggting rid of .id column
df_readHTML <- html_nodes(readHTML,"table") %>% html_table(trim = TRUE, header = TRUE, fill = TRUE) %>% as.data.frame()
df_readHTMLTable <-  data.frame(readHTMLTable)
df_readHTMLTab <- readHTMLTab

image_to_df <- function(dataframe) {
  for (i in 1:3) {
    url <- rep(dataframe$AmazonLink[i], 1)
    dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
    if (i==1) {
      image <- paste0("[![](1 - Deep Learning.jpg)](", dataframe$BookCover[i], ")") 
      dataframe$BookCover[1] <- sprintf(image)
    } else if (i==2) {
      image <- paste0("[![](2 - Amazon Echo.jpg)](", dataframe$BookCover[i], ")")
      dataframe$BookCover[2] <- sprintf(image)
    } else if (i==3) {
      image <- paste0("[![](3 - An Introduction to Statistical Learning.jpg)](", dataframe$BookCover[i], ")")
      dataframe$BookCover[3] <- sprintf(image)
    } 
  }
  return(dataframe)
}

#df_parsedHTML <- image_to_df(df_parsedHTML)
#df_readHTML <- image_to_df(df_readHTML)
#df_readHTMLTable <- image_to_df(df_readHTMLTable)
df_readHTMLTab <- image_to_df(df_readHTMLTab)

3.3.1 Kable

kable(df_readHTMLTab)
Book# Topic Title Authors Rating Type Pages Publisher LatestReleaseDate List Price AmazonLink BookCover
2 1 Machine Learning Deep Learning (Adaptive Computation and Machine Learning series) Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8 HardCover 775 The MIT Press 18-Nov-16 $28.99 AmazonLink
3 2 Machine Learning Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time Andrew Butler 4.2 PaperBack 118 CreateSpace Independent Publishing Platform 12-Aug-16 $9.95 AmazonLink
4 3 Machine Learning An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8 HardCover 426 Springer 1-Sep-17 $68.61 AmazonLink

3.3.2 Data Table

DT::datatable(df_readHTML, options = list(pagelength=5))

3.3.3 Select

DT::datatable(select(df_readHTML, `Topic`:List.Price), options = list(pagelength=5))

3.3.4 SQL

sqldf("select * from df_readHTML") 
##   Book.            Topic
## 1     1 Machine Learning
## 2     2 Machine Learning
## 3     3 Machine Learning
##                                                                                            Title
## 1                               Deep Learning (Adaptive Computation and Machine Learning series)
## 2                                Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
##                                                          Authors Rating
## 1                 Ian Goodfellow, Yoshua Bengio, Aaron Courville    4.8
## 2                                                  Andrew Butler    4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani    4.8
##        Type Pages                                   Publisher
## 1 HardCover   775                               The MIT Press
## 2 PaperBack   118 CreateSpace Independent Publishing Platform
## 3 HardCover   426                                    Springer
##   LatestReleaseDate List.Price BookCover
## 1         18-Nov-16     $28.99        NA
## 2         12-Aug-16      $9.95        NA
## 3          1-Sep-17     $68.61        NA
##                                                                                                                                                          AmazonLink
## 1                 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 2                     https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G

3.3.5 Knitr

knitr::kable(df_parsedHTML, format = "html")
Book. Topic Title Authors Rating Type Pages Publisher LatestReleaseDate List.Price BookCover AmazonLink
1 Machine Learning Deep Learning (Adaptive Computation and Machine Learning series) Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8 HardCover 775 The MIT Press 18-Nov-16 $28.99 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
2 Machine Learning Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time Andrew Butler 4.2 PaperBack 118 CreateSpace Independent Publishing Platform 12-Aug-16 $9.95 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
3 Machine Learning An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8 HardCover 426 Springer 1-Sep-17 $68.61 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G

4 XML (>1996)

Document storage and Data transfer, more structured and more human readable

  • XML data is defined with the \(<>\) start tag and \(</>\) end tag
  • Hierarchical data is nested within each parent tags

4.1 Read XML File into R from local working directory

4.2 Show Raw Data

Show the raw XML data by parsing HTML file using various functions as follows

4.2.1 Using xmlParse

Parse the xml file with xmlParse

parsedXML <- xmlParse(file = myXMLfile, useInternalNodes = TRUE)
parsedXML
## <?xml version="1.0" encoding="UTF-8"?>
## <bookList>
##   <book id="1">
##     <Topic>Machine Learning</Topic>
##     <Title>Deep Learning (Adaptive Computation and Machine Learning series)</Title>
##     <Authors>Ian Goodfellow, Yoshua Bengio, Aaron Courville</Authors>
##     <Rating>4.8</Rating>
##     <Type>HardCover</Type>
##     <Pages>775</Pages>
##     <Publisher>The MIT Press</Publisher>
##     <LatestReleaseDate>18-Nov-16</LatestReleaseDate>
##     <ListPrice>$28.99</ListPrice>
##     <image>
##       <src>1 - Deep Learning.jpg</src>
##     </image>
##     <AmazonLink>https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8</AmazonLink>
##   </book>
##   <book id="2">
##     <Topic>Machine Learning</Topic>
##     <Title>Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time</Title>
##     <Authors>Andrew Butler</Authors>
##     <Rating>4.2</Rating>
##     <Type>PaperBack</Type>
##     <Pages>118</Pages>
##     <Publisher>CreateSpace Independent Publishing Platform</Publisher>
##     <LatestReleaseDate>12-Aug-16</LatestReleaseDate>
##     <ListPrice>$9.95</ListPrice>
##     <image>
##       <src>2 - Amazon Echo.jpg</src>
##     </image>
##     <AmazonLink>https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8</AmazonLink>
##   </book>
##   <book id="3">
##     <Topic>Machine Learning</Topic>
##     <Title>An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)</Title>
##     <Authors>Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani</Authors>
##     <Rating>4.8</Rating>
##     <Type>HardCover</Type>
##     <Pages>426</Pages>
##     <Publisher>Springer</Publisher>
##     <LatestReleaseDate>1-Sep-17</LatestReleaseDate>
##     <ListPrice>$68.61</ListPrice>
##     <image>
##       <src>3 - An Introduction to Statistical Learning.jpg</src>
##     </image>
##     <AmazonLink>https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8</AmazonLink>
##   </book>
## </bookList>
## 

4.2.2 Using xmlTreeParse

xmlTParse <- xmlTreeParse(myXMLfile) %>% xmlRoot() %>% xmlSApply(xmlValue)
xmlTParse
##                                                                                                                                                                                                                                                                                                                                                                                              book 
##                                                                                   "Machine LearningDeep Learning (Adaptive Computation and Machine Learning series)Ian Goodfellow, Yoshua Bengio, Aaron Courville4.8HardCover775The MIT Press18-Nov-16$28.991 - Deep Learning.jpghttps://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8" 
##                                                                                                                                                                                                                                                                                                                                                                                              book 
##                                                                                              "Machine LearningAmazon Echo: The Ultimate Guide to Learn Amazon Echo In No TimeAndrew Butler4.2PaperBack118CreateSpace Independent Publishing Platform12-Aug-16$9.952 - Amazon Echo.jpghttps://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8" 
##                                                                                                                                                                                                                                                                                                                                                                                              book 
## "Machine LearningAn Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani4.8HardCover426Springer1-Sep-17$68.613 - An Introduction to Statistical Learning.jpghttps://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8"

4.2.3 Using xmlToList

Use the XML package to read XML file using the xmlToList function which creates a list

xmlList <- xmlToList(myXMLfile)
xmlList
## $book
## $book$Topic
## [1] "Machine Learning"
## 
## $book$Title
## [1] "Deep Learning (Adaptive Computation and Machine Learning series)"
## 
## $book$Authors
## [1] "Ian Goodfellow, Yoshua Bengio, Aaron Courville"
## 
## $book$Rating
## [1] "4.8"
## 
## $book$Type
## [1] "HardCover"
## 
## $book$Pages
## [1] "775"
## 
## $book$Publisher
## [1] "The MIT Press"
## 
## $book$LatestReleaseDate
## [1] "18-Nov-16"
## 
## $book$ListPrice
## [1] "$28.99"
## 
## $book$image
## $book$image$src
## [1] "1 - Deep Learning.jpg"
## 
## 
## $book$AmazonLink
## [1] "https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8"
## 
## $book$.attrs
##  id 
## "1" 
## 
## 
## $book
## $book$Topic
## [1] "Machine Learning"
## 
## $book$Title
## [1] "Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time"
## 
## $book$Authors
## [1] "Andrew Butler"
## 
## $book$Rating
## [1] "4.2"
## 
## $book$Type
## [1] "PaperBack"
## 
## $book$Pages
## [1] "118"
## 
## $book$Publisher
## [1] "CreateSpace Independent Publishing Platform"
## 
## $book$LatestReleaseDate
## [1] "12-Aug-16"
## 
## $book$ListPrice
## [1] "$9.95"
## 
## $book$image
## $book$image$src
## [1] "2 - Amazon Echo.jpg"
## 
## 
## $book$AmazonLink
## [1] "https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8"
## 
## $book$.attrs
##  id 
## "2" 
## 
## 
## $book
## $book$Topic
## [1] "Machine Learning"
## 
## $book$Title
## [1] "An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)"
## 
## $book$Authors
## [1] "Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani"
## 
## $book$Rating
## [1] "4.8"
## 
## $book$Type
## [1] "HardCover"
## 
## $book$Pages
## [1] "426"
## 
## $book$Publisher
## [1] "Springer"
## 
## $book$LatestReleaseDate
## [1] "1-Sep-17"
## 
## $book$ListPrice
## [1] "$68.61"
## 
## $book$image
## $book$image$src
## [1] "3 - An Introduction to Statistical Learning.jpg"
## 
## 
## $book$AmazonLink
## [1] "https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8"
## 
## $book$.attrs
##  id 
## "3"

4.2.4 Using xmlToDataFrame

Use the XML package to read XML file using the xmlToDataFrame function which creates a dataframe

xmlDataFrame <- xmlToDataFrame(myXMLfile)
xmlDataFrame
##              Topic
## 1 Machine Learning
## 2 Machine Learning
## 3 Machine Learning
##                                                                                            Title
## 1                               Deep Learning (Adaptive Computation and Machine Learning series)
## 2                                Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
##                                                          Authors Rating
## 1                 Ian Goodfellow, Yoshua Bengio, Aaron Courville    4.8
## 2                                                  Andrew Butler    4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani    4.8
##        Type Pages                                   Publisher
## 1 HardCover   775                               The MIT Press
## 2 PaperBack   118 CreateSpace Independent Publishing Platform
## 3 HardCover   426                                    Springer
##   LatestReleaseDate ListPrice
## 1         18-Nov-16    $28.99
## 2         12-Aug-16     $9.95
## 3          1-Sep-17    $68.61
##                                             image
## 1                           1 - Deep Learning.jpg
## 2                             2 - Amazon Echo.jpg
## 3 3 - An Introduction to Statistical Learning.jpg
##                                                                                                                        AmazonLink
## 1                 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8
## 2                     https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8

4.3 Show Data as Table

Show XML data loaded from above different approaches in the form of various tabular formats

df_parsedXML <- xmlRoot(parsedXML) %>% xmlToDataFrame()
df_xmlTParse <- xmlTParse %>% ldply(data.frame) %>% select(-c(.id))
df_xmlList <- ldply(xmlList, data.frame) %>% select(-c(.id))
df_xmlDataFrame <- xmlDataFrame

image_to_df <- function(dataframe) {
  for (i in 1:3) {
    if (i==1) {
      url <- "dataframe$AmazonLink[i]"
      dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
      image <- paste0("[![](1 - Deep Learning.jpg)](", dataframe$BookCover[i], ")") 
      dataframe$BookCover[1] <- sprintf(image)
    } else if (i==2) {
      url <- dataframe$AmazonLink[i]
      dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
      image <- paste0("[![](2 - Amazon Echo.jpg)](", dataframe$BookCover[i], ")")
      dataframe$BookCover[2] <- sprintf(image)
    } else if (i==3) {
      url <- dataframe$AmazonLink[i]
      dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
      image <- paste0("[![](3 - An Introduction to Statistical Learning.jpg)](", dataframe$BookCover[i], ")")
      dataframe$BookCover[3] <- sprintf(image)
    } 
  }
  return(dataframe)
}

#df_parsedHTML <- image_to_df(df_parsedHTML)
#df_readHTML <- image_to_df(df_readHTML)
#df_readHTMLTable <- image_to_df(df_readHTMLTable)
df_xmlDataFrame <- image_to_df(df_xmlDataFrame)
## Warning in `[<-.factor`(`*tmp*`, i, value = "[AmazonLink]
## (dataframe$AmazonLink[i])"): invalid factor level, NA generated

## Warning in `[<-.factor`(`*tmp*`, i, value = "[AmazonLink]
## (dataframe$AmazonLink[i])"): invalid factor level, NA generated

## Warning in `[<-.factor`(`*tmp*`, i, value = "[AmazonLink]
## (dataframe$AmazonLink[i])"): invalid factor level, NA generated

4.3.1 Kable

kable(df_xmlDataFrame)
Topic Title Authors Rating Type Pages Publisher LatestReleaseDate ListPrice image AmazonLink BookCover
Machine Learning Deep Learning (Adaptive Computation and Machine Learning series) Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8 HardCover 775 The MIT Press 18-Nov-16 $28.99 1 - Deep Learning.jpg NA
Machine Learning Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time Andrew Butler 4.2 PaperBack 118 CreateSpace Independent Publishing Platform 12-Aug-16 $9.95 2 - Amazon Echo.jpg NA
Machine Learning An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8 HardCover 426 Springer 1-Sep-17 $68.61 3 - An Introduction to Statistical Learning.jpg NA

4.3.2 Data Table

DT::datatable(df_parsedXML, options = list(pagelength=5))

4.3.3 Select

DT::datatable(select(df_parsedXML, Topic:ListPrice), options = list(pagelength=5))

4.3.4 SQL

sqldf("select * from df_xmlTParse") 
##                                                                                                                                                                                                                                                                                                                                                                                            X..i..
## 1                                                                                   Machine LearningDeep Learning (Adaptive Computation and Machine Learning series)Ian Goodfellow, Yoshua Bengio, Aaron Courville4.8HardCover775The MIT Press18-Nov-16$28.991 - Deep Learning.jpghttps://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8
## 2                                                                                              Machine LearningAmazon Echo: The Ultimate Guide to Learn Amazon Echo In No TimeAndrew Butler4.2PaperBack118CreateSpace Independent Publishing Platform12-Aug-16$9.952 - Amazon Echo.jpghttps://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8
## 3 Machine LearningAn Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani4.8HardCover426Springer1-Sep-17$68.613 - An Introduction to Statistical Learning.jpghttps://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8

4.3.5 Knitr

knitr::kable(df_xmlList, format = "html")
Topic Title Authors Rating Type Pages Publisher LatestReleaseDate ListPrice src AmazonLink .attrs
Machine Learning Deep Learning (Adaptive Computation and Machine Learning series) Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8 HardCover 775 The MIT Press 18-Nov-16 $28.99 1 - Deep Learning.jpg https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8 1
Machine Learning Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time Andrew Butler 4.2 PaperBack 118 CreateSpace Independent Publishing Platform 12-Aug-16 $9.95 2 - Amazon Echo.jpg https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8 2
Machine Learning An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8 HardCover 426 Springer 1-Sep-17 $68.61 3 - An Introduction to Statistical Learning.jpg https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8 3

5 JSON (>2001)

JSON is an XML replacement since it is light weight, most structured and most human readable compared to other two (HTML and XML)

5.1 Read JSON File into R from local working directory

myWorkingDir <- getwd()
myJSONfile <- paste0(myWorkingDir,"/BookList.json")

5.2 Show Raw Data

Show the raw JSON data by parsing HTML file using various functions as follows

5.2.1 Using RJSONIO

Parse the json file with fromJSON

readJSONIO <- RJSONIO::fromJSON(myJSONfile)
readJSONIO
## $BookList
## $BookList[[1]]
## $BookList[[1]]$`Book#`
## [1] "1"
## 
## $BookList[[1]]$Topic
## [1] "Machine Learning"
## 
## $BookList[[1]]$Title
## [1] "Deep Learning (Adaptive Computation and Machine Learning series)"
## 
## $BookList[[1]]$Authors
## [1] "Ian Goodfellow, Yoshua Bengio, Aaron Courville"
## 
## $BookList[[1]]$Rating
## [1] "4.8"
## 
## $BookList[[1]]$Type
## [1] "HardCover"
## 
## $BookList[[1]]$Pages
## [1] 775
## 
## $BookList[[1]]$Publisher
## [1] "The MIT Press"
## 
## $BookList[[1]]$`Latest Release Date`
## [1] "18-Nov-16"
## 
## $BookList[[1]]$`List Price`
## [1] "$28.99"
## 
## $BookList[[1]]$BookCover
## [1] "1 - Deep Learning.jpg"
## 
## $BookList[[1]]$AmazonLink
## [1] "https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
## 
## 
## $BookList[[2]]
## $BookList[[2]]$`Book#`
## [1] "2"
## 
## $BookList[[2]]$Topic
## [1] "Machine Learning"
## 
## $BookList[[2]]$Title
## [1] "Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time"
## 
## $BookList[[2]]$Authors
## [1] "Andrew Butler"
## 
## $BookList[[2]]$Rating
## [1] "4.2"
## 
## $BookList[[2]]$Type
## [1] "PaperBack"
## 
## $BookList[[2]]$Pages
## [1] 118
## 
## $BookList[[2]]$Publisher
## [1] "CreateSpace Independent Publishing Platform"
## 
## $BookList[[2]]$`Latest Release Date`
## [1] "12-Aug-16"
## 
## $BookList[[2]]$`List Price`
## [1] "$9.95"
## 
## $BookList[[2]]$BookCover
## [1] "2 - Amazon Echo.jpg"
## 
## $BookList[[2]]$AmazonLink
## [1] "https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
## 
## 
## $BookList[[3]]
## $BookList[[3]]$`Book#`
## [1] "3"
## 
## $BookList[[3]]$Topic
## [1] "Machine Learning"
## 
## $BookList[[3]]$Title
## [1] "An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)"
## 
## $BookList[[3]]$Authors
## [1] "Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani"
## 
## $BookList[[3]]$Rating
## [1] "4.8"
## 
## $BookList[[3]]$Type
## [1] "HardCover"
## 
## $BookList[[3]]$Pages
## [1] 426
## 
## $BookList[[3]]$Publisher
## [1] "Springer"
## 
## $BookList[[3]]$`Latest Release Date`
## [1] "1-Sep-17"
## 
## $BookList[[3]]$`List Price`
## [1] "$68.61"
## 
## $BookList[[3]]$BookCover
## [1] "3 - An Introduction to Statistical Learning.jpg"
## 
## $BookList[[3]]$AmazonLink
## [1] "https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"

5.2.2 Using jsonlite

readJSONlite <- jsonlite::fromJSON(myJSONfile)
readJSONlite
## $BookList
##   Book#            Topic
## 1     1 Machine Learning
## 2     2 Machine Learning
## 3     3 Machine Learning
##                                                                                            Title
## 1                               Deep Learning (Adaptive Computation and Machine Learning series)
## 2                                Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
##                                                          Authors Rating
## 1                 Ian Goodfellow, Yoshua Bengio, Aaron Courville    4.8
## 2                                                  Andrew Butler    4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani    4.8
##        Type Pages                                   Publisher
## 1 HardCover   775                               The MIT Press
## 2 PaperBack   118 CreateSpace Independent Publishing Platform
## 3 HardCover   426                                    Springer
##   Latest Release Date List Price
## 1           18-Nov-16     $28.99
## 2           12-Aug-16      $9.95
## 3            1-Sep-17     $68.61
##                                         BookCover
## 1                           1 - Deep Learning.jpg
## 2                             2 - Amazon Echo.jpg
## 3 3 - An Introduction to Statistical Learning.jpg
##                                                                                                                                                          AmazonLink
## 1                 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 2                     https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G

5.2.3 Using rjson

readJSON <- rjson::fromJSON(file=myJSONfile, method='C')
readJSON
## $BookList
## $BookList[[1]]
## $BookList[[1]]$`Book#`
## [1] "1"
## 
## $BookList[[1]]$Topic
## [1] "Machine Learning"
## 
## $BookList[[1]]$Title
## [1] "Deep Learning (Adaptive Computation and Machine Learning series)"
## 
## $BookList[[1]]$Authors
## [1] "Ian Goodfellow, Yoshua Bengio, Aaron Courville"
## 
## $BookList[[1]]$Rating
## [1] "4.8"
## 
## $BookList[[1]]$Type
## [1] "HardCover"
## 
## $BookList[[1]]$Pages
## [1] 775
## 
## $BookList[[1]]$Publisher
## [1] "The MIT Press"
## 
## $BookList[[1]]$`Latest Release Date`
## [1] "18-Nov-16"
## 
## $BookList[[1]]$`List Price`
## [1] "$28.99"
## 
## $BookList[[1]]$BookCover
## [1] "1 - Deep Learning.jpg"
## 
## $BookList[[1]]$AmazonLink
## [1] "https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
## 
## 
## $BookList[[2]]
## $BookList[[2]]$`Book#`
## [1] "2"
## 
## $BookList[[2]]$Topic
## [1] "Machine Learning"
## 
## $BookList[[2]]$Title
## [1] "Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time"
## 
## $BookList[[2]]$Authors
## [1] "Andrew Butler"
## 
## $BookList[[2]]$Rating
## [1] "4.2"
## 
## $BookList[[2]]$Type
## [1] "PaperBack"
## 
## $BookList[[2]]$Pages
## [1] 118
## 
## $BookList[[2]]$Publisher
## [1] "CreateSpace Independent Publishing Platform"
## 
## $BookList[[2]]$`Latest Release Date`
## [1] "12-Aug-16"
## 
## $BookList[[2]]$`List Price`
## [1] "$9.95"
## 
## $BookList[[2]]$BookCover
## [1] "2 - Amazon Echo.jpg"
## 
## $BookList[[2]]$AmazonLink
## [1] "https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
## 
## 
## $BookList[[3]]
## $BookList[[3]]$`Book#`
## [1] "3"
## 
## $BookList[[3]]$Topic
## [1] "Machine Learning"
## 
## $BookList[[3]]$Title
## [1] "An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)"
## 
## $BookList[[3]]$Authors
## [1] "Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani"
## 
## $BookList[[3]]$Rating
## [1] "4.8"
## 
## $BookList[[3]]$Type
## [1] "HardCover"
## 
## $BookList[[3]]$Pages
## [1] 426
## 
## $BookList[[3]]$Publisher
## [1] "Springer"
## 
## $BookList[[3]]$`Latest Release Date`
## [1] "1-Sep-17"
## 
## $BookList[[3]]$`List Price`
## [1] "$68.61"
## 
## $BookList[[3]]$BookCover
## [1] "3 - An Introduction to Statistical Learning.jpg"
## 
## $BookList[[3]]$AmazonLink
## [1] "https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"

5.3 Show Data as Table

Show JSON data loaded from above different approaches in the form of various tabular formats

df_readJSONIO <- data.frame(readJSONIO)
df_readJSONlite <- as.data.frame(readJSONlite)
df_readJSON <- ldply(readJSON, data.frame)

image_to_df <- function(dataframe) {
  for (i in 1:3) {
    url <- rep(dataframe$BookList.AmazonLink[i], 1)
    dataframe$BookList.AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
    if (i==1) {
      image <- paste0("[![](1 - Deep Learning.jpg)](", dataframe$BookList.BookCover[i], ")") 
      dataframe$BookList.BookCover[1] <- sprintf(image)
    } else if (i==2) {
      image <- paste0("[![](2 - Amazon Echo.jpg)](", dataframe$BookList.BookCover[i], ")")
      dataframe$BookList.BookCover[2] <- sprintf(image)
    } else if (i==3) {
      image <- paste0("[![](3 - An Introduction to Statistical Learning.jpg)](", dataframe$BookList.BookCover[i], ")")
      dataframe$BookList.BookCover[3] <- sprintf(image)
    } 
  }
  return(dataframe)
}

#df_readJSONIO <- image_to_df(df_readJSONIO)
df_readJSONlite <- image_to_df(df_readJSONlite)
#df_readJSON <- image_to_df(df_readJSON)

5.3.1 Kable

kable(df_readJSONlite)
BookList.Book. BookList.Topic BookList.Title BookList.Authors BookList.Rating BookList.Type BookList.Pages BookList.Publisher BookList.Latest.Release.Date BookList.List.Price BookList.BookCover BookList.AmazonLink
1 Machine Learning Deep Learning (Adaptive Computation and Machine Learning series) Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8 HardCover 775 The MIT Press 18-Nov-16 $28.99 AmazonLink
2 Machine Learning Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time Andrew Butler 4.2 PaperBack 118 CreateSpace Independent Publishing Platform 12-Aug-16 $9.95 AmazonLink
3 Machine Learning An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8 HardCover 426 Springer 1-Sep-17 $68.61 AmazonLink

5.3.2 Data Table

DT::datatable(df_readJSONlite, options = list(pagelength=5))

5.3.3 Select

DT::datatable(select(df_readJSONlite, BookList.Topic:BookList.List.Price), options = list(pagelength=5))

5.3.4 SQL

sqldf("select * from df_readJSON") 
##        .id Book.            Topic
## 1 BookList     1 Machine Learning
##                                                              Title
## 1 Deep Learning (Adaptive Computation and Machine Learning series)
##                                          Authors Rating      Type Pages
## 1 Ian Goodfellow, Yoshua Bengio, Aaron Courville    4.8 HardCover   775
##       Publisher Latest.Release.Date List.Price             BookCover
## 1 The MIT Press           18-Nov-16     $28.99 1 - Deep Learning.jpg
##                                                                                                                                          AmazonLink
## 1 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
##   Book..1          Topic.1
## 1       2 Machine Learning
##                                                           Title.1
## 1 Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
##       Authors.1 Rating.1    Type.1 Pages.1
## 1 Andrew Butler      4.2 PaperBack     118
##                                   Publisher.1 Latest.Release.Date.1
## 1 CreateSpace Independent Publishing Platform             12-Aug-16
##   List.Price.1         BookCover.1
## 1        $9.95 2 - Amazon Echo.jpg
##                                                                                                                                    AmazonLink.1
## 1 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
##   Book..2          Topic.2
## 1       3 Machine Learning
##                                                                                          Title.2
## 1 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
##                                                        Authors.2 Rating.2
## 1 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani      4.8
##      Type.2 Pages.2 Publisher.2 Latest.Release.Date.2 List.Price.2
## 1 HardCover     426    Springer              1-Sep-17       $68.61
##                                       BookCover.2
## 1 3 - An Introduction to Statistical Learning.jpg
##                                                                                                                                                        AmazonLink.2
## 1 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G

5.3.5 Knitr

knitr::kable(df_readJSONIO, format = "html")
BookList.Book. BookList.Topic BookList.Title BookList.Authors BookList.Rating BookList.Type BookList.Pages BookList.Publisher BookList.Latest.Release.Date BookList.List.Price BookList.BookCover BookList.AmazonLink BookList.Book..1 BookList.Topic.1 BookList.Title.1 BookList.Authors.1 BookList.Rating.1 BookList.Type.1 BookList.Pages.1 BookList.Publisher.1 BookList.Latest.Release.Date.1 BookList.List.Price.1 BookList.BookCover.1 BookList.AmazonLink.1 BookList.Book..2 BookList.Topic.2 BookList.Title.2 BookList.Authors.2 BookList.Rating.2 BookList.Type.2 BookList.Pages.2 BookList.Publisher.2 BookList.Latest.Release.Date.2 BookList.List.Price.2 BookList.BookCover.2 BookList.AmazonLink.2
1 Machine Learning Deep Learning (Adaptive Computation and Machine Learning series) Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8 HardCover 775 The MIT Press 18-Nov-16 $28.99 1 - Deep Learning.jpg https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G 2 Machine Learning Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time Andrew Butler 4.2 PaperBack 118 CreateSpace Independent Publishing Platform 12-Aug-16 $9.95 2 - Amazon Echo.jpg https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G 3 Machine Learning An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8 HardCover 426 Springer 1-Sep-17 $68.61 3 - An Introduction to Statistical Learning.jpg https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G

6 Conclusion

  • The method of parsing the three different source files are different
  • The data frames from all three different source files are not exactly identical especially the column names
  • With minor modifications with various libraries the end data frame are the same