Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.
Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.
Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?
Used for Websites since 1993, less structured and poor human readable.
myWorkingDir <- getwd()
myHTMLfile <- paste0(myWorkingDir,"/BookList.html")Show the raw HTML data by parsing HTML file using various functions as follows
Parse the html file with htmlParse
parsedHTML <- htmlParse(file = myHTMLfile,encoding = "UTF-8")
parsedHTML## <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
## <html><body><table class="table table-bordered table-hover table-condensed">
## <thead><tr>
## <th title="Field #1">Book#</th>
## <th title="Field #2">Topic</th>
## <th title="Field #3">Title</th>
## <th title="Field #4">Authors</th>
## <th title="Field #5">Rating</th>
## <th title="Field #6">Type</th>
## <th title="Field #7">Pages</th>
## <th title="Field #8">Publisher</th>
## <th title="Field #9">LatestReleaseDate</th>
## <th title="Field #10">List Price</th>
## <th title="Field #11">BookCover</th>
## <th title="Field #12">AmazonLink</th>
## </tr></thead>
## <tbody>
## <tr>
## <td align="right">1</td>
## <td>Machine Learning</td>
## <td>Deep Learning (Adaptive Computation and Machine Learning series)</td>
## <td>Ian Goodfellow, Yoshua Bengio, Aaron Courville</td>
## <td align="right">4.8</td>
## <td>HardCover</td>
## <td align="right">775</td>
## <td>The MIT Press</td>
## <td>18-Nov-16</td>
## <td>$28.99</td>
## <td><img src="1%20-%20Deep%20Learning.jpg"></td>
## <td>https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G</td>
## </tr>
## <tr>
## <td align="right">2</td>
## <td>Machine Learning</td>
## <td>Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time</td>
## <td>Andrew Butler</td>
## <td align="right">4.2</td>
## <td>PaperBack</td>
## <td align="right">118</td>
## <td>CreateSpace Independent Publishing Platform</td>
## <td>12-Aug-16</td>
## <td>$9.95</td>
## <td><img src="2%20-%20Amazon%20Echo.jpg"></td>
## <td>https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G</td>
## </tr>
## <tr>
## <td align="right">3</td>
## <td>Machine Learning</td>
## <td>An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)</td>
## <td>Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani</td>
## <td align="right">4.8</td>
## <td>HardCover</td>
## <td align="right">426</td>
## <td>Springer</td>
## <td>1-Sep-17</td>
## <td>$68.61</td>
## <td><img src="3%20-%20An%20Introduction%20to%20Statistical%20Learning.jpg"></td>
## <td>https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G</td>
## </tr>
## </tbody>
## </table></body></html>
##
readHTML <- read_html(myHTMLfile)
names(readHTML)## [1] "node" "doc"
#print(readHTML)
readHTML## {xml_document}
## <html>
## [1] <body><table class="table table-bordered table-hover table-condensed ...
Use the XML package to read our HTML file using the readHTMLTable function which creates a list
readHTMLTable <- readHTMLTable(myHTMLfile, which = 1)
#names(readHTMLTable)
#print(readHTMLTable)
readHTMLTable## Book# Topic
## 1 1 Machine Learning
## 2 2 Machine Learning
## 3 3 Machine Learning
## Title
## 1 Deep Learning (Adaptive Computation and Machine Learning series)
## 2 Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
## Authors Rating
## 1 Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8
## 2 Andrew Butler 4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8
## Type Pages Publisher
## 1 HardCover 775 The MIT Press
## 2 PaperBack 118 CreateSpace Independent Publishing Platform
## 3 HardCover 426 Springer
## LatestReleaseDate List Price BookCover
## 1 18-Nov-16 $28.99
## 2 12-Aug-16 $9.95
## 3 1-Sep-17 $68.61
## AmazonLink
## 1 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 2 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
readHTMLTab <- htmltab(doc = myHTMLfile)## Argument 'which' was left unspecified. Choosing first table.
## Warning: Columns [BookCover] seem to have no data and are removed. Use
## rm_nodata_cols = F to suppress this behavior
#names(readHTMLTable)
#print(readHTMLTable)
readHTMLTab## Book# Topic
## 2 1 Machine Learning
## 3 2 Machine Learning
## 4 3 Machine Learning
## Title
## 2 Deep Learning (Adaptive Computation and Machine Learning series)
## 3 Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 4 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
## Authors Rating
## 2 Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8
## 3 Andrew Butler 4.2
## 4 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8
## Type Pages Publisher
## 2 HardCover 775 The MIT Press
## 3 PaperBack 118 CreateSpace Independent Publishing Platform
## 4 HardCover 426 Springer
## LatestReleaseDate List Price
## 2 18-Nov-16 $28.99
## 3 12-Aug-16 $9.95
## 4 1-Sep-17 $68.61
## AmazonLink
## 2 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 4 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
Show HTML data loaded from above different approaches in the form of various tabular formats
df_parsedHTML <- readHTMLTable(parsedHTML) %>% ldply(data.frame) %>% select(-.id) # eggting rid of .id column
df_readHTML <- html_nodes(readHTML,"table") %>% html_table(trim = TRUE, header = TRUE, fill = TRUE) %>% as.data.frame()
df_readHTMLTable <- data.frame(readHTMLTable)
df_readHTMLTab <- readHTMLTab
image_to_df <- function(dataframe) {
for (i in 1:3) {
url <- rep(dataframe$AmazonLink[i], 1)
dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
if (i==1) {
image <- paste0("[](", dataframe$BookCover[i], ")")
dataframe$BookCover[1] <- sprintf(image)
} else if (i==2) {
image <- paste0("[](", dataframe$BookCover[i], ")")
dataframe$BookCover[2] <- sprintf(image)
} else if (i==3) {
image <- paste0("[](", dataframe$BookCover[i], ")")
dataframe$BookCover[3] <- sprintf(image)
}
}
return(dataframe)
}
#df_parsedHTML <- image_to_df(df_parsedHTML)
#df_readHTML <- image_to_df(df_readHTML)
#df_readHTMLTable <- image_to_df(df_readHTMLTable)
df_readHTMLTab <- image_to_df(df_readHTMLTab)kable(df_readHTMLTab)| Book# | Topic | Title | Authors | Rating | Type | Pages | Publisher | LatestReleaseDate | List Price | AmazonLink | BookCover | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | Machine Learning | Deep Learning (Adaptive Computation and Machine Learning series) | Ian Goodfellow, Yoshua Bengio, Aaron Courville | 4.8 | HardCover | 775 | The MIT Press | 18-Nov-16 | $28.99 | AmazonLink |
|
| 3 | 2 | Machine Learning | Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time | Andrew Butler | 4.2 | PaperBack | 118 | CreateSpace Independent Publishing Platform | 12-Aug-16 | $9.95 | AmazonLink |
|
| 4 | 3 | Machine Learning | An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) | Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani | 4.8 | HardCover | 426 | Springer | 1-Sep-17 | $68.61 | AmazonLink |
|
DT::datatable(df_readHTML, options = list(pagelength=5))DT::datatable(select(df_readHTML, `Topic`:List.Price), options = list(pagelength=5))sqldf("select * from df_readHTML") ## Book. Topic
## 1 1 Machine Learning
## 2 2 Machine Learning
## 3 3 Machine Learning
## Title
## 1 Deep Learning (Adaptive Computation and Machine Learning series)
## 2 Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
## Authors Rating
## 1 Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8
## 2 Andrew Butler 4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8
## Type Pages Publisher
## 1 HardCover 775 The MIT Press
## 2 PaperBack 118 CreateSpace Independent Publishing Platform
## 3 HardCover 426 Springer
## LatestReleaseDate List.Price BookCover
## 1 18-Nov-16 $28.99 NA
## 2 12-Aug-16 $9.95 NA
## 3 1-Sep-17 $68.61 NA
## AmazonLink
## 1 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 2 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
knitr::kable(df_parsedHTML, format = "html")| Book. | Topic | Title | Authors | Rating | Type | Pages | Publisher | LatestReleaseDate | List.Price | BookCover | AmazonLink |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Machine Learning | Deep Learning (Adaptive Computation and Machine Learning series) | Ian Goodfellow, Yoshua Bengio, Aaron Courville | 4.8 | HardCover | 775 | The MIT Press | 18-Nov-16 | $28.99 | https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G | |
| 2 | Machine Learning | Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time | Andrew Butler | 4.2 | PaperBack | 118 | CreateSpace Independent Publishing Platform | 12-Aug-16 | $9.95 | https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G | |
| 3 | Machine Learning | An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) | Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani | 4.8 | HardCover | 426 | Springer | 1-Sep-17 | $68.61 | https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G |
Document storage and Data transfer, more structured and more human readable
Show the raw XML data by parsing HTML file using various functions as follows
Parse the xml file with xmlParse
parsedXML <- xmlParse(file = myXMLfile, useInternalNodes = TRUE)
parsedXML## <?xml version="1.0" encoding="UTF-8"?>
## <bookList>
## <book id="1">
## <Topic>Machine Learning</Topic>
## <Title>Deep Learning (Adaptive Computation and Machine Learning series)</Title>
## <Authors>Ian Goodfellow, Yoshua Bengio, Aaron Courville</Authors>
## <Rating>4.8</Rating>
## <Type>HardCover</Type>
## <Pages>775</Pages>
## <Publisher>The MIT Press</Publisher>
## <LatestReleaseDate>18-Nov-16</LatestReleaseDate>
## <ListPrice>$28.99</ListPrice>
## <image>
## <src>1 - Deep Learning.jpg</src>
## </image>
## <AmazonLink>https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8</AmazonLink>
## </book>
## <book id="2">
## <Topic>Machine Learning</Topic>
## <Title>Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time</Title>
## <Authors>Andrew Butler</Authors>
## <Rating>4.2</Rating>
## <Type>PaperBack</Type>
## <Pages>118</Pages>
## <Publisher>CreateSpace Independent Publishing Platform</Publisher>
## <LatestReleaseDate>12-Aug-16</LatestReleaseDate>
## <ListPrice>$9.95</ListPrice>
## <image>
## <src>2 - Amazon Echo.jpg</src>
## </image>
## <AmazonLink>https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8</AmazonLink>
## </book>
## <book id="3">
## <Topic>Machine Learning</Topic>
## <Title>An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)</Title>
## <Authors>Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani</Authors>
## <Rating>4.8</Rating>
## <Type>HardCover</Type>
## <Pages>426</Pages>
## <Publisher>Springer</Publisher>
## <LatestReleaseDate>1-Sep-17</LatestReleaseDate>
## <ListPrice>$68.61</ListPrice>
## <image>
## <src>3 - An Introduction to Statistical Learning.jpg</src>
## </image>
## <AmazonLink>https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8</AmazonLink>
## </book>
## </bookList>
##
xmlTParse <- xmlTreeParse(myXMLfile) %>% xmlRoot() %>% xmlSApply(xmlValue)
xmlTParse## book
## "Machine LearningDeep Learning (Adaptive Computation and Machine Learning series)Ian Goodfellow, Yoshua Bengio, Aaron Courville4.8HardCover775The MIT Press18-Nov-16$28.991 - Deep Learning.jpghttps://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8"
## book
## "Machine LearningAmazon Echo: The Ultimate Guide to Learn Amazon Echo In No TimeAndrew Butler4.2PaperBack118CreateSpace Independent Publishing Platform12-Aug-16$9.952 - Amazon Echo.jpghttps://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8"
## book
## "Machine LearningAn Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani4.8HardCover426Springer1-Sep-17$68.613 - An Introduction to Statistical Learning.jpghttps://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8"
Use the XML package to read XML file using the xmlToList function which creates a list
xmlList <- xmlToList(myXMLfile)
xmlList## $book
## $book$Topic
## [1] "Machine Learning"
##
## $book$Title
## [1] "Deep Learning (Adaptive Computation and Machine Learning series)"
##
## $book$Authors
## [1] "Ian Goodfellow, Yoshua Bengio, Aaron Courville"
##
## $book$Rating
## [1] "4.8"
##
## $book$Type
## [1] "HardCover"
##
## $book$Pages
## [1] "775"
##
## $book$Publisher
## [1] "The MIT Press"
##
## $book$LatestReleaseDate
## [1] "18-Nov-16"
##
## $book$ListPrice
## [1] "$28.99"
##
## $book$image
## $book$image$src
## [1] "1 - Deep Learning.jpg"
##
##
## $book$AmazonLink
## [1] "https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8"
##
## $book$.attrs
## id
## "1"
##
##
## $book
## $book$Topic
## [1] "Machine Learning"
##
## $book$Title
## [1] "Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time"
##
## $book$Authors
## [1] "Andrew Butler"
##
## $book$Rating
## [1] "4.2"
##
## $book$Type
## [1] "PaperBack"
##
## $book$Pages
## [1] "118"
##
## $book$Publisher
## [1] "CreateSpace Independent Publishing Platform"
##
## $book$LatestReleaseDate
## [1] "12-Aug-16"
##
## $book$ListPrice
## [1] "$9.95"
##
## $book$image
## $book$image$src
## [1] "2 - Amazon Echo.jpg"
##
##
## $book$AmazonLink
## [1] "https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8"
##
## $book$.attrs
## id
## "2"
##
##
## $book
## $book$Topic
## [1] "Machine Learning"
##
## $book$Title
## [1] "An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)"
##
## $book$Authors
## [1] "Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani"
##
## $book$Rating
## [1] "4.8"
##
## $book$Type
## [1] "HardCover"
##
## $book$Pages
## [1] "426"
##
## $book$Publisher
## [1] "Springer"
##
## $book$LatestReleaseDate
## [1] "1-Sep-17"
##
## $book$ListPrice
## [1] "$68.61"
##
## $book$image
## $book$image$src
## [1] "3 - An Introduction to Statistical Learning.jpg"
##
##
## $book$AmazonLink
## [1] "https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8"
##
## $book$.attrs
## id
## "3"
Use the XML package to read XML file using the xmlToDataFrame function which creates a dataframe
xmlDataFrame <- xmlToDataFrame(myXMLfile)
xmlDataFrame## Topic
## 1 Machine Learning
## 2 Machine Learning
## 3 Machine Learning
## Title
## 1 Deep Learning (Adaptive Computation and Machine Learning series)
## 2 Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
## Authors Rating
## 1 Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8
## 2 Andrew Butler 4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8
## Type Pages Publisher
## 1 HardCover 775 The MIT Press
## 2 PaperBack 118 CreateSpace Independent Publishing Platform
## 3 HardCover 426 Springer
## LatestReleaseDate ListPrice
## 1 18-Nov-16 $28.99
## 2 12-Aug-16 $9.95
## 3 1-Sep-17 $68.61
## image
## 1 1 - Deep Learning.jpg
## 2 2 - Amazon Echo.jpg
## 3 3 - An Introduction to Statistical Learning.jpg
## AmazonLink
## 1 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8
## 2 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8
Show XML data loaded from above different approaches in the form of various tabular formats
df_parsedXML <- xmlRoot(parsedXML) %>% xmlToDataFrame()
df_xmlTParse <- xmlTParse %>% ldply(data.frame) %>% select(-c(.id))
df_xmlList <- ldply(xmlList, data.frame) %>% select(-c(.id))
df_xmlDataFrame <- xmlDataFrame
image_to_df <- function(dataframe) {
for (i in 1:3) {
if (i==1) {
url <- "dataframe$AmazonLink[i]"
dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
image <- paste0("[](", dataframe$BookCover[i], ")")
dataframe$BookCover[1] <- sprintf(image)
} else if (i==2) {
url <- dataframe$AmazonLink[i]
dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
image <- paste0("[](", dataframe$BookCover[i], ")")
dataframe$BookCover[2] <- sprintf(image)
} else if (i==3) {
url <- dataframe$AmazonLink[i]
dataframe$AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
image <- paste0("[](", dataframe$BookCover[i], ")")
dataframe$BookCover[3] <- sprintf(image)
}
}
return(dataframe)
}
#df_parsedHTML <- image_to_df(df_parsedHTML)
#df_readHTML <- image_to_df(df_readHTML)
#df_readHTMLTable <- image_to_df(df_readHTMLTable)
df_xmlDataFrame <- image_to_df(df_xmlDataFrame)## Warning in `[<-.factor`(`*tmp*`, i, value = "[AmazonLink]
## (dataframe$AmazonLink[i])"): invalid factor level, NA generated
## Warning in `[<-.factor`(`*tmp*`, i, value = "[AmazonLink]
## (dataframe$AmazonLink[i])"): invalid factor level, NA generated
## Warning in `[<-.factor`(`*tmp*`, i, value = "[AmazonLink]
## (dataframe$AmazonLink[i])"): invalid factor level, NA generated
kable(df_xmlDataFrame)DT::datatable(df_parsedXML, options = list(pagelength=5))DT::datatable(select(df_parsedXML, Topic:ListPrice), options = list(pagelength=5))sqldf("select * from df_xmlTParse") ## X..i..
## 1 Machine LearningDeep Learning (Adaptive Computation and Machine Learning series)Ian Goodfellow, Yoshua Bengio, Aaron Courville4.8HardCover775The MIT Press18-Nov-16$28.991 - Deep Learning.jpghttps://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8
## 2 Machine LearningAmazon Echo: The Ultimate Guide to Learn Amazon Echo In No TimeAndrew Butler4.2PaperBack118CreateSpace Independent Publishing Platform12-Aug-16$9.952 - Amazon Echo.jpghttps://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8
## 3 Machine LearningAn Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani4.8HardCover426Springer1-Sep-17$68.613 - An Introduction to Statistical Learning.jpghttps://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8
knitr::kable(df_xmlList, format = "html")| Topic | Title | Authors | Rating | Type | Pages | Publisher | LatestReleaseDate | ListPrice | src | AmazonLink | .attrs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Machine Learning | Deep Learning (Adaptive Computation and Machine Learning series) | Ian Goodfellow, Yoshua Bengio, Aaron Courville | 4.8 | HardCover | 775 | The MIT Press | 18-Nov-16 | $28.99 | 1 - Deep Learning.jpg | https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8 | 1 |
| Machine Learning | Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time | Andrew Butler | 4.2 | PaperBack | 118 | CreateSpace Independent Publishing Platform | 12-Aug-16 | $9.95 | 2 - Amazon Echo.jpg | https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8 | 2 |
| Machine Learning | An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) | Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani | 4.8 | HardCover | 426 | Springer | 1-Sep-17 | $68.61 | 3 - An Introduction to Statistical Learning.jpg | https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8 | 3 |
JSON is an XML replacement since it is light weight, most structured and most human readable compared to other two (HTML and XML)
myWorkingDir <- getwd()
myJSONfile <- paste0(myWorkingDir,"/BookList.json")Show the raw JSON data by parsing HTML file using various functions as follows
Parse the json file with fromJSON
readJSONIO <- RJSONIO::fromJSON(myJSONfile)
readJSONIO## $BookList
## $BookList[[1]]
## $BookList[[1]]$`Book#`
## [1] "1"
##
## $BookList[[1]]$Topic
## [1] "Machine Learning"
##
## $BookList[[1]]$Title
## [1] "Deep Learning (Adaptive Computation and Machine Learning series)"
##
## $BookList[[1]]$Authors
## [1] "Ian Goodfellow, Yoshua Bengio, Aaron Courville"
##
## $BookList[[1]]$Rating
## [1] "4.8"
##
## $BookList[[1]]$Type
## [1] "HardCover"
##
## $BookList[[1]]$Pages
## [1] 775
##
## $BookList[[1]]$Publisher
## [1] "The MIT Press"
##
## $BookList[[1]]$`Latest Release Date`
## [1] "18-Nov-16"
##
## $BookList[[1]]$`List Price`
## [1] "$28.99"
##
## $BookList[[1]]$BookCover
## [1] "1 - Deep Learning.jpg"
##
## $BookList[[1]]$AmazonLink
## [1] "https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
##
##
## $BookList[[2]]
## $BookList[[2]]$`Book#`
## [1] "2"
##
## $BookList[[2]]$Topic
## [1] "Machine Learning"
##
## $BookList[[2]]$Title
## [1] "Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time"
##
## $BookList[[2]]$Authors
## [1] "Andrew Butler"
##
## $BookList[[2]]$Rating
## [1] "4.2"
##
## $BookList[[2]]$Type
## [1] "PaperBack"
##
## $BookList[[2]]$Pages
## [1] 118
##
## $BookList[[2]]$Publisher
## [1] "CreateSpace Independent Publishing Platform"
##
## $BookList[[2]]$`Latest Release Date`
## [1] "12-Aug-16"
##
## $BookList[[2]]$`List Price`
## [1] "$9.95"
##
## $BookList[[2]]$BookCover
## [1] "2 - Amazon Echo.jpg"
##
## $BookList[[2]]$AmazonLink
## [1] "https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
##
##
## $BookList[[3]]
## $BookList[[3]]$`Book#`
## [1] "3"
##
## $BookList[[3]]$Topic
## [1] "Machine Learning"
##
## $BookList[[3]]$Title
## [1] "An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)"
##
## $BookList[[3]]$Authors
## [1] "Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani"
##
## $BookList[[3]]$Rating
## [1] "4.8"
##
## $BookList[[3]]$Type
## [1] "HardCover"
##
## $BookList[[3]]$Pages
## [1] 426
##
## $BookList[[3]]$Publisher
## [1] "Springer"
##
## $BookList[[3]]$`Latest Release Date`
## [1] "1-Sep-17"
##
## $BookList[[3]]$`List Price`
## [1] "$68.61"
##
## $BookList[[3]]$BookCover
## [1] "3 - An Introduction to Statistical Learning.jpg"
##
## $BookList[[3]]$AmazonLink
## [1] "https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
readJSONlite <- jsonlite::fromJSON(myJSONfile)
readJSONlite## $BookList
## Book# Topic
## 1 1 Machine Learning
## 2 2 Machine Learning
## 3 3 Machine Learning
## Title
## 1 Deep Learning (Adaptive Computation and Machine Learning series)
## 2 Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## 3 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
## Authors Rating
## 1 Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8
## 2 Andrew Butler 4.2
## 3 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8
## Type Pages Publisher
## 1 HardCover 775 The MIT Press
## 2 PaperBack 118 CreateSpace Independent Publishing Platform
## 3 HardCover 426 Springer
## Latest Release Date List Price
## 1 18-Nov-16 $28.99
## 2 12-Aug-16 $9.95
## 3 1-Sep-17 $68.61
## BookCover
## 1 1 - Deep Learning.jpg
## 2 2 - Amazon Echo.jpg
## 3 3 - An Introduction to Statistical Learning.jpg
## AmazonLink
## 1 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 2 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## 3 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
readJSON <- rjson::fromJSON(file=myJSONfile, method='C')
readJSON## $BookList
## $BookList[[1]]
## $BookList[[1]]$`Book#`
## [1] "1"
##
## $BookList[[1]]$Topic
## [1] "Machine Learning"
##
## $BookList[[1]]$Title
## [1] "Deep Learning (Adaptive Computation and Machine Learning series)"
##
## $BookList[[1]]$Authors
## [1] "Ian Goodfellow, Yoshua Bengio, Aaron Courville"
##
## $BookList[[1]]$Rating
## [1] "4.8"
##
## $BookList[[1]]$Type
## [1] "HardCover"
##
## $BookList[[1]]$Pages
## [1] 775
##
## $BookList[[1]]$Publisher
## [1] "The MIT Press"
##
## $BookList[[1]]$`Latest Release Date`
## [1] "18-Nov-16"
##
## $BookList[[1]]$`List Price`
## [1] "$28.99"
##
## $BookList[[1]]$BookCover
## [1] "1 - Deep Learning.jpg"
##
## $BookList[[1]]$AmazonLink
## [1] "https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
##
##
## $BookList[[2]]
## $BookList[[2]]$`Book#`
## [1] "2"
##
## $BookList[[2]]$Topic
## [1] "Machine Learning"
##
## $BookList[[2]]$Title
## [1] "Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time"
##
## $BookList[[2]]$Authors
## [1] "Andrew Butler"
##
## $BookList[[2]]$Rating
## [1] "4.2"
##
## $BookList[[2]]$Type
## [1] "PaperBack"
##
## $BookList[[2]]$Pages
## [1] 118
##
## $BookList[[2]]$Publisher
## [1] "CreateSpace Independent Publishing Platform"
##
## $BookList[[2]]$`Latest Release Date`
## [1] "12-Aug-16"
##
## $BookList[[2]]$`List Price`
## [1] "$9.95"
##
## $BookList[[2]]$BookCover
## [1] "2 - Amazon Echo.jpg"
##
## $BookList[[2]]$AmazonLink
## [1] "https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
##
##
## $BookList[[3]]
## $BookList[[3]]$`Book#`
## [1] "3"
##
## $BookList[[3]]$Topic
## [1] "Machine Learning"
##
## $BookList[[3]]$Title
## [1] "An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)"
##
## $BookList[[3]]$Authors
## [1] "Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani"
##
## $BookList[[3]]$Rating
## [1] "4.8"
##
## $BookList[[3]]$Type
## [1] "HardCover"
##
## $BookList[[3]]$Pages
## [1] 426
##
## $BookList[[3]]$Publisher
## [1] "Springer"
##
## $BookList[[3]]$`Latest Release Date`
## [1] "1-Sep-17"
##
## $BookList[[3]]$`List Price`
## [1] "$68.61"
##
## $BookList[[3]]$BookCover
## [1] "3 - An Introduction to Statistical Learning.jpg"
##
## $BookList[[3]]$AmazonLink
## [1] "https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G"
Show JSON data loaded from above different approaches in the form of various tabular formats
df_readJSONIO <- data.frame(readJSONIO)
df_readJSONlite <- as.data.frame(readJSONlite)
df_readJSON <- ldply(readJSON, data.frame)
image_to_df <- function(dataframe) {
for (i in 1:3) {
url <- rep(dataframe$BookList.AmazonLink[i], 1)
dataframe$BookList.AmazonLink[i] <- paste0("[", "AmazonLink", "](", url, ")")
if (i==1) {
image <- paste0("[](", dataframe$BookList.BookCover[i], ")")
dataframe$BookList.BookCover[1] <- sprintf(image)
} else if (i==2) {
image <- paste0("[](", dataframe$BookList.BookCover[i], ")")
dataframe$BookList.BookCover[2] <- sprintf(image)
} else if (i==3) {
image <- paste0("[](", dataframe$BookList.BookCover[i], ")")
dataframe$BookList.BookCover[3] <- sprintf(image)
}
}
return(dataframe)
}
#df_readJSONIO <- image_to_df(df_readJSONIO)
df_readJSONlite <- image_to_df(df_readJSONlite)
#df_readJSON <- image_to_df(df_readJSON)kable(df_readJSONlite)| BookList.Book. | BookList.Topic | BookList.Title | BookList.Authors | BookList.Rating | BookList.Type | BookList.Pages | BookList.Publisher | BookList.Latest.Release.Date | BookList.List.Price | BookList.BookCover | BookList.AmazonLink |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Machine Learning | Deep Learning (Adaptive Computation and Machine Learning series) | Ian Goodfellow, Yoshua Bengio, Aaron Courville | 4.8 | HardCover | 775 | The MIT Press | 18-Nov-16 | $28.99 |
|
AmazonLink |
| 2 | Machine Learning | Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time | Andrew Butler | 4.2 | PaperBack | 118 | CreateSpace Independent Publishing Platform | 12-Aug-16 | $9.95 |
|
AmazonLink |
| 3 | Machine Learning | An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) | Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani | 4.8 | HardCover | 426 | Springer | 1-Sep-17 | $68.61 |
|
AmazonLink |
DT::datatable(df_readJSONlite, options = list(pagelength=5))DT::datatable(select(df_readJSONlite, BookList.Topic:BookList.List.Price), options = list(pagelength=5))sqldf("select * from df_readJSON") ## .id Book. Topic
## 1 BookList 1 Machine Learning
## Title
## 1 Deep Learning (Adaptive Computation and Machine Learning series)
## Authors Rating Type Pages
## 1 Ian Goodfellow, Yoshua Bengio, Aaron Courville 4.8 HardCover 775
## Publisher Latest.Release.Date List.Price BookCover
## 1 The MIT Press 18-Nov-16 $28.99 1 - Deep Learning.jpg
## AmazonLink
## 1 https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## Book..1 Topic.1
## 1 2 Machine Learning
## Title.1
## 1 Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time
## Authors.1 Rating.1 Type.1 Pages.1
## 1 Andrew Butler 4.2 PaperBack 118
## Publisher.1 Latest.Release.Date.1
## 1 CreateSpace Independent Publishing Platform 12-Aug-16
## List.Price.1 BookCover.1
## 1 $9.95 2 - Amazon Echo.jpg
## AmazonLink.1
## 1 https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
## Book..2 Topic.2
## 1 3 Machine Learning
## Title.2
## 1 An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
## Authors.2 Rating.2
## 1 Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani 4.8
## Type.2 Pages.2 Publisher.2 Latest.Release.Date.2 List.Price.2
## 1 HardCover 426 Springer 1-Sep-17 $68.61
## BookCover.2
## 1 3 - An Introduction to Statistical Learning.jpg
## AmazonLink.2
## 1 https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G
knitr::kable(df_readJSONIO, format = "html")| BookList.Book. | BookList.Topic | BookList.Title | BookList.Authors | BookList.Rating | BookList.Type | BookList.Pages | BookList.Publisher | BookList.Latest.Release.Date | BookList.List.Price | BookList.BookCover | BookList.AmazonLink | BookList.Book..1 | BookList.Topic.1 | BookList.Title.1 | BookList.Authors.1 | BookList.Rating.1 | BookList.Type.1 | BookList.Pages.1 | BookList.Publisher.1 | BookList.Latest.Release.Date.1 | BookList.List.Price.1 | BookList.BookCover.1 | BookList.AmazonLink.1 | BookList.Book..2 | BookList.Topic.2 | BookList.Title.2 | BookList.Authors.2 | BookList.Rating.2 | BookList.Type.2 | BookList.Pages.2 | BookList.Publisher.2 | BookList.Latest.Release.Date.2 | BookList.List.Price.2 | BookList.BookCover.2 | BookList.AmazonLink.2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Machine Learning | Deep Learning (Adaptive Computation and Machine Learning series) | Ian Goodfellow, Yoshua Bengio, Aaron Courville | 4.8 | HardCover | 775 | The MIT Press | 18-Nov-16 | $28.99 | 1 - Deep Learning.jpg | https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=zg_bs_3887_1?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G | 2 | Machine Learning | Amazon Echo: The Ultimate Guide to Learn Amazon Echo In No Time | Andrew Butler | 4.2 | PaperBack | 118 | CreateSpace Independent Publishing Platform | 12-Aug-16 | $9.95 | 2 - Amazon Echo.jpg | https://www.amazon.com/Amazon-Echo-Ultimate-services-internet/dp/1536822043/ref=zg_bs_3887_2?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G | 3 | Machine Learning | An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) | Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani | 4.8 | HardCover | 426 | Springer | 1-Sep-17 | $68.61 | 3 - An Introduction to Statistical Learning.jpg | https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=zg_bs_3887_18?_encoding=UTF8&psc=1&refRID=9HK33PPS16VDZN3B1N8G |