Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.
Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”).
To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats.
Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?
xmlBooks <- xmlTreeParse("books.xml")
As shown the file is a XML file.
xmlBooks %>% class()
## [1] "XMLDocument" "XMLAbstractDocument"
As show the data is not clean.
xmlBooks
## $doc
## $file
## [1] "books.xml"
##
## $version
## [1] "1.0"
##
## $children
## $children$catalog
## <catalog>
## <book id="bk101">
## <author>Gambardella, Matthew</author>
## <title>XML Developer's Guide</title>
## <genre>Computer</genre>
## <price>44.95</price>
## <publish_date>2000-10-01</publish_date>
## <description>An in-depth look at creating applications
## with XML.</description>
## </book>
## <book id="bk102">
## <author>Ralls, Kim</author>
## <title>Midnight Rain</title>
## <genre>Fantasy</genre>
## <price>5.95</price>
## <publish_date>2000-12-16</publish_date>
## <description>A former architect battles corporate zombies,
## an evil sorceress, and her own childhood to become queen
## of the world.</description>
## </book>
## <book id="bk103">
## <author>Corets, Eva</author>
## <title>Maeve Ascendant</title>
## <genre>Fantasy</genre>
## <price>5.95</price>
## <publish_date>2000-11-17</publish_date>
## <description>After the collapse of a nanotechnology
## society in England, the young survivors lay the
## foundation for a new society.</description>
## </book>
## <book id="bk104">
## <author>Corets, Eva</author>
## <title>Oberon's Legacy</title>
## <genre>Fantasy</genre>
## <price>5.95</price>
## <publish_date>2001-03-10</publish_date>
## <description>In post-apocalypse England, the mysterious
## agent known only as Oberon helps to create a new life
## for the inhabitants of London. Sequel to Maeve
## Ascendant.</description>
## </book>
## <book id="bk105">
## <author>Corets, Eva</author>
## <title>The Sundered Grail</title>
## <genre>Fantasy</genre>
## <price>5.95</price>
## <publish_date>2001-09-10</publish_date>
## <description>The two daughters of Maeve, half-sisters,
## battle one another for control of England. Sequel to
## Oberon's Legacy.</description>
## </book>
## <book id="bk106">
## <author>Randall, Cynthia</author>
## <title>Lover Birds</title>
## <genre>Romance</genre>
## <price>4.95</price>
## <publish_date>2000-09-02</publish_date>
## <description>When Carla meets Paul at an ornithology
## conference, tempers fly as feathers get ruffled.</description>
## </book>
## <book id="bk107">
## <author>Thurman, Paula</author>
## <title>Splish Splash</title>
## <genre>Romance</genre>
## <price>4.95</price>
## <publish_date>2000-11-02</publish_date>
## <description>A deep sea diver finds true love twenty
## thousand leagues beneath the sea.</description>
## </book>
## <book id="bk108">
## <author>Knorr, Stefan</author>
## <title>Creepy Crawlies</title>
## <genre>Horror</genre>
## <price>4.95</price>
## <publish_date>2000-12-06</publish_date>
## <description>An anthology of horror stories about roaches,
## centipedes, scorpions and other insects.</description>
## </book>
## <book id="bk109">
## <author>Kress, Peter</author>
## <title>Paradox Lost</title>
## <genre>Science Fiction</genre>
## <price>6.95</price>
## <publish_date>2000-11-02</publish_date>
## <description>After an inadvertant trip through a Heisenberg
## Uncertainty Device, James Salway discovers the problems
## of being quantum.</description>
## </book>
## <book id="bk110">
## <author>O'Brien, Tim</author>
## <title>Microsoft .NET: The Programming Bible</title>
## <genre>Computer</genre>
## <price>36.95</price>
## <publish_date>2000-12-09</publish_date>
## <description>Microsoft's .NET initiative is explored in
## detail in this deep programmer's reference.</description>
## </book>
## <book id="bk111">
## <author>O'Brien, Tim</author>
## <title>MSXML3: A Comprehensive Guide</title>
## <genre>Computer</genre>
## <price>36.95</price>
## <publish_date>2000-12-01</publish_date>
## <description>The Microsoft MSXML3 parser is covered in
## detail, with attention to XML DOM interfaces, XSLT processing,
## SAX and more.</description>
## </book>
## <book id="bk112">
## <author>Galos, Mike</author>
## <title>Visual Studio 7: A Comprehensive Guide</title>
## <genre>Computer</genre>
## <price>49.95</price>
## <publish_date>2001-04-16</publish_date>
## <description>Microsoft Visual Studio 7 is explored in depth,
## looking at how Visual Basic, Visual C++, C#, and ASP+ are
## integrated into a comprehensive development
## environment.</description>
## </book>
## </catalog>
##
##
## attr(,"class")
## [1] "XMLDocumentContent"
##
## $dtd
## $external
## NULL
##
## $internal
## NULL
##
## attr(,"class")
## [1] "DTDList"
##
## attr(,"class")
## [1] "XMLDocument" "XMLAbstractDocument"
#Transform xml nodes in a list
xmlTop <- xmlSApply(xmlBooks,function(x) xmlSApply(x,xmlValue))
book_dt <- data.frame(xmlTop)
#Adds row names as a column
book_dt <- book_dt %>% rownames_to_column()
#creates a dataframe by transposing the values in the dataframe and then slices the dataframe starting from the second record since the first record correspond to the headers
bookJson_dt <- book_dt %>% t() %>% as_tibble() %>% slice(2:n())
#Assings headers for the dataframe
headers <- book_dt[,1]
names(bookJson_dt) <- headers
bookJson_dt %>%
kable() %>%
kable_styling(full_width = TRUE) %>%
scroll_box(width = "300")
| author | title | genre | price | publish_date | description |
|---|---|---|---|---|---|
| Gambardella, Matthew | XML Developer’s Guide | Computer | 44.95 | 2000-10-01 | An in-depth look at creating applications with XML. |
| Ralls, Kim | Midnight Rain | Fantasy | 5.95 | 2000-12-16 | A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world. |
| Corets, Eva | Maeve Ascendant | Fantasy | 5.95 | 2000-11-17 | After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society. |
| Corets, Eva | Oberon’s Legacy | Fantasy | 5.95 | 2001-03-10 | In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant. |
| Corets, Eva | The Sundered Grail | Fantasy | 5.95 | 2001-09-10 | The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy. |
| Randall, Cynthia | Lover Birds | Romance | 4.95 | 2000-09-02 | When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled. |
| Thurman, Paula | Splish Splash | Romance | 4.95 | 2000-11-02 | A deep sea diver finds true love twenty thousand leagues beneath the sea. |
| Knorr, Stefan | Creepy Crawlies | Horror | 4.95 | 2000-12-06 | An anthology of horror stories about roaches, centipedes, scorpions and other insects. |
| Kress, Peter | Paradox Lost | Science Fiction | 6.95 | 2000-11-02 | After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum. |
| O’Brien, Tim | Microsoft .NET: The Programming Bible | Computer | 36.95 | 2000-12-09 | Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference. |
| O’Brien, Tim | MSXML3: A Comprehensive Guide | Computer | 36.95 | 2000-12-01 | The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more. |
| Galos, Mike | Visual Studio 7: A Comprehensive Guide | Computer | 49.95 | 2001-04-16 | Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a comprehensive development environment. |
jsonBooks <- fromJSON("books.json")
As shown the file is a XML file.
jsonBooks %>% class()
## [1] "list"
As show the data is not clean.
jsonBooks
## $catalog
## $catalog$book
## $catalog$book[[1]]
## author
## "Gambardella, Matthew"
## title
## "XML Developer's Guide"
## genre
## "Computer"
## price
## "44.95"
## publish_date
## "2000-10-01"
## description
## "An in-depth look at creating applications \n with XML."
##
## $catalog$book[[2]]
## author
## "Ralls, Kim"
## title
## "Midnight Rain"
## genre
## "Fantasy"
## price
## "5.95"
## publish_date
## "2000-12-16"
## description
## "A former architect battles corporate zombies, \n an evil sorceress, and her own childhood to become queen \n of the world."
##
## $catalog$book[[3]]
## author
## "Corets, Eva"
## title
## "Maeve Ascendant"
## genre
## "Fantasy"
## price
## "5.95"
## publish_date
## "2000-11-17"
## description
## "After the collapse of a nanotechnology \n society in England, the young survivors lay the \n foundation for a new society."
##
## $catalog$book[[4]]
## author
## "Corets, Eva"
## title
## "Oberon's Legacy"
## genre
## "Fantasy"
## price
## "5.95"
## publish_date
## "2001-03-10"
## description
## "In post-apocalypse England, the mysterious \n agent known only as Oberon helps to create a new life \n for the inhabitants of London. Sequel to Maeve \n Ascendant."
##
## $catalog$book[[5]]
## author
## "Corets, Eva"
## title
## "The Sundered Grail"
## genre
## "Fantasy"
## price
## "5.95"
## publish_date
## "2001-09-10"
## description
## "The two daughters of Maeve, half-sisters, \n battle one another for control of England. Sequel to \n Oberon's Legacy."
##
## $catalog$book[[6]]
## author
## "Randall, Cynthia"
## title
## "Lover Birds"
## genre
## "Romance"
## price
## "4.95"
## publish_date
## "2000-09-02"
## description
## "When Carla meets Paul at an ornithology \n conference, tempers fly as feathers get ruffled."
##
## $catalog$book[[7]]
## author
## "Thurman, Paula"
## title
## "Splish Splash"
## genre
## "Romance"
## price
## "4.95"
## publish_date
## "2000-11-02"
## description
## "A deep sea diver finds true love twenty \n thousand leagues beneath the sea."
##
## $catalog$book[[8]]
## author
## "Knorr, Stefan"
## title
## "Creepy Crawlies"
## genre
## "Horror"
## price
## "4.95"
## publish_date
## "2000-12-06"
## description
## "An anthology of horror stories about roaches,\n centipedes, scorpions and other insects."
##
## $catalog$book[[9]]
## author
## "Kress, Peter"
## title
## "Paradox Lost"
## genre
## "Science Fiction"
## price
## "6.95"
## publish_date
## "2000-11-02"
## description
## "After an inadvertant trip through a Heisenberg\n Uncertainty Device, James Salway discovers the problems \n of being quantum."
##
## $catalog$book[[10]]
## author
## "O'Brien, Tim"
## title
## "Microsoft .NET: The Programming Bible"
## genre
## "Computer"
## price
## "36.95"
## publish_date
## "2000-12-09"
## description
## "Microsoft's .NET initiative is explored in \n detail in this deep programmer's reference."
##
## $catalog$book[[11]]
## author
## "O'Brien, Tim"
## title
## "MSXML3: A Comprehensive Guide"
## genre
## "Computer"
## price
## "36.95"
## publish_date
## "2000-12-01"
## description
## "The Microsoft MSXML3 parser is covered in \n detail, with attention to XML DOM interfaces, XSLT processing, \n SAX and more."
##
## $catalog$book[[12]]
## author
## "Galos, Mike"
## title
## "Visual Studio 7: A Comprehensive Guide"
## genre
## "Computer"
## price
## "49.95"
## publish_date
## "2001-04-16"
## description
## "Microsoft Visual Studio 7 is explored in depth,\n looking at how Visual Basic, Visual C++, C#, and ASP+ are \n integrated into a comprehensive development\n environment."
#Transform json nodes in a list
df <- lapply(jsonBooks, function(x) # Loop through each "play"
{
# Convert each group to a data frame.
# This assumes you have 6 elements each time
data.frame(matrix(unlist(x), ncol=6, byrow=T))
})
# Now you have a list of data frames, connect them together in
# one single dataframe
jsonBooks <- do.call(rbind, df)
names(jsonBooks) <- headers
jsonBooks %>%
kable() %>%
kable_styling(full_width = TRUE) %>%
scroll_box(width = "300")
| author | title | genre | price | publish_date | description | |
|---|---|---|---|---|---|---|
| catalog.1 | Gambardella, Matthew | XML Developer’s Guide | Computer | 44.95 | 2000-10-01 | An in-depth look at creating applications with XML. |
| catalog.2 | Ralls, Kim | Midnight Rain | Fantasy | 5.95 | 2000-12-16 | A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world. |
| catalog.3 | Corets, Eva | Maeve Ascendant | Fantasy | 5.95 | 2000-11-17 | After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society. |
| catalog.4 | Corets, Eva | Oberon’s Legacy | Fantasy | 5.95 | 2001-03-10 | In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant. |
| catalog.5 | Corets, Eva | The Sundered Grail | Fantasy | 5.95 | 2001-09-10 | The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy. |
| catalog.6 | Randall, Cynthia | Lover Birds | Romance | 4.95 | 2000-09-02 | When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled. |
| catalog.7 | Thurman, Paula | Splish Splash | Romance | 4.95 | 2000-11-02 | A deep sea diver finds true love twenty thousand leagues beneath the sea. |
| catalog.8 | Knorr, Stefan | Creepy Crawlies | Horror | 4.95 | 2000-12-06 | An anthology of horror stories about roaches, centipedes, scorpions and other insects. |
| catalog.9 | Kress, Peter | Paradox Lost | Science Fiction | 6.95 | 2000-11-02 | After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum. |
| catalog.10 | O’Brien, Tim | Microsoft .NET: The Programming Bible | Computer | 36.95 | 2000-12-09 | Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference. |
| catalog.11 | O’Brien, Tim | MSXML3: A Comprehensive Guide | Computer | 36.95 | 2000-12-01 | The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more. |
| catalog.12 | Galos, Mike | Visual Studio 7: A Comprehensive Guide | Computer | 49.95 | 2001-04-16 | Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a comprehensive development environment. |
htmlBooks <- readHTMLTable("bookshtml.html")
As shown the file is a XML file.
htmlBooks %>% class()
## [1] "list"
No tidying was needed.Therefore, this is the dataframe.
htmlBooks[[1]] %>%
kable() %>%
kable_styling(full_width = TRUE) %>%
scroll_box(width = "300")
| author | title | genre | price | publish_date | description |
|---|---|---|---|---|---|
| Gambardella, Matthew | XML Developer’s Guide | Computer | 44.95 | 2000-10-01 | An in-depth look at creating applications with XML. |
| Ralls, Kim | Midnight Rain | Fantasy | 5.95 | 2000-12-16 | A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world. |
| Corets, Eva | Maeve Ascendant | Fantasy | 5.95 | 2000-11-17 | After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society. |
| Corets, Eva | Oberon’s Legacy | Fantasy | 5.95 | 2001-03-10 | In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant. |
| Corets, Eva | The Sundered Grail | Fantasy | 5.95 | 2001-09-10 | The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy. |
| Randall, Cynthia | Lover Birds | Romance | 4.95 | 2000-09-02 | When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled. |
| Thurman, Paula | Splish Splash | Romance | 4.95 | 2000-11-02 | A deep sea diver finds true love twenty thousand leagues beneath the sea. |
| Knorr, Stefan | Creepy Crawlies | Horror | 4.95 | 2000-12-06 | An anthology of horror stories about roaches, centipedes, scorpions and other insects. |
| Kress, Peter | Paradox Lost | Science Fiction | 6.95 | 2000-11-02 | After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum. |
| O’Brien, Tim | Microsoft .NET: The Programming Bible | Computer | 36.95 | 2000-12-09 | Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference. |
| O’Brien, Tim | MSXML3: A Comprehensive Guide | Computer | 36.95 | 2000-12-01 | The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more. |
| Galos, Mike | Visual Studio 7: A Comprehensive Guide | Computer | 49.95 | 2001-04-16 | Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a comprehensive development environment. |
The output for all the files contains the same data. The only difference is that the function for reading html and json returns a list of objects whereas the xml files are read as such. On the other hand, the three formats need to be parsed and tidied in order to create a dataframe that stores the data correctly.