About the Assignment

Three of my favorite books on one of my favorite subjects with more than one author. For each book, the title, authors, genre and year of publishing were used to create and read HTML table, XML and JSON file formats.

HTML Table

<!DOCTYPE html>

Title	Author	Genre	Year Published
Good Omens	Terry Pratchett, Neil Gaiman	Mystery	5/10/1990
Heads You Lose	Lisa Lutz, David Hayward	Homorous Fiction	4/5/2011
Between The Lines	Jodi Picoult, Samantha van Leer	Fantasy Fiction	6/26/2012

Read Books.html

url<-"file:///C:/Users/newma/OneDrive/Desktop/MSDS%20Fall%202021/DATA%20607%20-%20Data%20Acquisition%20and%20Mgt/html%20course/Book.html"
Myhtml <- readHTMLTable(url,which=1)

#Display class
class(Myhtml)

## [1] "data.frame"

#Display data
Myhtml

##               Title                          Author            Genre
## 1        Good Omens    Terry Pratchett, Neil Gaiman          Mystery
## 2    Heads You Lose        Lisa Lutz, David Hayward Homorous Fiction
## 3 Between The Lines Jodi Picoult, Samantha van Leer  Fantasy Fiction
##   Year Published
## 1      5/10/1990
## 2       4/5/2011
## 3      6/26/2012

JSON Format

[ { “title”: “Good Omens”, “authors”: [ “Neil Gaiman”, “Terry Pratchett” ], “Genre”: “Homorous Fiction”, “Year_Published”: “5/10/1990” }, { “title”: “Heads You Lose”, “authors”: [ “Lisa Lutz”, “David Hayward” ], “Genre”: “Mystery”, “Year_Published”: “4/5/2011” }, { “title”: “Between The Lines”, “authors”: [ “Jodi Picoult”, “Samantha van Leer” ], “Genre”: “Fantasy Fiction”, “Year_Published”: “6/26/2012” }]

Read Books.json

# Giving the input file name to the function fromJSON.  

Myjson <- fromJSON(txt="https://raw.githubusercontent.com/nnaemeka-git/global-datasets/main/Books.json") 

# Display the class
class(Myjson)

## [1] "data.frame"

# Printing the result.
Myjson

##               title                         authors            Genre
## 1        Good Omens    Neil Gaiman, Terry Pratchett Homorous Fiction
## 2    Heads You Lose        Lisa Lutz, David Hayward          Mystery
## 3 Between The Lines Jodi Picoult, Samantha van Leer  Fantasy Fiction
##   Year_Published
## 1      5/10/1990
## 2       4/5/2011
## 3      6/26/2012

XML Format

<Person>


<book>
    <title>Good Omens</title>
    <first_author>Neil Gaiman</first_author>
    <second_author>Terry Pratchett</second_author>
    <Genre>Homorous Fiction</Genre>
    <Year_Published>5/10/1990</Year_Published>
</book>
<book>
    <title>Heads You Lose</title>
    <first_author>Lisa Lutz</first_author>
    <second_author>David Hayward</second_author>
    <Genre>Mystery</Genre>
    <Year_Published>4/5/2011</Year_Published>
</book>
<book>
    <title>Between The Lines</title>
    <first_author>Jodi Picoult</first_author>
    <second_author>Samantha van Leer</second_author>
    <Genre>Fantasy Fiction</Genre>
    <Year_Published>6/26/2012</Year_Published>
</book>


</Person>

Read Books.xml

# Giving the input file name to the function. 

url<-"file:///C:/Users/newma/OneDrive/Desktop/MSDS%20Fall%202021/DATA%20607%20-%20Data%20Acquisition%20and%20Mgt/html%20course/Books.xml"

# Giving the input file name to the function xmlToDataFrame.  
Myxml <- xmlToDataFrame(url)  

#Display class
class(Myxml)

## [1] "data.frame"

#Printing the dataframe  
print(Myxml)

##               title first_author     second_author            Genre
## 1        Good Omens  Neil Gaiman   Terry Pratchett Homorous Fiction
## 2    Heads You Lose    Lisa Lutz     David Hayward          Mystery
## 3 Between The Lines Jodi Picoult Samantha van Leer  Fantasy Fiction
##   Year_Published
## 1      5/10/1990
## 2       4/5/2011
## 3      6/26/2012

Miscellaneous Question

Are the three data frames identical?

The dataframe from HTML table and JSON file are identical. But the dataframe from the XML file is different because it has different column for each author.

Reading HTML table, XML and JSON file formats

Nnaemeka Newman Okereafor

10/9/2021