Part 1: Introduction

The purpose of this assignment was record attributes from three books in three different mark down languages: HTML, XML and JASN. The attributes were written into a table structure in each respective file. Finally, these files were ingest into R and displayed.

Each file was written by hand using W3schools.com as a reference.
https://www.w3schools.com/html/html_tables.asp
https://www.w3schools.com/js/js_json_intro.asp
https://www.w3schools.com/xml/ajax_applications.asp

Part 2: Writing HTML, XML and JASN Files

Below is the HTML Code:

<!DOCTYPE html>
<html>
<style>
table, th, td {
  border:1px solid black;
}
</style>
<body>

<h2>Fantasy Books</h2>

<table style="width:100%">
  <tr>
    <th>Title</th>
    <th>Authors</th>
    <th>Genre</th>
    <th>Language</th>
    <th>Series</th>
  </tr>
  <tr>
    <td>The Talisman</td>
    <td>Karl Stephen King, Peter Straub</td>
    <td>Fantasy</td>
    <td>English</td>
    <td>Jack Sawyer Trilogy</td>
  </tr>
  <tr>
    <td>Return of the King</td>
    <td>John Ronald Reuel Tolkien</td>
    <td>Fantasy</td>
    <td>English</td>
    <td>Lord of the Rings</td>
  </tr>
  <tr>
    <td>A Dance with Dragons</td>
    <td>George R. R. Martin</td>
    <td>Fantasy</td>
    <td>English</td>
    <td>A Song of Ice and Fire</td>
  </tr>
</table>
</body>
</html>

Below is the XML Code:

<?xml version="1.0" encoding="UTF-8"?>
<fantast_books>
    <book_name>
        <Title>The Talisman</Title>
        <Authors>Karl Stephen King,Peter Straub</Authors>
        <Genre>Fantasy</Genre>
        <Language>English</Language>
        <Series>Jack Sawyer Trilogy</Series>
    </book_name>
    <book_name>
        <Title>Return of the King</Title>
        <Authors>John Ronald Reuel Tolkien</Authors>
        <Genre>Fantasy</Genre>
        <Language>English</Language>
        <Series>Lord of the Rings</Series>
    </book_name>
    <book_name>
        <Title>A Dance with Dragons</Title>
        <Authors>George R. R. Martin</Authors>
        <Genre>Fantasy</Genre>
        <Language>English</Language>
        <Series>A Song of Ice and Fire</Series>
    </book_name>
</fantast_books>

Finally, below is the JASN Code:


{"FantasyBooks":[
    {"Title":"The Talisman","Authors":["Karl Stephen King, Peter Straub"],"Genre":"Fantasy","Language":"English","Series":"Jack Sawyer Trilogy"},
    {"Title":"Return of the King","Authors":"John Ronald Reuel Tolkien","Genre":"Fantasy","Language":"English","Series":"Lord of the Rings"},
    {"Title":"A Dance with Dragons","Authors":["George R. R. Martin"],"Genre":"Fantasy","Language":"English","Series":"A Song of Ice and Fire"}
    ]}

Part 3: R Ingestion

Below is the R code for ingestion:

JASN file

JASN_File<-"https://raw.githubusercontent.com/goygoyummm/Data607_R/main/20230307_Data_607_Assignment_7_Fantasy_Books_JSON.json"
JASN_Table<-fromJSON(JASN_File)
JASN_Table<-JASN_Table[['FantasyBooks']]
JASN_Table<- data.table(JASN_Table)

XML file

XML_File <- read_xml("https://raw.githubusercontent.com/goygoyummm/Data607_R/main/20230307_Data_607_Assignment_7_Fantasy_Books_eX.xml")
XML_Table<- xmlTreeParse(XML_File, useInternal=TRUE)
XML_Table<- xmlToDataFrame(XML_Table)
XML_Table<- data.table(XML_Table)

HTML File

HTML_File <- read_html("https://raw.githubusercontent.com/goygoyummm/Data607_R/main/20230307_Data_607_Assignment_7_Fantasy_Books_Hypertext.html")
HTML_Table <- html_table(HTML_File)[[1]]
HTML_Table<- data.table(HTML_Table)

Part 4: Results

A glimpse of each table displays one difference in how the different files were ingested into R. In the JASN file, ‘author’ was formatted as a list, while the HTML and XML files ‘author’ was formatted as a character.

glimpse(HTML_Table)
## Rows: 3
## Columns: 5
## $ Title    <chr> "The Talisman", "Return of the King", "A Dance with Dragons"
## $ Authors  <chr> "Karl Stephen King, Peter Straub", "John Ronald Reuel Tolkien…
## $ Genre    <chr> "Fantasy", "Fantasy", "Fantasy"
## $ Language <chr> "English", "English", "English"
## $ Series   <chr> "Jack Sawyer Trilogy", "Lord of the Rings", "A Song of Ice an…
glimpse(XML_Table)
## Rows: 3
## Columns: 5
## $ Title    <chr> "The Talisman", "Return of the King", "A Dance with Dragons"
## $ Authors  <chr> "Karl Stephen King, Peter Straub", "John Ronald Reuel Tolkien…
## $ Genre    <chr> "Fantasy", "Fantasy", "Fantasy"
## $ Language <chr> "English", "English", "English"
## $ Series   <chr> "Jack Sawyer Trilogy", "Lord of the Rings", "A Song of Ice an…
glimpse(JASN_Table)
## Rows: 3
## Columns: 5
## $ Title    <chr> "The Talisman", "Return of the King", "A Dance with Dragons"
## $ Authors  <list> "Karl Stephen King, Peter Straub", "John Ronald Reuel Tolkien…
## $ Genre    <chr> "Fantasy", "Fantasy", "Fantasy"
## $ Language <chr> "English", "English", "English"
## $ Series   <chr> "Jack Sawyer Trilogy", "Lord of the Rings", "A Song of Ice a…

The list formatted object ‘author’ was converted to character format below.

JASN_Table<-unnest(JASN_Table, Authors)
glimpse(JASN_Table)
## Rows: 3
## Columns: 5
## $ Title    <chr> "The Talisman", "Return of the King", "A Dance with Dragons"
## $ Authors  <chr> "Karl Stephen King, Peter Straub", "John Ronald Reuel Tolkien…
## $ Genre    <chr> "Fantasy", "Fantasy", "Fantasy"
## $ Language <chr> "English", "English", "English"
## $ Series   <chr> "Jack Sawyer Trilogy", "Lord of the Rings", "A Song of Ice an…

After converting the list formate to character formate, the tables produced by each file type were identical as seen below.

datatable(
  HTML_Table, extensions = 'FixedColumns',
  options = list(
  dom = 't',
  scrollX = TRUE,
  scrollCollapse = TRUE
))
datatable(
  XML_Table, extensions = 'FixedColumns',
  options = list(
  dom = 't',
  scrollX = TRUE,
  scrollCollapse = TRUE
))
datatable(
  JASN_Table, extensions = 'FixedColumns',
  options = list(
  dom = 't',
  scrollX = TRUE,
  scrollCollapse = TRUE
))

Part 5: Conclusion

All three tables are identical after converting the list formatted variable to character formatted variable. In terms of writing and manipulating HTML, XML and JASN files, writing the tables in JASN was a little more intuitive than HTML and XML.