The objective of this assignment is to create HTML and JSON data files containing information about selected adventure books and import them into R for analysis.
Approach
For this assignment, I created a small dataset containing three adventure books written by authors whose works I enjoy reading, including Alexandre Dumas, James Rollins, and H. Rider Haggard.
The selected books are:
The Three Musketeers
Sandstorm
King Solomon’s Mines
The dataset contains the following variables:
Title – Name of the book
Authors – Author or authors of the book
Publication Year – Year the book was originally published
Publisher – Publishing company responsible for the book
Genre – Literary genre of the book
The same dataset is stored in two different formats:
HTML table – representing structured tabular data often found on websites.
JSON file – representing hierarchical data commonly used by APIs.
Working with both formats will demonstrate how structured information from web sources can be imported and converted into R data frames.
Anticipated Challenges
One of the main challenges when working with web-based data formats is that they often represent information in different structures. In HTML files, tabular data is stored inside table elements, which require specialized packages such as rvest to extract the data. JSON files, on the other hand, store information using nested objects and arrays, which must be parsed using packages such as jsonlite. Another challenge involves handling nested values such as lists of authors. In JSON format, multiple authors may be stored as an array, which must be converted into a format suitable for analysis within R.
Implementation of Data Import
The following code demonstrates how the HTML table and JSON file can be imported into R and converted into data frames for further analysis.
library(rvest)library(jsonlite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(gt)library(stringr)
The HTML file contains a table that lists the books and their attributes. Using the rvest package, the table can be extracted and converted into a data frame.
The JSON file stores the same dataset but in a hierarchical format. The jsonlite package can be used to convert the JSON structure into an R data frame.
# Convert to base data.frame (removes tibble class differences)books_html_df <-as.data.frame(books_html_df)books_json_df <-as.data.frame(books_json_df)# Standardize column namescolnames(books_html_df) <-c("title","authors","publication_year","publisher","genre")# Convert JSON authors list to stringbooks_json_df$authors <-sapply(books_json_df$authors, paste, collapse =", ")# Final comparisonidentical(books_html_df, books_json_df)
[1] TRUE
Conclusion
The HTML and JSON datasets were imported into R and converted into data frames. Because the JSON file stored authors as an array while the HTML table stored them as a single string, the JSON author field was collapsed into a comma-separated string. After standardizing the column names and data frame structures, the identical() function confirmed that the two datasets were identical.
Reference
OpenAI. (2026, March 15). Conversation about comparing HTML and JSON datasets in R using identical() and all.equal(). ChatGPT. https://chat.openai.com/