Assignment - Working with HTML and JSON

Author

Ciara Bonnett

Published

March 12, 2026

Introduction

For this assignment, I selected three books related to social justice, personal narratives, and systemic history. My goal is to show that the same dataset can be represented in two different file formats and then loaded into R for comparison.

Selected books:

  • The Talk by Darrin Bell (2024)
  • Concrete Rose by Angie Thomas (2021)
  • Stamped: Racism, Antiracism, and You by Jason Reynolds and Ibram X. Kendi (2020)

Approach

My strategy is to manually author the data files to gain a better understanding of their syntax. I will then host these files in a public GitHub repository. In the next phase, I will use the rvest package to scrape the HTML table and the jsonlite package to parse the JSON objects into R data frames.

Anticipated Challenges

I anticipate two challenges. The first is that the JSON format uses an array of authors, whereas the HTML table treats authors as a single string. I will need to use purrr::map_chr() or paste() to collapse the JSON lists into strings so the two data frames can be compared accurately.

The second challenge is that when rvest scrapes an HTML table, it often defaults all columns to character type. I will likely need to convert the Year column to an integer in both data frames to ensure that all.equal() or identical() do not fail due to type differences.

Early Draft Files

books.html

```html
Title Authors Year Publisher
The Talk Darrin Bell 2024 Macmillan Audio
Concrete Rose Angie Thomas 2021 HarperCollins
Stamped: Racism, Antiracism, and You Jason Reynolds, Ibram X. Kendi 2020 Little, Brown Books for Young Readers

[ { “Title”: “The Talk”, “Authors”: [“Darrin Bell”], “Year”: 2024, “Publisher”: “Macmillan Audio” }, { “Title”: “Concrete Rose”, “Authors”: [“Angie Thomas”], “Year”: 2021, “Publisher”: “HarperCollins” }, { “Title”: “Stamped: Racism, Antiracism, and You”, “Authors”: [“Jason Reynolds”, “Ibram X. Kendi”], “Year”: 2020, “Publisher”: “Little, Brown Books for Young Readers” }]