Assignment 7 HTML and JSON

Author

Mei Qi Ng

Published

March 12, 2026

Approach Overview

To gain experience with working with structured data in HTML and JSON formats and to prepare these data to be used in R as data frames.

I using book data centered on the subject of personal growth, written by women authors. The data set consist of three books, one of which includes multiple authors. This will be used to demonstrate the different data formats in a list form. ## Running Code

The selected books are

  • Girlhood by Melissa Febos (2021)

  • The High 5 Habit by Mel Robbins (2021)

  • Burnout: The Secret to Unlocking the Stress Cycle by Emily Nagoski and Amelia Nagoski (2019)

Data Description

Book record attributes: Title, Author, Publication Year, Publisher, Genre

I chose these attributes as they were common details that can be found on websites and looked different in different file formats.

First, I will manually create

  • HTML file showing a table containing book information. Each row will be a boos , each column will be an book attribute. If the book has more than 1 author, it will list authors as a single text string separated by semicolons.

  • JSON file with the same book information being stored via nesting objects and arrays in a hierarchical structure. Each book stored as an objected with named attribute and the author will be in an array so it can handle multiple authors for certain books

Data Strategy Proposal

I will loading R packages (rvest, jsonlite, dyplr, stringr, janitor, and purr) to assist with loading the HTML and JSON files into data frames in R and to perform necessary transformation so that the resulting data frames share the same structure, columns names, and data type for smooth data analysis and comparison.