Assignment 7 HTML and JSON
Approach Overview
To gain experience with working with structured data in HTML and JSON formats and to prepare these data to be used in R as data frames.
I using book data centered on the subject of personal growth, written by women authors. The data set consist of three books, one of which includes multiple authors. This will be used to demonstrate the different data formats in a list form. ## Running Code
The selected books are
Girlhood by Melissa Febos (2021)
The High 5 Habit by Mel Robbins (2021)
Burnout: The Secret to Unlocking the Stress Cycle by Emily Nagoski and Amelia Nagoski (2019)
Data Description
Book record attributes: Title, Author, Publication Year, Publisher, Genre
I chose these attributes as they were common details that can be found on websites and looked different in different file formats.
First, I will manually create
HTML file showing a table containing book information. Each row will be a boos , each column will be an book attribute. If the book has more than 1 author, it will list authors as a single text string separated by semicolons.
JSON file with the same book information being stored via nesting objects and arrays in a hierarchical structure. Each book stored as an objected with named attribute and the author will be in an array so it can handle multiple authors for certain books
Data Strategy Proposal
I will loading R packages (rvest, jsonlite, dyplr, stringr, janitor, and purr) to assist with loading the HTML and JSON files into data frames in R and to perform necessary transformation so that the resulting data frames share the same structure, columns names, and data type for smooth data analysis and comparison.