For this assignment, I will select three books that are related to the same subject. For each book, I will record the title, author(s), and a few additional attributes such as publication year, publisher, and genre. At least one of the books will include multiple authors to meet the assignment requirement.
First, I will manually create an HTML file that contains a table with the book information. The table will include columns for each attribute so the data is clearly organized and easy to read.
Next, I will manually create a JSON file that contains the same information but in JSON format. Each book will be represented as an object with fields for the title, authors, publication year, publisher, and genre.
After creating the two files, I will use R to load the data from both sources. I plan to use the rvest package to read the HTML table and convert it into a data frame, and the jsonlite package to read the JSON file and convert it into another data frame.
One challenge I anticipate is that the data structures from HTML and JSON may not load in exactly the same format in R. For example, the authors field in JSON may appear as a list if there are multiple authors, while the HTML table may store them as a single text string. Because of this, I may need to adjust or clean the data slightly before comparing the two data frames.
Finally, I will compare the two data frames in R to check whether they contain the same data. This will confirm that both the HTML and JSON files correctly represent the same dataset even though they use different formats.