library(tidyverse)
library(lubridate)
goodreads_df <-
read_csv("https://myxavier-my.sharepoint.com/:u:/g/personal/ferravillalobosd_xavier_edu/IQByyhKcA9t_TL7hMzr7EElQATVQOA60pdSZ8VKuRWEGAbQ?download=1")Books You Should Read
INTRODUCTION
Goodreads has many of reviews of books, making it a good source of information about reading trends and popular books.
For this project, I scraped data from their list: Books That Everyone Should Read at Least Once.
Reading was a hobby I loved growing up, but it has been harder to keep up with as life has gotten busier. Looking at this Goodreads data, especially average ratings and how many people rated each book, helps me figure out which books are worth adding to my TBR list.
This project uses that dataset to find
What do readers value?
Which books stand out?
THE DATA
Background
I scraped the data from the Goodreads webpage using R. To import the data you will use this:
That is how we got the titles, authors, the text of the Goodreads ratings, and the link.
After scraping, I cleaned the dataset by separating the rating text into two numeric variables:
Average rating
Amount of ratings
Variables
This gave us information on 100 books and for each, we now know the:
title
author
rating text (average rating and amount of ratings combined)
link
average rating
amount of ratings
VISUALIZATIONS
1. Distribution of Average Goodreads Ratings
Here, we can see most books on the list are rated between that 4 to 4.4 rating, so we know to expect higher ratings.
2. The Amount of Ratings
This looks at how many ratings are typically given. The number of ratings vary widely, from thousands to millions. That’s also why we are using a log scale to show a more balanced distribution.
So far, the last couple of visualizations have showed us what to expect as for ratings and reviews. The average ratings are between 4 to 4.4 and a million reviews is typical for a book on this list.
3. Relationship between Amount of Ratings and the Rating
In the following visual, we see that there is not much of a relationship at all between how many ratings a book has and how highly it is rated. Popularity does not guarantee a higher rating.
4. Most Rated Books
Here we have the books with the highest amount of ratings. Many of these are modern books, have their series, and have their own movies!
CONCLUSION
This analysis shows that even among widely recommended books, there is a lot of variation in average ratings and the amount as well. Understanding what counts as a “typical” number of ratings and what a strong average rating looks like helps make sense of which books truly resonate with people. Exploring this data helped me think more intentionally about what book deserves a spot on my TBR list. So, maybe it’s finally time to read the Harry Potter series. Hopefully this analysis helps you decide what matters most to you as a reader and inspires your next pick too.