Books You Should Read

Author

Diana Ferra Villalobos

Published

May 7, 2026

INTRODUCTION

Goodreads has many of reviews of books, making it a good source of information about reading trends and popular books.

For this project, I scraped data from their list: Books That Everyone Should Read at Least Once.

Reading was a hobby I loved growing up, but it has been harder to keep up with as life has gotten busier. Looking at this Goodreads data, especially average ratings and how many people rated each book, helps me figure out which books are worth adding to my TBR list.

This project uses that dataset to find

  • What do readers value?

  • Which books stand out?

THE DATA

Background

I scraped the data from the Goodreads webpage using R. To import the data you will use this:

library(tidyverse)
library(lubridate) 

goodreads_df <- 
  read_csv("https://myxavier-my.sharepoint.com/:u:/g/personal/ferravillalobosd_xavier_edu/IQByyhKcA9t_TL7hMzr7EElQATVQOA60pdSZ8VKuRWEGAbQ?download=1")

That is how we got the titles, authors, the text of the Goodreads ratings, and the link.

After scraping, I cleaned the dataset by separating the rating text into two numeric variables:

  • Average rating

  • Amount of ratings

Variables

This gave us information on 100 books and for each, we now know the:

  • title

  • author

  • rating text (average rating and amount of ratings combined)

  • link

  • average rating

  • amount of ratings

VISUALIZATIONS

1. Distribution of Average Goodreads Ratings

Here, we can see most books on the list are rated between that 4 to 4.4 rating, so we know to expect higher ratings.

2. The Amount of Ratings

This looks at how many ratings are typically given. The number of ratings vary widely, from thousands to millions. That’s also why we are using a log scale to show a more balanced distribution.

So far, the last couple of visualizations have showed us what to expect as for ratings and reviews. The average ratings are between 4 to 4.4 and a million reviews is typical for a book on this list.

3. Relationship between Amount of Ratings and the Rating

In the following visual, we see that there is not much of a relationship at all between how many ratings a book has and how highly it is rated. Popularity does not guarantee a higher rating.

4. Most Rated Books

Here we have the books with the highest amount of ratings. Many of these are modern books, have their series, and have their own movies!

5. Author’s on the MUST READ list

I can take a guess who will be that #1 spot. And to no one’s surprise:

Great author’s have great books on this list.

CONCLUSION

This analysis shows that even among widely recommended books, there is a lot of variation in average ratings and the amount as well. Understanding what counts as a “typical” number of ratings and what a strong average rating looks like helps make sense of which books truly resonate with people. Exploring this data helped me think more intentionally about what book deserves a spot on my TBR list. So, maybe it’s finally time to read the Harry Potter series. Hopefully this analysis helps you decide what matters most to you as a reader and inspires your next pick too.