Goodreads Top 100

Author

Ken

Method

To explore the relationship between popularity and perceived quality, I scraped data from the Goodreads Top 100 Highest Rated Books, which feature books with both high ratings and at least 10,000 reviews - making it a reliable source for this analysis.

Using R and html packages like ‘rvest’, ‘tidyverse’, and ‘httr’, I extracted the following variables: book title, author name, average rating, and number of ratings. A polite user agent was used to identify the scraping session, and only necessary HTML elements were collected.

After cleaning and formatting data, I used summary tables and visualizations to examine how average rating and total number of ratings relate. Graphs such as histograms, boxplots, and scatterplots were created to reveal patterns in reader reception across titles.

title author avg_rating
Heaven Official’s Blessing: Tian Guan Ci Fu (Novel) Vol. 8 Alena Mornštajnová 4.81
Words of Radiance (The Stormlight Archive, #2) Brandon Sanderson 4.76
Light Bringer (Red Rising Saga, #6) Pierce Brown 4.76
Berserk, Vol. 12 Bruce D. Perry 4.75
Magical Midlife Battle (Leveling Up, #8) Santiago Posteguillo 4.75
The Warden and the Wolf King (The Wingfeather Saga, #4) Andrew Peterson 4.74
It’s a Magical World (Calvin and Hobbes, #11) Bill Watterson 4.73
Kingdom of Ash (Throne of Glass, #7) Sarah J. Maas 4.71
Grandmaster of Demonic Cultivation: Mo Dao Zu Shi (Novel) Vol. 4 Mò Xiāng Tóng Xiù 4.71
பொன்னியின் செல்வன், முழுத்தொகுப்பு Will Wight 4.70

The top 10 highest-rated books include a mix of novels, graphic novels, and fantasy sagas. Interestingly, most of these titles belong to long-running series, suggesting that readers who are already investing in a series, suggesting that readers who are already invested in a series are more likely to give high ratings. The highest-rated is Heaven Official’s Blessing, which appears to have both high average rating and wide popularity - showing strong reader loyalty.

title author num_ratings
Heaven Official’s Blessing: Tian Guan Ci Fu (Novel) Vol. 8 Alena Mornštajnová 4.81
Words of Radiance (The Stormlight Archive, #2) Brandon Sanderson 4.76
Light Bringer (Red Rising Saga, #6) Pierce Brown 4.76
Berserk, Vol. 12 Bruce D. Perry 4.75
Magical Midlife Battle (Leveling Up, #8) Santiago Posteguillo 4.75
The Warden and the Wolf King (The Wingfeather Saga, #4) Andrew Peterson 4.74
It’s a Magical World (Calvin and Hobbes, #11) Bill Watterson 4.73
Kingdom of Ash (Throne of Glass, #7) Sarah J. Maas 4.71
Grandmaster of Demonic Cultivation: Mo Dao Zu Shi (Novel) Vol. 4 Mò Xiāng Tóng Xiù 4.71
பொன்னியின் செல்வன், முழுத்தொகுப்பு Will Wight 4.70

The top 10 most popular books (by number of ratings) closely mirror the highest-rated list. This overlap suggests that popularity and quality, as perceived by Goodreads users, may be strongly correlated. However, it’s important to note that high visibility and built-in fanbases (I .e. sequels or fantasy sagas) may inflate both numbers.

The boxplot summarizes the spread and distribution of the average ratings. Most books fall within a tight range between about 4.57 and 4.58. The median rating is around 4.61, which is relatively high, and a few outliers above 4.75. This tells us that Goodreads users generally rate books positively, and the top-rated books stand out at statistical outliers, which strengthen their case for being high quality - not just slightly better, but meaningfully better than the rest.

#| label: histogram
#| echo: false
ggplot(goodreads_df, aes(x = avg_rating)) +
  geom_histogram(bins = 10, fill = "darkgreen", color = "white") +
  labs(title = "Distribution of Average Ratings")

The histogram shows how many books fall into each rating bucket. The majority of books have rating between 4.55 and 4.65, and only a small number exceed 4.75. This reinforces what is shown in the boxplot - most books are very well-liked, but only a handful rise to the top in terms of rating. This small group is what’s considered exceptional, many of these in this bucket are part of fantasy series or graphic novels. This suggests that genre and format may also influence high ratings, even if they aren’t always the most rated.

goodreads_df %>%
  ggplot(aes(x = num_ratings, y = avg_rating)) +
  geom_point(color = "steelblue") +
  labs(title = "Rating vs Number of Ratings",
       x = "Number of Ratings",
       y = "Average Rating")

The scatterplot shows the relationship between a book’s average rating and how many ratings it has received. At first glance, you might expect more ratings to be a sign of higher quality. However, the plot suggests that books with more ratings don’t necessarily have the highest average ratings. Most books cluster tightly in the 4.5 to 4.8 range, with no strong trend between popularity and rating. This supports ideas that popularity and quality aren’t always linked - a book can be widely read but not be the highest rated, and vice versa.