This case study explores the top 50 bestselling books on Amazon between 2009 and 2019. The goal is to identify trends and insights related to genre, pricing, ratings, and author recurrence. These insights can help publishers and retailers understand what drives book sales and make data-driven decisions in publishing and marketing strategies.
bestsellers with categories.csv
library(tidyverse)
library(janitor)
books <- read_csv("bestsellers with categories.csv") %>%
clean_names() %>%
distinct() %>%
mutate(
year = as.integer(year),
genre = as.factor(genre),
author = str_trim(author),
name = str_trim(name)
)
books %>%
count(genre) %>%
ggplot(aes(x = genre, y = n, fill = genre)) +
geom_col(width = 0.6) +
labs(title = "Fiction vs Nonfiction Bestsellers",
x = "Genre", y = "Count") +
scale_fill_manual(values = c("#FF6F61", "#3E92CC")) +
theme_minimal()
📌 Insight: Fiction dominates the bestsellers list, but Non Fiction grows steadily post-2015, indicating a shift in reader interest toward real-world topics.
ggplot(books, aes(x = price, y = user_rating)) +
geom_point(alpha = 0.6, color = "#2E4057") +
geom_smooth(method = "lm", se = FALSE, color = "#FF5733") +
labs(title = "Book Price vs User Rating",
x = "Price (USD)", y = "User Rating") +
theme_minimal()
📌 Insight: No strong correlation found — readers rate books highly regardless of price, suggesting content quality drives satisfaction more than pricing.
ggplot(books, aes(x = reviews, y = user_rating)) +
geom_point(alpha = 0.5, color = "#1B998B") +
geom_smooth(method = "lm", se = FALSE, color = "#EDC948") +
labs(title = "Number of Reviews vs User Rating",
x = "Number of Reviews", y = "User Rating") +
theme_minimal()
📌 Insight: Popular books with high review counts tend to maintain high ratings, suggesting that visibility and strong content are reinforcing factors.
books %>%
count(year, genre) %>%
ggplot(aes(x = year, y = n, fill = genre)) +
geom_col(position = "dodge") +
labs(title = "Bestseller Genre Trend Over Time",
x = "Year", y = "Number of Bestsellers") +
scale_fill_manual(values = c("#E69F00", "#56B4E9")) +
theme_minimal()
📌 Insight: Fiction maintained a lead throughout the decade, while Nonfiction showed a noticeable rise around 2016–2019, aligning with cultural and political moments.
This analysis uncovers useful business and behavioral patterns in reader preferences. Consistent authorship, quality storytelling, and strong visibility all contribute to making a book a long-term bestseller.