📘 Business Task

This case study explores the top 50 bestselling books on Amazon between 2009 and 2019. The goal is to identify trends and insights related to genre, pricing, ratings, and author recurrence. These insights can help publishers and retailers understand what drives book sales and make data-driven decisions in publishing and marketing strategies.


📂 Data Source


🧹 Data Cleaning

library(tidyverse)
library(janitor)

books <- read_csv("bestsellers with categories.csv") %>%
  clean_names() %>%
  distinct() %>%
  mutate(
    year = as.integer(year),
    genre = as.factor(genre),
    author = str_trim(author),
    name = str_trim(name)
  )

🔍 Analysis & Visualizations

🖊️ Top 10 Authors by Bestseller Appearances

books %>%
  count(author, sort = TRUE) %>%
  slice_max(n, n = 10) %>%
  ggplot(aes(x = reorder(author, n), y = n)) +
  geom_col(fill = "#6A0572") +
  coord_flip() +
  labs(title = "Top 10 Authors by Number of Bestsellers (2009–2019)",
       x = "Author", y = "Number of Books") +
  theme_minimal()

📌 Insight: Series authors like Jeff Kinney and Suzanne Collins appear most often, highlighting the commercial power of recurring characters and established franchises.


📚 Fiction vs Nonfiction Distribution

books %>%
  count(genre) %>%
  ggplot(aes(x = genre, y = n, fill = genre)) +
  geom_col(width = 0.6) +
  labs(title = "Fiction vs Nonfiction Bestsellers",
       x = "Genre", y = "Count") +
  scale_fill_manual(values = c("#FF6F61", "#3E92CC")) +
  theme_minimal()

📌 Insight: Fiction dominates the bestsellers list, but Non Fiction grows steadily post-2015, indicating a shift in reader interest toward real-world topics.


💵 Price vs User Rating

ggplot(books, aes(x = price, y = user_rating)) +
  geom_point(alpha = 0.6, color = "#2E4057") +
  geom_smooth(method = "lm", se = FALSE, color = "#FF5733") +
  labs(title = "Book Price vs User Rating",
       x = "Price (USD)", y = "User Rating") +
  theme_minimal()

📌 Insight: No strong correlation found — readers rate books highly regardless of price, suggesting content quality drives satisfaction more than pricing.


💬 Reviews vs Ratings

ggplot(books, aes(x = reviews, y = user_rating)) +
  geom_point(alpha = 0.5, color = "#1B998B") +
  geom_smooth(method = "lm", se = FALSE, color = "#EDC948") +
  labs(title = "Number of Reviews vs User Rating",
       x = "Number of Reviews", y = "User Rating") +
  theme_minimal()

📌 Insight: Popular books with high review counts tend to maintain high ratings, suggesting that visibility and strong content are reinforcing factors.


📌 Key Takeaways


🧠 Recommendations


✅ Conclusion

This analysis uncovers useful business and behavioral patterns in reader preferences. Consistent authorship, quality storytelling, and strong visibility all contribute to making a book a long-term bestseller.