How do genre, price, and ratings influence the popularity of books at a fictional online bookstore?
Explanation
This question explores the dynamics of book selection in a fictional online bookstore. This is done with a focus on how different genres fare in terms of pricing, ratings and stock availability. While this bookstore is fictional it does provide a template on how one could get insights on consumer behavior and preferences which are valuable for bookstore management and marketing strategies.
How I Will Answer this Question
I will use data from Books.toscrape.com. The link to the website is “https://books.toscrape.com/index.html” This is a fictional bookstore website that is meant to help people practice web scraping. While the book store is fictional it will provide a template for how people can analysis a real bookstore. To answer the question I will analyze visualizations that show the average rating by genre, the average price vs rating by genre, average price by genre, a density plot of prices by genre and a violin plot of ratings by genre.
Data Wrangling
I performed the necessary data wrangling on a separate R Script. I did an aggregate of the 4 genres present. I also turned the in stock variable from “in stock” and “out of stock” into a binary variable either 1 or 2. Below is the final dataset for my analysis.
library(readr)library(purrr)library(rvest)
Warning: package 'rvest' was built under R version 4.3.2
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Title, Genre
dbl (3): Price, Rating, Stock
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Visualization 1
Travel has the highest average price among the genres present. This could be due to potential high production costs or a potential greater demand for such books. Sequential art also has a higher average price which reflects the normally high production costs of graphic novels. Mystery has the lowest average price showing that production costs are likely low and a potential lower demand. Lower prices could mean they are more popular as they are more affordable or higher prices could be reflective of a higher demand for travel books.
Visualization 2
Travel has the widest distribution of prices which indicates a wide range of prices with a peak at about 30 dollars. On the other hand mystery has the narrowest distribution with a peak of around 15 meaning it has the most consistent pricing among the genres. More variety in pricing caters to a wider range of customers offering both budget and premium options. Broader price distributions can lead to struggles with availability as demand is harder to predict unlike the mystery genre due to more predictable pricing.
Visualization 3
Historical Fiction has the highest average rating while travel has the lowest average rating. The ratings are pretty similar suggesting there is not a significant difference in customer satisfaction by genre. Travel has the lowest average rating despite the highest average price meaning that the low ratings could be bringing it down or that people do not feel the books are worth the high prices. This may mean historical fiction is the most popular as they have an affordable price at the 2nd lowest in the store while also having the highest average rating.
Visualization 4
Historical fiction has the highest average rating but seems to mostly be in the 3 rating range suggesting a narrow range of ratings. On the other hand travel has a wide range from 5 to 1 with not many ratings suggesting that there are not a lot of options in the travel genre. From this chart Historical fiction seems to satisfy people evenly while travel books vary a lot depending on the book.
Visualization 5
This scatter plot seems to indicate that while higher ratings can contribute to popularity they are not solely dependent on price as books at various prices have a wide range of ratings. Travels average rating seems to be held back by a few low rated books while other books are higher rated while being more expensive.
Conclusion
Popularity at this fictional bookstore seems to be multi-faceted and not solely dependent on any one factor of genre, price, or ratings. Instead, it is the intersection of these factors that shapes consumer behavior. Historical fiction appears to strike a balance between price and satisfaction, potentially making it a consistently popular choice. Mystery’s affordability might drive higher volume sales, making it popular in terms of units sold. The travel genre, with its wide price and rating range, could have polarized popularity, with certain high-quality offerings achieving success despite higher prices. Ultimately, stocking a variety of books across genres, prices, and anticipated ratings would likely serve the bookstore well in catering to a broad range of customer preferences and maximizing overall popularity.