── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 1000 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): title, category, availability
dbl (3): price_gbp, rating, page_scraped_from
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Bar chart — average price by genrebooks_data %>%group_by(category) %>%summarise(mean_price =mean(price_gbp)) %>%ggplot(aes(x =reorder(category, mean_price), y = mean_price)) +geom_col(fill ="#2a9d8f") +coord_flip() +labs(title ="Average Price by Genre", x =NULL, y ="Mean Price (GBP)") +theme_minimal()
books_data %>%ggplot(aes(x =reorder(category, rating, FUN = median), y = rating)) +geom_boxplot(fill ="#457b9d", alpha =0.7) +coord_flip() +labs(title ="Star Rating Distribution by Genre", x =NULL, y ="Rating (1-5)") +theme_minimal()
Analysis & Results
The bar chart below shows the average price per genre and there is a noticeable spread across categories. Art and Food & Drink come in as the most expensive genres, averaging well above the others, while Children's and Poetry sit at the lower end. That pattern makes intuitive sense since specialty or illustrated books tend to be pricier regardless of content. This supports the idea that genre is at least somewhat tied to pricing, even if it's not the only factor.
The boxplot tells a different story when it comes to ratings. Most genres have a median rating right around 3 to 4 stars and the boxes are pretty similar in size across the board, meaning the spread within each genre isn't dramatically different either. Fantasy and Science Fiction nudge slightly higher while Self Help shows a bit more variation with some lower ratings pulling it down, but overall there's no genre that stands out as consistently better or worse rated than the rest.
Putting both visuals together, the data suggests that genre has a real relationship with price but not much of one with rating. You can't really use price as a signal for quality here since the more expensive genres aren't getting better reviews. For something like a pricing model or recommendation system, genre would be a useful input for estimating cost but a pretty weak one for predicting how well received a book will be.