Assignment 7
Part 1: Question/ My Interest
Which book categories on an online bookstore have the lowest availability, and is there a relationship between availability, price, and ratings?
I think it’s interesting because you can see the inventory limitations which can indicate high demand and supply chain restraints. Also comparing availability across categories helps publishers understand pressure points in inventory.
Part 2: Data Collection Process
Website: https://books.toscrape.com/
I collected:
Book Title
Category
Price (in Pounds)
Rating (1-5 stars)
Availability (Number Available & OG Text)
I wrote a function to scrape one page of books in one specific category. I used a loop to scrape pages of one category, then looped over multiple categories. I extracted all the Category names from the main URL, and their unique URLs along with that and put them together.
Part 3: Data Wrangling
Converted price from text to numbers and removed Pound symbol
Converted rating from text (One, Two, etc) to numbers (1-5)
Extracted number of available from availability; default to 1 if no number
Part 4: CSV file
Visualizations:
Average Price by Rating:
This bar chart compares the average price in Pounds of books across their ratings from 1-5. Surprisingly, 3-star ratings show the highest average price, while 4 and 5-star books are priced similarly to the lower-rated books.
Price Distribution by Rating:
The box plot reveals how prices vary across star ratings, revealing that 5-star books have the widest price range. While their median price is not the highest, the spread suggests that top-rated books can be both budget - friendly or premium editions. It challenges the assumption that higher ratings mean higher prices.
Average Rating by Category:
This bar chart highlights how readers rate different book categories on average. Poetry stands out with the highest rating while Science Fiction receives the lowest average rating, which might reflect a higher expectation for those who enjoy the genre.
Average Price by Category:
This chart compares average book prices across categories, revealing that Travel books are the most expensive. While Mystery books are the most affordable.
Price by Rating:
The scatter plot shows a relationship between book ratings and prices. While prices vary widely across all ratings, the dense cluster appear around 4 and 5-star ratings. This suggests there are the most common. There is no clear trend in price with higher ratings. Popularity or reader satisfaction does not drive pricing in this scenario.