Amazon Scrape
Introduction
I am going to be scraping Amazon’s website and answer a question that I find interesting. The question I am going to answer is “With product attributes (e.g., price, ratings, number of reviews), do these attributes affect the price of a product?” The product category I am using is the electronic category on Amazon’s website. I think this question is interesting because understanding whether product attributes like ratings and the number of reviews affect the price of a product is critical for businesses, sellers, and consumers. For example, sellers on platforms like Amazon can optimize their pricing strategies by knowing how ratings and reviews impact customer perception of value.
How I am Going to Answer This Question
To answer this question, I will download and scrape data from Amazon’s website that has information on a products name, price, ratings, and number of reviews. From there I will I loop through the pages of the website to gather as much information I can. Once I scrape into into a data-frame, I will clean the data to make sure there are no nulls and that eveyrthing looks good. Then, I will analyze the relationship between product attributes ratings, num_reviews and price using visual analysis like correlations, and other plots.
Clean the Dataset
I have to clean the data set to make sure all of the datatypes are the same and that the numeric values are numeric so we can plot them to see different distributions. So, what I did is I took the data set “amazon_data_15_pages” and cleaned and filtered to a data set called “amazon_data_clean.”
Scatterplot: Price vs. Reviews
This graph is a scatter plot that shows Price vs. Reviews. As we can see, the scatterplot shows a slightly negative trend between Ratings (x-axis) and Price (y-axis), as indicated by the downward slope of the red regression line. From this plot, ratings alone do not seem to significantly affect the price of a product. While there is a slight inverse relationship, other attributes might need to be included in the analysis to draw stronger conclusions about what influences price.
Boxplot: Price by Rating Category
Looking at the boxplot, the plot suggests no strong correlation between higher ratings and higher prices. Products in the 4-5 rating category are generally affordable (similar to 3-4), though they include a few high-priced outliers. This implies that high ratings are not exclusively tied to premium pricing, as highly-rated products exist at various price points.
Correlation Matrix
For price and rating, there is a moderate negative correlation (-0.33), indicating that as the price increases, the rating tends to decrease. This suggests that more expensive products might not necessarily be rated higher by customers.
There is no linear relationship between price and the number of reviews. This implies that the price of a product does not significantly impact how many reviews it receives.
Conclusion
Ratings and Price: There appears to be a moderate positive relationship between product ratings and price, where products with higher ratings tend to be more expensive, although exceptions exist.
Number of Reviews and Price: The number of reviews has little to no significant effect on the price of a product. This suggests that while review count is an important factor for consumers, it doesn’t heavily impact the price directly.
Additional Factors: Other factors (such as brand, product category, features) likely play a more significant role in determining product price. Price cannot be fully explained by ratings and reviews alone.
Thus, while ratings seem to have some influence on price, the number of reviews does not appear to have a strong effect on the price of a product.