Hogsmeade Sales EDA
Introduction
This is a Exploratory Data Analysis of the data set Hogsmeade Sales using RStudio and Quarto. There will be at least 8 graphs/ tables with clear interpretation, well-documented code, and creative touches.I will be using ggplot for visualization and dplyr for data manipulation extensively.
The final report will be structured with an Introduction, analysis sections with headings, and a Conclusion, and we’ll finish by showing the publish report on RPubs.
1 Bar chart of Orders by Region
From this chart, we can observe that Diagon Alley has the highest number of orders (its bar is tallest), making it the largest market among the regions.
2 Bar Chart - Total Sales by Product Category
The categories with the highest revenue are displayed in the Total Sales by Product Category chart.
The categories are ranked from tallest to shortest bar. Books most likely have the largest overall sales (the bar for books is a little higher than the others). Wands and brooms may trail closely after. The categories of Robes, Potions, Herbs, and Artefacts will show up somewhat lower.
3 Table of top 5 products by total sales
| Product Name | total_sales |
|---|---|
| The History of Magic | 41961.00 |
| Flying Carpet | 40822.70 |
| Firebolt | 33882.71 |
| Advanced Potion-Making | 33752.43 |
| Time-Turner | 32382.69 |
Five product names and their total sales (the total of all order sales for that product) should be listed in the Top 5 Products by Total Sales table.
The History of Magic, which has the highest overall sales (about $41,000)
Flying Carpet: (second highest, for example, $40,000+)
Firebolt: (a high-end broom with total sales in the mid-30,000s, perhaps in the top 5)
Advanced Potion-Making is a well-known book that costs about $30,000.
Time-Turner: (a priceless relic, in the low $30,000s)
4 Histogram of the distribution of order Sales values
The distribution of order totals is displayed in the Distribution of Order Sales Values histogram. The histogram is probably right-skewed, with a longer tail on the right side. The majority of orders are in lower sales categories; for instance, a significant portion of orders may be in the $0–$200 category (the first few bins).
The top bins receive fewer orders. The bars often get shorter as the sales quantity rises (going right on the x-axis), signifying a decline in frequency.
5 Scatter plot of Units Sold vs. Sales for each order
Each point in the Units Sold vs. Order Sales scatter plot indicates the size (in terms of items) and total sales value of a single order. The data shows that no order contained more than ten items, hence the x-axis runs from 1 to 10 units. The order’s sales in currencies are shown on the y-axis.
There is a general upward trend: orders with more units typically result in larger sales (the cloud of dots extends higher as Units Sold increases). This makes reasonable because purchasing more things typically results in more spending.
6 # Convert Order Date to Date type and aggregate sales by month
The Total Sales Over Time (Monthly) line graph displays the monthly sales total from the beginning of 2022 to the end of 2024. Important takeaways from this time frame could be:
- Seasonal Peaks: Every year around December, there are discernible peaks in the queue.
Actually, every December (2022, 2023, and 2024) hits a peak; December 2024 is probably the highest month in the statistics, and December 2022 and 2023 are also among the best months.
This clearly points to a year-end spike in sales during the holiday season (perhaps as a result of holiday shopping or events).
- Yearly Trend: When examining the general trajectory year by year, sales throughout these three years appear to be relatively stable with regular seasonal variation rather than exhibiting a sharp increase or declining trend.
7 Boxplot of order sales by customer type
All client types—Hogwarts Staff, Hogwarts Students, Witches, and Wizards have relatively identical order sales, according to the boxplot, with each group’s median sales value falling within the same range.
The vast range of sales values in each category, however, shows that customers’ spending per order varies.
8 Scatter plot of Profit vs Sales
The graph shows a strong positive linear relationship between sales and profit per order. Each point represents an individual order, and the trend line indicates that as sales increase, profit also rises proportionally.
The close clustering of points around the line suggests a consistent pattern, meaning higher sales generally lead to higher profits across the dataset.
Bonus MEME
“When you finally fix one error in R… only to trigger two more. Story of every data analyst’s life