ggplot2: A visualization package that implements the Grammar of Graphics to create complex and customizable plots.
tidyverse: A collection of R packages designed for data science, emphasizing tidy data principles for streamlined data manipulation and visualization.
dplyr: A data manipulation package that provides functions for transforming and summarizing tabular data with a clear syntax.
completejourney: Contains datasets related to customer purchases, demographics, and promotional details for analyzing consumer behavior.
lubridate: Simplifies the manipulation and formatting of date-time objects for easier time series analysis.
kableExtra: Enhances knitr::kable() with additional formatting options for creating visually appealing tables in HTML and LaTeX.
scales: Provides functions to format axis labels and control scales in ggplot2 plots, improving readability.
plotly: An interactive graphing library that allows the creation of web-based interactive plots, integrating well with ggplot2.
RColorBrewer: Offers color palettes for creating visually appealing plots, especially useful in maps and categorical data visualizations.
viridis: Provides perceptually uniform color palettes that enhance accessibility and clarity in visualizations.
broom: Converts statistical model objects into tidy data frames, summarizing results into a standard format for easier analysis.
transactions <- transactions_sample
products <- products
combined_data <- transactions %>%
left_join(products, by = "product_id")
department_sales <- combined_data %>%
group_by(department) %>%
summarise(Total_Sales = sum(quantity), .groups = 'drop') %>%
arrange(desc(Total_Sales))
department_sales <- department_sales %>%
mutate(Percentage = Total_Sales / sum(Total_Sales) * 100)
num_departments <- nrow(department_sales)
custom_colors <- viridis::viridis(num_departments)
ggplot(department_sales, aes(x = "", y = Total_Sales, fill = department)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y") +
labs(title = "Sales Contribution by Department",
fill = "Department") +
theme_void() +
scale_fill_manual(values = custom_colors) +
theme(legend.position = "right")
We analyzed the sales contribution of different departments and the sales contribution by each department was represented by a pie graph, this was done to understand which department contributes the most sales which will allow us to plan for better resource allocations, promotions and inventory management. We utilized the transaction_sample and products datasets from the complete packages journey, then we merged these datasets with the product_id column. After joining the datasets the sales were aggregated by departments by summing up all the sales made by each department.
We found that the department with the highest sales was fuel, but in the pie graph you can also see the other departments that came second and third. By visualizing the percentage of sales for each department it becomes clear how sales are spread across different categories, this allows us to see if were only dependent on one department for most of the overall sales. When we understand which department contributes the most sales, it helps us tailor our marketing efforts so that the high-performance department which is fuel can benefit from enhanced promotions to capitalize sales but also see targeted offers to boost sales for other departments. I would recommend Regork to capitalize on their fuel sales and analyze all the other departments to understand the low sales and develop strategies to improve sales.
monthly_sales <- combined_data %>%
mutate(transaction_date = as.Date(transaction_timestamp)) %>%
group_by(month = floor_date(transaction_date, "month")) %>%
summarise(Total_Sales = sum(quantity), .groups = 'drop')
ggplot(monthly_sales, aes(x = month, y = Total_Sales)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "blue", size = 2) +
labs(title = "Monthly Sales Trend",
x = "Month",
y = "Total Sales") +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = scales::comma)
We analyzed the monthly sales trends in the datasets to focus on how the quantity of items fluctuates over time. We wanted to analyze peak sales months or periods of decline throughout the year. We used the transaction_sample and products database from the complete journey package to identify the sales trends over time. The transaction_timestamp field was converted into a date format then the sales were grouped by date using floor_date function to round each date down to the beginning of its respective month. The total quantity of items sold was calculated using summarize ().
We found out that sales peak around September and plummet in January. By identifying what months have a peak in sales it can give us a better understanding of consumer behavior overtime and could that be in correlation to the holidays coming up or certain events of the year. We propose to drive promotions and deals in high volume sales months to maximize revenue by launching new campaigns and promos. We also propose to address the slow periods like January and suggest that we increase those campaigns to drive sales up when they plummet, especially after the holidays after everyone has spent their money.
top_products <- combined_data %>%
group_by(department) %>%
summarise(Total_Sales = sum(quantity),
Avg_Price = mean(sales_value, na.rm = TRUE),
.groups = 'drop') %>%
arrange(desc(Total_Sales)) %>%
head(5)
ggplot() +
geom_bar(data = top_products, aes(x = reorder(department, Total_Sales), y = Total_Sales),
stat = "identity", fill = "steelblue", position = "dodge") +
geom_line(data = top_products, aes(x = reorder(department, Total_Sales),
y = Avg_Price * max(top_products$Total_Sales) / max(top_products$Avg_Price)),
color = "orange", size = 1, group = 1) +
geom_point(data = top_products, aes(x = reorder(department, Total_Sales),
y = Avg_Price * max(top_products$Total_Sales) / max(top_products$Avg_Price)),
color = "orange", size = 2) +
labs(title = "Top 5 Products: Total Sales and Average Price",
x = "Product Department",
y = "Total Sales") +
scale_y_continuous(labels = scales::comma,
sec.axis = sec_axis(~ . * max(top_products$Avg_Price) / max(top_products$Total_Sales),
name = "Average Price")) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
We decided to identify the top 5 departments by total sales and compared their performances based one “total sales” and “average price of products” the analysis helps identify which departments not only sell the most products but also maintain higher average prices. We decided to use combines_data and merge it to transaction_sample and products databases and focus on the sales and pricing at the department level. For each department the total sales and average price are calculates and we chose the top five departments based on total sales, a bar chart is chosen to visualize the top 5 departments while a line graph is used to show the price average for each department. We found that the top five departments were fuel miscellaneous, grocery, drug gm, and produce with produce being the lowest and fuel being the highest. We recommend an adjustment in pricing strategy for departments with high total sales, but low average prices might benefit from price increases if the demand is inelastic while departments that have low sales might benefit from a price reduction. We also suggest targeted promotion towards low sales to increase sales volumes.
model <- lm(Total_Sales ~ month, data = monthly_sales)
future_months <- data.frame(month = seq(max(monthly_sales$month) + months(1), by = "month", length.out = 12))
predictions <- predict(model, newdata = future_months, interval = "confidence")
forecast_data <- bind_rows(
monthly_sales %>% mutate(Type = "Historical"),
future_months %>% mutate(Total_Sales = predictions[, "fit"], Type = "Predicted"),
future_months %>% mutate(Lower_CI = predictions[, "lwr"], Upper_CI = predictions[, "upr"], Type = "Confidence Interval")
)
ggplot(forecast_data, aes(x = month)) +
geom_line(aes(y = Total_Sales, color = Type), size = 1) +
geom_point(aes(y = Total_Sales, color = Type), size = 2) +
geom_ribbon(data = forecast_data %>% filter(Type == "Confidence Interval"),
aes(ymin = Lower_CI, ymax = Upper_CI),
fill = "lightblue", alpha = 0.5) +
labs(title = "Monthly Sales Trend and Forecast with Confidence Intervals",
x = "Month",
y = "Total Sales") +
scale_y_continuous(labels = scales::comma) +
scale_color_manual(values = c("Historical" = "blue", "Predicted" = "orange", "Confidence Interval" = "lightblue")) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This predictive analysis uses historical sales data and a linear regression model to forecast future trends. The model predicts a rise in sales over the next 12 months, signaling positive consumer demand. However, variability in predictions, indicated by confidence intervals, suggests uncertainty in exact sales figures, emphasizing the need for ongoing monitoring.
Key actionable insights include increasing inventory levels to meet rising demand, planning marketing campaigns to capitalize on growth, and optimizing promotions around peak sales periods. If signs of a downturn appear, businesses should investigate factors affecting demand, adjust marketing strategies, and prepare for supply chain and inventory challenges.
Overall, the analysis offers critical insights into future sales trends, helping guide strategic decisions. Continuous comparison of actual sales against predictions will enable businesses to respond flexibly to market changes, ensuring they remain prepared for both growth opportunities and potential risks.
department_sales <- combined_data %>%
group_by(department) %>%
summarise(Total_Sales = sum(quantity),
Avg_Sales_Per_Transaction = mean(quantity, na.rm = TRUE),
Total_Transactions = n(),
.groups = 'drop') %>%
mutate(Percentage_Contribution = Total_Sales / sum(Total_Sales) * 100) %>%
arrange(desc(Total_Sales))
summary_table <- department_sales %>%
select(department, Total_Sales, Avg_Sales_Per_Transaction, Total_Transactions, Percentage_Contribution)
kable(summary_table, caption = "Detailed Summary of Sales by Department") %>%
kable_styling(full_width = FALSE, position = "left")
| department | Total_Sales | Avg_Sales_Per_Transaction | Total_Transactions | Percentage_Contribution |
|---|---|---|---|---|
| FUEL | 6479260 | 9613.1454006 | 674 | 83.2307603 |
| MISCELLANEOUS | 1208045 | 4718.9257812 | 256 | 15.5182079 |
| GROCERY | 63730 | 1.3442028 | 47411 | 0.8186577 |
| DRUG GM | 10199 | 1.2841853 | 7942 | 0.1310135 |
| PRODUCE | 9691 | 1.2731214 | 7612 | 0.1244879 |
| MEAT-PCKGD | 4241 | 1.3195395 | 3214 | 0.0544787 |
| MEAT | 3358 | 1.3257008 | 2533 | 0.0431359 |
| DELI | 2009 | 1.0847732 | 1852 | 0.0258071 |
| PASTRY | 1422 | 1.3339587 | 1066 | 0.0182666 |
| NUTRITION | 1290 | 1.3451512 | 959 | 0.0165710 |
| SEAFOOD-PCKGD | 384 | 1.2268371 | 313 | 0.0049328 |
| SALAD BAR | 325 | 1.1245675 | 289 | 0.0041749 |
| COSMETICS | 242 | 1.0297872 | 235 | 0.0031087 |
| FLORAL | 164 | 1.2058824 | 136 | 0.0021067 |
| SEAFOOD | 109 | 1.1122449 | 98 | 0.0014002 |
| SPIRITS | 74 | 1.0882353 | 68 | 0.0009506 |
| GARDEN CENTER | 43 | 1.7916667 | 24 | 0.0005524 |
| CHEF SHOPPE | 29 | 1.0740741 | 27 | 0.0003725 |
| TRAVEL & LEISURE | 21 | 1.0000000 | 21 | 0.0002698 |
| COUPON | 20 | 1.3333333 | 15 | 0.0002569 |
| RESTAURANT | 16 | 1.0666667 | 15 | 0.0002055 |
| GM MERCH EXP | 9 | 1.2857143 | 7 | 0.0001156 |
| FROZEN GROCERY | 6 | 1.0000000 | 6 | 0.0000771 |
| AUTOMOTIVE | 3 | 1.0000000 | 3 | 0.0000385 |
| PHOTO & VIDEO | 2 | 2.0000000 | 1 | 0.0000257 |
| CNTRL/STORE SUP | 1 | 1.0000000 | 1 | 0.0000128 |
| NA | 1 | 0.0045045 | 222 | 0.0000128 |
Our analysis focused on the sales contribution by department at Regork, revealing that fuel was the top-selling category. We recommend that Regork prioritize the growth of high-performing departments while investigating the factors contributing to lower performance in other areas.
We examined monthly sales trends and discovered that sales dramatically increase in September and decline sharply in January. Based on this trend, we suggest that Regork adjust its marketing strategies to capitalize on high-demand periods and effectively address low-demand months, thereby driving sales volumes.
In our analysis of top departments, sales, and pricing, we identified that produce had the lowest sales compared to the other departments, while fuel ranked the highest. We recommend that Regork adjust its pricing strategies to take advantage of the inelastic demand for fuel and consider ways to boost sales in lower-performing categories, thereby increasing overall sales volume.