The dataset for this personal project was sourced from Kaggle. The values in the dataset are made up.
The Global Superstore dataset consists of various columns but the ones that we need for this project are; Order Date which start on 2011-01-01 and end on 2014-12-01, Category and Sub-Category of the products and lastly, Sales which is the selling price of each products.
Data Preparation consisted of removing all the rows that had missing values and removing duplicates. This was done in the excel sheet.
The two graphs below serves as a big picture analysis where we can observe the total sales by category and sub-category
# Calculate total sales for each 'Category'
total_sales_category <- data %>%
group_by(Category) %>%
summarise(Total_Sales_Millions = sum(Sales) / 1e6) # Convert total sales to millions
# Create a bar chart for 'Category' with labels and different colors
ggplot(total_sales_category, aes(x = Category, y = Total_Sales_Millions, fill = Category)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste0(round(Total_Sales_Millions, 2), "M")), vjust = -0.5) +
labs(title = "Distribution of Sales by Category", x = "Category", y = "Total Sales (Millions)") +
scale_fill_discrete() # Use different colors for each category
# Calculate total sales for each 'Sub-Category'
total_sales_subcategory <- data %>%
group_by(SubCategory) %>%
summarise(Total_Sales_Millions = sum(Sales) / 1e6) # Convert total sales to millions
# Create a bar chart for 'Sub-Category' with labels and different colors
ggplot(total_sales_subcategory, aes(x = SubCategory, y = Total_Sales_Millions, fill = SubCategory)) +
geom_bar(stat = "identity", position = position_dodge(width = 1)) +
geom_text(aes(label = paste0(round(Total_Sales_Millions, 2), "M")),
position = position_dodge(width = 1),
vjust = -0.5, hjust = 0.5, size = 3) +
labs(title = "Distribution of Sales by Sub-Category", x = "Sub-Category", y = "Total Sales (Millions)") +
scale_fill_discrete() + # Use different colors for each sub-category
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) # Display x-axis labels vertically
There are three categories of products namely Furniture, Office Supplies and Technology. Technology is the top performing category with a total sales amount of $4.74 millions while Office supplies is the worst performing category with a total sales amount of $3.79 millions.
As for sub-categories, Phones is the best performing sub-category with $1.71 millions in sales while Labels is the worst performing sub-category with $0.07 millions in sales.
# Create a bar chart for the sales amount of 'Furniture' by month and year
ggplot(data1, aes(x = Date, y = Furniture)) +
geom_bar(stat = "identity", fill = "coral") +
labs(title = "Monthly Sales of Furniture",
x = "Date",
y = "Sales($)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
# Create a bar chart for the quantity of 'Office Supplies' by month and year
ggplot(data1, aes(x = Date, y = OfficeSupplies)) +
geom_bar(stat = "identity", fill = "lightgreen") +
labs(title = "Monthly Sales of Office Supplies",
x = "Date",
y = "Sales($)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
# Assuming 'data1' is your dataset
# Create a bar chart for the quantity of 'Technology' by month and year
ggplot(data1, aes(x = Date, y = Technology)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Monthly Sales of Technology",
x = "Date",
y = "Sales($)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
For all the categories, it can be observed that sales increases steadily from start till end of each year. It is also observed that every start of the year sales goes down drastically. Every year, November and December have the highest sales, which can be attributed to holiday season and consumers are spending more.
# Create a time series using the ts function
furniture_ts <- ts(data1$Furniture, start = c(2011, 1), end = c(2014, 12), frequency = 12)
# Plot the time series
plot(furniture_ts, main = "Monthly Sales of Furniture", xlab = "Date", ylab = "Sales($)", type = "l",col = "coral", lwd = 2)
# Create a time series using the ts function
officesupplies_ts <- ts(data1$OfficeSupplies, start = c(2011, 1), end = c(2014, 12), frequency = 12)
# Plot the time series
plot(officesupplies_ts, main = "Monthly Sales of Office Supplies", xlab = "Date", ylab = "Sales($)", type = "l",col = "lightgreen", lwd = 2)
# Create a time series using the ts function
technology_ts <- ts(data1$Technology, start = c(2011, 1), end = c(2014, 12), frequency = 12)
# Plot the time series with a sky blue line and increased thickness
plot(technology_ts, main = "Monthly Sales of Technology", xlab = "Date", ylab = "Sales($)", type = "l", col = "skyblue", lwd = 2)
The time series plot for each category reconfirms that sales increase steadily from start to end of each year and peaking in the last two months. In January, sales goes back down and the cycle starts again.
Since the time series plots above shows a clear trend and seasonality, we applied Exponential Smoothing Forecast to predict future sales.
Exponential smoothing is a time series forecasting method that assigns exponentially decreasing weights to older observations, emphasizing more recent data.
It is most effective when the values of the time series follow a gradual trend and display seasonal behavior in which the values follow a repeated cyclical pattern over a given number of time steps.
The method involves updating a forecast by combining the current observation with a weighted sum of past observations.
# Create an exponential smoothing model
ets_furnituremodel <- ets(furniture_ts)
# Forecast future values until the end of 2025
forecast_values_furniture <- forecast(ets_furnituremodel, h = 12 * (2019 - 2015 + 1))
# Plot the original time series and the forecast with skyblue line
autoplot(forecast_values_furniture, series = "Original") + ylab("Sales($)") + xlab("Date") +
autolayer(forecast_values_furniture, series = "Point Forecast", linetype = "solid", col = "coral") +
autolayer(forecast_values_furniture, series = "Lo 80", col = "coral") +
autolayer(forecast_values_furniture, series = "Hi 80", col = "coral") +
ggtitle("Exponential Smoothing Forecast for Furniture Sales ") +
scale_y_continuous(labels = scales::comma_format(scale = 1e-3, suffix = "K"))
# Create an exponential smoothing model
ets_officesuppliesmodel <- ets(officesupplies_ts)
# Forecast future values until the end of 2025
forecast_values_office <- forecast(ets_officesuppliesmodel, h = 12 * (2019 - 2015 + 1))
# Plot the original time series and the forecast with skyblue line
autoplot(forecast_values_office, series = "Original") + ylab("Sales($)") + xlab("Date") +
autolayer(forecast_values_office, series = "Point Forecast", linetype = "solid", col = "lightgreen") +
autolayer(forecast_values_office, series = "Lo 80", col = "lightgreen") +
autolayer(forecast_values_office, series = "Hi 80", col = "lightgreen") +
ggtitle("Exponential Smoothing Forecast for Office Supplies Sales ") +
scale_y_continuous(labels = scales::comma_format(scale = 1e-3, suffix = "K"))
# Create an exponential smoothing model
ets_model <- ets(technology_ts)
# Forecast future values until the end of 2025
forecast_values <- forecast(ets_model, h = 12 * (2019 - 2015 + 1))
# Plot the original time series and the forecast with skyblue line
autoplot(forecast_values, series = "Original") + ylab("Sales($)") + xlab("Date") +
autolayer(forecast_values, series = "Point Forecast", linetype = "solid", col = "skyblue") +
autolayer(forecast_values, series = "Lo 80", col = "skyblue") +
autolayer(forecast_values, series = "Hi 80", col = "skyblue") +
ggtitle("Exponential Smoothing Forecast for Technology Sales") +
scale_y_continuous(labels = scales::comma_format(scale = 1e-3, suffix = "K")) # Format y-axis labels
After applying the Exponential Smoothing Forecast, we can observe that the upward trend continues for each category with sales peaking in the last two months of the year and sales going back down in January. We are forecast that Furniture will generate revenue of around $310,000, Office supplies will generate around $240,000 and Technology will generate around $350,000 in 2020.