The primary aim of this project is to perform an in-depth analysis of retail sales data. By examining various aspects, we seek to derive actionable insights that could assist in informed decision-making.
company <- read.csv("retail.csv")
head(company)
## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 1 1 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520
## 2 2 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520
## 3 3 CA-2016-138688 6/12/16 6/16/16 Second Class DV-13045
## 4 4 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335
## 5 5 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335
## 6 6 CA-2014-115812 6/9/14 6/14/14 Standard Class BH-11710
## Segment Product.ID Category Sub.Category
## 1 Consumer FUR-BO-10001798 Furniture Bookcases
## 2 Consumer FUR-CH-10000454 Furniture Chairs
## 3 Corporate OFF-LA-10000240 Office Supplies Labels
## 4 Consumer FUR-TA-10000577 Furniture Tables
## 5 Consumer OFF-ST-10000760 Office Supplies Storage
## 6 Consumer FUR-FU-10001487 Furniture Furnishings
## Product.Name Sales
## 1 Bush Somerset Collection Bookcase 261.9600
## 2 Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back 731.9400
## 3 Self-Adhesive Address Labels for Typewriters by Universal 14.6200
## 4 Bretford CR4500 Series Slim Rectangular Table 957.5775
## 5 Eldon Fold 'N Roll Cart System 22.3680
## 6 Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood 48.8600
## Quantity Discount Profit
## 1 2 0.00 41.9136
## 2 3 0.00 219.5820
## 3 2 0.00 6.8714
## 4 5 0.45 -383.0310
## 5 2 0.20 2.5164
## 6 7 0.00 14.1694
tail(company)
## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 9989 9989 CA-2017-163629 11/17/17 11/21/17 Standard Class RA-19885
## 9990 9990 CA-2014-110422 1/21/14 1/23/14 Second Class TB-21400
## 9991 9991 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060
## 9992 9992 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060
## 9993 9993 CA-2017-121258 2/26/17 3/3/17 Standard Class DB-13060
## 9994 9994 CA-2017-119914 5/4/17 5/9/17 Second Class CC-12220
## Segment Product.ID Category Sub.Category
## 9989 Corporate TEC-PH-10004006 Technology Phones
## 9990 Consumer FUR-FU-10001889 Furniture Furnishings
## 9991 Consumer FUR-FU-10000747 Furniture Furnishings
## 9992 Consumer TEC-PH-10003645 Technology Phones
## 9993 Consumer OFF-PA-10004041 Office Supplies Paper
## 9994 Consumer OFF-AP-10002684 Office Supplies Appliances
## Product.Name
## 9989 Panasonic KX - TS880B Telephone
## 9990 Ultra Door Pull Handle
## 9991 Tenex B1-RE Series Chair Mats for Low Pile Carpets
## 9992 Aastra 57i VoIP phone
## 9993 It's Hot Message Books with Stickers, 2 3/4" x 5"
## 9994 Acco 7-Outlet Masterpiece Power Center, Wihtout Fax/Phone Line Protection
## Sales Quantity Discount Profit
## 9989 206.100 5 0.0 55.6470
## 9990 25.248 3 0.2 4.1028
## 9991 91.960 2 0.0 15.6332
## 9992 258.576 2 0.2 19.3932
## 9993 29.600 4 0.0 13.3200
## 9994 243.160 2 0.0 72.9480
dim(company)
## [1] 9994 15
names(company)
## [1] "Row.ID" "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode"
## [6] "Customer.ID" "Segment" "Product.ID" "Category" "Sub.Category"
## [11] "Product.Name" "Sales" "Quantity" "Discount" "Profit"
Check data type for each column
str(company)
## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
# Converting columns to appropriate data types
company$Row.ID <- as.character(company$Row.ID)
company$Order.ID <- as.character(company$Order.ID)
company$Order.Date <- as.Date(company$Order.Date, "%m/%d/%y")
company$Ship.Date <- as.Date(company$Ship.Date, "%m/%d/%y")
company$Customer.ID <- as.character(company$Customer.ID)
company$Product.ID <- as.character(company$Product.ID)
company$Product.Name <- as.character(company$Product.Name)
# Inspecting the structure of the dataframe after changes
str(company)
## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : chr "1" "2" "3" "4" ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
## $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
colSums(is.na(company))
## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 0 0 0 0 0 0
## Segment Product.ID Category Sub.Category Product.Name Sales
## 0 0 0 0 0 0
## Quantity Discount Profit
## 0 0 0
anyNA(company)
## [1] FALSE
summary (company)
## Row.ID Order.ID Order.Date
## Length:9994 Length:9994 Min. :2014-01-03
## Class :character Class :character 1st Qu.:2015-05-23
## Mode :character Mode :character Median :2016-06-26
## Mean :2016-04-30
## 3rd Qu.:2017-05-14
## Max. :2017-12-30
## Ship.Date Ship.Mode Customer.ID Segment
## Min. :2014-01-07 Length:9994 Length:9994 Length:9994
## 1st Qu.:2015-05-27 Class :character Class :character Class :character
## Median :2016-06-29 Mode :character Mode :character Mode :character
## Mean :2016-05-03
## 3rd Qu.:2017-05-18
## Max. :2018-01-05
## Product.ID Category Sub.Category Product.Name
## Length:9994 Length:9994 Length:9994 Length:9994
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Sales Quantity Discount Profit
## Min. : 0.444 Min. : 1.00 Min. :0.0000 Min. :-6599.978
## 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729
## Median : 54.490 Median : 3.00 Median :0.2000 Median : 8.666
## Mean : 229.858 Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :22638.480 Max. :14.00 Max. :0.8000 Max. : 8399.976
We thoroughly inspected the company dataset, which
consists of 9,994 observations across various retail metrics. Key
columns were converted to appropriate data types, such as character and
Date, to align with the nature of the data they represent.
Order.Date field.aggregate(Profit~Segment,company,mean)
## Segment Profit
## 1 Consumer 25.83687
## 2 Corporate 30.45667
## 3 Home Office 33.81866
aggregate(Profit~Segment,company,var)
## Segment Profit
## 1 Consumer 58851.86
## 2 Corporate 53817.63
## 3 Home Office 45105.89
aggregate(Profit~Segment,company,sd)
## Segment Profit
## 1 Consumer 242.5940
## 2 Corporate 231.9863
## 3 Home Office 212.3815
boxplot(company$Profit)
Mean Profit
Profit Variability
1. Which Product Category Generates the Most Revenue?
# Aggregate Sales by Product Category
category_sales <- aggregate(Sales ~ Category, company, sum)
# Sort by Sales in Descending Order
category_sales <- category_sales[order(-category_sales$Sales),]
category_sales
## Category Sales
## 3 Technology 836154.0
## 1 Furniture 741999.8
## 2 Office Supplies 719047.0
Answer: The Technology category stands out as the most lucrative, suggesting that products in this category are either higher-priced or higher in demand compared to other categories.
2. What Is the Seasonal Trend in Sales?
# Convert Order.Date to month-year format
company$Order.MonthYear <- format(company$Order.Date, "%Y-%m")
# Aggregate Sales by Month-Year
monthly_sales <- aggregate(Sales ~ Order.MonthYear, company, sum)
monthly_sales
## Order.MonthYear Sales
## 1 2014-01 14236.895
## 2 2014-02 4519.892
## 3 2014-03 55691.009
## 4 2014-04 28295.345
## 5 2014-05 23648.287
## 6 2014-06 34595.128
## 7 2014-07 33946.393
## 8 2014-08 27909.468
## 9 2014-09 81777.351
## 10 2014-10 31453.393
## 11 2014-11 78628.717
## 12 2014-12 69545.621
## 13 2015-01 18174.076
## 14 2015-02 11951.411
## 15 2015-03 38726.252
## 16 2015-04 34195.209
## 17 2015-05 30131.686
## 18 2015-06 24797.292
## 19 2015-07 28765.325
## 20 2015-08 36898.332
## 21 2015-09 64595.918
## 22 2015-10 31404.924
## 23 2015-11 75972.564
## 24 2015-12 74919.521
## 25 2016-01 18542.491
## 26 2016-02 22978.815
## 27 2016-03 51715.875
## 28 2016-04 38750.039
## 29 2016-05 56987.728
## 30 2016-06 40344.534
## 31 2016-07 39261.963
## 32 2016-08 31115.374
## 33 2016-09 73410.025
## 34 2016-10 59687.745
## 35 2016-11 79411.966
## 36 2016-12 96999.043
## 37 2017-01 43971.374
## 38 2017-02 20301.133
## 39 2017-03 58872.353
## 40 2017-04 36521.536
## 41 2017-05 44261.110
## 42 2017-06 52981.726
## 43 2017-07 45264.416
## 44 2017-08 63120.888
## 45 2017-09 87866.652
## 46 2017-10 77776.923
## 47 2017-11 118447.825
## 48 2017-12 83829.319
3. Which Customer Segment Is Most Profitable?
# Aggregate Profit by Customer Segment
segment_profit <- aggregate(Profit ~ Segment, company, mean)
segment_profit
## Segment Profit
## 1 Consumer 25.83687
## 2 Corporate 30.45667
## 3 Home Office 33.81866
Answer: The Home Office segment not only has the highest average profit but also leads by a noticeable margin compared to the Corporate and Consumer segments.
4. Are Discounts Affecting Profits?
correlation <- cor(company$Discount, company$Profit)
correlation
## [1] -0.2194875
Answer: The correlation is negative, indicating that as discounts increase, profits tend to decrease.
5. How Do Sales Vary by Category and Sub-Category?
library(ggplot2)
# Create a cross-tabulation of Sales by Category and Sub-Category
sales_xtabs <- xtabs(Sales ~ Category + Sub.Category, data=company)
# Generate the heatmap
ggplot(as.data.frame(as.table(sales_xtabs)), aes(x=Category, y=Sub.Category)) +
geom_tile(aes(fill=Freq), color="white") +
scale_fill_gradient(low="white", high="red") +
theme_minimal() +
labs(title="Heatmap of Sales by Category and Sub-Category",
x="Category",
y="Sub-Category",
fill="Sales")
Answer:
Hot Spots: Areas with a darker shade of red on the heatmap indicate higher sales. These are the combinations of Categories and Sub-Categories that are the most lucrative.
Cool Spots: Areas with lighter colors represent lower sales. These could be areas for improvement or may indicate niche products.
6. How Does Product Quantity Vary Across Different Customer Segments and Product Categories?
# Create a cross-tabulation of Quantity by Segment and Category
quantity_xtabs <- xtabs(Quantity ~ Segment + Category, data=company)
# Generate the plot
plot(quantity_xtabs, main="Cross-Tabulation of Quantity by Segment and Category", xlab="Segment", ylab="Category")
7. Is There a Relationship Between Discount and Quantity?
# Scatter plot to explore the relationship between Discount and Quantity
plot(company$Discount, company$Quantity, main="Scatter Plot of Discount vs Quantity", xlab="Discount", ylab="Quantity")
8. What Are the Trends in Profit Over Time?
# Convert Order.Date to a Date object and sort
company$Order.Date <- as.Date(company$Order.Date)
sorted_company <- company[order(company$Order.Date),]
# Plot Profit over time
plot(sorted_company$Order.Date, sorted_company$Profit, type="l", main="Profit Trends Over Time", xlab="Time", ylab="Profit")
9. What Is the Average Profit per Product?
# Aggregate Profit by Product Name
product_profit <- aggregate(Profit ~ Product.Name, company, mean)
# Generate a plot (showing top 10 profitable products for readability)
top_products <- head(product_profit[order(-product_profit$Profit),], 10)
barplot(top_products$Profit, names.arg=top_products$Product.Name, las=2, main="Top 10 Profitable Products", xlab="Product", ylab="Average Profit", cex.names = 0.4)
Overview
We have conducted a comprehensive analysis of the ‘company’ dataset, focusing on key business questions related to revenue generation, customer segmentation, seasonality, and discounts. Our findings offer several actionable insights that can significantly influence business strategies.
Key Findings
Most Profitable Category: Technology leads in revenue, suggesting a high demand or premium pricing for these products.
Seasonal Trends: Sales peak in March and September, with a noticeable dip in February, indicating seasonality in customer buying behavior.
Profitable Customer Segments: The Home Office segment is the most profitable, suggesting a focused customer engagement strategy for this segment could be beneficial.
Impact of Discounts: A weak negative correlation between discounts and profits suggests that while discounts may boost sales, they should be managed carefully to not erode profit margins.
Sales Variation by Category and Sub-Category: The heatmap provides a visual representation of how sales are distributed across different Categories and Sub-Categories, highlighting areas of both strength and weakness.
Business Recommendations
Invest in Technology: Given its high revenue, further investment in technology products, either through increased inventory or marketing, is recommended.
Seasonal Marketing Campaigns: Implement targeted marketing campaigns during peak sales months like March and September, and consider promotions to boost sales during slow months like February.
Focus on Home Office Segment: Develop tailored marketing campaigns and special offers targeted at the Home Office customer segment to capitalize on its higher profitability.
Strategic Discounting: Use data analytics to offer targeted discounts to specific customer segments or on particular product categories, rather than broad discounting which may affect profits.
Product Bundling and Cross-Selling: Utilize the insights from the heatmap to create bundles combining high-sales and low-sales Sub-Categories, aiming to uplift the sales of the latter.
Data-Driven Decision-Making: Continue leveraging data analytics to refine these strategies over time, measuring the impact of any changes to ensure they are producing the desired outcomes.
By adopting these recommendations, the company can aim for a balanced approach to boost revenue, increase profitability, and improve customer engagement.