Since there is no summary available on this data, I feel this situation is similar to real life for many companies, who have little analysis in terms of their sales data. Most companies use basic tools for reporting but very few look at data cross sections to understand it better. I would like to conduct the analysis to help bring some insights into the functioning of this company.
These insights might help this company improve its sales and Profitability. Lets go!
# BONUS: Fetch the data from github...
library(RCurl)
## Loading required package: bitops
fileURL <- "https://raw.githubusercontent.com/rajk11040/RBridge/master/RetailSalesMarketingProfitCost.csv"
marketingDF <- read.csv(text = getURL(fileURL), header = TRUE, sep = ",")
#Code below for quick reading via local drive.. :)
#myFilePath = file.path("C:", "Users", "User", "Documents", "CUNY Bridge Program","R","RetailSalesMarketingProfitCost.csv" )
#marketingDF <- read.csv(file=myFilePath, header = TRUE, sep = ",")
#total observations
nrow(marketingDF)
## [1] 84672
#total columns
ncol(marketingDF)
## [1] 14
#understanding the structure of the dataset
str(marketingDF)
## 'data.frame': 84672 obs. of 14 variables:
## $ Year : int 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
## $ Product.line : Factor w/ 5 levels "Camping Equipment",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Product.type : Factor w/ 21 levels "Binoculars","Climbing Accessories",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Product : Factor w/ 144 levels "Aloe Relief",..: 139 139 139 139 139 139 139 139 139 139 ...
## $ Order.method.type: Factor w/ 7 levels "E-mail","Fax",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ Retailer.country : Factor w/ 21 levels "Australia","Austria",..: 21 5 14 4 12 13 6 16 1 15 ...
## $ Revenue : num 315044 13445 NA NA 181120 ...
## $ Planned.revenue : num 437477 14313 NA NA 235237 ...
## $ Product.cost : num 158372 6299 NA NA 89413 ...
## $ Quantity : int 66385 2172 NA NA 35696 NA 15205 7833 NA 14328 ...
## $ Unit.cost : num 2.55 2.9 NA NA 2.66 ...
## $ Unit.price : num 6.59 6.59 NA NA 6.59 NA 6.59 6.59 NA 6.59 ...
## $ Gross.profit : num 156673 7146 NA NA 91707 ...
## $ Unit.sale.price : num 5.2 6.19 NA NA 5.49 ...
#summary of the data set
summary(marketingDF)
## Year Product.line
## Min. :2004 Camping Equipment :24108
## 1st Qu.:2005 Golf Equipment : 8820
## Median :2006 Mountaineering Equipment:12348
## Mean :2006 Outdoor Protection : 8820
## 3rd Qu.:2006 Personal Accessories :30576
## Max. :2007
##
## Product.type Product
## Eyewear : 9408 Aloe Relief : 588
## Watches : 7644 Astro Pilot : 588
## Lanterns : 7056 Auto Pilot : 588
## Cooking Gear : 5880 Bear Edge : 588
## Navigation : 5880 Bear Survival Edge: 588
## Climbing Accessories: 4116 Bella : 588
## (Other) :44688 (Other) :81144
## Order.method.type Retailer.country Revenue
## E-mail :12096 Australia: 4032 Min. : 0
## Fax :12096 Austria : 4032 1st Qu.: 18579
## Mail :12096 Belgium : 4032 Median : 59867
## Sales visit:12096 Brazil : 4032 Mean : 189418
## Special :12096 Canada : 4032 3rd Qu.: 190193
## Telephone :12096 China : 4032 Max. :10054289
## Web :12096 (Other) :60480 NA's :59929
## Planned.revenue Product.cost Quantity Unit.cost
## Min. : 16 Min. : 6 Min. : 1 Min. : 0.85
## 1st Qu.: 19557 1st Qu.: 9432 1st Qu.: 328 1st Qu.: 11.43
## Median : 63907 Median : 32784 Median : 1043 Median : 36.83
## Mean : 198818 Mean : 111625 Mean : 3607 Mean : 84.89
## 3rd Qu.: 203996 3rd Qu.: 111371 3rd Qu.: 3288 3rd Qu.: 80.00
## Max. :10054289 Max. :6756853 Max. :313628 Max. :690.00
## NA's :59929 NA's :59929 NA's :59929 NA's :59929
## Unit.price Gross.profit Unit.sale.price
## Min. : 2.06 Min. : -18160 Min. : 0.00
## 1st Qu.: 23.00 1st Qu.: 8333 1st Qu.: 20.15
## Median : 66.77 Median : 25794 Median : 62.65
## Mean : 155.99 Mean : 77793 Mean : 147.23
## 3rd Qu.: 148.30 3rd Qu.: 78254 3rd Qu.: 140.96
## Max. :1359.72 Max. :3521098 Max. :1307.80
## NA's :59929 NA's :59929 NA's :59929
# print basic info to make sure data shows correctly
head(marketingDF,3)
## Year Product.line Product.type Product
## 1 2004 Camping Equipment Cooking Gear TrailChef Water Bag
## 2 2004 Camping Equipment Cooking Gear TrailChef Water Bag
## 3 2004 Camping Equipment Cooking Gear TrailChef Water Bag
## Order.method.type Retailer.country Revenue Planned.revenue
## 1 Telephone United States 315044.33 437477.15
## 2 Telephone Canada 13444.68 14313.48
## 3 Telephone Mexico NA NA
## Product.cost Quantity Unit.cost Unit.price Gross.profit Unit.sale.price
## 1 158371.8 66385 2.552857 6.59 156672.57 5.195714
## 2 6298.8 2172 2.900000 6.59 7145.88 6.190000
## 3 NA NA NA NA NA NA
So from the above we see that that data has 84672 observations of 14 variables - Year: Year of Observation - Product.line: 1st level of product hierarchy - Product.type: 2nd level of product hierarchy - Product: Product under the 2nd level hierarchy - Order.method.type: Order method like email, phone etc - Retailer.country: Country - Revenue: Actual revenue generated - Planned.revenue: Planned revenue - Product.cost: Total cost of the item - Quantity: Quntity of product sold - Unit.cost: Unit cost of each item - Unit.price: Unit Price of the each item - Gross.profit: Gross Profit for this item - Unit.sale.price: Unit sale price for each item
# Data cleanup
# Remove any rows that do not contain proper data
marketingDF <- marketingDF[complete.cases(marketingDF),]
#mean
mean(marketingDF$Revenue)
## [1] 189418.3
mean(marketingDF$Gross.profit)
## [1] 77793.11
#median
median(marketingDF$Revenue)
## [1] 59867.27
median(marketingDF$Gross.profit)
## [1] 25793.76
#install.packages("dplyr")
#dplyr will be used for data manipulation
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Simplify the column names in the data frame using dplyr
marketingDF <- rename(marketingDF,
Order.method = Order.method.type,
Country = Retailer.country)
# Arrange the products by their respective profits
profitbyproduct <- aggregate(Gross.profit~Product, marketingDF, mean)
# Sort products by highest first
profitbyproduct <- arrange(profitbyproduct, desc(Gross.profit))
# View the most profitable products in last 4 years mean
head(profitbyproduct)
## Product Gross.profit
## 1 Zone 414862.6
## 2 TX 403548.7
## 3 Maximus 390558.5
## 4 Infinity 373489.7
## 5 Retro 369931.6
## 6 Inferno 359893.4
#another way to aggregate Based on year and Product
profitbyproduct <- aggregate( Gross.profit ~ Year + Product , marketingDF , mean )
# Sort products by highest first
profitbyproduct <- arrange(profitbyproduct, desc(Year+Gross.profit))
# View the most profitable products in last 4 years mean
head(profitbyproduct)
## Year Product Gross.profit
## 1 2006 Infinity 720290.1
## 2 2006 Maximus 660768.2
## 3 2006 Zone 650529.2
## 4 2006 Inferno 624382.2
## 5 2006 TX 612577.8
## 6 2007 Inferno 602423.7
#profitbyproduct[which(profitbyproduct$Year==c(2006,2007)), ]
#Dump this dataframe into a local csv file to double check the results
#writeFilePath = file.path("C:", "Users", "User", "Documents", "CUNY Bridge Program","R","RetailSalesMarketingProfitCostClean.csv" )
#write.csv(file=writeFilePath, x=marketingDF)
#yearlySummary = group_by(marketingDF, Year)
#summarize(yearlySummary, sum(Revenue), sum(Planned.revenue), sum(Product.cost), sum(Gross.profit))
# More Data Wrangling so the we can look at summarization of these values
yearGroupDF = group_by(marketingDF, Year)
#summarize these values by the years
yearlySummaryDF=summarize(yearGroupDF, sum(Revenue), sum(Planned.revenue), sum(Product.cost), sum(Gross.profit))
yearlySummaryDF <- rename(yearlySummaryDF,
Revenue = "sum(Revenue)",
planned.revenue = "sum(Planned.revenue)",
Product.cost = "sum(Product.cost)",
Gross.profit = "sum(Gross.profit)")
yearlySummaryDF
## # A tibble: 4 x 5
## Year Revenue planned.revenue Product.cost Gross.profit
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 2004 914352804 968475138 546734895 367617909
## 2 2005 1159195590 1218177467 690511899 468683692
## 3 2006 1495891101 1560738880 872630528 623260572
## 4 2007 1117336274 1171950878 652063452 465272822
We draw a simple Scatterplot to understand how the revenue and the profit did Notice that this did not provide us a good understanding as there is too much overlap of dots to understand the data
#GGPlot2
#install.packages("ggplot2")
library(ggplot2)
# PLOT1:Scatter Plot: As overall revenue has risen, how did the profit for the company do ?
ggplot(marketingDF, aes(Revenue, Gross.profit, color=Year)) +
geom_point() +
theme_bw()+
labs(title="Profit Vs. Revenue by Year")+
theme(plot.title = element_text(hjust = 0.5))
# PLOT2:Scatter Plot: Which category contributed how muc to revenue and profits
g2 <- ggplot(marketingDF, aes(Revenue, Gross.profit, color=Year)) +
geom_point() +
theme_bw()+
labs(title="Profit Vs. Revenue By Product Line")+
theme(plot.title = element_text(hjust = 0.5))
g2 <- g2 + facet_wrap(~marketingDF$Product.line)
g2
This plot gives us the understanding the clearly Product Line - “Personal Accessories” is the biggest line for this company
Keeping up the subject of Product line
# Plot3: This bar shows all the product categories and their prevelance in each country
ggplot(marketingDF, aes(Country, fill=Product.line)) +
geom_bar() +
theme_bw()+
labs(title="Product Line by Country")+
theme(plot.title = element_text(hjust = 0.5))
This graph shows how many sales were recorded for each product line by the country
Lets now visualize how the Product line and Profit Margins for these lines worked across the countries
#Plot4: SCatterPlot
ggplot(marketingDF, aes(x=Country, y=Gross.profit, shape=Product.line, color=Product.line)) +
geom_point() +
theme_bw()+
labs(title="Profit Vs. Revenue By Product Line")+
theme(plot.title = element_text(hjust = 0.5))
As visibile in this graph that the most important Countries were 1. USA 2. Japan 3. UK, China and Canada coming close 3rd Its again also clear the in most of the countries Product Line “Personal Accessories” seems to do the best
Now lets draw a Freq Polygon to see how the unit price of the items exist in the data set
# Plot5: Frequency Polygon
# Shows how the Unit Cost of the item varies
ggplot(marketingDF, aes(x=Unit.price)) +
geom_freqpoly(binwidth=50, col="blue") +
theme_bw() +
labs(title="Unit Prices for Products")+
theme(plot.title = element_text(hjust = 0.5))
As seen from the graph, The Products have a wide range of unit prices. While most products are priced under $250. There are items in the dataset that are price close to $1400.
Lets also evaluate how sales order types (emails, web, phone etc) impacted these categories
# Plot 6: This bar shows how most of the customers ordered products by country
# Clearly Web was the most popular means of ordering followed by Sales Visits and then
ggplot(marketingDF, aes(Country, fill=Order.method)) +
geom_bar() +
theme_bw() +
labs(title="Sales Channels (Order Type) by Country")+
theme(plot.title = element_text(hjust = 0.5))
Lets plot the revenue by year using a bar plot to understand how the revenue was across the various year
# PLOT 7: Box Plots
# Understand how a continous variable changes in respect to a categorical variable
# install.packages("gridExtra")
require(gridExtra)
## Loading required package: gridExtra
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
RevByYear <- ggplot(marketingDF, aes(x=factor(Year), weights=Revenue)) +
geom_bar(col="green") +
theme_bw() +
labs(title="Revenue by Year", x="Year", y="Dollar Amount") +
theme(plot.title = element_text(hjust = 0.5))
RevByYear
#PLOT 8:
RevByYearByPL <- ggplot(marketingDF, aes(x=factor(Year), weights=Gross.profit)) +
geom_bar() +
theme_bw() +
facet_grid(Product.line ~ .) +
labs(title="Revenue by Product Lines", x="Year", y="Dollar Amount") +
theme(plot.title = element_text(hjust = 0.5))
RevByYearByPL
# Use Grid to add both graphs together in the same grid
#PLOT 9:
grid.arrange(RevByYear, RevByYearByPL, ncol=2)
Wow!! The above graph gives us a very clean breakdown of how Revenue performed Year over year for 4 years. The second chart also gives us facets of the same by Product line.
Lets now Grapth below a comparison of how the key financial figures - Actual Revenue - Planned Revenue - Total Product Cost - Gross Profit behaved year over year. This will give us understanding if there is any correlation between these numbers
# GGPlot for merging Major financial values by year on single frame
library(reshape2)
yearGroupDF = group_by(marketingDF, Year)
yearlySummaryDF=summarize(yearGroupDF, sum(Revenue), sum(Planned.revenue), sum(Product.cost), sum(Gross.profit))
yearlySummaryDF <- rename(yearlySummaryDF,
Revenue = "sum(Revenue)",
Planned.revenue = "sum(Planned.revenue)",
Product.cost = "sum(Product.cost)",
Gross.profit = "sum(Gross.profit)")
yearlySummaryDF
## # A tibble: 4 x 5
## Year Revenue Planned.revenue Product.cost Gross.profit
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 2004 914352804 968475138 546734895 367617909
## 2 2005 1159195590 1218177467 690511899 468683692
## 3 2006 1495891101 1560738880 872630528 623260572
## 4 2007 1117336274 1171950878 652063452 465272822
tempDF <- melt(yearlySummaryDF, id.vars="Year")
# Everything on the same plot
# Plot 10:
ggplot(tempDF, aes(Year,value, col=variable)) +
geom_point() +
stat_smooth() +
theme_bw() +
labs(title="Key Financials by Year",y="Gross Profit")
## `geom_smooth()` using method = 'loess'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 2004
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.015
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 4.0602
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 2004
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 2.015
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 4.0602
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 2004
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.015
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 4.0602
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 2004
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 2.015
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 4.0602
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 2004
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.015
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 4.0602
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 2004
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 2.015
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 4.0602
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 2004
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.015
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 4.0602
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 2004
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 2.015
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 4.0602
Since these all tracked very close to each other and there were dips in each one of these in 2007, clearly there were other factors in the sales that were responsible for the drop in profit for the company
Lets look at Product Price Level Comparison
# Plot 11: Unit Sale price of items
# understand diversity of the product assortment in this data
# this shows the item prices range from close to 0 to around $1400
ggplot(marketingDF, aes(Unit.price)) +
geom_histogram(binwidth=10, fill="palegreen4", col="green") +
theme_bw()+
labs(title="Unit Sale Prices by Products")+
theme(plot.title = element_text(hjust = 0.5))
Lets now see how the top 10 items contributed to the profit
#another way to aggregate Based on year and Product
profitbyproductDF <- aggregate( Gross.profit ~ Year + Product , marketingDF , mean )
profitbyproductDF <- arrange(profitbyproductDF, desc(Year+Gross.profit))
head(profitbyproductDF)
## Year Product Gross.profit
## 1 2006 Infinity 720290.1
## 2 2006 Maximus 660768.2
## 3 2006 Zone 650529.2
## 4 2006 Inferno 624382.2
## 5 2006 TX 612577.8
## 6 2007 Inferno 602423.7
# Boxplot grouping top 10 items per year for easier analysis
top2004items <- profitbyproductDF[which(profitbyproductDF$Year == 2004),]
top2004items <- top2004items[1:10,]
top2005items <- profitbyproductDF[which(profitbyproductDF$Year == 2005),]
top2005items <- top2005items[1:10,]
top2006items <- profitbyproductDF[which(profitbyproductDF$Year == 2006),]
top2006items <- top2006items[1:10,]
top2007items <- profitbyproductDF[which(profitbyproductDF$Year == 2007),]
top2007items <- top2007items[1:10,]
topitemsDF <- union(top2004items, union(top2005items, union(top2006items, top2007items)))
nrow(topitemsDF)
## [1] 40
topitemsDF <- mutate(topitemsDF, Year=as.factor(Year))
#Plot 12: Box Plot for top items across the years
ggplot(topitemsDF, aes(Product, Gross.profit)) +
geom_boxplot() +
theme_bw() +
labs(title="Top Items vs. Profit")+
theme(plot.title = element_text(hjust = 0.5))
#Top 10 item by Gross Profit
topitemsDF <- group_by(topitemsDF, Product)
topitemsDF <- summarize(topitemsDF, sum(Gross.profit))
topitemsDF <- rename(topitemsDF, Gross.profit = "sum(Gross.profit)")
topitemsDF <- arrange(topitemsDF, desc(Gross.profit))
topitemsDF
## # A tibble: 15 x 2
## Product Gross.profit
## <fctr> <dbl>
## 1 Zone 1788581
## 2 Maximus 1751140
## 3 TX 1698563
## 4 Infinity 1595949
## 5 Inferno 1516148
## 6 Hailstorm Titanium Woods Set 1296648
## 7 Retro 1009365
## 8 Legend 1003858
## 9 Cat Eye 758239
## 10 Hawk Eye 710121
## 11 Venue 477406
## 12 Hailstorm Titanium Irons 460765
## 13 Dante 420196
## 14 Fairway 363215
## 15 Star Lite 149333
We need to do analysis of the all the top items from year 2006 and 2007 to find out, if there is significant anamoly in the products which could point to reason for lower revenue and profits in 2007.
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
#This function compares the dataframe of products and identifies missing products
compareYears <- function(a1, a2){
a1NotIna2 <- sqldf('SELECT Product FROM a1 EXCEPT SELECT Product FROM a2')
missinga2item <- as.vector(a1NotIna2$Product)
a1extra <- a1[which(a1[ ,2] %in% missinga2item),]
a2NotIna1 <- sqldf('SELECT Product FROM a2 EXCEPT SELECT Product FROM a1')
missinga1item <- as.vector(a2NotIna1$Product)
a2extra <- a2[which(a2[ ,2] %in% missinga1item),]
if(sum(a1extra$Gross.profit) > sum(a2extra$Gross.profit)){
print("2006 more profitable for top 10 items by")
return (sum(a1extra$Gross.profit) - sum(a2extra$Gross.profit))
}else{
print("2007 more profitable for top 10 items by ")
return (sum(a2extra$Gross.profit) - sum(a1extra$Gross.profit))
}
}
compareYears(top2006items, top2007items)
## [1] "2006 more profitable for top 10 items by"
## [1] 83370.71
This comparison points to some reasons why the revenue declined in 2007. Many of the items that did not sell well in 2007 were high value items with greater profit margin. Just the top 10 items comparison show the loss of profit of around $83370.
The following conclusions could be generated from the above analysis - Revenue Drop in year 2007: The above company experienced drop in revenue as well as drop in profit in the last year of observation. Based on above, the sales revenue (planned and actual), seemed to dropped across all categories evenly (Plot 10). As the planned revenue also dropped, it is expected that the company was aware of this anticipated drop, due to reason unknown for this dataset.
The following reasons can be still infered from the dataset 1. Based on Plot 8 and 9: The Sales of Product category “Outdoor Protection” seem to decline significantly to near zero in 2007 2. Based on Plot 4: The sales of some higher value Products seem to decline from year 2006 to 2007. These brought the overall revenue and profit down 3. - Highest Profitable Product Categories were identfied as Product Categories were - Personal Accessories - Camping Equipment
Highest Profitable Product Categories were identfied as -Product Gross.profit -1 Zone 1788581 -2 Maximus 1751140 -3 TX 1698563 -4 Infinity 1595949 -5 Inferno 1516148 -6 Hailstorm Titanium Woods Set 1296648 -7 Retro 1009365 -8 Legend 1003858 -9 Cat Eye 758239 -10 Hawk Eye 710121
Highest Profitable Order method was identified as “Web”" as it generated the greatest profit. It contributed nearly 50% of sales in top categories.
Further Analysis: Additonal profitability analysis could also be done on the products by the sale channel (order type) to create recommendation for the sales team on which products to suggest more to customers.