‘Retail Sales Marketing Profit Cost’ has been collected from the sales section of the website of IBM Watson analytics.IBM (International Business Machines Corporation) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries.IBM manufactures and markets computer hardware, middleware and software, and provides hosting and consulting services in areas ranging from mainframe computers to nanotechnology. IBM is also a major research organization, holding the record for most patents generated by a business (as of 2018) for 25 consecutive years.IBM has continually shifted its business mix by commoditizing markets focusing on higher-value, more profitable markets.IBM says that Watson Analytics offering is intended to to provide the benefits of advanced analytics without the complexity.
This paper addresses the issues concerning the ‘Gross Profit’ with respect to the year and order method and product types.
The specific objective of this Study was to investigate the variation of the gross profit with respect to the Year,product type and order method type.This study analyzed the gross profit and also analyzed the comparison between planned revenue and revenue of the product.
Here we expect to review product-related information like Cost, Revenue, Price, etc. across Years and Ordering Method.This dataset can also be used in the Explore feature to better understand the hidden trends & patterns.
For this study, we collected data from the sales section in the website of IBM Watson Analytics(https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/).IBM Watson Analytics provides us some real data sets to analyze those in many different ways, to uncover insights that can have a profound effect on business. It could also help getting the track of marketing strategy.
In this data set we will analyze how the gross profit of the company is changing yearly according to their product lines, product type, order method, unit sale price, product cost etc.
We proposed the best fit model here :
Gross.profit = b0 + b1Year + b2Product.line + b3Product.type + b4Order.method.type + b5Revenue + b6Quantity + b7Unit.cost + b8Unit.sale.price
retail.df <- read.csv(paste("RetailSales.csv", sep=""))
attach(retail.df)
M1 <- Gross.profit ~ Year + Product.line + Product.type + Order.method.type + Revenue + Quantity + Unit.cost + Unit.sale.price
f1 <- lm(M1, data = retail.df)
summary(f1)
##
## Call:
## lm(formula = M1, data = retail.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -673249 -6267 -351 5057 632447
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) -8.113e+05 3.673e+05 -2.209
## Year 3.983e+02 1.832e+02 2.175
## Product.lineGolf Equipment -9.208e+03 2.173e+03 -4.238
## Product.lineMountaineering Equipment 1.478e+04 1.647e+03 8.973
## Product.lineOutdoor Protection 1.665e+04 1.648e+03 10.099
## Product.linePersonal Accessories 1.153e+04 1.524e+03 7.569
## Product.typeClimbing Accessories 4.176e+03 1.375e+03 3.037
## Product.typeCooking Gear 1.246e+04 1.452e+03 8.579
## Product.typeEyewear -4.920e+02 1.119e+03 -0.440
## Product.typeFirst Aid -5.830e+03 1.307e+03 -4.460
## Product.typeGolf Accessories 2.660e+04 2.403e+03 11.068
## Product.typeInsect Repellents 4.491e+03 1.305e+03 3.440
## Product.typeIrons 1.577e+04 1.712e+03 9.210
## Product.typeKnives -9.449e+02 1.241e+03 -0.761
## Product.typeLanterns 1.246e+04 1.465e+03 8.505
## Product.typeNavigation -2.798e+03 1.234e+03 -2.268
## Product.typePacks 1.001e+04 1.401e+03 7.146
## Product.typePutters 3.213e+04 2.476e+03 12.975
## Product.typeRope -9.106e+02 1.716e+03 -0.531
## Product.typeSafety -6.357e+03 1.584e+03 -4.012
## Product.typeSleeping Bags 1.098e+04 1.450e+03 7.570
## Product.typeSunscreen NA NA NA
## Product.typeTents NA NA NA
## Product.typeTools NA NA NA
## Product.typeWatches 4.195e+03 1.142e+03 3.673
## Product.typeWoods NA NA NA
## Order.method.typeFax -6.615e+02 9.385e+02 -0.705
## Order.method.typeMail -6.971e+02 1.067e+03 -0.653
## Order.method.typeSales visit -5.205e+02 7.789e+02 -0.668
## Order.method.typeSpecial 5.059e+02 1.222e+03 0.414
## Order.method.typeTelephone 1.447e+02 7.936e+02 0.182
## Order.method.typeWeb 4.505e+03 7.119e+02 6.328
## Revenue 3.988e-01 6.373e-04 625.803
## Quantity -2.634e-01 2.726e-02 -9.663
## Unit.cost -7.364e+02 1.477e+01 -49.866
## Unit.sale.price 4.386e+02 8.627e+00 50.847
## Pr(>|t|)
## (Intercept) 0.027215 *
## Year 0.029655 *
## Product.lineGolf Equipment 2.26e-05 ***
## Product.lineMountaineering Equipment < 2e-16 ***
## Product.lineOutdoor Protection < 2e-16 ***
## Product.linePersonal Accessories 3.90e-14 ***
## Product.typeClimbing Accessories 0.002389 **
## Product.typeCooking Gear < 2e-16 ***
## Product.typeEyewear 0.660234
## Product.typeFirst Aid 8.24e-06 ***
## Product.typeGolf Accessories < 2e-16 ***
## Product.typeInsect Repellents 0.000582 ***
## Product.typeIrons < 2e-16 ***
## Product.typeKnives 0.446418
## Product.typeLanterns < 2e-16 ***
## Product.typeNavigation 0.023361 *
## Product.typePacks 9.19e-13 ***
## Product.typePutters < 2e-16 ***
## Product.typeRope 0.595745
## Product.typeSafety 6.04e-05 ***
## Product.typeSleeping Bags 3.87e-14 ***
## Product.typeSunscreen NA
## Product.typeTents NA
## Product.typeTools NA
## Product.typeWatches 0.000240 ***
## Product.typeWoods NA
## Order.method.typeFax 0.480917
## Order.method.typeMail 0.513625
## Order.method.typeSales visit 0.503988
## Order.method.typeSpecial 0.678894
## Order.method.typeTelephone 0.855360
## Order.method.typeWeb 2.52e-10 ***
## Revenue < 2e-16 ***
## Quantity < 2e-16 ***
## Unit.cost < 2e-16 ***
## Unit.sale.price < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29010 on 24711 degrees of freedom
## (59929 observations deleted due to missingness)
## Multiple R-squared: 0.9664, Adjusted R-squared: 0.9663
## F-statistic: 2.292e+04 on 31 and 24711 DF, p-value: < 2.2e-16
Here we established the effect of the year, product line, product type, order method type, revenue, quantity, unit cost, unit sale price on the gross profit of the company.We regressed gross profit on the basis of those variables.We estimated model, using linear least squares.The best fit model will be always the model with greatest adjusted R squared value.
Here we contructed two models M1 and M2. Now comparing their adjusted R squared value we got the best model M1 as it’s R squared value is 0.9663 which is greatest. So we can see that gross profit is highly dependent on all those x variables in model 1. The p-value is also here <0.05.
Our aim was here to focus on the changes of gross profit according to those varibles. According to our analysis in 2006 the gross profit and the revenue were the highest, and we could also find that the gross profit is also highest when the order method is Web. The unique contribution of this paper is that we investigated the gross profit and how it is being changed with revenue, sale price, cost price, quantity, order method, product type.
Sales section in the website of IBM Watson Analytics :
https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/
retail.df <- read.csv(paste("RetailSales.csv", sep=""))
attach(retail.df)
## The following objects are masked from retail.df (pos = 3):
##
## Gross.profit, Order.method.type, Planned.revenue, Product,
## Product.cost, Product.line, Product.type, Quantity,
## Retailer.country, Revenue, Unit.cost, Unit.price,
## Unit.sale.price, Year
library(psych)
describe(retail.df)
## vars n mean sd median trimmed
## Year 1 84672 2005.50 1.12 2005.50 2005.50
## Product.line* 2 84672 3.15 1.66 3.00 3.19
## Product.type* 3 84672 10.51 6.23 10.00 10.39
## Product* 4 84672 72.50 41.57 72.50 72.50
## Order.method.type* 5 84672 4.00 2.00 4.00 4.00
## Retailer.country* 6 84672 11.00 6.06 11.00 11.00
## Revenue 7 24743 189418.25 390750.90 59867.27 106149.19
## Planned.revenue 8 24743 198817.54 402535.53 63906.84 112896.09
## Product.cost 9 24743 111625.14 238415.56 32783.72 61046.41
## Quantity 10 24743 3606.56 8777.72 1043.00 1829.23
## Unit.cost 11 24743 84.89 131.11 36.83 51.88
## Unit.price 12 24743 155.99 246.80 66.77 94.20
## Gross.profit 13 24743 77793.11 158122.34 25793.76 44033.05
## Unit.sale.price 14 24743 147.23 232.05 62.65 89.23
## mad min max range skew
## Year 1.48 2004.00 2007.00 3.00 0.00
## Product.line* 2.97 1.00 5.00 4.00 -0.14
## Product.type* 8.90 1.00 21.00 20.00 0.16
## Product* 53.37 1.00 144.00 143.00 0.00
## Order.method.type* 2.97 1.00 7.00 6.00 0.00
## Retailer.country* 7.41 1.00 21.00 20.00 0.00
## Revenue 75285.35 0.00 10054289.00 10054289.00 6.90
## Planned.revenue 80672.73 15.69 10054289.00 10054273.31 6.68
## Product.cost 42496.83 5.76 6756852.78 6756847.02 7.23
## Quantity 1294.31 1.00 313628.00 313627.00 9.50
## Unit.cost 43.48 0.85 690.00 689.15 2.45
## Unit.price 77.10 2.06 1359.72 1357.66 2.72
## Gross.profit 31742.48 -18159.60 3521097.76 3539257.36 6.90
## Unit.sale.price 72.14 0.00 1307.80 1307.80 2.70
## kurtosis se
## Year -1.36 0.00
## Product.line* -1.62 0.01
## Product.type* -1.27 0.02
## Product* -1.20 0.14
## Order.method.type* -1.25 0.01
## Retailer.country* -1.21 0.02
## Revenue 85.05 2484.13
## Planned.revenue 79.43 2559.05
## Product.cost 96.19 1515.68
## Quantity 175.46 55.80
## Unit.cost 5.72 0.83
## Unit.price 7.74 1.57
## Gross.profit 83.84 1005.23
## Unit.sale.price 7.62 1.48
mytable <- with(retail.df, table(Order.method.type))
mytable
## Order.method.type
## E-mail Fax Mail Sales visit Special Telephone
## 12096 12096 12096 12096 12096 12096
## Web
## 12096
mytable2 <- with(retail.df, table(Retailer.country))
mytable2
## Retailer.country
## Australia Austria Belgium Brazil Canada
## 4032 4032 4032 4032 4032
## China Denmark Finland France Germany
## 4032 4032 4032 4032 4032
## Italy Japan Korea Mexico Netherlands
## 4032 4032 4032 4032 4032
## Singapore Spain Sweden Switzerland United Kingdom
## 4032 4032 4032 4032 4032
## United States
## 4032
ProductlinebyOrdertype <- xtabs(~ retail.df$Product.line + retail.df$Order.method.type , data=retail.df)
ProductlinebyOrdertype
## retail.df$Order.method.type
## retail.df$Product.line E-mail Fax Mail Sales visit Special Telephone
## Camping Equipment 3444 3444 3444 3444 3444 3444
## Golf Equipment 1260 1260 1260 1260 1260 1260
## Mountaineering Equipment 1764 1764 1764 1764 1764 1764
## Outdoor Protection 1260 1260 1260 1260 1260 1260
## Personal Accessories 4368 4368 4368 4368 4368 4368
## retail.df$Order.method.type
## retail.df$Product.line Web
## Camping Equipment 3444
## Golf Equipment 1260
## Mountaineering Equipment 1764
## Outdoor Protection 1260
## Personal Accessories 4368
addmargins(ProductlinebyOrdertype)
## retail.df$Order.method.type
## retail.df$Product.line E-mail Fax Mail Sales visit Special
## Camping Equipment 3444 3444 3444 3444 3444
## Golf Equipment 1260 1260 1260 1260 1260
## Mountaineering Equipment 1764 1764 1764 1764 1764
## Outdoor Protection 1260 1260 1260 1260 1260
## Personal Accessories 4368 4368 4368 4368 4368
## Sum 12096 12096 12096 12096 12096
## retail.df$Order.method.type
## retail.df$Product.line Telephone Web Sum
## Camping Equipment 3444 3444 24108
## Golf Equipment 1260 1260 8820
## Mountaineering Equipment 1764 1764 12348
## Outdoor Protection 1260 1260 8820
## Personal Accessories 4368 4368 30576
## Sum 12096 12096 84672
ProducttypebyOrdertype <- xtabs(~ retail.df$Product.type + retail.df$Order.method.type , data=retail.df)
ProducttypebyOrdertype
## retail.df$Order.method.type
## retail.df$Product.type E-mail Fax Mail Sales visit Special Telephone Web
## Binoculars 504 504 504 504 504 504 504
## Climbing Accessories 588 588 588 588 588 588 588
## Cooking Gear 840 840 840 840 840 840 840
## Eyewear 1344 1344 1344 1344 1344 1344 1344
## First Aid 420 420 420 420 420 420 420
## Golf Accessories 336 336 336 336 336 336 336
## Insect Repellents 420 420 420 420 420 420 420
## Irons 336 336 336 336 336 336 336
## Knives 588 588 588 588 588 588 588
## Lanterns 1008 1008 1008 1008 1008 1008 1008
## Navigation 840 840 840 840 840 840 840
## Packs 504 504 504 504 504 504 504
## Putters 252 252 252 252 252 252 252
## Rope 336 336 336 336 336 336 336
## Safety 336 336 336 336 336 336 336
## Sleeping Bags 588 588 588 588 588 588 588
## Sunscreen 420 420 420 420 420 420 420
## Tents 504 504 504 504 504 504 504
## Tools 504 504 504 504 504 504 504
## Watches 1092 1092 1092 1092 1092 1092 1092
## Woods 336 336 336 336 336 336 336
addmargins(ProducttypebyOrdertype)
## retail.df$Order.method.type
## retail.df$Product.type E-mail Fax Mail Sales visit Special Telephone
## Binoculars 504 504 504 504 504 504
## Climbing Accessories 588 588 588 588 588 588
## Cooking Gear 840 840 840 840 840 840
## Eyewear 1344 1344 1344 1344 1344 1344
## First Aid 420 420 420 420 420 420
## Golf Accessories 336 336 336 336 336 336
## Insect Repellents 420 420 420 420 420 420
## Irons 336 336 336 336 336 336
## Knives 588 588 588 588 588 588
## Lanterns 1008 1008 1008 1008 1008 1008
## Navigation 840 840 840 840 840 840
## Packs 504 504 504 504 504 504
## Putters 252 252 252 252 252 252
## Rope 336 336 336 336 336 336
## Safety 336 336 336 336 336 336
## Sleeping Bags 588 588 588 588 588 588
## Sunscreen 420 420 420 420 420 420
## Tents 504 504 504 504 504 504
## Tools 504 504 504 504 504 504
## Watches 1092 1092 1092 1092 1092 1092
## Woods 336 336 336 336 336 336
## Sum 12096 12096 12096 12096 12096 12096
## retail.df$Order.method.type
## retail.df$Product.type Web Sum
## Binoculars 504 3528
## Climbing Accessories 588 4116
## Cooking Gear 840 5880
## Eyewear 1344 9408
## First Aid 420 2940
## Golf Accessories 336 2352
## Insect Repellents 420 2940
## Irons 336 2352
## Knives 588 4116
## Lanterns 1008 7056
## Navigation 840 5880
## Packs 504 3528
## Putters 252 1764
## Rope 336 2352
## Safety 336 2352
## Sleeping Bags 588 4116
## Sunscreen 420 2940
## Tents 504 3528
## Tools 504 3528
## Watches 1092 7644
## Woods 336 2352
## Sum 12096 84672
boxplot(Gross.profit ~ Order.method.type ,data=retail.df, main="Effect of Order method type on Gross profit", ylab="Order type", xlab="Gross profit", horizontal=FALSE)
boxplot(Quantity ~ Order.method.type ,data=retail.df, main="Effect of Order method type on quantity", ylab="Order type", xlab="Quantity", horizontal=FALSE)
boxplot(Revenue ~ Order.method.type ,data=retail.df, main="Effect of Order method type on revenue", ylab="Order type", xlab="Revenue", horizontal=FALSE)
boxplot(Revenue ~ Year ,data=retail.df, main="Effect of Year on revenue", ylab="Year", xlab="Revenue", horizontal=FALSE)
boxplot(Gross.profit ~ Year ,data=retail.df, main="Distribution of gross profit with year", ylab="Year", xlab="Gross profit", horizontal=FALSE)
library(lattice)
histogram(~Gross.profit, data = retail.df,
main = "Distribution of Gross profit", xlab="gross profit", col='red' )
library(lattice)
histogram(~Revenue, data = retail.df,
main = "Distribution of Revenue", xlab="revenue", col='blue' )
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
# Scatter plot of Gross profit with order method
scatterplot(Gross.profit ~ Order.method.type ,data=retail.df, main="Scatterplot of Gross profit with order type", xlab="Order method type", ylab="Gross profit", horizontal=TRUE)
## [1] "7457" "7604" "16574" "16584" "9956" "1590" "16731" "58925"
## [9] "16133" "17172" "37912" "16009" "37177" "38059" "16597" "37765"
## [17] "38206" "16163" "80115" "2780" "16113" "16554" "16407" "23031"
## [25] "1568" "14344" "22737" "16848" "1996" "16701" "1497" "22662"
## [33] "22956" "1492" "2668" "1494" "71758" "46033" "1499" "22666"
## [41] "16828" "25061" "16093" "16534" "22996" "16240" "23885" "16823"
## [49] "16088" "16387" "1471" "22639" "1765" "22933" "3529" "1618"
## [57] "7351" "3676" "9850" "17053" "49288" "51787" "52228" "58843"
## [65] "51934" "28561" "52522" "49729" "31060" "72955"
# Scatterplot of Gross profit with year
scatterplot(Gross.profit ~ Year ,data=retail.df, main="Scatterplot of Gross profit with year", ylab="Gross profit", xlab="Year", horizontal=TRUE)
## Warning in plot.window(...): "horizontal" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "horizontal" is not a graphical
## parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "horizontal"
## is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "horizontal"
## is not a graphical parameter
## Warning in box(...): "horizontal" is not a graphical parameter
## Warning in title(...): "horizontal" is not a graphical parameter
# Average gross profit by order method
avgprofit = aggregate(Gross.profit ~ Order.method.type , data = retail.df, mean)
avgprofit
## Order.method.type Gross.profit
## 1 E-mail 35324.29
## 2 Fax 16319.28
## 3 Mail 16386.31
## 4 Sales visit 29109.29
## 5 Special 14006.19
## 6 Telephone 36904.41
## 7 Web 139377.47
library(car)
scatterplotMatrix(~ Gross.profit + Revenue , data=retail.df,
main="Gross profit")
# Scatterplot Matrices from the car Package
library(car)
scatterplotMatrix(~Gross.profit+Year+Product.type+Order.method.type , data=retail.df)
library(Hmisc)
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
##
## describe
## The following objects are masked from 'package:base':
##
## format.pval, units
colretails <- c("Gross.profit","Product.cost","Quantity")
corMatrix <- rcorr(as.matrix(retail.df[,colretails]))
corMatrix
## Gross.profit Product.cost Quantity
## Gross.profit 1.00 0.94 0.49
## Product.cost 0.94 1.00 0.51
## Quantity 0.49 0.51 1.00
##
## n= 24743
##
##
## P
## Gross.profit Product.cost Quantity
## Gross.profit 0 0
## Product.cost 0 0
## Quantity 0 0
library(Hmisc)
library(car)
library(corrgram)
colretails <- c("Gross.profit","Product.cost","Quantity")
corrgram(retail.df[,colretails], order=TRUE,
main="Gross profit vs others",
lower.panel=panel.pts, upper.panel=panel.pie,
diag.panel=panel.minmax, text.panel=panel.txt)
cor.test(retail.df[,"Gross.profit"], retail.df[,"Quantity"])
##
## Pearson's product-moment correlation
##
## data: retail.df[, "Gross.profit"] and retail.df[, "Quantity"]
## t = 87.538, df = 24741, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4767203 0.4957483
## sample estimates:
## cor
## 0.486292
cor.test(retail.df[,"Gross.profit"], retail.df[,"Revenue"])
##
## Pearson's product-moment correlation
##
## data: retail.df[, "Gross.profit"] and retail.df[, "Revenue"]
## t = 736.41, df = 24741, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9773903 0.9784778
## sample estimates:
## cor
## 0.9779407
So, Gross profit and Quantity & Gross profit and Revenue are highly correlated.
t.test(retail.df$Revenue,retail.df$Planned.revenue)
##
## Welch Two Sample t-test
##
## data: retail.df$Revenue and retail.df$Planned.revenue
## t = -2.6355, df = 49440, p-value = 0.008405
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16389.583 -2408.994
## sample estimates:
## mean of x mean of y
## 189418.3 198817.5
Since the p-value is less than 0.05, we fail to reject the null hypothesis that planned revenue and revenue are equal.
In this model we try regressing Gross profit on ALL the remaining columns.
M1 <- Gross.profit ~ Year + Product.line + Product.type + Order.method.type + Revenue + Quantity + Unit.cost + Unit.sale.price
f1 <- lm(M1, data = retail.df)
summary(f1)
##
## Call:
## lm(formula = M1, data = retail.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -673249 -6267 -351 5057 632447
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) -8.113e+05 3.673e+05 -2.209
## Year 3.983e+02 1.832e+02 2.175
## Product.lineGolf Equipment -9.208e+03 2.173e+03 -4.238
## Product.lineMountaineering Equipment 1.478e+04 1.647e+03 8.973
## Product.lineOutdoor Protection 1.665e+04 1.648e+03 10.099
## Product.linePersonal Accessories 1.153e+04 1.524e+03 7.569
## Product.typeClimbing Accessories 4.176e+03 1.375e+03 3.037
## Product.typeCooking Gear 1.246e+04 1.452e+03 8.579
## Product.typeEyewear -4.920e+02 1.119e+03 -0.440
## Product.typeFirst Aid -5.830e+03 1.307e+03 -4.460
## Product.typeGolf Accessories 2.660e+04 2.403e+03 11.068
## Product.typeInsect Repellents 4.491e+03 1.305e+03 3.440
## Product.typeIrons 1.577e+04 1.712e+03 9.210
## Product.typeKnives -9.449e+02 1.241e+03 -0.761
## Product.typeLanterns 1.246e+04 1.465e+03 8.505
## Product.typeNavigation -2.798e+03 1.234e+03 -2.268
## Product.typePacks 1.001e+04 1.401e+03 7.146
## Product.typePutters 3.213e+04 2.476e+03 12.975
## Product.typeRope -9.106e+02 1.716e+03 -0.531
## Product.typeSafety -6.357e+03 1.584e+03 -4.012
## Product.typeSleeping Bags 1.098e+04 1.450e+03 7.570
## Product.typeSunscreen NA NA NA
## Product.typeTents NA NA NA
## Product.typeTools NA NA NA
## Product.typeWatches 4.195e+03 1.142e+03 3.673
## Product.typeWoods NA NA NA
## Order.method.typeFax -6.615e+02 9.385e+02 -0.705
## Order.method.typeMail -6.971e+02 1.067e+03 -0.653
## Order.method.typeSales visit -5.205e+02 7.789e+02 -0.668
## Order.method.typeSpecial 5.059e+02 1.222e+03 0.414
## Order.method.typeTelephone 1.447e+02 7.936e+02 0.182
## Order.method.typeWeb 4.505e+03 7.119e+02 6.328
## Revenue 3.988e-01 6.373e-04 625.803
## Quantity -2.634e-01 2.726e-02 -9.663
## Unit.cost -7.364e+02 1.477e+01 -49.866
## Unit.sale.price 4.386e+02 8.627e+00 50.847
## Pr(>|t|)
## (Intercept) 0.027215 *
## Year 0.029655 *
## Product.lineGolf Equipment 2.26e-05 ***
## Product.lineMountaineering Equipment < 2e-16 ***
## Product.lineOutdoor Protection < 2e-16 ***
## Product.linePersonal Accessories 3.90e-14 ***
## Product.typeClimbing Accessories 0.002389 **
## Product.typeCooking Gear < 2e-16 ***
## Product.typeEyewear 0.660234
## Product.typeFirst Aid 8.24e-06 ***
## Product.typeGolf Accessories < 2e-16 ***
## Product.typeInsect Repellents 0.000582 ***
## Product.typeIrons < 2e-16 ***
## Product.typeKnives 0.446418
## Product.typeLanterns < 2e-16 ***
## Product.typeNavigation 0.023361 *
## Product.typePacks 9.19e-13 ***
## Product.typePutters < 2e-16 ***
## Product.typeRope 0.595745
## Product.typeSafety 6.04e-05 ***
## Product.typeSleeping Bags 3.87e-14 ***
## Product.typeSunscreen NA
## Product.typeTents NA
## Product.typeTools NA
## Product.typeWatches 0.000240 ***
## Product.typeWoods NA
## Order.method.typeFax 0.480917
## Order.method.typeMail 0.513625
## Order.method.typeSales visit 0.503988
## Order.method.typeSpecial 0.678894
## Order.method.typeTelephone 0.855360
## Order.method.typeWeb 2.52e-10 ***
## Revenue < 2e-16 ***
## Quantity < 2e-16 ***
## Unit.cost < 2e-16 ***
## Unit.sale.price < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29010 on 24711 degrees of freedom
## (59929 observations deleted due to missingness)
## Multiple R-squared: 0.9664, Adjusted R-squared: 0.9663
## F-statistic: 2.292e+04 on 31 and 24711 DF, p-value: < 2.2e-16
As the Adjusted R-squared value is 0.9663, so this holds a strong relationship.
library(leaps)
leap1 <- regsubsets(M1, data = retail.df, nbest=1)
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax,
## force.in = force.in, : 4 linear dependencies found
## Reordering variables and trying again:
# summary(leap1)
plot(leap1, scale="adjr2")
M2 <- Gross.profit ~ Revenue + Quantity + Unit.cost + Unit.sale.price
f2 <- lm(M2, data = retail.df)
summary(f2)
##
## Call:
## lm(formula = M2, data = retail.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -689795 -4859 -1966 4932 621548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.582e+03 2.435e+02 14.714 <2e-16 ***
## Revenue 4.009e-01 6.026e-04 665.249 <2e-16 ***
## Quantity -2.611e-01 2.634e-02 -9.912 <2e-16 ***
## Unit.cost -7.684e+02 9.672e+00 -79.447 <2e-16 ***
## Unit.sale.price 4.377e+02 5.432e+00 80.590 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29370 on 24738 degrees of freedom
## (59929 observations deleted due to missingness)
## Multiple R-squared: 0.9655, Adjusted R-squared: 0.9655
## F-statistic: 1.731e+05 on 4 and 24738 DF, p-value: < 2.2e-16
library(leaps)
leap2 <- regsubsets(M2, data = retail.df, nbest=1)
# summary(leap2)
plot(leap2, scale="adjr2")
# the Adjusted R Squared for Model 2 is less than Model 1
summary(f1)$adj.r.squared
## [1] 0.966345
summary(f2)$adj.r.squared
## [1] 0.9655065
# the AIC for Model 2 is less than Model 1
AIC(f1)
## [1] 578736.4
AIC(f2)
## [1] 579318.4
Thus, Model 1 is our ‘best’ ordinary least squares model.
Model 1 predicts the Gross.profit as a function of the following explanatory variables: Year,Product.line,Product.type,Order.method.type,Revenue,Quantity,Unit.cost,Unit.sale.price.