1. Introduction

‘Retail Sales Marketing Profit Cost’ has been collected from the sales section of the website of IBM Watson analytics.IBM (International Business Machines Corporation) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries.IBM manufactures and markets computer hardware, middleware and software, and provides hosting and consulting services in areas ranging from mainframe computers to nanotechnology. IBM is also a major research organization, holding the record for most patents generated by a business (as of 2018) for 25 consecutive years.IBM has continually shifted its business mix by commoditizing markets focusing on higher-value, more profitable markets.IBM says that Watson Analytics offering is intended to to provide the benefits of advanced analytics without the complexity.
This paper addresses the issues concerning the ‘Gross Profit’ with respect to the year and order method and product types.

Literature Review

The specific objective of this Study was to investigate the variation of the gross profit with respect to the Year,product type and order method type.This study analyzed the gross profit and also analyzed the comparison between planned revenue and revenue of the product.
Here we expect to review product-related information like Cost, Revenue, Price, etc. across Years and Ordering Method.This dataset can also be used in the Explore feature to better understand the hidden trends & patterns.

Data Description

For this study, we collected data from the sales section in the website of IBM Watson Analytics(https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/).IBM Watson Analytics provides us some real data sets to analyze those in many different ways, to uncover insights that can have a profound effect on business. It could also help getting the track of marketing strategy.
In this data set we will analyze how the gross profit of the company is changing yearly according to their product lines, product type, order method, unit sale price, product cost etc.

Model Analysis

We proposed the best fit model here :
Gross.profit = b0 + b1Year + b2Product.line + b3Product.type + b4Order.method.type + b5Revenue + b6Quantity + b7Unit.cost + b8Unit.sale.price

Read the data set

retail.df <- read.csv(paste("RetailSales.csv", sep=""))
attach(retail.df)

Regression model

M1 <- Gross.profit ~  Year + Product.line + Product.type + Order.method.type + Revenue + Quantity + Unit.cost + Unit.sale.price
f1 <- lm(M1, data = retail.df)
summary(f1)
## 
## Call:
## lm(formula = M1, data = retail.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -673249   -6267    -351    5057  632447 
## 
## Coefficients: (4 not defined because of singularities)
##                                        Estimate Std. Error t value
## (Intercept)                          -8.113e+05  3.673e+05  -2.209
## Year                                  3.983e+02  1.832e+02   2.175
## Product.lineGolf Equipment           -9.208e+03  2.173e+03  -4.238
## Product.lineMountaineering Equipment  1.478e+04  1.647e+03   8.973
## Product.lineOutdoor Protection        1.665e+04  1.648e+03  10.099
## Product.linePersonal Accessories      1.153e+04  1.524e+03   7.569
## Product.typeClimbing Accessories      4.176e+03  1.375e+03   3.037
## Product.typeCooking Gear              1.246e+04  1.452e+03   8.579
## Product.typeEyewear                  -4.920e+02  1.119e+03  -0.440
## Product.typeFirst Aid                -5.830e+03  1.307e+03  -4.460
## Product.typeGolf Accessories          2.660e+04  2.403e+03  11.068
## Product.typeInsect Repellents         4.491e+03  1.305e+03   3.440
## Product.typeIrons                     1.577e+04  1.712e+03   9.210
## Product.typeKnives                   -9.449e+02  1.241e+03  -0.761
## Product.typeLanterns                  1.246e+04  1.465e+03   8.505
## Product.typeNavigation               -2.798e+03  1.234e+03  -2.268
## Product.typePacks                     1.001e+04  1.401e+03   7.146
## Product.typePutters                   3.213e+04  2.476e+03  12.975
## Product.typeRope                     -9.106e+02  1.716e+03  -0.531
## Product.typeSafety                   -6.357e+03  1.584e+03  -4.012
## Product.typeSleeping Bags             1.098e+04  1.450e+03   7.570
## Product.typeSunscreen                        NA         NA      NA
## Product.typeTents                            NA         NA      NA
## Product.typeTools                            NA         NA      NA
## Product.typeWatches                   4.195e+03  1.142e+03   3.673
## Product.typeWoods                            NA         NA      NA
## Order.method.typeFax                 -6.615e+02  9.385e+02  -0.705
## Order.method.typeMail                -6.971e+02  1.067e+03  -0.653
## Order.method.typeSales visit         -5.205e+02  7.789e+02  -0.668
## Order.method.typeSpecial              5.059e+02  1.222e+03   0.414
## Order.method.typeTelephone            1.447e+02  7.936e+02   0.182
## Order.method.typeWeb                  4.505e+03  7.119e+02   6.328
## Revenue                               3.988e-01  6.373e-04 625.803
## Quantity                             -2.634e-01  2.726e-02  -9.663
## Unit.cost                            -7.364e+02  1.477e+01 -49.866
## Unit.sale.price                       4.386e+02  8.627e+00  50.847
##                                      Pr(>|t|)    
## (Intercept)                          0.027215 *  
## Year                                 0.029655 *  
## Product.lineGolf Equipment           2.26e-05 ***
## Product.lineMountaineering Equipment  < 2e-16 ***
## Product.lineOutdoor Protection        < 2e-16 ***
## Product.linePersonal Accessories     3.90e-14 ***
## Product.typeClimbing Accessories     0.002389 ** 
## Product.typeCooking Gear              < 2e-16 ***
## Product.typeEyewear                  0.660234    
## Product.typeFirst Aid                8.24e-06 ***
## Product.typeGolf Accessories          < 2e-16 ***
## Product.typeInsect Repellents        0.000582 ***
## Product.typeIrons                     < 2e-16 ***
## Product.typeKnives                   0.446418    
## Product.typeLanterns                  < 2e-16 ***
## Product.typeNavigation               0.023361 *  
## Product.typePacks                    9.19e-13 ***
## Product.typePutters                   < 2e-16 ***
## Product.typeRope                     0.595745    
## Product.typeSafety                   6.04e-05 ***
## Product.typeSleeping Bags            3.87e-14 ***
## Product.typeSunscreen                      NA    
## Product.typeTents                          NA    
## Product.typeTools                          NA    
## Product.typeWatches                  0.000240 ***
## Product.typeWoods                          NA    
## Order.method.typeFax                 0.480917    
## Order.method.typeMail                0.513625    
## Order.method.typeSales visit         0.503988    
## Order.method.typeSpecial             0.678894    
## Order.method.typeTelephone           0.855360    
## Order.method.typeWeb                 2.52e-10 ***
## Revenue                               < 2e-16 ***
## Quantity                              < 2e-16 ***
## Unit.cost                             < 2e-16 ***
## Unit.sale.price                       < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29010 on 24711 degrees of freedom
##   (59929 observations deleted due to missingness)
## Multiple R-squared:  0.9664, Adjusted R-squared:  0.9663 
## F-statistic: 2.292e+04 on 31 and 24711 DF,  p-value: < 2.2e-16

Here we established the effect of the year, product line, product type, order method type, revenue, quantity, unit cost, unit sale price on the gross profit of the company.We regressed gross profit on the basis of those variables.We estimated model, using linear least squares.The best fit model will be always the model with greatest adjusted R squared value.

Discussion

Here we contructed two models M1 and M2. Now comparing their adjusted R squared value we got the best model M1 as it’s R squared value is 0.9663 which is greatest. So we can see that gross profit is highly dependent on all those x variables in model 1. The p-value is also here <0.05.

Conclusion

Our aim was here to focus on the changes of gross profit according to those varibles. According to our analysis in 2006 the gross profit and the revenue were the highest, and we could also find that the gross profit is also highest when the order method is Web. The unique contribution of this paper is that we investigated the gross profit and how it is being changed with revenue, sale price, cost price, quantity, order method, product type.

References

Sales section in the website of IBM Watson Analytics :
https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/

Appendix 1

Read the data set in R

retail.df <- read.csv(paste("RetailSales.csv", sep=""))
attach(retail.df)
## The following objects are masked from retail.df (pos = 3):
## 
##     Gross.profit, Order.method.type, Planned.revenue, Product,
##     Product.cost, Product.line, Product.type, Quantity,
##     Retailer.country, Revenue, Unit.cost, Unit.price,
##     Unit.sale.price, Year

Get a summary of data set

library(psych)
describe(retail.df)
##                    vars     n      mean        sd   median   trimmed
## Year                  1 84672   2005.50      1.12  2005.50   2005.50
## Product.line*         2 84672      3.15      1.66     3.00      3.19
## Product.type*         3 84672     10.51      6.23    10.00     10.39
## Product*              4 84672     72.50     41.57    72.50     72.50
## Order.method.type*    5 84672      4.00      2.00     4.00      4.00
## Retailer.country*     6 84672     11.00      6.06    11.00     11.00
## Revenue               7 24743 189418.25 390750.90 59867.27 106149.19
## Planned.revenue       8 24743 198817.54 402535.53 63906.84 112896.09
## Product.cost          9 24743 111625.14 238415.56 32783.72  61046.41
## Quantity             10 24743   3606.56   8777.72  1043.00   1829.23
## Unit.cost            11 24743     84.89    131.11    36.83     51.88
## Unit.price           12 24743    155.99    246.80    66.77     94.20
## Gross.profit         13 24743  77793.11 158122.34 25793.76  44033.05
## Unit.sale.price      14 24743    147.23    232.05    62.65     89.23
##                         mad       min         max       range  skew
## Year                   1.48   2004.00     2007.00        3.00  0.00
## Product.line*          2.97      1.00        5.00        4.00 -0.14
## Product.type*          8.90      1.00       21.00       20.00  0.16
## Product*              53.37      1.00      144.00      143.00  0.00
## Order.method.type*     2.97      1.00        7.00        6.00  0.00
## Retailer.country*      7.41      1.00       21.00       20.00  0.00
## Revenue            75285.35      0.00 10054289.00 10054289.00  6.90
## Planned.revenue    80672.73     15.69 10054289.00 10054273.31  6.68
## Product.cost       42496.83      5.76  6756852.78  6756847.02  7.23
## Quantity            1294.31      1.00   313628.00   313627.00  9.50
## Unit.cost             43.48      0.85      690.00      689.15  2.45
## Unit.price            77.10      2.06     1359.72     1357.66  2.72
## Gross.profit       31742.48 -18159.60  3521097.76  3539257.36  6.90
## Unit.sale.price       72.14      0.00     1307.80     1307.80  2.70
##                    kurtosis      se
## Year                  -1.36    0.00
## Product.line*         -1.62    0.01
## Product.type*         -1.27    0.02
## Product*              -1.20    0.14
## Order.method.type*    -1.25    0.01
## Retailer.country*     -1.21    0.02
## Revenue               85.05 2484.13
## Planned.revenue       79.43 2559.05
## Product.cost          96.19 1515.68
## Quantity             175.46   55.80
## Unit.cost              5.72    0.83
## Unit.price             7.74    1.57
## Gross.profit          83.84 1005.23
## Unit.sale.price        7.62    1.48

One way contigency table

mytable <- with(retail.df, table(Order.method.type))
mytable
## Order.method.type
##      E-mail         Fax        Mail Sales visit     Special   Telephone 
##       12096       12096       12096       12096       12096       12096 
##         Web 
##       12096
mytable2 <- with(retail.df, table(Retailer.country))
mytable2
## Retailer.country
##      Australia        Austria        Belgium         Brazil         Canada 
##           4032           4032           4032           4032           4032 
##          China        Denmark        Finland         France        Germany 
##           4032           4032           4032           4032           4032 
##          Italy          Japan          Korea         Mexico    Netherlands 
##           4032           4032           4032           4032           4032 
##      Singapore          Spain         Sweden    Switzerland United Kingdom 
##           4032           4032           4032           4032           4032 
##  United States 
##           4032

Two way contigency tables

ProductlinebyOrdertype <- xtabs(~ retail.df$Product.line + retail.df$Order.method.type , data=retail.df)
ProductlinebyOrdertype
##                           retail.df$Order.method.type
## retail.df$Product.line     E-mail  Fax Mail Sales visit Special Telephone
##   Camping Equipment          3444 3444 3444        3444    3444      3444
##   Golf Equipment             1260 1260 1260        1260    1260      1260
##   Mountaineering Equipment   1764 1764 1764        1764    1764      1764
##   Outdoor Protection         1260 1260 1260        1260    1260      1260
##   Personal Accessories       4368 4368 4368        4368    4368      4368
##                           retail.df$Order.method.type
## retail.df$Product.line      Web
##   Camping Equipment        3444
##   Golf Equipment           1260
##   Mountaineering Equipment 1764
##   Outdoor Protection       1260
##   Personal Accessories     4368
addmargins(ProductlinebyOrdertype)
##                           retail.df$Order.method.type
## retail.df$Product.line     E-mail   Fax  Mail Sales visit Special
##   Camping Equipment          3444  3444  3444        3444    3444
##   Golf Equipment             1260  1260  1260        1260    1260
##   Mountaineering Equipment   1764  1764  1764        1764    1764
##   Outdoor Protection         1260  1260  1260        1260    1260
##   Personal Accessories       4368  4368  4368        4368    4368
##   Sum                       12096 12096 12096       12096   12096
##                           retail.df$Order.method.type
## retail.df$Product.line     Telephone   Web   Sum
##   Camping Equipment             3444  3444 24108
##   Golf Equipment                1260  1260  8820
##   Mountaineering Equipment      1764  1764 12348
##   Outdoor Protection            1260  1260  8820
##   Personal Accessories          4368  4368 30576
##   Sum                          12096 12096 84672
ProducttypebyOrdertype <- xtabs(~ retail.df$Product.type + retail.df$Order.method.type , data=retail.df)
ProducttypebyOrdertype
##                       retail.df$Order.method.type
## retail.df$Product.type E-mail  Fax Mail Sales visit Special Telephone  Web
##   Binoculars              504  504  504         504     504       504  504
##   Climbing Accessories    588  588  588         588     588       588  588
##   Cooking Gear            840  840  840         840     840       840  840
##   Eyewear                1344 1344 1344        1344    1344      1344 1344
##   First Aid               420  420  420         420     420       420  420
##   Golf Accessories        336  336  336         336     336       336  336
##   Insect Repellents       420  420  420         420     420       420  420
##   Irons                   336  336  336         336     336       336  336
##   Knives                  588  588  588         588     588       588  588
##   Lanterns               1008 1008 1008        1008    1008      1008 1008
##   Navigation              840  840  840         840     840       840  840
##   Packs                   504  504  504         504     504       504  504
##   Putters                 252  252  252         252     252       252  252
##   Rope                    336  336  336         336     336       336  336
##   Safety                  336  336  336         336     336       336  336
##   Sleeping Bags           588  588  588         588     588       588  588
##   Sunscreen               420  420  420         420     420       420  420
##   Tents                   504  504  504         504     504       504  504
##   Tools                   504  504  504         504     504       504  504
##   Watches                1092 1092 1092        1092    1092      1092 1092
##   Woods                   336  336  336         336     336       336  336
addmargins(ProducttypebyOrdertype)
##                       retail.df$Order.method.type
## retail.df$Product.type E-mail   Fax  Mail Sales visit Special Telephone
##   Binoculars              504   504   504         504     504       504
##   Climbing Accessories    588   588   588         588     588       588
##   Cooking Gear            840   840   840         840     840       840
##   Eyewear                1344  1344  1344        1344    1344      1344
##   First Aid               420   420   420         420     420       420
##   Golf Accessories        336   336   336         336     336       336
##   Insect Repellents       420   420   420         420     420       420
##   Irons                   336   336   336         336     336       336
##   Knives                  588   588   588         588     588       588
##   Lanterns               1008  1008  1008        1008    1008      1008
##   Navigation              840   840   840         840     840       840
##   Packs                   504   504   504         504     504       504
##   Putters                 252   252   252         252     252       252
##   Rope                    336   336   336         336     336       336
##   Safety                  336   336   336         336     336       336
##   Sleeping Bags           588   588   588         588     588       588
##   Sunscreen               420   420   420         420     420       420
##   Tents                   504   504   504         504     504       504
##   Tools                   504   504   504         504     504       504
##   Watches                1092  1092  1092        1092    1092      1092
##   Woods                   336   336   336         336     336       336
##   Sum                   12096 12096 12096       12096   12096     12096
##                       retail.df$Order.method.type
## retail.df$Product.type   Web   Sum
##   Binoculars             504  3528
##   Climbing Accessories   588  4116
##   Cooking Gear           840  5880
##   Eyewear               1344  9408
##   First Aid              420  2940
##   Golf Accessories       336  2352
##   Insect Repellents      420  2940
##   Irons                  336  2352
##   Knives                 588  4116
##   Lanterns              1008  7056
##   Navigation             840  5880
##   Packs                  504  3528
##   Putters                252  1764
##   Rope                   336  2352
##   Safety                 336  2352
##   Sleeping Bags          588  4116
##   Sunscreen              420  2940
##   Tents                  504  3528
##   Tools                  504  3528
##   Watches               1092  7644
##   Woods                  336  2352
##   Sum                  12096 84672

Appendix 2

Effect of order method on gross profit

boxplot(Gross.profit ~ Order.method.type ,data=retail.df, main="Effect of Order method type on Gross profit", ylab="Order type", xlab="Gross profit", horizontal=FALSE)

Effect of order method on quantity

boxplot(Quantity ~ Order.method.type ,data=retail.df, main="Effect of Order method type on quantity", ylab="Order type", xlab="Quantity", horizontal=FALSE)

Effect of order method on Revenue

boxplot(Revenue ~ Order.method.type ,data=retail.df, main="Effect of Order method type on revenue", ylab="Order type", xlab="Revenue", horizontal=FALSE)

Effect of year on Revenue

boxplot(Revenue ~ Year ,data=retail.df, main="Effect of Year on revenue", ylab="Year", xlab="Revenue", horizontal=FALSE)

Gross profit according to year

boxplot(Gross.profit ~ Year ,data=retail.df, main="Distribution of gross profit with year", ylab="Year", xlab="Gross profit", horizontal=FALSE)

Histograms

library(lattice)
histogram(~Gross.profit, data = retail.df,
main = "Distribution of Gross profit", xlab="gross profit", col='red' ) 

library(lattice)
histogram(~Revenue, data = retail.df,
main = "Distribution of Revenue", xlab="revenue", col='blue' ) 

Scatterplots

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
# Scatter plot of Gross profit with order method
scatterplot(Gross.profit ~ Order.method.type ,data=retail.df, main="Scatterplot of Gross profit with order type", xlab="Order method type", ylab="Gross profit", horizontal=TRUE)

##  [1] "7457"  "7604"  "16574" "16584" "9956"  "1590"  "16731" "58925"
##  [9] "16133" "17172" "37912" "16009" "37177" "38059" "16597" "37765"
## [17] "38206" "16163" "80115" "2780"  "16113" "16554" "16407" "23031"
## [25] "1568"  "14344" "22737" "16848" "1996"  "16701" "1497"  "22662"
## [33] "22956" "1492"  "2668"  "1494"  "71758" "46033" "1499"  "22666"
## [41] "16828" "25061" "16093" "16534" "22996" "16240" "23885" "16823"
## [49] "16088" "16387" "1471"  "22639" "1765"  "22933" "3529"  "1618" 
## [57] "7351"  "3676"  "9850"  "17053" "49288" "51787" "52228" "58843"
## [65] "51934" "28561" "52522" "49729" "31060" "72955"
# Scatterplot of Gross profit with year
scatterplot(Gross.profit ~ Year ,data=retail.df, main="Scatterplot of Gross profit with year", ylab="Gross profit", xlab="Year", horizontal=TRUE)
## Warning in plot.window(...): "horizontal" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "horizontal" is not a graphical
## parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "horizontal"
## is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "horizontal"
## is not a graphical parameter
## Warning in box(...): "horizontal" is not a graphical parameter
## Warning in title(...): "horizontal" is not a graphical parameter

# Average gross profit by order method
avgprofit = aggregate(Gross.profit ~ Order.method.type , data = retail.df, mean)
avgprofit
##   Order.method.type Gross.profit
## 1            E-mail     35324.29
## 2               Fax     16319.28
## 3              Mail     16386.31
## 4       Sales visit     29109.29
## 5           Special     14006.19
## 6         Telephone     36904.41
## 7               Web    139377.47
library(car)
scatterplotMatrix(~ Gross.profit + Revenue , data=retail.df,
    main="Gross profit")

# Scatterplot Matrices from the car Package
library(car)
scatterplotMatrix(~Gross.profit+Year+Product.type+Order.method.type , data=retail.df)

Appendix 3

CORRELATION MATRIX

library(Hmisc)
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
## 
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
## 
##     describe
## The following objects are masked from 'package:base':
## 
##     format.pval, units
colretails <- c("Gross.profit","Product.cost","Quantity")
corMatrix <- rcorr(as.matrix(retail.df[,colretails]))
corMatrix
##              Gross.profit Product.cost Quantity
## Gross.profit         1.00         0.94     0.49
## Product.cost         0.94         1.00     0.51
## Quantity             0.49         0.51     1.00
## 
## n= 24743 
## 
## 
## P
##              Gross.profit Product.cost Quantity
## Gross.profit               0            0      
## Product.cost  0                         0      
## Quantity      0            0

CORRGRAM

library(Hmisc)
library(car)
library(corrgram)
colretails <- c("Gross.profit","Product.cost","Quantity")
corrgram(retail.df[,colretails], order=TRUE,
         main="Gross profit vs others",
         lower.panel=panel.pts, upper.panel=panel.pie,
         diag.panel=panel.minmax, text.panel=panel.txt)

Pearson’s Correlation test

cor.test(retail.df[,"Gross.profit"], retail.df[,"Quantity"])
## 
##  Pearson's product-moment correlation
## 
## data:  retail.df[, "Gross.profit"] and retail.df[, "Quantity"]
## t = 87.538, df = 24741, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4767203 0.4957483
## sample estimates:
##      cor 
## 0.486292
cor.test(retail.df[,"Gross.profit"], retail.df[,"Revenue"])
## 
##  Pearson's product-moment correlation
## 
## data:  retail.df[, "Gross.profit"] and retail.df[, "Revenue"]
## t = 736.41, df = 24741, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9773903 0.9784778
## sample estimates:
##       cor 
## 0.9779407

So, Gross profit and Quantity & Gross profit and Revenue are highly correlated.

T-test

t.test(retail.df$Revenue,retail.df$Planned.revenue)
## 
##  Welch Two Sample t-test
## 
## data:  retail.df$Revenue and retail.df$Planned.revenue
## t = -2.6355, df = 49440, p-value = 0.008405
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16389.583  -2408.994
## sample estimates:
## mean of x mean of y 
##  189418.3  198817.5

Since the p-value is less than 0.05, we fail to reject the null hypothesis that planned revenue and revenue are equal.

R code

REGRESSION

MODEL 1

In this model we try regressing Gross profit on ALL the remaining columns.

Independent Variables(x1,x2,x3..): {Year,Product.line,Product.type,Order.method.type,Revenue,Quantity,Unit.cost,Unit.sale.price}

Dependent Variable(y): Gross.profit

M1 <- Gross.profit ~  Year + Product.line + Product.type + Order.method.type + Revenue + Quantity + Unit.cost + Unit.sale.price
f1 <- lm(M1, data = retail.df)
summary(f1)
## 
## Call:
## lm(formula = M1, data = retail.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -673249   -6267    -351    5057  632447 
## 
## Coefficients: (4 not defined because of singularities)
##                                        Estimate Std. Error t value
## (Intercept)                          -8.113e+05  3.673e+05  -2.209
## Year                                  3.983e+02  1.832e+02   2.175
## Product.lineGolf Equipment           -9.208e+03  2.173e+03  -4.238
## Product.lineMountaineering Equipment  1.478e+04  1.647e+03   8.973
## Product.lineOutdoor Protection        1.665e+04  1.648e+03  10.099
## Product.linePersonal Accessories      1.153e+04  1.524e+03   7.569
## Product.typeClimbing Accessories      4.176e+03  1.375e+03   3.037
## Product.typeCooking Gear              1.246e+04  1.452e+03   8.579
## Product.typeEyewear                  -4.920e+02  1.119e+03  -0.440
## Product.typeFirst Aid                -5.830e+03  1.307e+03  -4.460
## Product.typeGolf Accessories          2.660e+04  2.403e+03  11.068
## Product.typeInsect Repellents         4.491e+03  1.305e+03   3.440
## Product.typeIrons                     1.577e+04  1.712e+03   9.210
## Product.typeKnives                   -9.449e+02  1.241e+03  -0.761
## Product.typeLanterns                  1.246e+04  1.465e+03   8.505
## Product.typeNavigation               -2.798e+03  1.234e+03  -2.268
## Product.typePacks                     1.001e+04  1.401e+03   7.146
## Product.typePutters                   3.213e+04  2.476e+03  12.975
## Product.typeRope                     -9.106e+02  1.716e+03  -0.531
## Product.typeSafety                   -6.357e+03  1.584e+03  -4.012
## Product.typeSleeping Bags             1.098e+04  1.450e+03   7.570
## Product.typeSunscreen                        NA         NA      NA
## Product.typeTents                            NA         NA      NA
## Product.typeTools                            NA         NA      NA
## Product.typeWatches                   4.195e+03  1.142e+03   3.673
## Product.typeWoods                            NA         NA      NA
## Order.method.typeFax                 -6.615e+02  9.385e+02  -0.705
## Order.method.typeMail                -6.971e+02  1.067e+03  -0.653
## Order.method.typeSales visit         -5.205e+02  7.789e+02  -0.668
## Order.method.typeSpecial              5.059e+02  1.222e+03   0.414
## Order.method.typeTelephone            1.447e+02  7.936e+02   0.182
## Order.method.typeWeb                  4.505e+03  7.119e+02   6.328
## Revenue                               3.988e-01  6.373e-04 625.803
## Quantity                             -2.634e-01  2.726e-02  -9.663
## Unit.cost                            -7.364e+02  1.477e+01 -49.866
## Unit.sale.price                       4.386e+02  8.627e+00  50.847
##                                      Pr(>|t|)    
## (Intercept)                          0.027215 *  
## Year                                 0.029655 *  
## Product.lineGolf Equipment           2.26e-05 ***
## Product.lineMountaineering Equipment  < 2e-16 ***
## Product.lineOutdoor Protection        < 2e-16 ***
## Product.linePersonal Accessories     3.90e-14 ***
## Product.typeClimbing Accessories     0.002389 ** 
## Product.typeCooking Gear              < 2e-16 ***
## Product.typeEyewear                  0.660234    
## Product.typeFirst Aid                8.24e-06 ***
## Product.typeGolf Accessories          < 2e-16 ***
## Product.typeInsect Repellents        0.000582 ***
## Product.typeIrons                     < 2e-16 ***
## Product.typeKnives                   0.446418    
## Product.typeLanterns                  < 2e-16 ***
## Product.typeNavigation               0.023361 *  
## Product.typePacks                    9.19e-13 ***
## Product.typePutters                   < 2e-16 ***
## Product.typeRope                     0.595745    
## Product.typeSafety                   6.04e-05 ***
## Product.typeSleeping Bags            3.87e-14 ***
## Product.typeSunscreen                      NA    
## Product.typeTents                          NA    
## Product.typeTools                          NA    
## Product.typeWatches                  0.000240 ***
## Product.typeWoods                          NA    
## Order.method.typeFax                 0.480917    
## Order.method.typeMail                0.513625    
## Order.method.typeSales visit         0.503988    
## Order.method.typeSpecial             0.678894    
## Order.method.typeTelephone           0.855360    
## Order.method.typeWeb                 2.52e-10 ***
## Revenue                               < 2e-16 ***
## Quantity                              < 2e-16 ***
## Unit.cost                             < 2e-16 ***
## Unit.sale.price                       < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29010 on 24711 degrees of freedom
##   (59929 observations deleted due to missingness)
## Multiple R-squared:  0.9664, Adjusted R-squared:  0.9663 
## F-statistic: 2.292e+04 on 31 and 24711 DF,  p-value: < 2.2e-16

As the Adjusted R-squared value is 0.9663, so this holds a strong relationship.

FIND THE BEST PREDICTORS

library(leaps)
leap1 <- regsubsets(M1, data = retail.df, nbest=1)
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax,
## force.in = force.in, : 4 linear dependencies found
## Reordering variables and trying again:
# summary(leap1)
plot(leap1, scale="adjr2")

MODEL 2

M2 <- Gross.profit ~   Revenue + Quantity + Unit.cost + Unit.sale.price
f2 <- lm(M2, data = retail.df)
summary(f2)
## 
## Call:
## lm(formula = M2, data = retail.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689795   -4859   -1966    4932  621548 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3.582e+03  2.435e+02  14.714   <2e-16 ***
## Revenue          4.009e-01  6.026e-04 665.249   <2e-16 ***
## Quantity        -2.611e-01  2.634e-02  -9.912   <2e-16 ***
## Unit.cost       -7.684e+02  9.672e+00 -79.447   <2e-16 ***
## Unit.sale.price  4.377e+02  5.432e+00  80.590   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29370 on 24738 degrees of freedom
##   (59929 observations deleted due to missingness)
## Multiple R-squared:  0.9655, Adjusted R-squared:  0.9655 
## F-statistic: 1.731e+05 on 4 and 24738 DF,  p-value: < 2.2e-16
library(leaps)
leap2 <- regsubsets(M2, data = retail.df, nbest=1)
# summary(leap2)
plot(leap2, scale="adjr2")

# the Adjusted R Squared for Model 2 is less than Model 1
summary(f1)$adj.r.squared
## [1] 0.966345
summary(f2)$adj.r.squared
## [1] 0.9655065
# the AIC for Model 2 is less than Model 1
AIC(f1)
## [1] 578736.4
AIC(f2)
## [1] 579318.4

Thus, Model 1 is our ‘best’ ordinary least squares model.
Model 1 predicts the Gross.profit as a function of the following explanatory variables: Year,Product.line,Product.type,Order.method.type,Revenue,Quantity,Unit.cost,Unit.sale.price.