The dataset consists of monthly sales data of a steel company with details on source plant, product type, GST levied, invoice number, net weight, buyer details, sale region etc.
The analysis is aimed understanding patterns between sale regions, product types, total sales etc. to improve profits and overcome setbacks.
setwd("~/R")
sales.df <- read.csv(paste("company_sales_nov.csv",sep=""))
attach(sales.df)
dim(sales.df)
## [1] 48121 20
colnames(sales.df)
## [1] "Sold.to.Party.No" "Sold.to.Party.Name" "Sold.to.Party.City"
## [4] "Sold.to.Party.State" "Dc.description" "Source.Plant"
## [7] "Sales.District" "Shipment.Date" "G.L.Description"
## [10] "ODN.No" "Net.Wt" "Basic.Price"
## [13] "Basic.value" "EW1.Freight" "Total.value"
## [16] "Integratd.GST.." "Integratd.GST" "Total.Value"
## [19] "Producttyp" "So.Description"
str(sales.df)
## 'data.frame': 48121 obs. of 20 variables:
## $ Sold.to.Party.No : int 40024488 40007108 40007108 40007108 40007108 40007108 40007108 40007108 40035511 40035411 ...
## $ Sold.to.Party.Name : Factor w/ 905 levels "?SteelGate GmbH",..: 863 437 437 437 437 437 437 437 356 763 ...
## $ Sold.to.Party.City : Factor w/ 442 levels "","4705-564 Braga",..: 383 319 319 319 319 319 319 319 431 124 ...
## $ Sold.to.Party.State: Factor w/ 31 levels "Andaman und Nico.In.",..: 10 19 19 19 19 19 19 19 11 12 ...
## $ Dc.description : Factor w/ 7 levels "Auction","Job Work",..: 3 5 5 5 5 5 5 5 1 5 ...
## $ Source.Plant : int 1013 1013 1013 1013 1013 1013 1013 1013 1013 1013 ...
## $ Sales.District : Factor w/ 5 levels "CENTRE","EAST",..: 5 5 5 5 5 5 5 5 5 3 ...
## $ Shipment.Date : Factor w/ 47 levels "","10/1/2017",..: 26 26 26 26 26 26 26 26 27 27 ...
## $ G.L.Description : Factor w/ 5 levels "","DOMESTIC SALES",..: 2 2 2 2 2 2 2 2 2 5 ...
## $ ODN.No : Factor w/ 13961 levels "KL2700015898",..: 10785 10786 10786 10786 10786 10786 10786 10787 10788 10789 ...
## $ Net.Wt : num 4.46 3.39 4.25 3.94 3.93 ...
## $ Basic.Price : num 65443 56325 55625 55625 55625 ...
## $ Basic.value : num 292203 190942 236684 219163 218606 ...
## $ EW1.Freight : num 9408 2119 2659 2463 2456 ...
## $ Total.value : num 301611 193061 239343 221626 221062 ...
## $ Integratd.GST.. : int 18 0 0 0 0 0 0 0 18 18 ...
## $ Integratd.GST : int 54290 0 0 0 0 0 0 0 110102 53204 ...
## $ Total.Value : num 355901 227811 282425 261518 260854 ...
## $ Producttyp : Factor w/ 7 levels "Coil","Corrugated Sheet",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ So.Description : Factor w/ 2 levels "JSW Coated Dom Sales",..: 1 1 1 1 1 1 1 1 1 1 ...
library(psych)
describe(sales.df)
## vars n mean sd median
## Sold.to.Party.No 1 48121 38681805.49 7196640.68 40022812.00
## Sold.to.Party.Name* 2 48121 474.73 264.00 460.00
## Sold.to.Party.City* 3 48121 215.18 125.10 206.00
## Sold.to.Party.State* 4 48121 16.19 7.30 17.00
## Dc.description* 5 48121 4.56 1.88 5.00
## Source.Plant 6 48121 1014.55 2.11 1013.00
## Sales.District* 7 48121 4.37 0.95 5.00
## Shipment.Date* 8 48121 29.76 10.40 31.00
## G.L.Description* 9 48121 2.59 1.53 2.00
## ODN.No* 10 48121 7063.90 3974.53 6885.00
## Net.Wt 11 48121 5.23 40.99 3.59
## Basic.Price 12 48121 56008.80 12325.29 55555.20
## Basic.value 13 48121 241089.79 247974.52 208936.42
## EW1.Freight 14 48121 3439.18 6036.95 0.00
## Total.value 15 48121 177530.98 258134.87 146254.00
## Integratd.GST.. 16 48121 8.75 9.00 0.00
## Integratd.GST 17 48121 18770.19 27566.57 0.00
## Total.Value 18 48121 276742.91 289073.58 231208.00
## Producttyp* 19 48121 2.06 1.92 1.00
## So.Description* 20 48121 1.24 0.43 1.00
## trimmed mad min max range
## Sold.to.Party.No 40020465.44 18553.26 1013.00 40036770 40035757.00
## Sold.to.Party.Name* 477.99 343.96 1.00 905 904.00
## Sold.to.Party.City* 215.41 148.26 1.00 442 441.00
## Sold.to.Party.State* 15.89 8.90 1.00 31 30.00
## Dc.description* 4.66 2.97 1.00 7 6.00
## Source.Plant 1014.31 0.00 1013.00 1018 5.00
## Sales.District* 4.56 0.00 1.00 5 4.00
## Shipment.Date* 30.34 10.38 1.00 47 46.00
## G.L.Description* 2.49 1.48 1.00 5 4.00
## ODN.No* 7084.04 5042.32 1.00 13961 13960.00
## Net.Wt 3.61 1.64 0.02 5000 4999.98
## Basic.Price 55976.68 9524.62 2.37 231700 231697.63
## Basic.value 204817.99 95926.33 7.00 5901399 5901392.00
## EW1.Freight 2163.41 0.00 0.00 89429 89429.00
## Total.value 137184.90 159201.59 0.00 5901399 5901399.00
## Integratd.GST.. 8.69 0.00 0.00 18 18.00
## Integratd.GST 14858.96 0.00 0.00 744770 744770.00
## Total.Value 234543.24 115708.03 8.00 6963651 6963643.00
## Producttyp* 1.58 0.00 1.00 7 6.00
## So.Description* 1.18 0.00 1.00 2 1.00
## skew kurtosis se
## Sold.to.Party.No -5.19 24.92 32806.70
## Sold.to.Party.Name* -0.02 -1.23 1.20
## Sold.to.Party.City* 0.00 -1.13 0.57
## Sold.to.Party.State* 0.24 -0.85 0.03
## Dc.description* -0.14 -1.01 0.01
## Source.Plant 0.96 -0.94 0.01
## Sales.District* -1.52 1.62 0.00
## Shipment.Date* -0.62 0.33 0.05
## G.L.Description* 0.75 -1.05 0.01
## ODN.No* 0.00 -1.13 18.12
## Net.Wt 94.18 9634.80 0.19
## Basic.Price 2.06 35.38 56.19
## Basic.value 10.26 198.13 1130.42
## EW1.Freight 3.10 21.33 27.52
## Total.value 9.50 177.36 1176.74
## Integratd.GST.. 0.06 -2.00 0.04
## Integratd.GST 7.13 158.77 125.67
## Total.Value 10.65 209.46 1317.77
## Producttyp* 1.84 2.00 0.01
## So.Description* 1.21 -0.54 0.00
#Based on Domestic and International Sales.
xtabs(~So.Description)
## So.Description
## JSW Coated Dom Sales JSW Coated Exp Sales
## 36506 11615
Domestic sales are more than three times that of Exports for the company.
#Based on Product type.
xtabs(~Producttyp)
## Producttyp
## Coil Corrugated Sheet Others Producttyp
## 31707 4501 5857 1
## Profile Sheet Rectangular Sheet
## 664 175 5216
The company predominantly produces coils compared to other products.
#Based on end-use.
xtabs(~Dc.description)
## Dc.description
## Auction Job Work OEM Others
## 3893 94 15656 292
## Retail SEZ/Deemed Export Stock Transfer
## 14932 29 13225
The company acts as an OEM producing finished equipment, majorly.
#Based on region.
xtabs(~Sales.District)
## Sales.District
## CENTRE EAST NORTH SOUTH WEST
## 641 2380 5135 10108 29857
Most sales occur towards the western regions in the country, much greater than the sum of all other regions.
#Based on Source plant.
xtabs(~Source.Plant)
## Source.Plant
## 1013 1014 1018
## 24450 10967 12704
The source plant, 1013, in Vasind produces more steel equipment than the other two plants put together.
aggregate(Total.Value/10000000,by=list(Dc.description),sum)
## Group.1 x
## 1 Auction 87.4387746
## 2 Job Work 0.6697823
## 3 OEM 314.1363746
## 4 Others 35.6956470
## 5 Retail 519.0652377
## 6 SEZ/Deemed Export 0.3287486
## 7 Stock Transfer 374.3799894
aggregate(Total.value/10000000,by=list(Producttyp),sum)
## Group.1 x
## 1 Coil 548.970450
## 2 Corrugated Sheet 102.936574
## 3 Others 85.793591
## 4 Producttyp 0.000000
## 5 Profile Sheet 7.055564
## 6 Rectangular 2.577263
## 7 Sheet 106.963381
aggregate(Total.value/10000000,by=list(Sales.District),sum)
## Group.1 x
## 1 CENTRE 10.6484
## 2 EAST 47.7538
## 3 NORTH 101.6898
## 4 SOUTH 230.1184
## 5 WEST 464.0864
aggregate(Total.value/10000000,by=list(Source.Plant),sum)
## Group.1 x
## 1 1013 448.7490
## 2 1014 129.6502
## 3 1018 275.8976
We can infer that each of these: WEST sales region, Retail products, Vasind plant and Coils contribute majorly towards Total Sales Value.
#Based on region and end-use.
xtabs(~Sales.District+Dc.description)
## Dc.description
## Sales.District Auction Job Work OEM Others Retail SEZ/Deemed Export
## CENTRE 362 0 152 9 118 0
## EAST 19 0 396 0 1965 0
## NORTH 1019 0 1614 3 1889 0
## SOUTH 250 0 4261 0 4582 29
## WEST 2243 94 9233 280 6378 0
## Dc.description
## Sales.District Stock Transfer
## CENTRE 0
## EAST 0
## NORTH 610
## SOUTH 986
## WEST 11629
#Based on region and Product Type.
xtabs(~Sales.District+Producttyp)
## Producttyp
## Sales.District Coil Corrugated Sheet Others Producttyp Profile Sheet
## CENTRE 489 60 44 0 12
## EAST 2270 9 92 0 0
## NORTH 4014 192 401 0 45
## SOUTH 8293 338 843 0 160
## WEST 16641 3902 4477 1 447
## Producttyp
## Sales.District Rectangular Sheet
## CENTRE 0 36
## EAST 0 9
## NORTH 19 464
## SOUTH 40 434
## WEST 116 4273
#Based on plant and end-use.
xtabs(~Source.Plant+Dc.description)
## Dc.description
## Source.Plant Auction Job Work OEM Others Retail SEZ/Deemed Export
## 1013 1345 0 10158 109 6643 29
## 1014 1684 0 1310 63 2939 0
## 1018 864 94 4188 120 5350 0
## Dc.description
## Source.Plant Stock Transfer
## 1013 6166
## 1014 4971
## 1018 2088
#Based on plant and Product Type.
xtabs(~Source.Plant+Producttyp)
## Producttyp
## Source.Plant Coil Corrugated Sheet Others Producttyp Profile Sheet
## 1013 8036 4501 5857 1 664
## 1014 10967 0 0 0 0
## 1018 12704 0 0 0 0
## Producttyp
## Source.Plant Rectangular Sheet
## 1013 175 5216
## 1014 0 0
## 1018 0 0
As we can see from the contingency tables, some source plants have null productions for certain equipments. Also, some sales regions have zero sales taking place for certain product types. These areas can have scope for improvement.
boxplot(Total.Value,main="Boxplot for Total Sales Value",horizontal = TRUE)
boxplot(Net.Wt,main="Net weight of sold product",horizontal = TRUE)
boxplot(Basic.Price,main="Basic price of sold product",horizontal = TRUE)
hist(Net.Wt,breaks = 10)
hist(Total.Value,breaks = 30)
hist(Basic.Price,breaks = 10)
plot(Total.Value~Source.Plant)
We see that the plant at Vasind(1013), produces highly valued end products.
plot(Total.Value~Sales.District)
As we can see, the WEST region contributes to a good no. of outliers for the total sales value.
plot(Total.Value~Producttyp)
plot(Total.Value~Dc.description)
col = c("Total.Value","Net.Wt","Basic.value","EW1.Freight","Integratd.GST")
tc <- cor(sales.df[,col])
round(tc,2)
## Total.Value Net.Wt Basic.value EW1.Freight Integratd.GST
## Total.Value 1.00 0.08 1.00 0.09 0.26
## Net.Wt 0.08 1.00 0.08 0.00 0.00
## Basic.value 1.00 0.08 1.00 0.04 0.22
## EW1.Freight 0.09 0.00 0.04 1.00 0.57
## Integratd.GST 0.26 0.00 0.22 0.57 1.00
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.4.3
## corrplot 0.84 loaded
corrplot(tc,method = "circle")
We can see that the Total.Value is positively correlated to the Net.Wt, Basic.value, Freight, GST, which is a logical consequence.
Running chi-square test,
p1 <- xtabs(~So.Description+Sales.District)
addmargins(p1)
## Sales.District
## So.Description CENTRE EAST NORTH SOUTH WEST Sum
## JSW Coated Dom Sales 478 1920 4187 8440 21481 36506
## JSW Coated Exp Sales 163 460 948 1668 8376 11615
## Sum 641 2380 5135 10108 29857 48121
chisq.test(p1)
##
## Pearson's Chi-squared test
##
## data: p1
## X-squared = 692.92, df = 4, p-value < 2.2e-16
Since, p-value<0.01, a dependency is indicated between above variables.
Running chi-square test,
p1 <- xtabs(~Source.Plant+Dc.description)
addmargins(p1)
## Dc.description
## Source.Plant Auction Job Work OEM Others Retail SEZ/Deemed Export
## 1013 1345 0 10158 109 6643 29
## 1014 1684 0 1310 63 2939 0
## 1018 864 94 4188 120 5350 0
## Sum 3893 94 15656 292 14932 29
## Dc.description
## Source.Plant Stock Transfer Sum
## 1013 6166 24450
## 1014 4971 10967
## 1018 2088 12704
## Sum 13225 48121
chisq.test(p1)
##
## Pearson's Chi-squared test
##
## data: p1
## X-squared = 5872.8, df = 12, p-value < 2.2e-16
Since, p-value<0.01, a dependency is indicated between above variables.
Running chi-square test,
p1 <- xtabs(~Source.Plant+Sales.District)
addmargins(p1)
## Sales.District
## Source.Plant CENTRE EAST NORTH SOUTH WEST Sum
## 1013 237 238 1626 2860 19489 24450
## 1014 302 778 1079 1442 7366 10967
## 1018 102 1364 2430 5806 3002 12704
## Sum 641 2380 5135 10108 29857 48121
chisq.test(p1)
##
## Pearson's Chi-squared test
##
## data: p1
## X-squared = 12507, df = 8, p-value < 2.2e-16
Since, p-value<0.01, a dependency is indicated between above variables.
Total.Value = b0 + b1.(Basic.Price) + b2.(Integratd.GST..) + b3.(Net.Wt)
sheet.df <- sales.df[(Producttyp=='Sheet'),]
fit1<-lm(sheet.df$Total.Value~sheet.df$Basic.Price+sheet.df$Integratd.GST..+sheet.df$Net.Wt)
summary(fit1)
##
## Call:
## lm(formula = sheet.df$Total.Value ~ sheet.df$Basic.Price + sheet.df$Integratd.GST.. +
## sheet.df$Net.Wt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2231295 -19168 706 37374 4217342
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.730e+05 8.659e+03 -43.073 <2e-16 ***
## sheet.df$Basic.Price 7.322e+00 1.468e-01 49.895 <2e-16 ***
## sheet.df$Integratd.GST.. -4.386e+02 1.905e+02 -2.302 0.0214 *
## sheet.df$Net.Wt 5.105e+04 2.631e+02 194.042 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 111800 on 5212 degrees of freedom
## Multiple R-squared: 0.8847, Adjusted R-squared: 0.8846
## F-statistic: 1.333e+04 on 3 and 5212 DF, p-value: < 2.2e-16
corrsheet.df <- sales.df[(Producttyp=='Corrugated Sheet'),]
fit2<-lm(corrsheet.df$Total.Value~corrsheet.df$Basic.Price+corrsheet.df$Integratd.GST..+corrsheet.df$Net.Wt)
summary(fit2)
##
## Call:
## lm(formula = corrsheet.df$Total.Value ~ corrsheet.df$Basic.Price +
## corrsheet.df$Integratd.GST.. + corrsheet.df$Net.Wt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5410654 -57190 11276 93804 3139865
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.226e+05 1.779e+04 -40.621 < 2e-16 ***
## corrsheet.df$Basic.Price 1.440e+01 2.939e-01 48.999 < 2e-16 ***
## corrsheet.df$Integratd.GST.. -3.438e+03 4.426e+02 -7.766 9.96e-15 ***
## corrsheet.df$Net.Wt 4.748e+04 4.953e+02 95.872 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 238000 on 4497 degrees of freedom
## Multiple R-squared: 0.6883, Adjusted R-squared: 0.6881
## F-statistic: 3310 on 3 and 4497 DF, p-value: < 2.2e-16
The p-values are less than 0.05, the R-squared values account for 88.47% and 68.83% of the variation in response data. Also, all the parameters are significant.
From above, we can see the highest increase in Total.Value(278.4 and 256.6 respectively) per unit positive change in Net.Wt, in comaparison to other paramters. Also, there is an increase in Total.Value(7.3 and 1.44 respectively) per unit positive change in Basic.Price.
So, we can increase Total Value by: * increasing production to increase net weight. * increasing basic price. Obviously, a presence of strong demand for the company’s products is a necessity, for the results to show their effects.