Monthly Sales Analysis of JSW Steel Coated Products Ltd.

The dataset consists of monthly sales data of a steel company with details on source plant, product type, GST levied, invoice number, net weight, buyer details, sale region etc.

The analysis is aimed understanding patterns between sale regions, product types, total sales etc. to improve profits and overcome setbacks.

1. Reading Data into R.

setwd("~/R")
sales.df <- read.csv(paste("company_sales_nov.csv",sep=""))
attach(sales.df)

2. Visualizing length and breadth of dataset.

dim(sales.df)
## [1] 48121    20
colnames(sales.df)
##  [1] "Sold.to.Party.No"    "Sold.to.Party.Name"  "Sold.to.Party.City" 
##  [4] "Sold.to.Party.State" "Dc.description"      "Source.Plant"       
##  [7] "Sales.District"      "Shipment.Date"       "G.L.Description"    
## [10] "ODN.No"              "Net.Wt"              "Basic.Price"        
## [13] "Basic.value"         "EW1.Freight"         "Total.value"        
## [16] "Integratd.GST.."     "Integratd.GST"       "Total.Value"        
## [19] "Producttyp"          "So.Description"
str(sales.df)
## 'data.frame':    48121 obs. of  20 variables:
##  $ Sold.to.Party.No   : int  40024488 40007108 40007108 40007108 40007108 40007108 40007108 40007108 40035511 40035411 ...
##  $ Sold.to.Party.Name : Factor w/ 905 levels "?SteelGate GmbH",..: 863 437 437 437 437 437 437 437 356 763 ...
##  $ Sold.to.Party.City : Factor w/ 442 levels "","4705-564 Braga",..: 383 319 319 319 319 319 319 319 431 124 ...
##  $ Sold.to.Party.State: Factor w/ 31 levels "Andaman und Nico.In.",..: 10 19 19 19 19 19 19 19 11 12 ...
##  $ Dc.description     : Factor w/ 7 levels "Auction","Job Work",..: 3 5 5 5 5 5 5 5 1 5 ...
##  $ Source.Plant       : int  1013 1013 1013 1013 1013 1013 1013 1013 1013 1013 ...
##  $ Sales.District     : Factor w/ 5 levels "CENTRE","EAST",..: 5 5 5 5 5 5 5 5 5 3 ...
##  $ Shipment.Date      : Factor w/ 47 levels "","10/1/2017",..: 26 26 26 26 26 26 26 26 27 27 ...
##  $ G.L.Description    : Factor w/ 5 levels "","DOMESTIC SALES",..: 2 2 2 2 2 2 2 2 2 5 ...
##  $ ODN.No             : Factor w/ 13961 levels "KL2700015898",..: 10785 10786 10786 10786 10786 10786 10786 10787 10788 10789 ...
##  $ Net.Wt             : num  4.46 3.39 4.25 3.94 3.93 ...
##  $ Basic.Price        : num  65443 56325 55625 55625 55625 ...
##  $ Basic.value        : num  292203 190942 236684 219163 218606 ...
##  $ EW1.Freight        : num  9408 2119 2659 2463 2456 ...
##  $ Total.value        : num  301611 193061 239343 221626 221062 ...
##  $ Integratd.GST..    : int  18 0 0 0 0 0 0 0 18 18 ...
##  $ Integratd.GST      : int  54290 0 0 0 0 0 0 0 110102 53204 ...
##  $ Total.Value        : num  355901 227811 282425 261518 260854 ...
##  $ Producttyp         : Factor w/ 7 levels "Coil","Corrugated Sheet",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ So.Description     : Factor w/ 2 levels "JSW Coated Dom Sales",..: 1 1 1 1 1 1 1 1 1 1 ...

3. Creating descriptive statistics of each variable.

library(psych)
describe(sales.df)
##                      vars     n        mean         sd      median
## Sold.to.Party.No        1 48121 38681805.49 7196640.68 40022812.00
## Sold.to.Party.Name*     2 48121      474.73     264.00      460.00
## Sold.to.Party.City*     3 48121      215.18     125.10      206.00
## Sold.to.Party.State*    4 48121       16.19       7.30       17.00
## Dc.description*         5 48121        4.56       1.88        5.00
## Source.Plant            6 48121     1014.55       2.11     1013.00
## Sales.District*         7 48121        4.37       0.95        5.00
## Shipment.Date*          8 48121       29.76      10.40       31.00
## G.L.Description*        9 48121        2.59       1.53        2.00
## ODN.No*                10 48121     7063.90    3974.53     6885.00
## Net.Wt                 11 48121        5.23      40.99        3.59
## Basic.Price            12 48121    56008.80   12325.29    55555.20
## Basic.value            13 48121   241089.79  247974.52   208936.42
## EW1.Freight            14 48121     3439.18    6036.95        0.00
## Total.value            15 48121   177530.98  258134.87   146254.00
## Integratd.GST..        16 48121        8.75       9.00        0.00
## Integratd.GST          17 48121    18770.19   27566.57        0.00
## Total.Value            18 48121   276742.91  289073.58   231208.00
## Producttyp*            19 48121        2.06       1.92        1.00
## So.Description*        20 48121        1.24       0.43        1.00
##                          trimmed       mad     min      max       range
## Sold.to.Party.No     40020465.44  18553.26 1013.00 40036770 40035757.00
## Sold.to.Party.Name*       477.99    343.96    1.00      905      904.00
## Sold.to.Party.City*       215.41    148.26    1.00      442      441.00
## Sold.to.Party.State*       15.89      8.90    1.00       31       30.00
## Dc.description*             4.66      2.97    1.00        7        6.00
## Source.Plant             1014.31      0.00 1013.00     1018        5.00
## Sales.District*             4.56      0.00    1.00        5        4.00
## Shipment.Date*             30.34     10.38    1.00       47       46.00
## G.L.Description*            2.49      1.48    1.00        5        4.00
## ODN.No*                  7084.04   5042.32    1.00    13961    13960.00
## Net.Wt                      3.61      1.64    0.02     5000     4999.98
## Basic.Price             55976.68   9524.62    2.37   231700   231697.63
## Basic.value            204817.99  95926.33    7.00  5901399  5901392.00
## EW1.Freight              2163.41      0.00    0.00    89429    89429.00
## Total.value            137184.90 159201.59    0.00  5901399  5901399.00
## Integratd.GST..             8.69      0.00    0.00       18       18.00
## Integratd.GST           14858.96      0.00    0.00   744770   744770.00
## Total.Value            234543.24 115708.03    8.00  6963651  6963643.00
## Producttyp*                 1.58      0.00    1.00        7        6.00
## So.Description*             1.18      0.00    1.00        2        1.00
##                       skew kurtosis       se
## Sold.to.Party.No     -5.19    24.92 32806.70
## Sold.to.Party.Name*  -0.02    -1.23     1.20
## Sold.to.Party.City*   0.00    -1.13     0.57
## Sold.to.Party.State*  0.24    -0.85     0.03
## Dc.description*      -0.14    -1.01     0.01
## Source.Plant          0.96    -0.94     0.01
## Sales.District*      -1.52     1.62     0.00
## Shipment.Date*       -0.62     0.33     0.05
## G.L.Description*      0.75    -1.05     0.01
## ODN.No*               0.00    -1.13    18.12
## Net.Wt               94.18  9634.80     0.19
## Basic.Price           2.06    35.38    56.19
## Basic.value          10.26   198.13  1130.42
## EW1.Freight           3.10    21.33    27.52
## Total.value           9.50   177.36  1176.74
## Integratd.GST..       0.06    -2.00     0.04
## Integratd.GST         7.13   158.77   125.67
## Total.Value          10.65   209.46  1317.77
## Producttyp*           1.84     2.00     0.01
## So.Description*       1.21    -0.54     0.00

4. 1-way Contingency tables.

#Based on Domestic and International Sales.
xtabs(~So.Description)
## So.Description
## JSW Coated Dom Sales JSW Coated Exp Sales 
##                36506                11615

Domestic sales are more than three times that of Exports for the company.

#Based on Product type.
xtabs(~Producttyp)
## Producttyp
##             Coil Corrugated Sheet           Others       Producttyp 
##            31707             4501             5857                1 
##    Profile Sheet      Rectangular            Sheet 
##              664              175             5216

The company predominantly produces coils compared to other products.

#Based on end-use.
xtabs(~Dc.description)
## Dc.description
##           Auction          Job Work               OEM            Others 
##              3893                94             15656               292 
##            Retail SEZ/Deemed Export    Stock Transfer 
##             14932                29             13225

The company acts as an OEM producing finished equipment, majorly.

#Based on region.
xtabs(~Sales.District)
## Sales.District
## CENTRE   EAST  NORTH  SOUTH   WEST 
##    641   2380   5135  10108  29857

Most sales occur towards the western regions in the country, much greater than the sum of all other regions.

#Based on Source plant.
xtabs(~Source.Plant)
## Source.Plant
##  1013  1014  1018 
## 24450 10967 12704

The source plant, 1013, in Vasind produces more steel equipment than the other two plants put together.

5. Aggregating total sales,in terms of value (in crores):

aggregate(Total.Value/10000000,by=list(Dc.description),sum)
##             Group.1           x
## 1           Auction  87.4387746
## 2          Job Work   0.6697823
## 3               OEM 314.1363746
## 4            Others  35.6956470
## 5            Retail 519.0652377
## 6 SEZ/Deemed Export   0.3287486
## 7    Stock Transfer 374.3799894
aggregate(Total.value/10000000,by=list(Producttyp),sum)
##            Group.1          x
## 1             Coil 548.970450
## 2 Corrugated Sheet 102.936574
## 3           Others  85.793591
## 4       Producttyp   0.000000
## 5    Profile Sheet   7.055564
## 6      Rectangular   2.577263
## 7            Sheet 106.963381
aggregate(Total.value/10000000,by=list(Sales.District),sum)
##   Group.1        x
## 1  CENTRE  10.6484
## 2    EAST  47.7538
## 3   NORTH 101.6898
## 4   SOUTH 230.1184
## 5    WEST 464.0864
aggregate(Total.value/10000000,by=list(Source.Plant),sum)
##   Group.1        x
## 1    1013 448.7490
## 2    1014 129.6502
## 3    1018 275.8976

We can infer that each of these: WEST sales region, Retail products, Vasind plant and Coils contribute majorly towards Total Sales Value.

6. 2-way Contingency tables.

#Based on region and end-use.
xtabs(~Sales.District+Dc.description)
##               Dc.description
## Sales.District Auction Job Work   OEM Others Retail SEZ/Deemed Export
##         CENTRE     362        0   152      9    118                 0
##         EAST        19        0   396      0   1965                 0
##         NORTH     1019        0  1614      3   1889                 0
##         SOUTH      250        0  4261      0   4582                29
##         WEST      2243       94  9233    280   6378                 0
##               Dc.description
## Sales.District Stock Transfer
##         CENTRE              0
##         EAST                0
##         NORTH             610
##         SOUTH             986
##         WEST            11629
#Based on region and Product Type.
 xtabs(~Sales.District+Producttyp)
##               Producttyp
## Sales.District  Coil Corrugated Sheet Others Producttyp Profile Sheet
##         CENTRE   489               60     44          0            12
##         EAST    2270                9     92          0             0
##         NORTH   4014              192    401          0            45
##         SOUTH   8293              338    843          0           160
##         WEST   16641             3902   4477          1           447
##               Producttyp
## Sales.District Rectangular Sheet
##         CENTRE           0    36
##         EAST             0     9
##         NORTH           19   464
##         SOUTH           40   434
##         WEST           116  4273
#Based on plant and end-use.
 xtabs(~Source.Plant+Dc.description)
##             Dc.description
## Source.Plant Auction Job Work   OEM Others Retail SEZ/Deemed Export
##         1013    1345        0 10158    109   6643                29
##         1014    1684        0  1310     63   2939                 0
##         1018     864       94  4188    120   5350                 0
##             Dc.description
## Source.Plant Stock Transfer
##         1013           6166
##         1014           4971
##         1018           2088
#Based on plant and Product Type.
 xtabs(~Source.Plant+Producttyp)
##             Producttyp
## Source.Plant  Coil Corrugated Sheet Others Producttyp Profile Sheet
##         1013  8036             4501   5857          1           664
##         1014 10967                0      0          0             0
##         1018 12704                0      0          0             0
##             Producttyp
## Source.Plant Rectangular Sheet
##         1013         175  5216
##         1014           0     0
##         1018           0     0

As we can see from the contingency tables, some source plants have null productions for certain equipments. Also, some sales regions have zero sales taking place for certain product types. These areas can have scope for improvement.

7. Boxplots for variables: Total.Value, Net.Wt, Basic..

boxplot(Total.Value,main="Boxplot for Total Sales Value",horizontal = TRUE)

boxplot(Net.Wt,main="Net weight of sold product",horizontal = TRUE)

boxplot(Basic.Price,main="Basic price of sold product",horizontal = TRUE)

8. Histograms for net weight of product and total sales value.

hist(Net.Wt,breaks = 10)

hist(Total.Value,breaks = 30)

hist(Basic.Price,breaks = 10)

9. Plots.

plot(Total.Value~Source.Plant)

We see that the plant at Vasind(1013), produces highly valued end products.

plot(Total.Value~Sales.District)

As we can see, the WEST region contributes to a good no. of outliers for the total sales value.

plot(Total.Value~Producttyp)

plot(Total.Value~Dc.description)

10. Correlation matrix for Monthly Sales.

col = c("Total.Value","Net.Wt","Basic.value","EW1.Freight","Integratd.GST")
tc <- cor(sales.df[,col])
round(tc,2)
##               Total.Value Net.Wt Basic.value EW1.Freight Integratd.GST
## Total.Value          1.00   0.08        1.00        0.09          0.26
## Net.Wt               0.08   1.00        0.08        0.00          0.00
## Basic.value          1.00   0.08        1.00        0.04          0.22
## EW1.Freight          0.09   0.00        0.04        1.00          0.57
## Integratd.GST        0.26   0.00        0.22        0.57          1.00

11. Visualizing correlations using corrgram.

library(corrplot)
## Warning: package 'corrplot' was built under R version 3.4.3
## corrplot 0.84 loaded
corrplot(tc,method = "circle")

We can see that the Total.Value is positively correlated to the Net.Wt, Basic.value, Freight, GST, which is a logical consequence.

12a. Testing impact of Sales Region on number of Domestic/ International Sales.

Running chi-square test,

p1 <- xtabs(~So.Description+Sales.District)
addmargins(p1)
##                       Sales.District
## So.Description         CENTRE  EAST NORTH SOUTH  WEST   Sum
##   JSW Coated Dom Sales    478  1920  4187  8440 21481 36506
##   JSW Coated Exp Sales    163   460   948  1668  8376 11615
##   Sum                     641  2380  5135 10108 29857 48121
chisq.test(p1)
## 
##  Pearson's Chi-squared test
## 
## data:  p1
## X-squared = 692.92, df = 4, p-value < 2.2e-16

Since, p-value<0.01, a dependency is indicated between above variables.

12b. Testing impact of Source Plant on type of Product produce based on End-Use.

Running chi-square test,

p1 <- xtabs(~Source.Plant+Dc.description)
addmargins(p1)
##             Dc.description
## Source.Plant Auction Job Work   OEM Others Retail SEZ/Deemed Export
##         1013    1345        0 10158    109   6643                29
##         1014    1684        0  1310     63   2939                 0
##         1018     864       94  4188    120   5350                 0
##         Sum     3893       94 15656    292  14932                29
##             Dc.description
## Source.Plant Stock Transfer   Sum
##         1013           6166 24450
##         1014           4971 10967
##         1018           2088 12704
##         Sum           13225 48121
chisq.test(p1)
## 
##  Pearson's Chi-squared test
## 
## data:  p1
## X-squared = 5872.8, df = 12, p-value < 2.2e-16

Since, p-value<0.01, a dependency is indicated between above variables.

12c. Testing impact of Source Plant on Sale Region.

Running chi-square test,

p1 <- xtabs(~Source.Plant+Sales.District)
addmargins(p1)
##             Sales.District
## Source.Plant CENTRE  EAST NORTH SOUTH  WEST   Sum
##         1013    237   238  1626  2860 19489 24450
##         1014    302   778  1079  1442  7366 10967
##         1018    102  1364  2430  5806  3002 12704
##         Sum     641  2380  5135 10108 29857 48121
chisq.test(p1)
## 
##  Pearson's Chi-squared test
## 
## data:  p1
## X-squared = 12507, df = 8, p-value < 2.2e-16

Since, p-value<0.01, a dependency is indicated between above variables.

13. Formulating a regression model.

Total.Value = b0 + b1.(Basic.Price) + b2.(Integratd.GST..) + b3.(Net.Wt)

Consider this model for two of the product types:

  • Sheets:
sheet.df <- sales.df[(Producttyp=='Sheet'),]

fit1<-lm(sheet.df$Total.Value~sheet.df$Basic.Price+sheet.df$Integratd.GST..+sheet.df$Net.Wt)
summary(fit1)
## 
## Call:
## lm(formula = sheet.df$Total.Value ~ sheet.df$Basic.Price + sheet.df$Integratd.GST.. + 
##     sheet.df$Net.Wt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2231295   -19168      706    37374  4217342 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -3.730e+05  8.659e+03 -43.073   <2e-16 ***
## sheet.df$Basic.Price      7.322e+00  1.468e-01  49.895   <2e-16 ***
## sheet.df$Integratd.GST.. -4.386e+02  1.905e+02  -2.302   0.0214 *  
## sheet.df$Net.Wt           5.105e+04  2.631e+02 194.042   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 111800 on 5212 degrees of freedom
## Multiple R-squared:  0.8847, Adjusted R-squared:  0.8846 
## F-statistic: 1.333e+04 on 3 and 5212 DF,  p-value: < 2.2e-16
  • Corrugated Sheets:
corrsheet.df <- sales.df[(Producttyp=='Corrugated Sheet'),]

fit2<-lm(corrsheet.df$Total.Value~corrsheet.df$Basic.Price+corrsheet.df$Integratd.GST..+corrsheet.df$Net.Wt)
summary(fit2)
## 
## Call:
## lm(formula = corrsheet.df$Total.Value ~ corrsheet.df$Basic.Price + 
##     corrsheet.df$Integratd.GST.. + corrsheet.df$Net.Wt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5410654   -57190    11276    93804  3139865 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -7.226e+05  1.779e+04 -40.621  < 2e-16 ***
## corrsheet.df$Basic.Price      1.440e+01  2.939e-01  48.999  < 2e-16 ***
## corrsheet.df$Integratd.GST.. -3.438e+03  4.426e+02  -7.766 9.96e-15 ***
## corrsheet.df$Net.Wt           4.748e+04  4.953e+02  95.872  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 238000 on 4497 degrees of freedom
## Multiple R-squared:  0.6883, Adjusted R-squared:  0.6881 
## F-statistic:  3310 on 3 and 4497 DF,  p-value: < 2.2e-16

The p-values are less than 0.05, the R-squared values account for 88.47% and 68.83% of the variation in response data. Also, all the parameters are significant.

From above, we can see the highest increase in Total.Value(278.4 and 256.6 respectively) per unit positive change in Net.Wt, in comaparison to other paramters. Also, there is an increase in Total.Value(7.3 and 1.44 respectively) per unit positive change in Basic.Price.

So, we can increase Total Value by: * increasing production to increase net weight. * increasing basic price. Obviously, a presence of strong demand for the company’s products is a necessity, for the results to show their effects.