D3M Case Soup Pricing

Load Libraries

pacman::p_load(sjPlot,  ggeffects, rio, dplyr, ggplot2, DT,
               dplyr, corrplot,FactoMineR,  highcharter, lubridate, plotly,
              broom, stargazer, plm, GGally, explor, ggfortify, DescTools)

Case: Countercyclical Pricing

A large body of empirical literature in economics has documented that on average sellers set lower prices when the demand is high. Thus economic anomaly, i.e., a fall in the retail prices of products during periods of peak demand, is often referred to as countercyclical pricing.

The Canned Soup Industry

Packaged soups were a USD 5 billion market in 2015. Campbell Soup Company’s brand Campbell was the category leader with ~42% market share by value, with General Mills Progresso standing a distant second at ~15% of market share. The rest of the soup-scape had a tough competition between a multitude of private label and niche brands such as those from Unilever, Hain Celestial Group, Campbell Soup Company, Walmart, Heinz, ConAgra etc.

Soups are available in various formats such as shelf-stable, chilled, frozen, dehydrated, instant, UHT and are a highly seasonal category with demand peaking in winter months. In 2015, UHT soups posted the strongest growth rates, partly due to the premium positioning and partly due to the convenience factor (no can openers needed and reseal-able). This trend of strong growth rates in the UHT category is likely to continue till 2020.

Campbell Soup Company has three major product divisions - the simple meals division which consists largely of soups, the baked snacks division and the health beverage division. Progresso was started by Italian immigrants and was purchased by Pillsbury in 1988 and became a part of the General Mills family in 2001, when GM bought Pillsbury.

Scenario

You are hired as the Brand Manager for Progresso which employs a peculiar pricing strategy of significantly reducing prices in periods of high demand (i.e. Winters) and then raising the prices during off-peak months.

Let’s go ahead to import the data.

# Load Transaction
soup <- import("Countercyclical_Pricing.xlsx" , sheet ="soup_new")
soup$Date<-as.Date(soup$Date, "%m/%d/%Y" )

## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%m/%d/%Y'

# Month Dummy
soup$month_dummy<-months(soup$Date)
soup$month_dummy = factor(soup$month_dummy, levels = month.name)

# Load Demographic
store_demo <- import("Countercyclical_Pricing.xlsx" , sheet ="store_demo")

# Merge with "soup" file
soup <- merge(soup, store_demo, by="IRI_KEY")

Let’s begin with some summary statistics:

summary(soup)

##     IRI_KEY             Date               Winter         
##  Min.   : 200039   Min.   :2001-06-01   Length:88409      
##  1st Qu.: 239034   1st Qu.:2002-11-01   Class :character  
##  Median : 265775   Median :2004-04-01   Mode  :character  
##  Mean   : 360391   Mean   :2004-03-17                     
##  3rd Qu.: 291530   3rd Qu.:2005-08-01                     
##  Max.   :8032406   Max.   :2006-12-01                     
##                                                           
##     Region          Volume.Campbell     Volume.Other     
##  Length:88409       Min.   :   46.42   Min.   :   0.075  
##  Class :character   1st Qu.: 1357.65   1st Qu.: 119.668  
##  Mode  :character   Median : 2219.27   Median : 283.020  
##                     Mean   : 2949.86   Mean   : 469.450  
##                     3rd Qu.: 3629.39   3rd Qu.: 607.454  
##                     Max.   :42134.66   Max.   :9627.755  
##                                                          
##    Volume.PL         Volume.Progresso   Price.Campbell    Price.Other     
##  Min.   :    0.656   Min.   :    1.16   Min.   :0.4177   Min.   : 0.3248  
##  1st Qu.:  332.067   1st Qu.:  260.79   1st Qu.:1.2802   1st Qu.: 1.7194  
##  Median :  593.048   Median :  658.52   Median :1.4180   Median : 1.9681  
##  Mean   :  846.418   Mean   : 1408.26   Mean   :1.4318   Mean   : 1.9670  
##  3rd Qu.: 1054.568   3rd Qu.: 1608.28   3rd Qu.:1.5723   3rd Qu.: 2.1971  
##  Max.   :13695.612   Max.   :53857.27   Max.   :2.5191   Max.   :26.4000  
##                                                                           
##     Price.PL      Price.Progresso  Total_Volume       month_dummy   
##  Min.   :0.2594   Min.   :0.670   Min.   :  115.2   December: 7984  
##  1st Qu.:0.9889   1st Qu.:1.381   1st Qu.: 2483.7   November: 7885  
##  Median :1.0953   Median :1.642   Median : 4186.4   July    : 7848  
##  Mean   :1.1182   Mean   :1.648   Mean   : 5674.0   August  : 7829  
##  3rd Qu.:1.2238   3rd Qu.:1.892   3rd Qu.: 7110.5   June    : 7825  
##  Max.   :3.1803   Max.   :2.962   Max.   :81870.1   October : 7822  
##                                                     (Other) :41216  
##       FIPS           NAME            State_Name           REGION         
##  Min.   : 1003   Length:88409       Length:88409       Length:88409      
##  1st Qu.:17031   Class :character   Class :character   Class :character  
##  Median :34005   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :29861                                                           
##  3rd Qu.:45013                                                           
##  Max.   :55139                                                           
##                                                                          
##     DIVISION       Zip_name            Zip_key        per_kids18    
##  Min.   :1.000   Length:88409       Min.   : 1020   Min.   :0.0200  
##  1st Qu.:3.000   Class :character   1st Qu.:21244   1st Qu.:0.2200  
##  Median :5.000   Mode  :character   Median :46062   Median :0.2500  
##  Mean   :5.084                      Mean   :49104   Mean   :0.2492  
##  3rd Qu.:7.000                      3rd Qu.:77504   3rd Qu.:0.2800  
##  Max.   :9.000                      Max.   :99223   Max.   :0.4200  
##                                                                     
##    per_kids9         per_old         per_white        per_black     
##  Min.   :0.0100   Min.   :0.0200   Min.   :0.0100   Min.   :0.0000  
##  1st Qu.:0.1200   1st Qu.:0.0900   1st Qu.:0.6900   1st Qu.:0.0200  
##  Median :0.1400   Median :0.1200   Median :0.8300   Median :0.0500  
##  Mean   :0.1379   Mean   :0.1225   Mean   :0.7779   Mean   :0.1051  
##  3rd Qu.:0.1500   3rd Qu.:0.1500   3rd Qu.:0.9200   3rd Qu.:0.1200  
##  Max.   :0.2400   Max.   :0.4400   Max.   :0.9900   Max.   :0.9600  
##                                                                     
##      hhsize       Per_college     per_hh_less20K     median_inc    
##  Min.   :1.240   Min.   :0.0300   Min.   :0.0200   Min.   : 12262  
##  1st Qu.:2.350   1st Qu.:0.1700   1st Qu.:0.1100   1st Qu.: 38085  
##  Median :2.550   Median :0.2400   Median :0.1700   Median : 46731  
##  Mean   :2.573   Mean   :0.2724   Mean   :0.1818   Mean   : 49981  
##  3rd Qu.:2.750   3rd Qu.:0.3500   3rd Qu.:0.2300   3rd Qu.: 58113  
##  Max.   :4.540   Max.   :0.7900   Max.   :0.6500   Max.   :185466  
##                                                                    
##    capita_inc      Share_Camp      Share_Other          Share_PL        
##  Min.   : 7411   Min.   :0.2048   Min.   :0.001769   Min.   :0.0001807  
##  1st Qu.:18651   1st Qu.:0.4645   1st Qu.:0.042262   1st Qu.:0.0980898  
##  Median :22764   Median :0.5529   Median :0.065623   Median :0.1459749  
##  Mean   :24536   Mean   :0.5406   Mean   :0.078985   Mean   :0.1555552  
##  3rd Qu.:27635   3rd Qu.:0.6316   3rd Qu.:0.104115   3rd Qu.:0.2054529  
##  Max.   :92940   Max.   :0.8890   Max.   :0.365949   Max.   :0.4584570  
##                                                                         
##    Share_prog       
##  Min.   :0.0004484  
##  1st Qu.:0.1242095  
##  Median :0.2058253  
##  Mean   :0.2248098  
##  3rd Qu.:0.3045534  
##  Max.   :0.6187502  
##

A closer look at the key variables of Volume and Price:

Desc(soup$Volume.Campbell, main = "Volume - Campbell", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Volume - Campbell
## 
##        length           n         NAs      unique          0s        mean
##        88'409      88'409           0      88'339           0  2'949.8631
##                    100.0%        0.0%                    0.0%            
##                                                                          
##           .05         .10         .25      median         .75         .90
##      629.2092    852.6389  1'357.6492  2'219.2727  3'629.3864  5'653.8541
##                                                                          
##         range          sd       vcoef         mad         IQR        skew
##   42'088.2362  2'675.0062      0.9068  1'516.8702  2'271.7372      3.4488
##                                                                          
##       meanCI
##   2'932.2299
##   2'967.4962
##             
##          .95
##   7'519.3469
##             
##         kurt
##      20.2964
##             
## lowest : 46.4238, 57.9708, 69.3526, 72.3775, 72.4185
## highest: 38'960.4970, 40'190.9550, 40'435.9262, 41'720.8094, 42'134.6600

Desc(soup$Volume.Other, main = "Volume - Other", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Volume - Other
## 
##        length          n        NAs     unique         0s         mean
##        88'409     88'409          0     85'312          0    469.45022
##                   100.0%       0.0%                  0.0%             
##                                                                       
##           .05        .10        .25     median        .75          .90
##      31.88108   52.59388  119.66790  283.02020  607.45380  1'122.78656
##                                                                       
##         range         sd      vcoef        mad        IQR         skew
##   9'627.68030  551.02982    1.17378  293.25754  487.78590      2.91962
##                                                                       
##        meanCI
##     465.81793
##     473.08251
##              
##           .95
##   1'549.16618
##              
##          kurt
##      14.70433
##              
## lowest : 0.075, 0.0938, 0.15 (2), 0.225, 0.4688 (2)
## highest: 7'897.64220, 7'954.53590, 8'379.26860, 9'480.01210, 9'627.75530

Desc(soup$Volume.PL, main = "Volume - PL", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Volume - PL
## 
##        length         n       NAs    unique          0s        mean
##        88'409    88'409         0    87'952           0    846.4176
##                  100.0%      0.0%                  0.0%            
##                                                                    
##           .05       .10       .25    median         .75         .90
##      131.6927  191.8408  332.0672  593.0481  1'054.5680  1'736.3935
##                                                                    
##         range        sd     vcoef       mad         IQR        skew
##   13'694.9552  865.7465    1.0228  464.8446    722.5008      3.4398
##                                                                    
##       meanCI
##     840.7107
##     852.1244
##             
##          .95
##   2'353.7521
##             
##         kurt
##      19.6550
##             
## lowest : 0.6563 (9), 0.6719 (5), 1.175, 1.1875 (4), 1.3126 (6)
## highest: 12'182.3865, 12'901.2449, 13'034.3355, 13'454.1288, 13'695.6115

Desc(soup$Volume.Progresso, main = "Volume - Progresso", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Volume - Progresso
## 
##        length           n       NAs    unique          0s        mean
##        88'409      88'409         0    83'773           0  1'408.2573
##                    100.0%      0.0%                  0.0%            
##                                                                      
##           .05         .10       .25    median         .75         .90
##       65.0219    112.6943  260.7885  658.5154  1'608.2761  3'606.0861
##                                                                      
##         range          sd     vcoef       mad         IQR        skew
##   53'856.1147  2'151.2545    1.5276  723.7958  1'347.4876      4.7911
##                                                                      
##       meanCI
##   1'394.0766
##   1'422.4380
##             
##          .95
##   5'476.1484
##             
##         kurt
##      47.2010
##             
## lowest : 1.1563 (14), 1.1625, 1.1875 (25), 2.25, 2.3125 (2)
## highest: 42'918.3728, 44'710.0529, 46'431.8416, 51'025.4025, 53'857.2710

Desc(soup$Price.Campbell, main = "Price - Campbell", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Price - Campbell
## 
##      length          n        NAs     unique         0s       mean
##      88'409     88'409          0     88'403          0  1.4318183
##                 100.0%       0.0%                  0.0%           
##                                                                   
##         .05        .10        .25     median        .75        .90
##   1.0779546  1.1560454  1.2801623  1.4179616  1.5723059  1.7254870
##                                                                   
##       range         sd      vcoef        mad        IQR       skew
##   2.1014035  0.2292716  0.1601262  0.2157014  0.2921436  0.3210238
##                                                                   
##      meanCI
##   1.4303070
##   1.4333296
##            
##         .95
##   1.8321768
##            
##        kurt
##   0.5083971
##            
## lowest : 0.4176857, 0.4549892, 0.465928, 0.4712422, 0.4761005
## highest: 2.4534204, 2.4651145, 2.4737757, 2.4962544, 2.5190893

Desc(soup$Price.Other, main = "Price - Other", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Price - Other
## 
##       length          n        NAs     unique         0s        mean
##       88'409     88'409          0     87'971          0   1.9669873
##                  100.0%       0.0%                  0.0%            
##                                                                     
##          .05        .10        .25     median        .75         .90
##    1.3821671  1.5079833  1.7194224  1.9681458  2.1970589   2.4150271
##                                                                     
##        range         sd      vcoef        mad        IQR        skew
##   26.0752088  0.4009550  0.2038422  0.3542450  0.4776364  10.0905767
##                                                                     
##        meanCI
##     1.9643442
##     1.9696303
##              
##           .95
##     2.5489843
##              
##          kurt
##   527.2405552
##              
## lowest : 0.3247912, 0.3532738, 0.4575335, 0.5721617, 0.5786118
## highest: 20.0, 20.1492537, 23.8666667, 25.2, 26.4

Desc(soup$Price.PL, main = "Price - PL", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Price - PL
## 
##      length          n        NAs     unique         0s       mean
##      88'409     88'409          0     88'304          0  1.1182068
##                 100.0%       0.0%                  0.0%           
##                                                                   
##         .05        .10        .25     median        .75        .90
##   0.8548219  0.9042625  0.9888903  1.0952711  1.2237778  1.3580829
##                                                                   
##       range         sd      vcoef        mad        IQR       skew
##   2.9208347  0.1995527  0.1784578  0.1718325  0.2348875  1.5766351
##                                                                   
##      meanCI
##   1.1168914
##   1.1195223
##            
##         .95
##   1.4412235
##            
##        kurt
##   9.2990006
##            
## lowest : 0.2594244, 0.2736749, 0.2997819, 0.3012192, 0.3016202
## highest: 3.0584527, 3.0820091, 3.1081014, 3.1763026, 3.180259

Desc(soup$Price.Progresso, main = "Price - Progresso", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Price - Progresso
## 
##      length          n        NAs     unique         0s       mean
##      88'409     88'409          0     88'126          0  1.6484764
##                 100.0%       0.0%                  0.0%           
##                                                                   
##         .05        .10        .25     median        .75        .90
##   1.0765335  1.1818494  1.3812681  1.6417210  1.8923186  2.1248687
##                                                                   
##       range         sd      vcoef        mad        IQR       skew
##   2.2918816  0.3589708  0.2177591  0.3792848  0.5110505  0.2007816
##                                                                   
##       meanCI
##    1.6461101
##    1.6508427
##             
##          .95
##    2.2550999
##             
##         kurt
##   -0.3779045
##             
## lowest : 0.6700189, 0.7009323, 0.7130135, 0.7171439, 0.72131
## highest: 2.910654, 2.914275, 2.9367074, 2.943178, 2.9619004

The volume data is very skewed so we ought to do a log transformation of the data.

Model Building: Preliminaries

Let’s see how volume is impacted by price.

M1 <- lm(Volume.Progresso ~ Price.Progresso, data = soup) # Linear regression
M2 <- lm(log(Volume.Progresso) ~ Price.Progresso, data = soup) # Semi-log Regression
M3 <- lm(log(Volume.Progresso) ~ log(Price.Progresso), data = soup) # Log-log Regression

# See All 3 Models in same Table
stargazer(M1, M2, M3, type = "text", dep.var.labels.include = FALSE, column.labels = c("Linear", 
    "Semi-log", "Log-log"))

## 
## ==========================================================================
##                                             Dependent variable:           
##                                  -----------------------------------------
##                                     Linear       Semi-log       Log-log   
##                                       (1)           (2)           (3)     
## --------------------------------------------------------------------------
## Price.Progresso                  -2,827.165***   -1.921***                
##                                    (17.772)       (0.011)                 
##                                                                           
## log(Price.Progresso)                                           -3.209***  
##                                                                 (0.017)   
##                                                                           
## Constant                         6,068.772***    9.613***      7.972***   
##                                    (29.982)       (0.018)       (0.009)   
##                                                                           
## --------------------------------------------------------------------------
## Observations                        88,409        88,409        88,409    
## R2                                   0.223         0.259         0.282    
## Adjusted R2                          0.223         0.259         0.282    
## Residual Std. Error (df = 88407)   1,896.833       1.165         1.147    
## F Statistic (df = 1; 88407)      25,307.760*** 30,947.870*** 34,708.030***
## ==========================================================================
## Note:                                          *p<0.1; **p<0.05; ***p<0.01

The R2 is quite low.

Let us look at the own and cross-price elasticity with additional controls.

# Create New Column in data for Log Volume of Progresso

soup$log_prog_vol <- log(soup$Volume.Progresso)

# Create Log Price for all brands

soup$log_price_prog <- log(soup$Price.Progresso)
soup$log_price_camp <- log(soup$Price.Campbell)
soup$log_price_PL <- log(soup$Price.PL)
soup$log_price_other <- log(soup$Price.Other)

# Log REGRESSION WITH OWN & Competitor Prices. Lets store the output in
# 'Log_m1'

Log_m1 <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + 
    log_price_other, data = soup)


# Add Winter Dummy
Log_m2 <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + 
    log_price_other + month_dummy, data = soup)


# Add Region
Log_m3 <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + 
    log_price_other + month_dummy + Region, data = soup)


# See All 3 Models in same Table
stargazer(Log_m1, Log_m2, Log_m3, type = "text", dep.var.labels.include = FALSE, 
    column.labels = c("Price Only", "Add Seasonality Control", "Add Region Control"))

## 
## ==============================================================================================================
##                                                         Dependent variable:                                   
##                      -----------------------------------------------------------------------------------------
##                               Price Only              Add Seasonality Control         Add Region Control      
##                                   (1)                           (2)                           (3)             
## --------------------------------------------------------------------------------------------------------------
## log_price_prog                 -3.818***                     -3.098***                     -2.685***          
##                                 (0.018)                       (0.019)                       (0.017)           
##                                                                                                               
## log_price_camp                 1.580***                      1.802***                      1.216***           
##                                 (0.027)                       (0.026)                       (0.023)           
##                                                                                                               
## log_price_PL                   1.299***                      1.299***                      0.607***           
##                                 (0.024)                       (0.023)                       (0.020)           
##                                                                                                               
## log_price_other                -0.401***                     -0.235***                     0.552***           
##                                 (0.021)                       (0.020)                       (0.018)           
##                                                                                                               
## month_dummyFebruary                                          -0.197***                     -0.217***          
##                                                               (0.018)                       (0.015)           
##                                                                                                               
## month_dummyMarch                                             -0.397***                     -0.416***          
##                                                               (0.018)                       (0.015)           
##                                                                                                               
## month_dummyApril                                             -0.677***                     -0.758***          
##                                                               (0.018)                       (0.016)           
##                                                                                                               
## month_dummyMay                                               -0.839***                     -0.939***          
##                                                               (0.018)                       (0.016)           
##                                                                                                               
## month_dummyJune                                              -1.269***                     -1.373***          
##                                                               (0.018)                       (0.015)           
##                                                                                                               
## month_dummyJuly                                              -1.144***                     -1.236***          
##                                                               (0.018)                       (0.015)           
##                                                                                                               
## month_dummyAugust                                            -0.956***                     -1.064***          
##                                                               (0.017)                       (0.015)           
##                                                                                                               
## month_dummySeptember                                         -0.555***                     -0.627***          
##                                                               (0.017)                       (0.015)           
##                                                                                                               
## month_dummyOctober                                           -0.181***                     -0.229***          
##                                                               (0.017)                       (0.015)           
##                                                                                                               
## month_dummyNovember                                          -0.193***                     -0.256***          
##                                                               (0.017)                       (0.015)           
##                                                                                                               
## month_dummyDecember                                          -0.050***                     -0.118***          
##                                                               (0.017)                       (0.015)           
##                                                                                                               
## RegionMidWest                                                                              -0.609***          
##                                                                                             (0.010)           
##                                                                                                               
## RegionSouth                                                                                -1.463***          
##                                                                                             (0.009)           
##                                                                                                               
## RegionWest                                                                                 -0.588***          
##                                                                                             (0.009)           
##                                                                                                               
## Constant                       7.853***                      7.869***                      8.258***           
##                                 (0.015)                       (0.017)                       (0.015)           
##                                                                                                               
## --------------------------------------------------------------------------------------------------------------
## Observations                    88,409                        88,409                        88,409            
## R2                               0.352                         0.430                         0.579            
## Adjusted R2                      0.352                         0.430                         0.579            
## Residual Std. Error       1.090 (df = 88404)            1.022 (df = 88393)            0.879 (df = 88390)      
## F Statistic          12,029.580*** (df = 4; 88404) 4,453.655*** (df = 15; 88393) 6,748.284*** (df = 18; 88390)
## ==============================================================================================================
## Note:                                                                              *p<0.1; **p<0.05; ***p<0.01

library(GGally)
ggcoef(Log_m3, vline = TRUE, conf.int = TRUE, exclude_intercept = TRUE)

Panel Structure

Our data is monthly sales (2001-2006) from approximately 2000 stores. This is a panel data where each store is being repeated several time. A relatively new package “plm” provides a number of easily implemented commands to account for panel structure.

library(plm)
# Elasticity from Store Fixed Effects
FE <- plm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + log_price_other + 
    month_dummy, data = soup, model = "within")
summary(FE)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = log_prog_vol ~ log_price_prog + log_price_camp + 
##     log_price_PL + log_price_other + month_dummy, data = soup, 
##     model = "within")
## 
## Unbalanced Panel: n=2042, T=1-67, N=88409
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -5.170  -0.209   0.016   0.238   3.160 
## 
## Coefficients :
##                        Estimate Std. Error  t-value  Pr(>|t|)    
## log_price_prog       -2.2342485  0.0086725 -257.625 < 2.2e-16 ***
## log_price_camp        0.7096679  0.0139396   50.910 < 2.2e-16 ***
## log_price_PL          0.1532217  0.0131995   11.608 < 2.2e-16 ***
## log_price_other       0.1919614  0.0107465   17.863 < 2.2e-16 ***
## month_dummyFebruary  -0.1833443  0.0073508  -24.942 < 2.2e-16 ***
## month_dummyMarch     -0.4043601  0.0073915  -54.706 < 2.2e-16 ***
## month_dummyApril     -0.7832886  0.0076056 -102.988 < 2.2e-16 ***
## month_dummyMay       -0.9328994  0.0078110 -119.434 < 2.2e-16 ***
## month_dummyJune      -1.3419711  0.0076522 -175.371 < 2.2e-16 ***
## month_dummyJuly      -1.2001555  0.0076320 -157.254 < 2.2e-16 ***
## month_dummyAugust    -1.0335084  0.0074145 -139.391 < 2.2e-16 ***
## month_dummySeptember -0.5803225  0.0071783  -80.844 < 2.2e-16 ***
## month_dummyOctober   -0.2068965  0.0070913  -29.176 < 2.2e-16 ***
## month_dummyNovember  -0.2226919  0.0071144  -31.301 < 2.2e-16 ***
## month_dummyDecember  -0.1020968  0.0070961  -14.388 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    62412
## Residual Sum of Squares: 15400
## R-Squared:      0.75326
## Adj. R-Squared: 0.74738
## F-statistic: 17574.6 on 15 and 86352 DF, p-value: < 2.22e-16

# Elasticity from Store Random effects
RE <- plm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + log_price_other + 
    month_dummy, data = soup, model = "random")
summary(RE)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = log_prog_vol ~ log_price_prog + log_price_camp + 
##     log_price_PL + log_price_other + month_dummy, data = soup, 
##     model = "random")
## 
## Unbalanced Panel: n=2042, T=1-67, N=88409
## 
## Effects:
##                  var std.dev share
## idiosyncratic 0.1783  0.4223 0.198
## individual    0.7231  0.8503 0.802
## theta  : 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5552  0.9277  0.9385  0.9287  0.9394  0.9394 
## 
## Residuals :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -5.2900 -0.2020  0.0301  0.0052  0.2500  2.8900 
## 
## Coefficients :
##                        Estimate Std. Error  t-value  Pr(>|t|)    
## (Intercept)           7.5978641  0.0209263  363.077 < 2.2e-16 ***
## log_price_prog       -2.2414010  0.0087063 -257.446 < 2.2e-16 ***
## log_price_camp        0.7235901  0.0139657   51.812 < 2.2e-16 ***
## log_price_PL          0.1695027  0.0132090   12.832 < 2.2e-16 ***
## log_price_other       0.1888008  0.0107724   17.526 < 2.2e-16 ***
## month_dummyFebruary  -0.1835607  0.0073865  -24.851 < 2.2e-16 ***
## month_dummyMarch     -0.4046573  0.0074269  -54.486 < 2.2e-16 ***
## month_dummyApril     -0.7829887  0.0076410 -102.472 < 2.2e-16 ***
## month_dummyMay       -0.9330506  0.0078456 -118.926 < 2.2e-16 ***
## month_dummyJune      -1.3429752  0.0076848 -174.757 < 2.2e-16 ***
## month_dummyJuly      -1.2011877  0.0076647 -156.718 < 2.2e-16 ***
## month_dummyAugust    -1.0338843  0.0074478 -138.817 < 2.2e-16 ***
## month_dummySeptember -0.5805795  0.0072122  -80.500 < 2.2e-16 ***
## month_dummyOctober   -0.2069477  0.0071253  -29.044 < 2.2e-16 ***
## month_dummyNovember  -0.2227362  0.0071483  -31.159 < 2.2e-16 ***
## month_dummyDecember  -0.1023453  0.0071295  -14.355 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    64450
## Residual Sum of Squares: 15918
## R-Squared:      0.75307
## Adj. R-Squared: 0.75303
## F-statistic: 17966.9 on 15 and 88393 DF, p-value: < 2.22e-16

Exploring Hetrogeniety in Elasticity

There are a number of ways to approach this. First, instead of using Store Fixed effects, we could include observed characteristics of the store (for example, local demographics). We could also consider seperate models for different regions, states, etc. These are simple but can be extremely insighful in understanding the patterns of your data.

Let us add store demographics

# Add Store Demographics: FOCUS ON INCOME ONLY HERE. It is often useful to
# transform a continous variable to discrete.  For example, we can divide
# the stores into quintiles of income (Lowest 20%, ...Highest 20%)

# Assign stores to Income Quintiles based on ZIP code trading area
soup$IncQuint <- with(soup, cut(soup$median_inc, quantile(soup$median_inc, (0:5)/5), 
    include.lowest = TRUE))

# Add Labels to Quintile

soup$IncQuint <- factor(soup$IncQuint, labels = c("Low Income", "Quint2", "Quint3", 
    "Quint4", "High Income"))

# Lets also do Log Income
soup$LogIncome = log(soup$median_inc)

# Regression with logincome
Log_Inc <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + 
    log_price_other + month_dummy + Region + LogIncome, data = soup)

# Regression with Income Quintile
Inc_Q <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + 
    log_price_other + month_dummy + Region + IncQuint, data = soup)

# See models with different ways of entering Income into regression

stargazer(Log_Inc, Inc_Q, type = "text", dep.var.labels.include = FALSE, column.labels = c("With Log Income", 
    "Income Quintile"))

## 
## ================================================================================
##                                          Dependent variable:                    
##                      -----------------------------------------------------------
##                             With Log Income               Income Quintile       
##                                   (1)                           (2)             
## --------------------------------------------------------------------------------
## log_price_prog                 -2.601***                     -2.598***          
##                                 (0.016)                       (0.016)           
##                                                                                 
## log_price_camp                 0.969***                      0.971***           
##                                 (0.022)                       (0.022)           
##                                                                                 
## log_price_PL                   0.349***                      0.365***           
##                                 (0.019)                       (0.019)           
##                                                                                 
## log_price_other                0.453***                      0.470***           
##                                 (0.017)                       (0.017)           
##                                                                                 
## month_dummyFebruary            -0.204***                     -0.206***          
##                                 (0.015)                       (0.015)           
##                                                                                 
## month_dummyMarch               -0.399***                     -0.400***          
##                                 (0.015)                       (0.015)           
##                                                                                 
## month_dummyApril               -0.741***                     -0.744***          
##                                 (0.015)                       (0.015)           
##                                                                                 
## month_dummyMay                 -0.903***                     -0.907***          
##                                 (0.015)                       (0.015)           
##                                                                                 
## month_dummyJune                -1.328***                     -1.333***          
##                                 (0.015)                       (0.015)           
##                                                                                 
## month_dummyJuly                -1.190***                     -1.194***          
##                                 (0.015)                       (0.015)           
##                                                                                 
## month_dummyAugust              -1.029***                     -1.033***          
##                                 (0.014)                       (0.014)           
##                                                                                 
## month_dummySeptember           -0.601***                     -0.604***          
##                                 (0.014)                       (0.014)           
##                                                                                 
## month_dummyOctober             -0.215***                     -0.217***          
##                                 (0.014)                       (0.014)           
##                                                                                 
## month_dummyNovember            -0.238***                     -0.240***          
##                                 (0.014)                       (0.014)           
##                                                                                 
## month_dummyDecember            -0.104***                     -0.106***          
##                                 (0.014)                       (0.014)           
##                                                                                 
## RegionMidWest                  -0.632***                     -0.656***          
##                                 (0.009)                       (0.009)           
##                                                                                 
## RegionSouth                    -1.385***                     -1.396***          
##                                 (0.008)                       (0.008)           
##                                                                                 
## RegionWest                     -0.552***                     -0.569***          
##                                 (0.009)                       (0.009)           
##                                                                                 
## LogIncome                      0.861***                                         
##                                 (0.009)                                         
##                                                                                 
## IncQuintQuint2                                               0.299***           
##                                                               (0.009)           
##                                                                                 
## IncQuintQuint3                                               0.492***           
##                                                               (0.009)           
##                                                                                 
## IncQuintQuint4                                               0.643***           
##                                                               (0.009)           
##                                                                                 
## IncQuintHigh Income                                          0.803***           
##                                                               (0.009)           
##                                                                                 
## Constant                       -0.931***                     7.892***           
##                                 (0.101)                       (0.015)           
##                                                                                 
## --------------------------------------------------------------------------------
## Observations                    88,409                        88,409            
## R2                               0.616                         0.618            
## Adjusted R2                      0.616                         0.617            
## Residual Std. Error       0.839 (df = 88389)            0.838 (df = 88386)      
## F Statistic          7,457.186*** (df = 19; 88389) 6,486.468*** (df = 22; 88386)
## ================================================================================
## Note:                                                *p<0.1; **p<0.05; ***p<0.01

# Even though R-square is not that different, AIC/BIC suggest Income
# Quintile may be the better way to enter Income in regression
AIC(Log_Inc, Inc_Q)

##         df      AIC
## Log_Inc 21 219954.5
## Inc_Q   24 219569.2

BIC(Log_Inc, Inc_Q)

##         df      BIC
## Log_Inc 21 220151.7
## Inc_Q   24 219794.6

Price elasticity BY each State

reg <- soup %>% 
  group_by(State_Name, Region) %>% 
  do(fit = lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + month_dummy, .))

d3 <- reg %>% 
  tidy(fit) %>% 
  filter(term == "log_price_prog") %>% 
  arrange(desc(estimate))

datatable(d3)

What have done here? In those 2-3 lines of code we have run a regression model for each state separately. We then use “broom” package in R to extract all parameters. google “broom dplyr” for more example on this concept.

Let’s look at Progresso price elasticity across all stores.

ggplot(d3, aes(x = reorder(State_Name, -estimate), y = estimate, color = Region)) + 
    geom_point() + theme_minimal() + labs(x = "State") + geom_errorbar(aes(ymin = estimate - 
    std.error, ymax = estimate + std.error, color = Region), width = 0.2) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Price elasticity for Progresso Across US States")

Elasticity Matrix by Census Region

reg1 <- soup %>% 
  group_by(Region) %>% 
  do(fit = lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + month_dummy, .))

d <- reg1 %>% 
  tidy(fit) %>% 
  filter(term != "(Intercept)")

e <- filter(d, term == "log_price_prog" | term == "log_price_camp" | term == 
    "log_price_PL")

ggplot(e, aes(x = Region, y = estimate, fill = term)) + 
  geom_bar(position = position_dodge(), 
    stat = "identity", colour = "black") + 
  geom_errorbar(aes(ymin = estimate - 
    std.error, ymax = estimate + std.error), width = 0.2, position = position_dodge(0.9)) +
      theme_minimal() + labs(fill = "Brand Price") + 
  ggtitle("Own & Cross Price elasticity for Progresso Across Census Regions")

Exercise

Use the Income Quintiles as a “grouping” variable and estimate the price elasticity for each Income Quintile. Make sure to add other control variables (prices, seasonality, census regions)

Let’s take a quick look at the income distribution

Desc(soup$median_inc, main = "Income distribution", plotit = TRUE)

## ------------------------------------------------------------------------- 
## Income distribution
## 
##       length          n        NAs     unique         0s       mean
##       88'409     88'409          0      1'657          0  49'980.88
##                  100.0%       0.0%                  0.0%           
##                                                                    
##          .05        .10        .25     median        .75        .90
##    28'760.00  32'228.00  38'085.00  46'731.00  58'113.00  70'854.00
##                                                                    
##        range         sd      vcoef        mad        IQR       skew
##   173'204.00  17'176.81       0.34  14'274.47  20'028.00       1.57
##                                                                    
##      meanCI
##   49'867.66
##   50'094.11
##            
##         .95
##   81'863.00
##            
##        kurt
##        4.72
##            
## lowest : 12'262.0 (17), 16'257.0 (65), 16'919.0 (21), 17'999.0 (3), 18'158.0 (29)
## highest: 139'997.0 (67), 141'428.0 (57), 146'755.0 (44), 154'817.0 (67), 185'466.0 (8)

There is no missing data. The mean income is $49,981. The distribution is positively skewed (1.57), witha minimum income of $12,262 and a maximum income of $139,997.

Let us use the income quintiles to group the instances.

soup_income <- mutate(soup, income=cut(median_inc, 
                                       quantile(median_inc,c(0,0.2,0.4,0.6,0.8,1)), 
                                       include.lowest=TRUE,
                                       labels=1:5))
table(soup_income$income)

## 
##     1     2     3     4     5 
## 17702 17687 17680 17715 17625

Let us now create 5 new datasets for each of the income quintiles.

Store Demographics

Examine the store demographics in greater detail.