pacman::p_load(sjPlot, ggeffects, rio, dplyr, ggplot2, DT,
dplyr, corrplot,FactoMineR, highcharter, lubridate, plotly,
broom, stargazer, plm, GGally, explor, ggfortify, DescTools)
A large body of empirical literature in economics has documented that on average sellers set lower prices when the demand is high. Thus economic anomaly, i.e., a fall in the retail prices of products during periods of peak demand, is often referred to as countercyclical pricing.
Packaged soups were a USD 5 billion market in 2015. Campbell Soup Company’s brand Campbell was the category leader with ~42% market share by value, with General Mills Progresso standing a distant second at ~15% of market share. The rest of the soup-scape had a tough competition between a multitude of private label and niche brands such as those from Unilever, Hain Celestial Group, Campbell Soup Company, Walmart, Heinz, ConAgra etc.
Soups are available in various formats such as shelf-stable, chilled, frozen, dehydrated, instant, UHT and are a highly seasonal category with demand peaking in winter months. In 2015, UHT soups posted the strongest growth rates, partly due to the premium positioning and partly due to the convenience factor (no can openers needed and reseal-able). This trend of strong growth rates in the UHT category is likely to continue till 2020.
Campbell Soup Company has three major product divisions - the simple meals division which consists largely of soups, the baked snacks division and the health beverage division. Progresso was started by Italian immigrants and was purchased by Pillsbury in 1988 and became a part of the General Mills family in 2001, when GM bought Pillsbury.
You are hired as the Brand Manager for Progresso which employs a peculiar pricing strategy of significantly reducing prices in periods of high demand (i.e. Winters) and then raising the prices during off-peak months.
Let’s go ahead to import the data.
# Load Transaction
soup <- import("Countercyclical_Pricing.xlsx" , sheet ="soup_new")
soup$Date<-as.Date(soup$Date, "%m/%d/%Y" )
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%m/%d/%Y'
# Month Dummy
soup$month_dummy<-months(soup$Date)
soup$month_dummy = factor(soup$month_dummy, levels = month.name)
# Load Demographic
store_demo <- import("Countercyclical_Pricing.xlsx" , sheet ="store_demo")
# Merge with "soup" file
soup <- merge(soup, store_demo, by="IRI_KEY")
Let’s begin with some summary statistics:
summary(soup)
## IRI_KEY Date Winter
## Min. : 200039 Min. :2001-06-01 Length:88409
## 1st Qu.: 239034 1st Qu.:2002-11-01 Class :character
## Median : 265775 Median :2004-04-01 Mode :character
## Mean : 360391 Mean :2004-03-17
## 3rd Qu.: 291530 3rd Qu.:2005-08-01
## Max. :8032406 Max. :2006-12-01
##
## Region Volume.Campbell Volume.Other
## Length:88409 Min. : 46.42 Min. : 0.075
## Class :character 1st Qu.: 1357.65 1st Qu.: 119.668
## Mode :character Median : 2219.27 Median : 283.020
## Mean : 2949.86 Mean : 469.450
## 3rd Qu.: 3629.39 3rd Qu.: 607.454
## Max. :42134.66 Max. :9627.755
##
## Volume.PL Volume.Progresso Price.Campbell Price.Other
## Min. : 0.656 Min. : 1.16 Min. :0.4177 Min. : 0.3248
## 1st Qu.: 332.067 1st Qu.: 260.79 1st Qu.:1.2802 1st Qu.: 1.7194
## Median : 593.048 Median : 658.52 Median :1.4180 Median : 1.9681
## Mean : 846.418 Mean : 1408.26 Mean :1.4318 Mean : 1.9670
## 3rd Qu.: 1054.568 3rd Qu.: 1608.28 3rd Qu.:1.5723 3rd Qu.: 2.1971
## Max. :13695.612 Max. :53857.27 Max. :2.5191 Max. :26.4000
##
## Price.PL Price.Progresso Total_Volume month_dummy
## Min. :0.2594 Min. :0.670 Min. : 115.2 December: 7984
## 1st Qu.:0.9889 1st Qu.:1.381 1st Qu.: 2483.7 November: 7885
## Median :1.0953 Median :1.642 Median : 4186.4 July : 7848
## Mean :1.1182 Mean :1.648 Mean : 5674.0 August : 7829
## 3rd Qu.:1.2238 3rd Qu.:1.892 3rd Qu.: 7110.5 June : 7825
## Max. :3.1803 Max. :2.962 Max. :81870.1 October : 7822
## (Other) :41216
## FIPS NAME State_Name REGION
## Min. : 1003 Length:88409 Length:88409 Length:88409
## 1st Qu.:17031 Class :character Class :character Class :character
## Median :34005 Mode :character Mode :character Mode :character
## Mean :29861
## 3rd Qu.:45013
## Max. :55139
##
## DIVISION Zip_name Zip_key per_kids18
## Min. :1.000 Length:88409 Min. : 1020 Min. :0.0200
## 1st Qu.:3.000 Class :character 1st Qu.:21244 1st Qu.:0.2200
## Median :5.000 Mode :character Median :46062 Median :0.2500
## Mean :5.084 Mean :49104 Mean :0.2492
## 3rd Qu.:7.000 3rd Qu.:77504 3rd Qu.:0.2800
## Max. :9.000 Max. :99223 Max. :0.4200
##
## per_kids9 per_old per_white per_black
## Min. :0.0100 Min. :0.0200 Min. :0.0100 Min. :0.0000
## 1st Qu.:0.1200 1st Qu.:0.0900 1st Qu.:0.6900 1st Qu.:0.0200
## Median :0.1400 Median :0.1200 Median :0.8300 Median :0.0500
## Mean :0.1379 Mean :0.1225 Mean :0.7779 Mean :0.1051
## 3rd Qu.:0.1500 3rd Qu.:0.1500 3rd Qu.:0.9200 3rd Qu.:0.1200
## Max. :0.2400 Max. :0.4400 Max. :0.9900 Max. :0.9600
##
## hhsize Per_college per_hh_less20K median_inc
## Min. :1.240 Min. :0.0300 Min. :0.0200 Min. : 12262
## 1st Qu.:2.350 1st Qu.:0.1700 1st Qu.:0.1100 1st Qu.: 38085
## Median :2.550 Median :0.2400 Median :0.1700 Median : 46731
## Mean :2.573 Mean :0.2724 Mean :0.1818 Mean : 49981
## 3rd Qu.:2.750 3rd Qu.:0.3500 3rd Qu.:0.2300 3rd Qu.: 58113
## Max. :4.540 Max. :0.7900 Max. :0.6500 Max. :185466
##
## capita_inc Share_Camp Share_Other Share_PL
## Min. : 7411 Min. :0.2048 Min. :0.001769 Min. :0.0001807
## 1st Qu.:18651 1st Qu.:0.4645 1st Qu.:0.042262 1st Qu.:0.0980898
## Median :22764 Median :0.5529 Median :0.065623 Median :0.1459749
## Mean :24536 Mean :0.5406 Mean :0.078985 Mean :0.1555552
## 3rd Qu.:27635 3rd Qu.:0.6316 3rd Qu.:0.104115 3rd Qu.:0.2054529
## Max. :92940 Max. :0.8890 Max. :0.365949 Max. :0.4584570
##
## Share_prog
## Min. :0.0004484
## 1st Qu.:0.1242095
## Median :0.2058253
## Mean :0.2248098
## 3rd Qu.:0.3045534
## Max. :0.6187502
##
A closer look at the key variables of Volume and Price:
Desc(soup$Volume.Campbell, main = "Volume - Campbell", plotit = TRUE)
## -------------------------------------------------------------------------
## Volume - Campbell
##
## length n NAs unique 0s mean
## 88'409 88'409 0 88'339 0 2'949.8631
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 629.2092 852.6389 1'357.6492 2'219.2727 3'629.3864 5'653.8541
##
## range sd vcoef mad IQR skew
## 42'088.2362 2'675.0062 0.9068 1'516.8702 2'271.7372 3.4488
##
## meanCI
## 2'932.2299
## 2'967.4962
##
## .95
## 7'519.3469
##
## kurt
## 20.2964
##
## lowest : 46.4238, 57.9708, 69.3526, 72.3775, 72.4185
## highest: 38'960.4970, 40'190.9550, 40'435.9262, 41'720.8094, 42'134.6600
Desc(soup$Volume.Other, main = "Volume - Other", plotit = TRUE)
## -------------------------------------------------------------------------
## Volume - Other
##
## length n NAs unique 0s mean
## 88'409 88'409 0 85'312 0 469.45022
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 31.88108 52.59388 119.66790 283.02020 607.45380 1'122.78656
##
## range sd vcoef mad IQR skew
## 9'627.68030 551.02982 1.17378 293.25754 487.78590 2.91962
##
## meanCI
## 465.81793
## 473.08251
##
## .95
## 1'549.16618
##
## kurt
## 14.70433
##
## lowest : 0.075, 0.0938, 0.15 (2), 0.225, 0.4688 (2)
## highest: 7'897.64220, 7'954.53590, 8'379.26860, 9'480.01210, 9'627.75530
Desc(soup$Volume.PL, main = "Volume - PL", plotit = TRUE)
## -------------------------------------------------------------------------
## Volume - PL
##
## length n NAs unique 0s mean
## 88'409 88'409 0 87'952 0 846.4176
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 131.6927 191.8408 332.0672 593.0481 1'054.5680 1'736.3935
##
## range sd vcoef mad IQR skew
## 13'694.9552 865.7465 1.0228 464.8446 722.5008 3.4398
##
## meanCI
## 840.7107
## 852.1244
##
## .95
## 2'353.7521
##
## kurt
## 19.6550
##
## lowest : 0.6563 (9), 0.6719 (5), 1.175, 1.1875 (4), 1.3126 (6)
## highest: 12'182.3865, 12'901.2449, 13'034.3355, 13'454.1288, 13'695.6115
Desc(soup$Volume.Progresso, main = "Volume - Progresso", plotit = TRUE)
## -------------------------------------------------------------------------
## Volume - Progresso
##
## length n NAs unique 0s mean
## 88'409 88'409 0 83'773 0 1'408.2573
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 65.0219 112.6943 260.7885 658.5154 1'608.2761 3'606.0861
##
## range sd vcoef mad IQR skew
## 53'856.1147 2'151.2545 1.5276 723.7958 1'347.4876 4.7911
##
## meanCI
## 1'394.0766
## 1'422.4380
##
## .95
## 5'476.1484
##
## kurt
## 47.2010
##
## lowest : 1.1563 (14), 1.1625, 1.1875 (25), 2.25, 2.3125 (2)
## highest: 42'918.3728, 44'710.0529, 46'431.8416, 51'025.4025, 53'857.2710
Desc(soup$Price.Campbell, main = "Price - Campbell", plotit = TRUE)
## -------------------------------------------------------------------------
## Price - Campbell
##
## length n NAs unique 0s mean
## 88'409 88'409 0 88'403 0 1.4318183
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 1.0779546 1.1560454 1.2801623 1.4179616 1.5723059 1.7254870
##
## range sd vcoef mad IQR skew
## 2.1014035 0.2292716 0.1601262 0.2157014 0.2921436 0.3210238
##
## meanCI
## 1.4303070
## 1.4333296
##
## .95
## 1.8321768
##
## kurt
## 0.5083971
##
## lowest : 0.4176857, 0.4549892, 0.465928, 0.4712422, 0.4761005
## highest: 2.4534204, 2.4651145, 2.4737757, 2.4962544, 2.5190893
Desc(soup$Price.Other, main = "Price - Other", plotit = TRUE)
## -------------------------------------------------------------------------
## Price - Other
##
## length n NAs unique 0s mean
## 88'409 88'409 0 87'971 0 1.9669873
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 1.3821671 1.5079833 1.7194224 1.9681458 2.1970589 2.4150271
##
## range sd vcoef mad IQR skew
## 26.0752088 0.4009550 0.2038422 0.3542450 0.4776364 10.0905767
##
## meanCI
## 1.9643442
## 1.9696303
##
## .95
## 2.5489843
##
## kurt
## 527.2405552
##
## lowest : 0.3247912, 0.3532738, 0.4575335, 0.5721617, 0.5786118
## highest: 20.0, 20.1492537, 23.8666667, 25.2, 26.4
Desc(soup$Price.PL, main = "Price - PL", plotit = TRUE)
## -------------------------------------------------------------------------
## Price - PL
##
## length n NAs unique 0s mean
## 88'409 88'409 0 88'304 0 1.1182068
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 0.8548219 0.9042625 0.9888903 1.0952711 1.2237778 1.3580829
##
## range sd vcoef mad IQR skew
## 2.9208347 0.1995527 0.1784578 0.1718325 0.2348875 1.5766351
##
## meanCI
## 1.1168914
## 1.1195223
##
## .95
## 1.4412235
##
## kurt
## 9.2990006
##
## lowest : 0.2594244, 0.2736749, 0.2997819, 0.3012192, 0.3016202
## highest: 3.0584527, 3.0820091, 3.1081014, 3.1763026, 3.180259
Desc(soup$Price.Progresso, main = "Price - Progresso", plotit = TRUE)
## -------------------------------------------------------------------------
## Price - Progresso
##
## length n NAs unique 0s mean
## 88'409 88'409 0 88'126 0 1.6484764
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 1.0765335 1.1818494 1.3812681 1.6417210 1.8923186 2.1248687
##
## range sd vcoef mad IQR skew
## 2.2918816 0.3589708 0.2177591 0.3792848 0.5110505 0.2007816
##
## meanCI
## 1.6461101
## 1.6508427
##
## .95
## 2.2550999
##
## kurt
## -0.3779045
##
## lowest : 0.6700189, 0.7009323, 0.7130135, 0.7171439, 0.72131
## highest: 2.910654, 2.914275, 2.9367074, 2.943178, 2.9619004
The volume data is very skewed so we ought to do a log transformation of the data.
Let’s see how volume is impacted by price.
M1 <- lm(Volume.Progresso ~ Price.Progresso, data = soup) # Linear regression
M2 <- lm(log(Volume.Progresso) ~ Price.Progresso, data = soup) # Semi-log Regression
M3 <- lm(log(Volume.Progresso) ~ log(Price.Progresso), data = soup) # Log-log Regression
# See All 3 Models in same Table
stargazer(M1, M2, M3, type = "text", dep.var.labels.include = FALSE, column.labels = c("Linear",
"Semi-log", "Log-log"))
##
## ==========================================================================
## Dependent variable:
## -----------------------------------------
## Linear Semi-log Log-log
## (1) (2) (3)
## --------------------------------------------------------------------------
## Price.Progresso -2,827.165*** -1.921***
## (17.772) (0.011)
##
## log(Price.Progresso) -3.209***
## (0.017)
##
## Constant 6,068.772*** 9.613*** 7.972***
## (29.982) (0.018) (0.009)
##
## --------------------------------------------------------------------------
## Observations 88,409 88,409 88,409
## R2 0.223 0.259 0.282
## Adjusted R2 0.223 0.259 0.282
## Residual Std. Error (df = 88407) 1,896.833 1.165 1.147
## F Statistic (df = 1; 88407) 25,307.760*** 30,947.870*** 34,708.030***
## ==========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The R2 is quite low.
Let us look at the own and cross-price elasticity with additional controls.
# Create New Column in data for Log Volume of Progresso
soup$log_prog_vol <- log(soup$Volume.Progresso)
# Create Log Price for all brands
soup$log_price_prog <- log(soup$Price.Progresso)
soup$log_price_camp <- log(soup$Price.Campbell)
soup$log_price_PL <- log(soup$Price.PL)
soup$log_price_other <- log(soup$Price.Other)
# Log REGRESSION WITH OWN & Competitor Prices. Lets store the output in
# 'Log_m1'
Log_m1 <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL +
log_price_other, data = soup)
# Add Winter Dummy
Log_m2 <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL +
log_price_other + month_dummy, data = soup)
# Add Region
Log_m3 <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL +
log_price_other + month_dummy + Region, data = soup)
# See All 3 Models in same Table
stargazer(Log_m1, Log_m2, Log_m3, type = "text", dep.var.labels.include = FALSE,
column.labels = c("Price Only", "Add Seasonality Control", "Add Region Control"))
##
## ==============================================================================================================
## Dependent variable:
## -----------------------------------------------------------------------------------------
## Price Only Add Seasonality Control Add Region Control
## (1) (2) (3)
## --------------------------------------------------------------------------------------------------------------
## log_price_prog -3.818*** -3.098*** -2.685***
## (0.018) (0.019) (0.017)
##
## log_price_camp 1.580*** 1.802*** 1.216***
## (0.027) (0.026) (0.023)
##
## log_price_PL 1.299*** 1.299*** 0.607***
## (0.024) (0.023) (0.020)
##
## log_price_other -0.401*** -0.235*** 0.552***
## (0.021) (0.020) (0.018)
##
## month_dummyFebruary -0.197*** -0.217***
## (0.018) (0.015)
##
## month_dummyMarch -0.397*** -0.416***
## (0.018) (0.015)
##
## month_dummyApril -0.677*** -0.758***
## (0.018) (0.016)
##
## month_dummyMay -0.839*** -0.939***
## (0.018) (0.016)
##
## month_dummyJune -1.269*** -1.373***
## (0.018) (0.015)
##
## month_dummyJuly -1.144*** -1.236***
## (0.018) (0.015)
##
## month_dummyAugust -0.956*** -1.064***
## (0.017) (0.015)
##
## month_dummySeptember -0.555*** -0.627***
## (0.017) (0.015)
##
## month_dummyOctober -0.181*** -0.229***
## (0.017) (0.015)
##
## month_dummyNovember -0.193*** -0.256***
## (0.017) (0.015)
##
## month_dummyDecember -0.050*** -0.118***
## (0.017) (0.015)
##
## RegionMidWest -0.609***
## (0.010)
##
## RegionSouth -1.463***
## (0.009)
##
## RegionWest -0.588***
## (0.009)
##
## Constant 7.853*** 7.869*** 8.258***
## (0.015) (0.017) (0.015)
##
## --------------------------------------------------------------------------------------------------------------
## Observations 88,409 88,409 88,409
## R2 0.352 0.430 0.579
## Adjusted R2 0.352 0.430 0.579
## Residual Std. Error 1.090 (df = 88404) 1.022 (df = 88393) 0.879 (df = 88390)
## F Statistic 12,029.580*** (df = 4; 88404) 4,453.655*** (df = 15; 88393) 6,748.284*** (df = 18; 88390)
## ==============================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
library(GGally)
ggcoef(Log_m3, vline = TRUE, conf.int = TRUE, exclude_intercept = TRUE)
Our data is monthly sales (2001-2006) from approximately 2000 stores. This is a panel data where each store is being repeated several time. A relatively new package “plm” provides a number of easily implemented commands to account for panel structure.
library(plm)
# Elasticity from Store Fixed Effects
FE <- plm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + log_price_other +
month_dummy, data = soup, model = "within")
summary(FE)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = log_prog_vol ~ log_price_prog + log_price_camp +
## log_price_PL + log_price_other + month_dummy, data = soup,
## model = "within")
##
## Unbalanced Panel: n=2042, T=1-67, N=88409
##
## Residuals :
## Min. 1st Qu. Median 3rd Qu. Max.
## -5.170 -0.209 0.016 0.238 3.160
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## log_price_prog -2.2342485 0.0086725 -257.625 < 2.2e-16 ***
## log_price_camp 0.7096679 0.0139396 50.910 < 2.2e-16 ***
## log_price_PL 0.1532217 0.0131995 11.608 < 2.2e-16 ***
## log_price_other 0.1919614 0.0107465 17.863 < 2.2e-16 ***
## month_dummyFebruary -0.1833443 0.0073508 -24.942 < 2.2e-16 ***
## month_dummyMarch -0.4043601 0.0073915 -54.706 < 2.2e-16 ***
## month_dummyApril -0.7832886 0.0076056 -102.988 < 2.2e-16 ***
## month_dummyMay -0.9328994 0.0078110 -119.434 < 2.2e-16 ***
## month_dummyJune -1.3419711 0.0076522 -175.371 < 2.2e-16 ***
## month_dummyJuly -1.2001555 0.0076320 -157.254 < 2.2e-16 ***
## month_dummyAugust -1.0335084 0.0074145 -139.391 < 2.2e-16 ***
## month_dummySeptember -0.5803225 0.0071783 -80.844 < 2.2e-16 ***
## month_dummyOctober -0.2068965 0.0070913 -29.176 < 2.2e-16 ***
## month_dummyNovember -0.2226919 0.0071144 -31.301 < 2.2e-16 ***
## month_dummyDecember -0.1020968 0.0070961 -14.388 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 62412
## Residual Sum of Squares: 15400
## R-Squared: 0.75326
## Adj. R-Squared: 0.74738
## F-statistic: 17574.6 on 15 and 86352 DF, p-value: < 2.22e-16
# Elasticity from Store Random effects
RE <- plm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + log_price_other +
month_dummy, data = soup, model = "random")
summary(RE)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = log_prog_vol ~ log_price_prog + log_price_camp +
## log_price_PL + log_price_other + month_dummy, data = soup,
## model = "random")
##
## Unbalanced Panel: n=2042, T=1-67, N=88409
##
## Effects:
## var std.dev share
## idiosyncratic 0.1783 0.4223 0.198
## individual 0.7231 0.8503 0.802
## theta :
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5552 0.9277 0.9385 0.9287 0.9394 0.9394
##
## Residuals :
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.2900 -0.2020 0.0301 0.0052 0.2500 2.8900
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 7.5978641 0.0209263 363.077 < 2.2e-16 ***
## log_price_prog -2.2414010 0.0087063 -257.446 < 2.2e-16 ***
## log_price_camp 0.7235901 0.0139657 51.812 < 2.2e-16 ***
## log_price_PL 0.1695027 0.0132090 12.832 < 2.2e-16 ***
## log_price_other 0.1888008 0.0107724 17.526 < 2.2e-16 ***
## month_dummyFebruary -0.1835607 0.0073865 -24.851 < 2.2e-16 ***
## month_dummyMarch -0.4046573 0.0074269 -54.486 < 2.2e-16 ***
## month_dummyApril -0.7829887 0.0076410 -102.472 < 2.2e-16 ***
## month_dummyMay -0.9330506 0.0078456 -118.926 < 2.2e-16 ***
## month_dummyJune -1.3429752 0.0076848 -174.757 < 2.2e-16 ***
## month_dummyJuly -1.2011877 0.0076647 -156.718 < 2.2e-16 ***
## month_dummyAugust -1.0338843 0.0074478 -138.817 < 2.2e-16 ***
## month_dummySeptember -0.5805795 0.0072122 -80.500 < 2.2e-16 ***
## month_dummyOctober -0.2069477 0.0071253 -29.044 < 2.2e-16 ***
## month_dummyNovember -0.2227362 0.0071483 -31.159 < 2.2e-16 ***
## month_dummyDecember -0.1023453 0.0071295 -14.355 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 64450
## Residual Sum of Squares: 15918
## R-Squared: 0.75307
## Adj. R-Squared: 0.75303
## F-statistic: 17966.9 on 15 and 88393 DF, p-value: < 2.22e-16
There are a number of ways to approach this. First, instead of using Store Fixed effects, we could include observed characteristics of the store (for example, local demographics). We could also consider seperate models for different regions, states, etc. These are simple but can be extremely insighful in understanding the patterns of your data.
Let us add store demographics
# Add Store Demographics: FOCUS ON INCOME ONLY HERE. It is often useful to
# transform a continous variable to discrete. For example, we can divide
# the stores into quintiles of income (Lowest 20%, ...Highest 20%)
# Assign stores to Income Quintiles based on ZIP code trading area
soup$IncQuint <- with(soup, cut(soup$median_inc, quantile(soup$median_inc, (0:5)/5),
include.lowest = TRUE))
# Add Labels to Quintile
soup$IncQuint <- factor(soup$IncQuint, labels = c("Low Income", "Quint2", "Quint3",
"Quint4", "High Income"))
# Lets also do Log Income
soup$LogIncome = log(soup$median_inc)
# Regression with logincome
Log_Inc <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL +
log_price_other + month_dummy + Region + LogIncome, data = soup)
# Regression with Income Quintile
Inc_Q <- lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL +
log_price_other + month_dummy + Region + IncQuint, data = soup)
# See models with different ways of entering Income into regression
stargazer(Log_Inc, Inc_Q, type = "text", dep.var.labels.include = FALSE, column.labels = c("With Log Income",
"Income Quintile"))
##
## ================================================================================
## Dependent variable:
## -----------------------------------------------------------
## With Log Income Income Quintile
## (1) (2)
## --------------------------------------------------------------------------------
## log_price_prog -2.601*** -2.598***
## (0.016) (0.016)
##
## log_price_camp 0.969*** 0.971***
## (0.022) (0.022)
##
## log_price_PL 0.349*** 0.365***
## (0.019) (0.019)
##
## log_price_other 0.453*** 0.470***
## (0.017) (0.017)
##
## month_dummyFebruary -0.204*** -0.206***
## (0.015) (0.015)
##
## month_dummyMarch -0.399*** -0.400***
## (0.015) (0.015)
##
## month_dummyApril -0.741*** -0.744***
## (0.015) (0.015)
##
## month_dummyMay -0.903*** -0.907***
## (0.015) (0.015)
##
## month_dummyJune -1.328*** -1.333***
## (0.015) (0.015)
##
## month_dummyJuly -1.190*** -1.194***
## (0.015) (0.015)
##
## month_dummyAugust -1.029*** -1.033***
## (0.014) (0.014)
##
## month_dummySeptember -0.601*** -0.604***
## (0.014) (0.014)
##
## month_dummyOctober -0.215*** -0.217***
## (0.014) (0.014)
##
## month_dummyNovember -0.238*** -0.240***
## (0.014) (0.014)
##
## month_dummyDecember -0.104*** -0.106***
## (0.014) (0.014)
##
## RegionMidWest -0.632*** -0.656***
## (0.009) (0.009)
##
## RegionSouth -1.385*** -1.396***
## (0.008) (0.008)
##
## RegionWest -0.552*** -0.569***
## (0.009) (0.009)
##
## LogIncome 0.861***
## (0.009)
##
## IncQuintQuint2 0.299***
## (0.009)
##
## IncQuintQuint3 0.492***
## (0.009)
##
## IncQuintQuint4 0.643***
## (0.009)
##
## IncQuintHigh Income 0.803***
## (0.009)
##
## Constant -0.931*** 7.892***
## (0.101) (0.015)
##
## --------------------------------------------------------------------------------
## Observations 88,409 88,409
## R2 0.616 0.618
## Adjusted R2 0.616 0.617
## Residual Std. Error 0.839 (df = 88389) 0.838 (df = 88386)
## F Statistic 7,457.186*** (df = 19; 88389) 6,486.468*** (df = 22; 88386)
## ================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# Even though R-square is not that different, AIC/BIC suggest Income
# Quintile may be the better way to enter Income in regression
AIC(Log_Inc, Inc_Q)
## df AIC
## Log_Inc 21 219954.5
## Inc_Q 24 219569.2
BIC(Log_Inc, Inc_Q)
## df BIC
## Log_Inc 21 220151.7
## Inc_Q 24 219794.6
reg <- soup %>%
group_by(State_Name, Region) %>%
do(fit = lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + month_dummy, .))
d3 <- reg %>%
tidy(fit) %>%
filter(term == "log_price_prog") %>%
arrange(desc(estimate))
datatable(d3)
What have done here? In those 2-3 lines of code we have run a regression model for each state separately. We then use “broom” package in R to extract all parameters. google “broom dplyr” for more example on this concept.
Let’s look at Progresso price elasticity across all stores.
ggplot(d3, aes(x = reorder(State_Name, -estimate), y = estimate, color = Region)) +
geom_point() + theme_minimal() + labs(x = "State") + geom_errorbar(aes(ymin = estimate -
std.error, ymax = estimate + std.error, color = Region), width = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Price elasticity for Progresso Across US States")
reg1 <- soup %>%
group_by(Region) %>%
do(fit = lm(log_prog_vol ~ log_price_prog + log_price_camp + log_price_PL + month_dummy, .))
d <- reg1 %>%
tidy(fit) %>%
filter(term != "(Intercept)")
e <- filter(d, term == "log_price_prog" | term == "log_price_camp" | term ==
"log_price_PL")
ggplot(e, aes(x = Region, y = estimate, fill = term)) +
geom_bar(position = position_dodge(),
stat = "identity", colour = "black") +
geom_errorbar(aes(ymin = estimate -
std.error, ymax = estimate + std.error), width = 0.2, position = position_dodge(0.9)) +
theme_minimal() + labs(fill = "Brand Price") +
ggtitle("Own & Cross Price elasticity for Progresso Across Census Regions")
Use the Income Quintiles as a “grouping” variable and estimate the price elasticity for each Income Quintile. Make sure to add other control variables (prices, seasonality, census regions)
Let’s take a quick look at the income distribution
Desc(soup$median_inc, main = "Income distribution", plotit = TRUE)
## -------------------------------------------------------------------------
## Income distribution
##
## length n NAs unique 0s mean
## 88'409 88'409 0 1'657 0 49'980.88
## 100.0% 0.0% 0.0%
##
## .05 .10 .25 median .75 .90
## 28'760.00 32'228.00 38'085.00 46'731.00 58'113.00 70'854.00
##
## range sd vcoef mad IQR skew
## 173'204.00 17'176.81 0.34 14'274.47 20'028.00 1.57
##
## meanCI
## 49'867.66
## 50'094.11
##
## .95
## 81'863.00
##
## kurt
## 4.72
##
## lowest : 12'262.0 (17), 16'257.0 (65), 16'919.0 (21), 17'999.0 (3), 18'158.0 (29)
## highest: 139'997.0 (67), 141'428.0 (57), 146'755.0 (44), 154'817.0 (67), 185'466.0 (8)
There is no missing data. The mean income is $49,981. The distribution is positively skewed (1.57), witha minimum income of $12,262 and a maximum income of $139,997.
Let us use the income quintiles to group the instances.
soup_income <- mutate(soup, income=cut(median_inc,
quantile(median_inc,c(0,0.2,0.4,0.6,0.8,1)),
include.lowest=TRUE,
labels=1:5))
table(soup_income$income)
##
## 1 2 3 4 5
## 17702 17687 17680 17715 17625
Let us now create 5 new datasets for each of the income quintiles.
Examine the store demographics in greater detail.