Workshop 5, Business Analytics for Decision Making

Authors

Alberto Dorantes D., Ph.D.

Monterrey Tech, EGADE Business School

Abstract
In this workshop we apply the regression model to a practical example in Marketing: estimating price elasticity of demand.

1 Workshop Directions

You have to work on Google Colab for this workshop. In Google Colab, you MUST LOGIN with your @tec.mx account and then create a google colab document for each workshop.

You must share each Colab document (workshop) with the following accounts:

  • cdorante@tec.mx

You must give Edit privileges to this account.

Rename your Notebook as “W2-YourFirstName YourLastname”.

You must submit your workshop by uploading your Google Colab link in Canvas. What you have to write in your workshop? You have to:

  • You have to REPLICATE and RUN all the Python code, and

  • DO ALL CHALLENGES stated in sections. These challenges can be Python code or just responding QUESTIONS with your own words and in CAPITAL LETTERS. You have to WRITE CLEARLY so that I can see your LINE OF THINKING!

I strongly recommended you to write your OWN NOTES about the topics as if it were your study NOTEBOOK.

2 Application 1: Estimating price elasticity of demand

2.1 Price chage versus sales volume change

A company is analyzing a price promotion for one of its consumer products for next month. The current price for each unit of the product is $100.00, and the total volume sold in this month was 1,000 units. If a company offers a sales promotion of 20% off for next month, what is the minimum % increase in volume (% in units) of the product needed in order to end up with no loss in value sales (0% of value sales change) for the next month?

More generally, for each of the following price discounts, what should be the % increase in sales volume in order to achieve the same value sales than this month?

% Discount depth % Volume growth to level sales
-5% ?
-10% ?
-15% ?
-20% ?
-25% ?
-30% ?
-35% ?
-40% ?
-45% ?
-50% ?

Can you find a way to quickly calculate the missing values of this table in R?

Many people would think that the solution is to apply the same % price discount to the % volume increase, but let see what happens.

For the case of -20% price change, we would have a discount price of $80.00. Then if next month volume sales increase in +20%, then the units sold would be 1,200 units, and the total value sales would be $80.00 * 1,200, which is$96,000.00. Then, value sales would be down $4,000.00 compared to this month value sales.

2.1.1 Data management to create a table of price scenarios

We will define the variables for the problem, and create a data frame with the price scenarios.

We create the variables for the problem:

import pandas as pd
import numpy as np
# We create variables for price and volume of the product in the current month:
Price0 = 100
Vol0 = 1000
# We calculate value sales of this month:
Sales0 = Price0 * Vol0

For price, volume and value sales we will create a table for the different price discount scenarios. We create a data frame for the table:

# We create a vector as a sequence for the discount scenarios:
D = np.arange(start=0, stop=-0.55, step=-0.05)
# We create a DataFrame using the discount vector
scenarios = pd.DataFrame({'D': D})

Now we can create columns for the different prices, volume and % increases for each scenario:

scenarios['Price1'] = Price0 * (1 + scenarios['D'])

We start assigning the intuitive (but wrong) volume increase for each scenario and see what happens to value sales:

# G column will be the volume growth in %, and will be equal to 
#  the % price discount:
scenarios['G'] = -1 * scenarios['D']

# Create the column for volume sales for each scenario (Vol1):
scenarios['Vol1'] = Vol0 * (1 + scenarios['G'])

Now we can create a column for value sales in money:

scenarios['Sales1'] = scenarios['Price1'] * scenarios['Vol1']

We see that volume sales increases, while value sales decreases with each discount depth scenario:

print(scenarios)
Discount Scenarios
       D  Price1     G    Vol1    Sales1
0   0.00   100.0 -0.00  1000.0  100000.0
1  -0.05    95.0  0.05  1050.0   99750.0
2  -0.10    90.0  0.10  1100.0   99000.0
3  -0.15    85.0  0.15  1150.0   97750.0
4  -0.20    80.0  0.20  1200.0   96000.0
5  -0.25    75.0  0.25  1250.0   93750.0
6  -0.30    70.0  0.30  1300.0   91000.0
7  -0.35    65.0  0.35  1350.0   87750.0
8  -0.40    60.0  0.40  1400.0   84000.0
9  -0.45    55.0  0.45  1450.0   79750.0
10 -0.50    50.0  0.50  1500.0   75000.0

Then we need to find higher % volume growth (values of G) for each discount depth to end up with the same value sales for all discount depths. We can do this by trial and error, but we might take some time. Then, the question is: can we find a formula for G that depends of the discount depth? Let’s play with the relationship between price, price discount, volume, volume growth and value sales:

Value sales is equal to unit price times volume sales. I will use Then:

S_{0}=Price_{0}*Vol_{0}

S_{1}=Price_{1}*Vol_{1} Where

S_0 = Value sales this month S_1 = Value sales next month Price_0 = Unit price this month Price_1 = Unit price next month Vol_0 = Volume sales this month Vol_1 = Volume sales next month

We can express unit price and volume of this month with respect to the previous price and volume: Price_{1}=Price_{0}*\left(1+D\right) Vol_{1}=Vol_{0}*\left(1+G\right) Where: D = % unit price change from this month to the next month G = % growth of units sold from this month to the next

If we want to force that the value sales of next month (S_1) be the same as the value sales of this month (S_0), then:

S_{0}=S_{1} Price_{0}*Vol_{0}=Price_{1}*Vol_{1} Expressing Price_1 and Vol_1 as functions of Price_0 and Vol_0:

Price_{0}*Vol_{0}=Price_{0}*\left(1+D\right)*Vol_{0}*\left(1+G\right) Dividing both sides by Price_0 * Vol_0:

1=\left(1+D\right)*\left(1+G\right) Dividing by (1+G) both sides:

\frac{1}{\left(1+D\right)}=\left(1+G\right)

Subtracting 1 in both sides:

G=\frac{1}{\left(1+D\right)}-1

Let’s apply this formula to update G, Vol1 and Sales1 in our table of discount scenarios:

scenarios['G'] = 1 / (1 + scenarios['D']) - 1
scenarios['Vol1'] = Vol0 * ( 1+scenarios['G'])
scenarios['Sales1'] = scenarios['Price1'] * scenarios['Vol1']

We see the content of the scenarios table:

print(scenarios)
       D  Price1         G         Vol1    Sales1
0   0.00   100.0  0.000000  1000.000000  100000.0
1  -0.05    95.0  0.052632  1052.631579  100000.0
2  -0.10    90.0  0.111111  1111.111111  100000.0
3  -0.15    85.0  0.176471  1176.470588  100000.0
4  -0.20    80.0  0.250000  1250.000000  100000.0
5  -0.25    75.0  0.333333  1333.333333  100000.0
6  -0.30    70.0  0.428571  1428.571429  100000.0
7  -0.35    65.0  0.538462  1538.461538  100000.0
8  -0.40    60.0  0.666667  1666.666667  100000.0
9  -0.45    55.0  0.818182  1818.181818  100000.0
10 -0.50    50.0  1.000000  2000.000000  100000.0
Discount Scenarios
       D  Price1         G         Vol1    Sales1
0   0.00   100.0  0.000000  1000.000000  100000.0
1  -0.05    95.0  0.052632  1052.631579  100000.0
2  -0.10    90.0  0.111111  1111.111111  100000.0
3  -0.15    85.0  0.176471  1176.470588  100000.0
4  -0.20    80.0  0.250000  1250.000000  100000.0
5  -0.25    75.0  0.333333  1333.333333  100000.0
6  -0.30    70.0  0.428571  1428.571429  100000.0
7  -0.35    65.0  0.538462  1538.461538  100000.0
8  -0.40    60.0  0.666667  1666.666667  100000.0
9  -0.45    55.0  0.818182  1818.181818  100000.0
10 -0.50    50.0  1.000000  2000.000000  100000.0

We got the expected result. Sales for next month (Sales1) is the same for each discount scenario. Interestingly, we can see that usually we need to have a higher % of volume growth compared to the absolute value of % in price change. For a discount of 50%, we need to sell 100% more (the double) to get the same value sales!

Once we played with the relationship between % of price change and % of volume sales, we will review the concept of price elasticity of demand.

2.2 Price elasticity of demand

Price elasticity measures the sensitivity of changes in volume sales with respect to changes in unit price. We can use % as a measure of change in volume sales and price change. Then, we can state price elasticity of demand of a product as:

E=\frac{\%\Delta Volume}{\%\Delta Price} Where E= Price Elasticity of demand

Using our notation: E=\frac{G}{D} It is easy to understand the concept of price elasticity, but the estimation of elasticity is not quite straight forward.Let’s see how we can calculate E for each price discount of our example. We add a new column for price elasticity in our scenarios table:

scenarios['E'] = scenarios['G'] / scenarios['D']

We see the content of the scenarios table:

#|eval: false
print(scenarios)
       D  Price1         G         Vol1    Sales1         E
0   0.00   100.0  0.000000  1000.000000  100000.0       NaN
1  -0.05    95.0  0.052632  1052.631579  100000.0 -1.052632
2  -0.10    90.0  0.111111  1111.111111  100000.0 -1.111111
3  -0.15    85.0  0.176471  1176.470588  100000.0 -1.176471
4  -0.20    80.0  0.250000  1250.000000  100000.0 -1.250000
5  -0.25    75.0  0.333333  1333.333333  100000.0 -1.333333
6  -0.30    70.0  0.428571  1428.571429  100000.0 -1.428571
7  -0.35    65.0  0.538462  1538.461538  100000.0 -1.538462
8  -0.40    60.0  0.666667  1666.666667  100000.0 -1.666667
9  -0.45    55.0  0.818182  1818.181818  100000.0 -1.818182
10 -0.50    50.0  1.000000  2000.000000  100000.0 -2.000000
#|echo: false
tools.display_dataframe_to_user(name="Discount Scenarios", dataframe=scenarios)
Discount Scenarios
       D  Price1         G         Vol1    Sales1         E
0   0.00   100.0  0.000000  1000.000000  100000.0       NaN
1  -0.05    95.0  0.052632  1052.631579  100000.0 -1.052632
2  -0.10    90.0  0.111111  1111.111111  100000.0 -1.111111
3  -0.15    85.0  0.176471  1176.470588  100000.0 -1.176471
4  -0.20    80.0  0.250000  1250.000000  100000.0 -1.250000
5  -0.25    75.0  0.333333  1333.333333  100000.0 -1.333333
6  -0.30    70.0  0.428571  1428.571429  100000.0 -1.428571
7  -0.35    65.0  0.538462  1538.461538  100000.0 -1.538462
8  -0.40    60.0  0.666667  1666.666667  100000.0 -1.666667
9  -0.45    55.0  0.818182  1818.181818  100000.0 -1.818182
10 -0.50    50.0  1.000000  2000.000000  100000.0 -2.000000

We see that for each discount depth, the estimation for E is different. This sounds weird, isn’t? Is this estimation is correct, then do we need to calculate a elasticity value for different values of discount depth?

Before responding to these questions, we will show a scatter plot to better appreciate the relationship between price % change and volume % change considering all discount scenarios:

import matplotlib.pyplot as plt
plt.scatter(scenarios['D'], scenarios['G'], marker='o',color='skyblue',edgecolor='blue')
plt.xlabel('Discount (D)')
plt.ylabel('Volume Growth (G)')
plt.grid(True)
plt.show()

We see a curvilinear relationship between price % change and volume % change to keep the same level of value sales.

We can also see the relationship between price and volume (not in % changes; in values):

plt.clf()
plt.scatter(scenarios['Price1'], scenarios['Vol1'], marker='o',color='skyblue',edgecolor='blue')
plt.xlabel('Price')
plt.ylabel('Volume')
plt.grid(True)
plt.show()

We also see a curvilinear relationship between price and volume to keep the same level of value sales.

Usually volume % change is a function of price % change; the deeper the price discount the higher the volume % change. What we plot is a mathematical function of the %volume change with respect to %price change:

G=f(D) Then, Elasticity can be defined as the derivative of G with respect to D, since elasticity is how much G changes with changes in D:

E=\frac{\delta G}{\delta D}=G'

Remember that the derivative is the slope of a tangent to the function in a specific value of the X variable. If we see the previous plot, we can see that if we draw an imaginary tangent to the curve for each values of discount %, we will see that the more negative the discount, the more inclined the tangent, so the elasticity changes with different %price discounts.

Then, the question can be: can we have a measure of elasticity with the same value for any value of %price discount?

The response is YES! If we calculate price elasticity using continuously compounded (cc) % changes for price and volume, instead of simple % changes, then we will end up with a measure of elasticity that does not change with changes in %price.

One way to calculate cc % change of a variable is just to take the natural log of 1 + simple % change. To calculate the cc % change of price and volume:

d=log\left(1+D\right) g=log\left(1+G\right) Then we use these cc % changes to estimate elasticity:

e=\frac{g}{d} I used small letters (d, g) for the cc % changes of discount and sales volume growth; and e for the new elasticity measure using the cc % changes.

Let’s do these calculations for our example:

We first calculate the continuously compounded % change of price and volume. We will add these variables as columns of our scenarios table:

scenarios['d'] = np.log(1+scenarios['D'])
scenarios['g'] = np.log(1+scenarios['G'])
scenarios['e'] = scenarios['d'] / scenarios['g']

We see the content of the scenarios table:

print(scenarios)
Discount Scenarios
       D  Price1         G         Vol1  ...         E         d         g    e
0   0.00   100.0  0.000000  1000.000000  ...       NaN  0.000000  0.000000  NaN
1  -0.05    95.0  0.052632  1052.631579  ... -1.052632 -0.051293  0.051293 -1.0
2  -0.10    90.0  0.111111  1111.111111  ... -1.111111 -0.105361  0.105361 -1.0
3  -0.15    85.0  0.176471  1176.470588  ... -1.176471 -0.162519  0.162519 -1.0
4  -0.20    80.0  0.250000  1250.000000  ... -1.250000 -0.223144  0.223144 -1.0
5  -0.25    75.0  0.333333  1333.333333  ... -1.333333 -0.287682  0.287682 -1.0
6  -0.30    70.0  0.428571  1428.571429  ... -1.428571 -0.356675  0.356675 -1.0
7  -0.35    65.0  0.538462  1538.461538  ... -1.538462 -0.430783  0.430783 -1.0
8  -0.40    60.0  0.666667  1666.666667  ... -1.666667 -0.510826  0.510826 -1.0
9  -0.45    55.0  0.818182  1818.181818  ... -1.818182 -0.597837  0.597837 -1.0
10 -0.50    50.0  1.000000  2000.000000  ... -2.000000 -0.693147  0.693147 -1.0

[11 rows x 9 columns]

Interestingly the magnitudes of d and g are exactly the same for each discount scenarios, but with opposite sign. Then the new elasticity measure e is equal to -1 for all discount scenarios! Now we can ONLY use 1 measure for price elasticity of demand no matter the discount % to be applied to unit price. Another advantage of using cc % changes of price and volume is that the relationship between d and g are now linear instead of curvi-linear:

plt.clf()
plt.scatter(scenarios['d'], scenarios['g'], marker='o',color='skyblue',edgecolor='blue')
plt.xlabel('Discount (d) in log %')
plt.ylabel('Volume Growth (g) in log %')
plt.grid(True)
plt.show()

The slope of a tangent of this line will always be equal to -1, that is the elasticity measure e!

Another interesting relationship is between the log of price and the log of volume:

plt.clf()
plt.scatter(np.log(scenarios['Price1']), np.log(scenarios['Vol1']), marker='o',color='skyblue',edgecolor='blue')
plt.xlabel('Log of Price')
plt.ylabel('Log of Volume')
plt.grid(True)
plt.show()

This relationship is also linear, and the slope of the tangent for each level of values for log of price is also -1, which is the new elasticity measure.

2.3 How sales moves with changes in price elasticity

We can simulate how sales volume and sales value changes with different values of price elasticity of a product. We need to express volume growth G in terms of elasticity e, and recalculate most of the columns for our scenario table.

We start with the e formula, and then get the g, and then get the g (cc % volume change) into G (simple % volume change).

Since e=\frac{g}{d}, then g=e*d. Now we can get G from g. Since g=log\left(1+G\right), then we apply the exponential function to both sides and we get:

e^g=\left(1+G\right)

Remember that e (the Euler number 2.71…) is the base for the natural logarithm.

Then we can update the columns for g, G, Vol1, and Sales1 in our scenario table with specific values for elasticity e.

Let’s start moving elasticity from -1 to -2 and see how the sales volume (Vol1) and sales value (Sales1) changes:

e = -1.5
scenarios['e']=e
scenarios['g']=scenarios['e'] * scenarios['d']
scenarios['G']= np.exp(scenarios['g']) - 1
scenarios['Vol1'] = Vol0 * (1+scenarios['G'])
scenarios['Sales1'] =scenarios['Price1'] * scenarios['Vol1']

We see the content of the scenarios table:

print(scenarios)
Discount Scenarios
       D  Price1         G         Vol1  ...         E         d         g    e
0   0.00   100.0  0.000000  1000.000000  ...       NaN  0.000000 -0.000000 -1.5
1  -0.05    95.0  0.079977  1079.977213  ... -1.052632 -0.051293  0.076940 -1.5
2  -0.10    90.0  0.171214  1171.213948  ... -1.111111 -0.105361  0.158041 -1.5
3  -0.15    85.0  0.276062  1276.061517  ... -1.176471 -0.162519  0.243778 -1.5
4  -0.20    80.0  0.397542  1397.542486  ... -1.250000 -0.223144  0.334715 -1.5
5  -0.25    75.0  0.539601  1539.600718  ... -1.333333 -0.287682  0.431523 -1.5
6  -0.30    70.0  0.707469  1707.469442  ... -1.428571 -0.356675  0.535012 -1.5
7  -0.35    65.0  0.908227  1908.226686  ... -1.538462 -0.430783  0.646174 -1.5
8  -0.40    60.0  1.151657  2151.657415  ... -1.666667 -0.510826  0.766238 -1.5
9  -0.45    55.0  1.451636  2451.635864  ... -1.818182 -0.597837  0.896756 -1.5
10 -0.50    50.0  1.828427  2828.427125  ... -2.000000 -0.693147  1.039721 -1.5

[11 rows x 9 columns]

We can plot the discount % vs volume growth:

plt.clf()
plt.scatter(scenarios['D'], scenarios['G'], marker='o',color='skyblue',edgecolor='blue')
plt.xlabel('Price Discount (D)')
plt.ylabel('Volume % growth (G)')
plt.grid(True)
plt.show()

We can see that not only volume sales increases, but also sales value increases with each level of discount depth.

Then, what can you learn/conclude about price elasticity e with respect to changes in sales? In which situations might be a good idea to do aggressive discounts for a product? briefly explain.

2.4 Estimation of price elasticity with real data

In the previous sections we estimated price elasticity of a hypothetical product with specific scenarios of discount depth. Now the question is, how can we estimate price elasticity of a consumer product?

First we need historical sales data of a the consumer product. We need historical sales in value ($) and sales in volume, and we can calculate an average unit price per period. We can get daily, weekly, monthly or quarterly data. Using monthly or quarterly data is a good idea since the volatility of sales at a weekly and daily level is high so that it is harder to get robust estimation of elasticity. In addition, using monthly or quarterly data allows us to easily measure the seasonality of a consumer product.

Imagine we have monthly data for sales volume and sales value for a consumer product for the past 36 months. How can you estimate price elasticity?

It is easily to calculate unit price for each month. We just divide sales value by sales volume for each month. Now we will have a column for historical prices, a column for sales volume and a column for sales value. Then we can estimate price elasticity by using a simple regression model.

For the previous hypothetical exercise, we can also estimate price elasticity using a simple regression model. The way we set up this simple regression is using the natural logarithm of sales volume as the dependent variable, and the natural log of price as the independent variable. We can express this regression model as:

lnvol_{t}=b_{0}+b_{1}*lnprice_{t}+\varepsilon_{t}

Where:

lnvol_{t} = log of sales volume at period t, which is the dependent variable of the model.

lnprice_{t} = log of unit price at period t, which is the independent or explanatory variable of the model.

\varepsilon_{t} is called the error of the model, and it is supposed to behave like a normal distribution with mean=0 and a specific standard deviation: \varepsilon_{t}\sim N(0,\sigma_{\varepsilon})

b_{0}= coefficient for the intercept; it represents the value of the dependent variable, in this case, log of sales volume when the value of the independent variable (log of price) is equal to zero. In other words, it is the point where the regression line crosses the Y axis.

b_{1} = coefficient for the slope of the regression line. This coefficient is a measure of sensitivity of how much the dependent variable moves for each 1-unit move in the independent variable.

The equation for the regression model is actually the expected value of the dependent variable. Since the mean of the error is zero, then:

E\left[lnvol\right]=b_{0}+b_{1}*lnprice_{t}

This regression equation is actually a line that represents all the pairs of values for log of price and log of sales volume.

Let’s run a regression model using the hypothetical example of the discount scenarios:

import statsmodels.formula.api as smf
scenarios['logVol'] = np.log(scenarios['Vol1'])
scenarios['logPrice'] = np.log(scenarios['Price1'])
# I estimate the OLS regression model:
model1 = smf.ols('logVol ~ logPrice',data=scenarios).fit()
# I display the summary of the regression: 
print(model1.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 logVol   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 1.409e+28
Date:              mié., 19 feb. 2025   Prob (F-statistic):          1.09e-123
Time:                        20:48:19   Log-Likelihood:                 341.10
No. Observations:                  11   AIC:                            -678.2
Df Residuals:                       9   BIC:                            -677.4
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     13.8155   5.43e-14   2.54e+14      0.000      13.816      13.816
logPrice      -1.5000   1.26e-14  -1.19e+14      0.000      -1.500      -1.500
==============================================================================
Omnibus:                        0.211   Durbin-Watson:                   0.006
Prob(Omnibus):                  0.900   Jarque-Bera (JB):                0.293
Skew:                          -0.250   Prob(JB):                        0.864
Kurtosis:                       2.375   Cond. No.                         89.5
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The coefficient b_{0} or intercept = 13.81. b_{1}=-1.5, which is the same value of elasticity e we got previously using cc % changes in price and volume. In this hypothetical case, all the points of the data lie on the regression line, so all errors are equal to zero and the standard errors (standard deviation of the regression coefficients) are actually zero.

Now move to a real-world consumer product and estimate price elasticity. We will download a dataset of historical monthly data of 4 consumer products sold in retail stores. The dataset was constructed using a real-world dataset of consumer products of one specific category and sold in big retail stores in Mexico. Do the following to directly download the dataset into your R:

import requests
from io import BytesIO

# URL of the Excel file
url = 'http://www.apradie.com/datos/salesdata1.xlsx'

# Define headers to mimic a web browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Send a GET request to download the file with custom headers
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Use BytesIO to create a file-like object in memory
    excel_file = BytesIO(response.content)
    
    # Read the Excel file into a pandas DataFrame
    sales = pd.read_excel(excel_file)
    sales = sales.set_index('month') 
  
else:
    print(f"Failed to retrieve the file. Status code: {response.status_code}")

We can see see the content of this file.

    print(sales)
Monthly sales of skus
             qsku1    vsku1   qsku2  ...    vsku3    qsku4     vsku4
month                                ...                            
2017-05-01  266745  1733845  213811  ...  3882460  1737617  19353875
2017-06-01  293427  1819250  235841  ...  4048995  1743237  18915640
2017-07-01  231065  1432600  201680  ...  3813775  1690972  18800880
2017-08-01  196930  1201275  190614  ...  3831065  1835160  20683400
2017-09-01  174157  1132020  171971  ...  5493565  2067265  21311065
2017-10-01  168432  1077965  177789  ...  7939530  2512034  24061980
2017-11-01  178442  1070650  158040  ...  7179055  2370224  22126260
2017-12-01  167541  1072265  168935  ...  5035570  1997318  20296275
2018-01-01  163678  1063905  165500  ...  6999505  2084443  20516960
2018-02-01  142067   966055  137946  ...  8786170  2253161  20238705
2018-03-01  197475  1204600  172930  ...  9197805  2507851  23259990
2018-04-01  179316  1165555  181410  ...  7192640  2119443  22118375
2018-05-01  172579  1173535  183918  ...  6668525  1964131  21308690
2018-06-01  169240  1150830  179595  ...  6677075  1953905  20883660
2018-07-01  146817   998355  182320  ...  5813525  1793574  19867255
2018-08-01  138727   984960  171000  ...  7079685  1968629  20860765
2018-09-01  128471   937840  154578  ...  7479160  2138188  21187755
2018-10-01  133185   958930  152312  ...  7889180  2047528  20564745
2018-11-01  112220   796765  136824  ...  5891615  1676519  17866080
2018-12-01  109825   779760  140745  ...  6275985  1710498  18380125
2019-01-01  115084   817095  146799  ...  7612255  1868065  19110580
2019-02-01  112394   798000  135993  ...  7548415  2007173  19015390
2019-03-01  134466   941260  169565  ...  8287705  2219548  21716905
2019-04-01  139431   906300  164997  ...  7375990  2000818  20130690
2019-05-01  149960  1019730  176048  ...  7911790  2151228  21210080
2019-06-01  141456  1004340  200725  ...  7754945  2147632  20699170
2019-07-01  104685   753730  164731  ...  5834520  1694306  17421480
2019-08-01  126706   912285  204915  ...  6291945  2027945  20202605
2019-09-01  132727   968905  190135  ...  5816565  2199420  20475635
2019-10-01  142338  1081765  165350  ...  4309960  1771625  18403970
2019-11-01  118651   913615  154887  ...  3938700  1585935  16310740
2019-12-01   82863   629755  147879  ...  2947280  1624348  16548050
2020-01-01   95525   725990   99323  ...  2777990  1604222  15487280
2020-02-01   82477   635075  130238  ...  2284655  1730077  16541685
2020-03-01  112063   862885  166859  ...  2325600  1913984  19594795
2020-04-01   90136   676020  151963  ...  1364390  1457315  16395955
2020-05-01   91567   686755  159098  ...  1313945  1382876  15917535

[37 rows x 8 columns]

The columns that start with v are value sales, and the columns that start with q are volume sales (in units). Then, vsku1 is the value sales (in $) of product 1, qsku1 is the # of unit sold of product 1, etc.

We will work with product 1. We can calculate an average monthly price for product 1, and also columns for the log of value sales and log of price:

sales['psku1'] = sales['vsku1'] / sales['qsku1']
sales['logpsku1'] = np.log(sales['psku1'])
sales['logqsku1'] = np.log(sales['qsku1'])

We can start visualizing the sales data for this product:

plt.clf()
plt.plot(sales.index, sales['qsku1'], marker='o', linestyle='-')
plt.xlabel('Month')
plt.ylabel('Volume Sales')
plt.title('Volume Sales of qsku1')
plt.grid(True)
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
(array([17287., 17410., 17532., 17652., 17775., 17897., 18017., 18140.,
       18262., 18383.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

We can see that the number of units sold of product 1 is dramatically going down for the last 36 months.

We can calculate the % decline of volume sales from the first month until the most recent month:

# Calculate the percentage change
first_value = sales['qsku1'].iloc[0]
last_value = sales['qsku1'].iloc[-1]
volume_growth = ((last_value - first_value) / first_value) * 100

The % decline in volume sales of product 1 is -6567.2459%. If you were the responsible for this product in a company, you must be very, very concerned.

Now we do a plot for historical price of product 1:

plt.clf()
plt.plot(sales.index, sales['psku1'], marker='o', linestyle='-')
plt.xlabel('Month')
plt.ylabel('Price')
plt.title('Price of qsku1')
plt.grid(True)
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
(array([17287., 17410., 17532., 17652., 17775., 17897., 18017., 18140.,
       18262., 18383.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

We can see that the unit price of product 1 has been going up for the last 36 months. We can calculate the % price increase in the whole period:

# Calculate the percentage change
first_value = sales['psku1'].iloc[0]
last_value = sales['psku1'].iloc[-1]
price_growth = ((last_value - first_value) / first_value) * 100

The price increased 1538.4869% in the last 36 months (from the first month up to the last month of the data).

Now we can visualize how sales volume changes with different price levels. We plot price vs volume in log scale:

import seaborn as sns

plt.clf()
plt.figure(figsize=(10, 6))
sns.regplot(x='logpsku1', y='logqsku1', data=sales, scatter_kws={'alpha':0.5}, line_kws={'color':'red'}) # Use regplot for regression line
plt.xlabel('Log of psku1')
plt.ylabel('Log of qsku1')
plt.title('Scatter Plot of Log(psku1) vs. Log(qsku1) with Regression Line ')
plt.grid(True)
plt.show()

I also plotted the regression line between price and volume (in log scale). We clearly see a negative relationship between price and sales volume.

Now we can run a simple linear regression model using the log of sales volume as dependent variable, and the log of price as independent variable:

# I estimate the OLS regression model:
model2 = smf.ols('logqsku1 ~ logpsku1',data=sales).fit()
# I display the summary of the regression: 
print(model2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:               logqsku1   R-squared:                       0.705
Model:                            OLS   Adj. R-squared:                  0.697
Method:                 Least Squares   F-statistic:                     83.67
Date:              mié., 19 feb. 2025   Prob (F-statistic):           8.29e-11
Time:                        20:48:23   Log-Likelihood:                 15.085
No. Observations:                  37   AIC:                            -26.17
Df Residuals:                      35   BIC:                            -22.95
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     18.4376      0.720     25.596      0.000      16.975      19.900
logpsku1      -3.4044      0.372     -9.147      0.000      -4.160      -2.649
==============================================================================
Omnibus:                        4.792   Durbin-Watson:                   1.003
Prob(Omnibus):                  0.091   Jarque-Bera (JB):                3.829
Skew:                           0.782   Prob(JB):                        0.147
Kurtosis:                       3.191   Cond. No.                         64.9
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Then, the beta 1 coefficient of this regression is -3.4, which is an estimation of price elasticity of demand.

We can interpret price elasticity as follows: for each 1% increase in the price of product, its own demand is expected to decrease in about 3.4%. This result make sense after looking the volume decline vs the price increase.

It is important to note that this measure of price elasticity does not consider important factors that happen with consumer products in real life. Important drivers of demand for consumer products are 1) seasonality, 2) organic growing trend, and 3) cross-elasticities with competing/substitute products and with complementary products, 4) other commercial factors such as distribution, advertising, promotions, and 5) external events such as economic recessions.

2.5 Alternative measures for direct elasticity of demand

In the case of consumer products it is recommended to include some of the factors mentioned above in the econometric model in order to calculate a more robust measure of direct price elasticity, and also measures of cross-elasticities with significant competing or complementary products. We need to use more advanced econometric models. Due to the complexity of these models, we cannot cover them in detail in this workshop, but we will mention some of the most important econometric models used that consider these factors.

The most popular models are called ARIMA/SARIMA and ARIMAX models. ARIMA stands for Autoregressive Integrated Moving Average models. SARIMA stands for Seasonal ARIMA models. ARIMAX stands for ARIMA/SARIMA models with explanatory variables (X’s variables) or factors.

3 References