import pandas as pd
import numpy as np
# We create variables for price and volume of the product in the current month:
= 100
Price0 = 1000
Vol0 # We calculate value sales of this month:
= Price0 * Vol0 Sales0
Workshop 5, Business Analytics for Decision Making
1 Workshop Directions
You have to work on Google Colab for this workshop. In Google Colab, you MUST LOGIN with your @tec.mx account and then create a google colab document for each workshop.
You must share each Colab document (workshop) with the following accounts:
- cdorante@tec.mx
You must give Edit privileges to this account.
Rename your Notebook as “W2-YourFirstName YourLastname”.
You must submit your workshop by uploading your Google Colab link in Canvas. What you have to write in your workshop? You have to:
You have to REPLICATE and RUN all the Python code, and
DO ALL CHALLENGES stated in sections. These challenges can be Python code or just responding QUESTIONS with your own words and in CAPITAL LETTERS. You have to WRITE CLEARLY so that I can see your LINE OF THINKING!
I strongly recommended you to write your OWN NOTES about the topics as if it were your study NOTEBOOK.
2 Application 1: Estimating price elasticity of demand
2.1 Price chage versus sales volume change
A company is analyzing a price promotion for one of its consumer products for next month. The current price for each unit of the product is $100.00, and the total volume sold in this month was 1,000 units. If a company offers a sales promotion of 20% off for next month, what is the minimum % increase in volume (% in units) of the product needed in order to end up with no loss in value sales (0% of value sales change) for the next month?
More generally, for each of the following price discounts, what should be the % increase in sales volume in order to achieve the same value sales than this month?
% Discount depth | % Volume growth to level sales |
---|---|
-5% | ? |
-10% | ? |
-15% | ? |
-20% | ? |
-25% | ? |
-30% | ? |
-35% | ? |
-40% | ? |
-45% | ? |
-50% | ? |
Can you find a way to quickly calculate the missing values of this table in R?
Many people would think that the solution is to apply the same % price discount to the % volume increase, but let see what happens.
For the case of -20% price change, we would have a discount price of $80.00. Then if next month volume sales increase in +20%, then the units sold would be 1,200 units, and the total value sales would be $80.00 * 1,200, which is$96,000.00. Then, value sales would be down $4,000.00 compared to this month value sales.
2.1.1 Data management to create a table of price scenarios
We will define the variables for the problem, and create a data frame with the price scenarios.
We create the variables for the problem:
For price, volume and value sales we will create a table for the different price discount scenarios. We create a data frame for the table:
# We create a vector as a sequence for the discount scenarios:
= np.arange(start=0, stop=-0.55, step=-0.05)
D # We create a DataFrame using the discount vector
= pd.DataFrame({'D': D}) scenarios
Now we can create columns for the different prices, volume and % increases for each scenario:
'Price1'] = Price0 * (1 + scenarios['D']) scenarios[
We start assigning the intuitive (but wrong) volume increase for each scenario and see what happens to value sales:
# G column will be the volume growth in %, and will be equal to
# the % price discount:
'G'] = -1 * scenarios['D']
scenarios[
# Create the column for volume sales for each scenario (Vol1):
'Vol1'] = Vol0 * (1 + scenarios['G']) scenarios[
Now we can create a column for value sales in money:
'Sales1'] = scenarios['Price1'] * scenarios['Vol1'] scenarios[
We see that volume sales increases, while value sales decreases with each discount depth scenario:
print(scenarios)
Discount Scenarios
D Price1 G Vol1 Sales1
0 0.00 100.0 -0.00 1000.0 100000.0
1 -0.05 95.0 0.05 1050.0 99750.0
2 -0.10 90.0 0.10 1100.0 99000.0
3 -0.15 85.0 0.15 1150.0 97750.0
4 -0.20 80.0 0.20 1200.0 96000.0
5 -0.25 75.0 0.25 1250.0 93750.0
6 -0.30 70.0 0.30 1300.0 91000.0
7 -0.35 65.0 0.35 1350.0 87750.0
8 -0.40 60.0 0.40 1400.0 84000.0
9 -0.45 55.0 0.45 1450.0 79750.0
10 -0.50 50.0 0.50 1500.0 75000.0
Then we need to find higher % volume growth (values of G) for each discount depth to end up with the same value sales for all discount depths. We can do this by trial and error, but we might take some time. Then, the question is: can we find a formula for G that depends of the discount depth? Let’s play with the relationship between price, price discount, volume, volume growth and value sales:
Value sales is equal to unit price times volume sales. I will use Then:
S_{0}=Price_{0}*Vol_{0}
S_{1}=Price_{1}*Vol_{1} Where
S_0 = Value sales this month S_1 = Value sales next month Price_0 = Unit price this month Price_1 = Unit price next month Vol_0 = Volume sales this month Vol_1 = Volume sales next month
We can express unit price and volume of this month with respect to the previous price and volume: Price_{1}=Price_{0}*\left(1+D\right) Vol_{1}=Vol_{0}*\left(1+G\right) Where: D = % unit price change from this month to the next month G = % growth of units sold from this month to the next
If we want to force that the value sales of next month (S_1) be the same as the value sales of this month (S_0), then:
S_{0}=S_{1} Price_{0}*Vol_{0}=Price_{1}*Vol_{1} Expressing Price_1 and Vol_1 as functions of Price_0 and Vol_0:
Price_{0}*Vol_{0}=Price_{0}*\left(1+D\right)*Vol_{0}*\left(1+G\right) Dividing both sides by Price_0 * Vol_0:
1=\left(1+D\right)*\left(1+G\right) Dividing by (1+G) both sides:
\frac{1}{\left(1+D\right)}=\left(1+G\right)
Subtracting 1 in both sides:
G=\frac{1}{\left(1+D\right)}-1
Let’s apply this formula to update G, Vol1 and Sales1 in our table of discount scenarios:
'G'] = 1 / (1 + scenarios['D']) - 1
scenarios['Vol1'] = Vol0 * ( 1+scenarios['G'])
scenarios['Sales1'] = scenarios['Price1'] * scenarios['Vol1'] scenarios[
We see the content of the scenarios table:
print(scenarios)
D Price1 G Vol1 Sales1
0 0.00 100.0 0.000000 1000.000000 100000.0
1 -0.05 95.0 0.052632 1052.631579 100000.0
2 -0.10 90.0 0.111111 1111.111111 100000.0
3 -0.15 85.0 0.176471 1176.470588 100000.0
4 -0.20 80.0 0.250000 1250.000000 100000.0
5 -0.25 75.0 0.333333 1333.333333 100000.0
6 -0.30 70.0 0.428571 1428.571429 100000.0
7 -0.35 65.0 0.538462 1538.461538 100000.0
8 -0.40 60.0 0.666667 1666.666667 100000.0
9 -0.45 55.0 0.818182 1818.181818 100000.0
10 -0.50 50.0 1.000000 2000.000000 100000.0
Discount Scenarios
D Price1 G Vol1 Sales1
0 0.00 100.0 0.000000 1000.000000 100000.0
1 -0.05 95.0 0.052632 1052.631579 100000.0
2 -0.10 90.0 0.111111 1111.111111 100000.0
3 -0.15 85.0 0.176471 1176.470588 100000.0
4 -0.20 80.0 0.250000 1250.000000 100000.0
5 -0.25 75.0 0.333333 1333.333333 100000.0
6 -0.30 70.0 0.428571 1428.571429 100000.0
7 -0.35 65.0 0.538462 1538.461538 100000.0
8 -0.40 60.0 0.666667 1666.666667 100000.0
9 -0.45 55.0 0.818182 1818.181818 100000.0
10 -0.50 50.0 1.000000 2000.000000 100000.0
We got the expected result. Sales for next month (Sales1) is the same for each discount scenario. Interestingly, we can see that usually we need to have a higher % of volume growth compared to the absolute value of % in price change. For a discount of 50%, we need to sell 100% more (the double) to get the same value sales!
Once we played with the relationship between % of price change and % of volume sales, we will review the concept of price elasticity of demand.
2.2 Price elasticity of demand
Price elasticity measures the sensitivity of changes in volume sales with respect to changes in unit price. We can use % as a measure of change in volume sales and price change. Then, we can state price elasticity of demand of a product as:
E=\frac{\%\Delta Volume}{\%\Delta Price} Where E= Price Elasticity of demand
Using our notation: E=\frac{G}{D} It is easy to understand the concept of price elasticity, but the estimation of elasticity is not quite straight forward.Let’s see how we can calculate E for each price discount of our example. We add a new column for price elasticity in our scenarios table:
'E'] = scenarios['G'] / scenarios['D'] scenarios[
We see the content of the scenarios table:
#|eval: false
print(scenarios)
D Price1 G Vol1 Sales1 E
0 0.00 100.0 0.000000 1000.000000 100000.0 NaN
1 -0.05 95.0 0.052632 1052.631579 100000.0 -1.052632
2 -0.10 90.0 0.111111 1111.111111 100000.0 -1.111111
3 -0.15 85.0 0.176471 1176.470588 100000.0 -1.176471
4 -0.20 80.0 0.250000 1250.000000 100000.0 -1.250000
5 -0.25 75.0 0.333333 1333.333333 100000.0 -1.333333
6 -0.30 70.0 0.428571 1428.571429 100000.0 -1.428571
7 -0.35 65.0 0.538462 1538.461538 100000.0 -1.538462
8 -0.40 60.0 0.666667 1666.666667 100000.0 -1.666667
9 -0.45 55.0 0.818182 1818.181818 100000.0 -1.818182
10 -0.50 50.0 1.000000 2000.000000 100000.0 -2.000000
#|echo: false
="Discount Scenarios", dataframe=scenarios) tools.display_dataframe_to_user(name
Discount Scenarios
D Price1 G Vol1 Sales1 E
0 0.00 100.0 0.000000 1000.000000 100000.0 NaN
1 -0.05 95.0 0.052632 1052.631579 100000.0 -1.052632
2 -0.10 90.0 0.111111 1111.111111 100000.0 -1.111111
3 -0.15 85.0 0.176471 1176.470588 100000.0 -1.176471
4 -0.20 80.0 0.250000 1250.000000 100000.0 -1.250000
5 -0.25 75.0 0.333333 1333.333333 100000.0 -1.333333
6 -0.30 70.0 0.428571 1428.571429 100000.0 -1.428571
7 -0.35 65.0 0.538462 1538.461538 100000.0 -1.538462
8 -0.40 60.0 0.666667 1666.666667 100000.0 -1.666667
9 -0.45 55.0 0.818182 1818.181818 100000.0 -1.818182
10 -0.50 50.0 1.000000 2000.000000 100000.0 -2.000000
We see that for each discount depth, the estimation for E is different. This sounds weird, isn’t? Is this estimation is correct, then do we need to calculate a elasticity value for different values of discount depth?
Before responding to these questions, we will show a scatter plot to better appreciate the relationship between price % change and volume % change considering all discount scenarios:
import matplotlib.pyplot as plt
'D'], scenarios['G'], marker='o',color='skyblue',edgecolor='blue')
plt.scatter(scenarios['Discount (D)')
plt.xlabel('Volume Growth (G)')
plt.ylabel(True)
plt.grid( plt.show()
We see a curvilinear relationship between price % change and volume % change to keep the same level of value sales.
We can also see the relationship between price and volume (not in % changes; in values):
plt.clf()'Price1'], scenarios['Vol1'], marker='o',color='skyblue',edgecolor='blue')
plt.scatter(scenarios['Price')
plt.xlabel('Volume')
plt.ylabel(True)
plt.grid( plt.show()
We also see a curvilinear relationship between price and volume to keep the same level of value sales.
Usually volume % change is a function of price % change; the deeper the price discount the higher the volume % change. What we plot is a mathematical function of the %volume change with respect to %price change:
G=f(D) Then, Elasticity can be defined as the derivative of G with respect to D, since elasticity is how much G changes with changes in D:
E=\frac{\delta G}{\delta D}=G'
Remember that the derivative is the slope of a tangent to the function in a specific value of the X variable. If we see the previous plot, we can see that if we draw an imaginary tangent to the curve for each values of discount %, we will see that the more negative the discount, the more inclined the tangent, so the elasticity changes with different %price discounts.
Then, the question can be: can we have a measure of elasticity with the same value for any value of %price discount?
The response is YES! If we calculate price elasticity using continuously compounded (cc) % changes for price and volume, instead of simple % changes, then we will end up with a measure of elasticity that does not change with changes in %price.
One way to calculate cc % change of a variable is just to take the natural log of 1 + simple % change. To calculate the cc % change of price and volume:
d=log\left(1+D\right) g=log\left(1+G\right) Then we use these cc % changes to estimate elasticity:
e=\frac{g}{d} I used small letters (d, g) for the cc % changes of discount and sales volume growth; and e for the new elasticity measure using the cc % changes.
Let’s do these calculations for our example:
We first calculate the continuously compounded % change of price and volume. We will add these variables as columns of our scenarios table:
'd'] = np.log(1+scenarios['D'])
scenarios['g'] = np.log(1+scenarios['G'])
scenarios['e'] = scenarios['d'] / scenarios['g'] scenarios[
We see the content of the scenarios table:
print(scenarios)
Discount Scenarios
D Price1 G Vol1 ... E d g e
0 0.00 100.0 0.000000 1000.000000 ... NaN 0.000000 0.000000 NaN
1 -0.05 95.0 0.052632 1052.631579 ... -1.052632 -0.051293 0.051293 -1.0
2 -0.10 90.0 0.111111 1111.111111 ... -1.111111 -0.105361 0.105361 -1.0
3 -0.15 85.0 0.176471 1176.470588 ... -1.176471 -0.162519 0.162519 -1.0
4 -0.20 80.0 0.250000 1250.000000 ... -1.250000 -0.223144 0.223144 -1.0
5 -0.25 75.0 0.333333 1333.333333 ... -1.333333 -0.287682 0.287682 -1.0
6 -0.30 70.0 0.428571 1428.571429 ... -1.428571 -0.356675 0.356675 -1.0
7 -0.35 65.0 0.538462 1538.461538 ... -1.538462 -0.430783 0.430783 -1.0
8 -0.40 60.0 0.666667 1666.666667 ... -1.666667 -0.510826 0.510826 -1.0
9 -0.45 55.0 0.818182 1818.181818 ... -1.818182 -0.597837 0.597837 -1.0
10 -0.50 50.0 1.000000 2000.000000 ... -2.000000 -0.693147 0.693147 -1.0
[11 rows x 9 columns]
Interestingly the magnitudes of d and g are exactly the same for each discount scenarios, but with opposite sign. Then the new elasticity measure e is equal to -1 for all discount scenarios! Now we can ONLY use 1 measure for price elasticity of demand no matter the discount % to be applied to unit price. Another advantage of using cc % changes of price and volume is that the relationship between d and g are now linear instead of curvi-linear:
plt.clf()'d'], scenarios['g'], marker='o',color='skyblue',edgecolor='blue')
plt.scatter(scenarios['Discount (d) in log %')
plt.xlabel('Volume Growth (g) in log %')
plt.ylabel(True)
plt.grid( plt.show()
The slope of a tangent of this line will always be equal to -1, that is the elasticity measure e!
Another interesting relationship is between the log of price and the log of volume:
plt.clf()'Price1']), np.log(scenarios['Vol1']), marker='o',color='skyblue',edgecolor='blue')
plt.scatter(np.log(scenarios['Log of Price')
plt.xlabel('Log of Volume')
plt.ylabel(True)
plt.grid( plt.show()
This relationship is also linear, and the slope of the tangent for each level of values for log of price is also -1, which is the new elasticity measure.
2.3 How sales moves with changes in price elasticity
We can simulate how sales volume and sales value changes with different values of price elasticity of a product. We need to express volume growth G in terms of elasticity e, and recalculate most of the columns for our scenario table.
We start with the e formula, and then get the g, and then get the g (cc % volume change) into G (simple % volume change).
Since e=\frac{g}{d}, then g=e*d. Now we can get G from g. Since g=log\left(1+G\right), then we apply the exponential function to both sides and we get:
e^g=\left(1+G\right)
Remember that e (the Euler number 2.71…) is the base for the natural logarithm.
Then we can update the columns for g, G, Vol1, and Sales1 in our scenario table with specific values for elasticity e.
Let’s start moving elasticity from -1 to -2 and see how the sales volume (Vol1) and sales value (Sales1) changes:
= -1.5
e 'e']=e
scenarios['g']=scenarios['e'] * scenarios['d']
scenarios['G']= np.exp(scenarios['g']) - 1
scenarios['Vol1'] = Vol0 * (1+scenarios['G'])
scenarios['Sales1'] =scenarios['Price1'] * scenarios['Vol1'] scenarios[
We see the content of the scenarios table:
print(scenarios)
Discount Scenarios
D Price1 G Vol1 ... E d g e
0 0.00 100.0 0.000000 1000.000000 ... NaN 0.000000 -0.000000 -1.5
1 -0.05 95.0 0.079977 1079.977213 ... -1.052632 -0.051293 0.076940 -1.5
2 -0.10 90.0 0.171214 1171.213948 ... -1.111111 -0.105361 0.158041 -1.5
3 -0.15 85.0 0.276062 1276.061517 ... -1.176471 -0.162519 0.243778 -1.5
4 -0.20 80.0 0.397542 1397.542486 ... -1.250000 -0.223144 0.334715 -1.5
5 -0.25 75.0 0.539601 1539.600718 ... -1.333333 -0.287682 0.431523 -1.5
6 -0.30 70.0 0.707469 1707.469442 ... -1.428571 -0.356675 0.535012 -1.5
7 -0.35 65.0 0.908227 1908.226686 ... -1.538462 -0.430783 0.646174 -1.5
8 -0.40 60.0 1.151657 2151.657415 ... -1.666667 -0.510826 0.766238 -1.5
9 -0.45 55.0 1.451636 2451.635864 ... -1.818182 -0.597837 0.896756 -1.5
10 -0.50 50.0 1.828427 2828.427125 ... -2.000000 -0.693147 1.039721 -1.5
[11 rows x 9 columns]
We can plot the discount % vs volume growth:
plt.clf()'D'], scenarios['G'], marker='o',color='skyblue',edgecolor='blue')
plt.scatter(scenarios['Price Discount (D)')
plt.xlabel('Volume % growth (G)')
plt.ylabel(True)
plt.grid( plt.show()
We can see that not only volume sales increases, but also sales value increases with each level of discount depth.
Then, what can you learn/conclude about price elasticity e with respect to changes in sales? In which situations might be a good idea to do aggressive discounts for a product? briefly explain.
2.4 Estimation of price elasticity with real data
In the previous sections we estimated price elasticity of a hypothetical product with specific scenarios of discount depth. Now the question is, how can we estimate price elasticity of a consumer product?
First we need historical sales data of a the consumer product. We need historical sales in value ($) and sales in volume, and we can calculate an average unit price per period. We can get daily, weekly, monthly or quarterly data. Using monthly or quarterly data is a good idea since the volatility of sales at a weekly and daily level is high so that it is harder to get robust estimation of elasticity. In addition, using monthly or quarterly data allows us to easily measure the seasonality of a consumer product.
Imagine we have monthly data for sales volume and sales value for a consumer product for the past 36 months. How can you estimate price elasticity?
It is easily to calculate unit price for each month. We just divide sales value by sales volume for each month. Now we will have a column for historical prices, a column for sales volume and a column for sales value. Then we can estimate price elasticity by using a simple regression model.
For the previous hypothetical exercise, we can also estimate price elasticity using a simple regression model. The way we set up this simple regression is using the natural logarithm of sales volume as the dependent variable, and the natural log of price as the independent variable. We can express this regression model as:
lnvol_{t}=b_{0}+b_{1}*lnprice_{t}+\varepsilon_{t}
Where:
lnvol_{t} = log of sales volume at period t, which is the dependent variable of the model.
lnprice_{t} = log of unit price at period t, which is the independent or explanatory variable of the model.
\varepsilon_{t} is called the error of the model, and it is supposed to behave like a normal distribution with mean=0 and a specific standard deviation: \varepsilon_{t}\sim N(0,\sigma_{\varepsilon})
b_{0}= coefficient for the intercept; it represents the value of the dependent variable, in this case, log of sales volume when the value of the independent variable (log of price) is equal to zero. In other words, it is the point where the regression line crosses the Y axis.
b_{1} = coefficient for the slope of the regression line. This coefficient is a measure of sensitivity of how much the dependent variable moves for each 1-unit move in the independent variable.
The equation for the regression model is actually the expected value of the dependent variable. Since the mean of the error is zero, then:
E\left[lnvol\right]=b_{0}+b_{1}*lnprice_{t}
This regression equation is actually a line that represents all the pairs of values for log of price and log of sales volume.
Let’s run a regression model using the hypothetical example of the discount scenarios:
import statsmodels.formula.api as smf
'logVol'] = np.log(scenarios['Vol1'])
scenarios['logPrice'] = np.log(scenarios['Price1'])
scenarios[# I estimate the OLS regression model:
= smf.ols('logVol ~ logPrice',data=scenarios).fit()
model1 # I display the summary of the regression:
print(model1.summary())
OLS Regression Results
==============================================================================
Dep. Variable: logVol R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 1.409e+28
Date: mié., 19 feb. 2025 Prob (F-statistic): 1.09e-123
Time: 20:48:19 Log-Likelihood: 341.10
No. Observations: 11 AIC: -678.2
Df Residuals: 9 BIC: -677.4
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 13.8155 5.43e-14 2.54e+14 0.000 13.816 13.816
logPrice -1.5000 1.26e-14 -1.19e+14 0.000 -1.500 -1.500
==============================================================================
Omnibus: 0.211 Durbin-Watson: 0.006
Prob(Omnibus): 0.900 Jarque-Bera (JB): 0.293
Skew: -0.250 Prob(JB): 0.864
Kurtosis: 2.375 Cond. No. 89.5
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The coefficient b_{0} or intercept = 13.81. b_{1}=-1.5, which is the same value of elasticity e we got previously using cc % changes in price and volume. In this hypothetical case, all the points of the data lie on the regression line, so all errors are equal to zero and the standard errors (standard deviation of the regression coefficients) are actually zero.
Now move to a real-world consumer product and estimate price elasticity. We will download a dataset of historical monthly data of 4 consumer products sold in retail stores. The dataset was constructed using a real-world dataset of consumer products of one specific category and sold in big retail stores in Mexico. Do the following to directly download the dataset into your R:
import requests
from io import BytesIO
# URL of the Excel file
= 'http://www.apradie.com/datos/salesdata1.xlsx'
url
# Define headers to mimic a web browser
= {
headers 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
# Send a GET request to download the file with custom headers
= requests.get(url, headers=headers)
response
# Check if the request was successful
if response.status_code == 200:
# Use BytesIO to create a file-like object in memory
= BytesIO(response.content)
excel_file
# Read the Excel file into a pandas DataFrame
= pd.read_excel(excel_file)
sales = sales.set_index('month')
sales
else:
print(f"Failed to retrieve the file. Status code: {response.status_code}")
We can see see the content of this file.
print(sales)
Monthly sales of skus
qsku1 vsku1 qsku2 ... vsku3 qsku4 vsku4
month ...
2017-05-01 266745 1733845 213811 ... 3882460 1737617 19353875
2017-06-01 293427 1819250 235841 ... 4048995 1743237 18915640
2017-07-01 231065 1432600 201680 ... 3813775 1690972 18800880
2017-08-01 196930 1201275 190614 ... 3831065 1835160 20683400
2017-09-01 174157 1132020 171971 ... 5493565 2067265 21311065
2017-10-01 168432 1077965 177789 ... 7939530 2512034 24061980
2017-11-01 178442 1070650 158040 ... 7179055 2370224 22126260
2017-12-01 167541 1072265 168935 ... 5035570 1997318 20296275
2018-01-01 163678 1063905 165500 ... 6999505 2084443 20516960
2018-02-01 142067 966055 137946 ... 8786170 2253161 20238705
2018-03-01 197475 1204600 172930 ... 9197805 2507851 23259990
2018-04-01 179316 1165555 181410 ... 7192640 2119443 22118375
2018-05-01 172579 1173535 183918 ... 6668525 1964131 21308690
2018-06-01 169240 1150830 179595 ... 6677075 1953905 20883660
2018-07-01 146817 998355 182320 ... 5813525 1793574 19867255
2018-08-01 138727 984960 171000 ... 7079685 1968629 20860765
2018-09-01 128471 937840 154578 ... 7479160 2138188 21187755
2018-10-01 133185 958930 152312 ... 7889180 2047528 20564745
2018-11-01 112220 796765 136824 ... 5891615 1676519 17866080
2018-12-01 109825 779760 140745 ... 6275985 1710498 18380125
2019-01-01 115084 817095 146799 ... 7612255 1868065 19110580
2019-02-01 112394 798000 135993 ... 7548415 2007173 19015390
2019-03-01 134466 941260 169565 ... 8287705 2219548 21716905
2019-04-01 139431 906300 164997 ... 7375990 2000818 20130690
2019-05-01 149960 1019730 176048 ... 7911790 2151228 21210080
2019-06-01 141456 1004340 200725 ... 7754945 2147632 20699170
2019-07-01 104685 753730 164731 ... 5834520 1694306 17421480
2019-08-01 126706 912285 204915 ... 6291945 2027945 20202605
2019-09-01 132727 968905 190135 ... 5816565 2199420 20475635
2019-10-01 142338 1081765 165350 ... 4309960 1771625 18403970
2019-11-01 118651 913615 154887 ... 3938700 1585935 16310740
2019-12-01 82863 629755 147879 ... 2947280 1624348 16548050
2020-01-01 95525 725990 99323 ... 2777990 1604222 15487280
2020-02-01 82477 635075 130238 ... 2284655 1730077 16541685
2020-03-01 112063 862885 166859 ... 2325600 1913984 19594795
2020-04-01 90136 676020 151963 ... 1364390 1457315 16395955
2020-05-01 91567 686755 159098 ... 1313945 1382876 15917535
[37 rows x 8 columns]
The columns that start with v are value sales, and the columns that start with q are volume sales (in units). Then, vsku1 is the value sales (in $) of product 1, qsku1 is the # of unit sold of product 1, etc.
We will work with product 1. We can calculate an average monthly price for product 1, and also columns for the log of value sales and log of price:
'psku1'] = sales['vsku1'] / sales['qsku1']
sales['logpsku1'] = np.log(sales['psku1'])
sales['logqsku1'] = np.log(sales['qsku1']) sales[
We can start visualizing the sales data for this product:
plt.clf()'qsku1'], marker='o', linestyle='-')
plt.plot(sales.index, sales['Month')
plt.xlabel('Volume Sales')
plt.ylabel('Volume Sales of qsku1')
plt.title(True)
plt.grid(=45) # Rotate x-axis labels for better readability plt.xticks(rotation
(array([17287., 17410., 17532., 17652., 17775., 17897., 18017., 18140.,
18262., 18383.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
# Adjust layout to prevent labels from overlapping
plt.tight_layout() plt.show()
We can see that the number of units sold of product 1 is dramatically going down for the last 36 months.
We can calculate the % decline of volume sales from the first month until the most recent month:
# Calculate the percentage change
= sales['qsku1'].iloc[0]
first_value = sales['qsku1'].iloc[-1]
last_value = ((last_value - first_value) / first_value) * 100 volume_growth
The % decline in volume sales of product 1 is -6567.2459%. If you were the responsible for this product in a company, you must be very, very concerned.
Now we do a plot for historical price of product 1:
plt.clf()'psku1'], marker='o', linestyle='-')
plt.plot(sales.index, sales['Month')
plt.xlabel('Price')
plt.ylabel('Price of qsku1')
plt.title(True)
plt.grid(=45) # Rotate x-axis labels for better readability plt.xticks(rotation
(array([17287., 17410., 17532., 17652., 17775., 17897., 18017., 18140.,
18262., 18383.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
# Adjust layout to prevent labels from overlapping
plt.tight_layout() plt.show()
We can see that the unit price of product 1 has been going up for the last 36 months. We can calculate the % price increase in the whole period:
# Calculate the percentage change
= sales['psku1'].iloc[0]
first_value = sales['psku1'].iloc[-1]
last_value = ((last_value - first_value) / first_value) * 100 price_growth
The price increased 1538.4869% in the last 36 months (from the first month up to the last month of the data).
Now we can visualize how sales volume changes with different price levels. We plot price vs volume in log scale:
import seaborn as sns
plt.clf()=(10, 6))
plt.figure(figsize='logpsku1', y='logqsku1', data=sales, scatter_kws={'alpha':0.5}, line_kws={'color':'red'}) # Use regplot for regression line
sns.regplot(x'Log of psku1')
plt.xlabel('Log of qsku1')
plt.ylabel('Scatter Plot of Log(psku1) vs. Log(qsku1) with Regression Line ')
plt.title(True)
plt.grid( plt.show()
I also plotted the regression line between price and volume (in log scale). We clearly see a negative relationship between price and sales volume.
Now we can run a simple linear regression model using the log of sales volume as dependent variable, and the log of price as independent variable:
# I estimate the OLS regression model:
= smf.ols('logqsku1 ~ logpsku1',data=sales).fit()
model2 # I display the summary of the regression:
print(model2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: logqsku1 R-squared: 0.705
Model: OLS Adj. R-squared: 0.697
Method: Least Squares F-statistic: 83.67
Date: mié., 19 feb. 2025 Prob (F-statistic): 8.29e-11
Time: 20:48:23 Log-Likelihood: 15.085
No. Observations: 37 AIC: -26.17
Df Residuals: 35 BIC: -22.95
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 18.4376 0.720 25.596 0.000 16.975 19.900
logpsku1 -3.4044 0.372 -9.147 0.000 -4.160 -2.649
==============================================================================
Omnibus: 4.792 Durbin-Watson: 1.003
Prob(Omnibus): 0.091 Jarque-Bera (JB): 3.829
Skew: 0.782 Prob(JB): 0.147
Kurtosis: 3.191 Cond. No. 64.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Then, the beta 1 coefficient of this regression is -3.4, which is an estimation of price elasticity of demand.
We can interpret price elasticity as follows: for each 1% increase in the price of product, its own demand is expected to decrease in about 3.4%. This result make sense after looking the volume decline vs the price increase.
It is important to note that this measure of price elasticity does not consider important factors that happen with consumer products in real life. Important drivers of demand for consumer products are 1) seasonality, 2) organic growing trend, and 3) cross-elasticities with competing/substitute products and with complementary products, 4) other commercial factors such as distribution, advertising, promotions, and 5) external events such as economic recessions.
2.5 Alternative measures for direct elasticity of demand
In the case of consumer products it is recommended to include some of the factors mentioned above in the econometric model in order to calculate a more robust measure of direct price elasticity, and also measures of cross-elasticities with significant competing or complementary products. We need to use more advanced econometric models. Due to the complexity of these models, we cannot cover them in detail in this workshop, but we will mention some of the most important econometric models used that consider these factors.
The most popular models are called ARIMA/SARIMA and ARIMAX models. ARIMA stands for Autoregressive Integrated Moving Average models. SARIMA stands for Seasonal ARIMA models. ARIMAX stands for ARIMA/SARIMA models with explanatory variables (X’s variables) or factors.