Research question.
Write one clear sentence.
Data.
Where does it come from? What is the unit of observation?
Model.
Write the regression equation you estimate.
# Put your dataset inside a /data subfolder.
# df <- read_csv(here("data", "DATASET_NAME.csv"))
df <- read_csv("block3_functional_forms_demand.csv")
glimpse(df)
## Rows: 250
## Columns: 4
## $ quantity <dbl> 3999.55, 3607.85, 6241.21, 7953.41, 6447.59, 5452.35, 4513.27…
## $ income <dbl> 35902.04, 37138.25, 33830.93, 34923.83, 24469.31, 19435.45, 1…
## $ price <dbl> 12.77, 10.93, 8.32, 9.30, 6.85, 7.72, 8.72, 13.56, 8.50, 11.7…
## $ promo <dbl> 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1…
skim(df)
| Name | df |
| Number of rows | 250 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| quantity | 0 | 1 | 4656.59 | 1772.77 | 1536.09 | 3364.93 | 4296.77 | 5533.88 | 15742.40 | ▇▆▁▁▁ |
| income | 0 | 1 | 27640.62 | 10276.05 | 9561.82 | 20673.81 | 26259.82 | 32559.07 | 74606.70 | ▆▇▂▁▁ |
| price | 0 | 1 | 9.88 | 2.50 | 4.66 | 8.15 | 9.61 | 11.58 | 20.13 | ▃▇▅▁▁ |
| promo | 0 | 1 | 0.44 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
df %>%
summarise(
n = n(),
q_mean = mean(quantity),
p_mean = mean(price),
inc_mean = mean(income),
promo_share = mean(promo)
)
ggplot(df, aes(x = price, y = quantity)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE)
ggplot(df, aes(x = income, y = quantity, color = factor(promo))) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE)
Does it look linear or ‘curved’? The line looks to be linear.
Does the promo effect look like an offset (level) or does it change the slope? We can see that the promo does affect the slope of the line since the lines arent parallel, hence the relationship between income and when promo = 0 or 1 is different.
df <- df %>%
mutate(
ln_quantity = log(quantity),
ln_price = log(price),
ln_income = log(income)
)
m_level_level <- lm(quantity ~ price + income + promo, data = df)
summary(m_level_level)
##
## Call:
## lm(formula = quantity ~ price + income + promo, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1857.9 -589.9 -103.9 380.0 7436.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.528e+03 3.197e+02 26.678 < 2e-16 ***
## price -5.431e+02 2.512e+01 -21.616 < 2e-16 ***
## income 4.735e-02 6.129e-03 7.726 2.82e-13 ***
## promo 4.229e+02 1.265e+02 3.344 0.000955 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 991.4 on 246 degrees of freedom
## Multiple R-squared: 0.6911, Adjusted R-squared: 0.6873
## F-statistic: 183.4 on 3 and 246 DF, p-value: < 2.2e-16
# Add controls / dummies / interactions if required
# m2 <- lm(y ~ x + z1 + z2, data = df)
# summary(m2)
m_log_level <- lm(ln_quantity ~ price + income + promo, data = df)
summary(m_log_level)
##
## Call:
## lm(formula = ln_quantity ~ price + income + promo, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46192 -0.10058 0.00345 0.08543 0.56559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.272e+00 5.098e-02 181.879 < 2e-16 ***
## price -1.209e-01 4.006e-03 -30.164 < 2e-16 ***
## income 9.201e-06 9.774e-07 9.414 < 2e-16 ***
## promo 1.108e-01 2.017e-02 5.495 9.77e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1581 on 246 degrees of freedom
## Multiple R-squared: 0.8102, Adjusted R-squared: 0.8079
## F-statistic: 350.1 on 3 and 246 DF, p-value: < 2.2e-16
beta_price <- coef(m_log_level)["price"]
100 * (exp(beta_price) - 1)
## price
## -11.38342
# Clean table for reports
# modelsummary(list("Baseline" = m1, "Extended" = m2), output = "markdown")
m_level_log <- lm(quantity ~ ln_price + ln_income + promo, data = df)
summary(m_level_log)
##
## Call:
## lm(formula = quantity ~ ln_price + ln_income + promo, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2089.9 -508.1 -12.3 319.8 6501.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4022.2 1667.1 2.413 0.0166 *
## ln_price -5683.9 222.8 -25.514 < 2e-16 ***
## ln_income 1306.3 153.9 8.485 2.04e-15 ***
## promo 460.1 112.6 4.087 5.92e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 883.5 on 246 degrees of freedom
## Multiple R-squared: 0.7546, Adjusted R-squared: 0.7516
## F-statistic: 252.2 on 3 and 246 DF, p-value: < 2.2e-16
m_log_log <- lm(ln_quantity ~ ln_price + ln_income + promo, data = df)
summary(m_log_log)
##
## Call:
## lm(formula = ln_quantity ~ ln_price + ln_income + promo, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46045 -0.10185 0.01338 0.09069 0.42295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.38841 0.28192 29.754 < 2e-16 ***
## ln_price -1.21395 0.03767 -32.223 < 2e-16 ***
## ln_income 0.26395 0.02603 10.139 < 2e-16 ***
## promo 0.11996 0.01904 6.301 1.35e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1494 on 246 degrees of freedom
## Multiple R-squared: 0.8305, Adjusted R-squared: 0.8285
## F-statistic: 401.8 on 3 and 246 DF, p-value: < 2.2e-16
beta_promo <- coef(m_log_log)["promo"]
100 * (exp(beta_promo) - 1)
## promo
## 12.74478
m_poly <- lm(quantity ~ price + I(price^2) + income + promo, data = df)
summary(m_poly)
##
## Call:
## lm(formula = quantity ~ price + I(price^2) + income + promo,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2252.8 -481.6 -13.1 363.3 6017.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.411e+04 6.670e+02 21.156 < 2e-16 ***
## price -1.661e+03 1.235e+02 -13.451 < 2e-16 ***
## I(price^2) 5.203e+01 5.658e+00 9.195 < 2e-16 ***
## income 4.750e-02 5.295e-03 8.971 < 2e-16 ***
## promo 5.392e+02 1.100e+02 4.903 1.72e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 856.5 on 245 degrees of freedom
## Multiple R-squared: 0.7703, Adjusted R-squared: 0.7666
## F-statistic: 205.4 on 4 and 245 DF, p-value: < 2.2e-16
Does the price^2 sign suggest curvature? Yes,since its coefficient is 52.03 which is positive, and has a high (p < 2e-16), and the price coefficient is -1661 which is negative. Hence it proves a curvature in the demand curve.
Does it make economic sense for demand to stop falling at high prices? No it doesnt because according to the law of demand, higher prices means that demand will fall.
modelsummary(
list(
"Level-Level" = m_level_level,
"Log-Level" = m_log_level,
"Level-Log" = m_level_log,
"Log-Log" = m_log_log,
"Poly" = m_poly
),
output = "markdown"
)
| Level-Level | Log-Level | Level-Log | Log-Log | Poly | |
|---|---|---|---|---|---|
| (Intercept) | 8528.000 | 9.272 | 4022.184 | 8.388 | 14110.566 |
| (319.670) | (0.051) | (1667.121) | (0.282) | (666.973) | |
| price | -543.074 | -0.121 | -1660.635 | ||
| (25.124) | (0.004) | (123.458) | |||
| income | 0.047 | 0.000 | 0.048 | ||
| (0.006) | (0.000) | (0.005) | |||
| promo | 422.873 | 0.111 | 460.091 | 0.120 | 539.242 |
| (126.465) | (0.020) | (112.572) | (0.019) | (109.993) | |
| ln_price | -5683.933 | -1.214 | |||
| (222.775) | (0.038) | ||||
| ln_income | 1306.274 | 0.264 | |||
| (153.945) | (0.026) | ||||
| I(price^2) | 52.030 | ||||
| (5.658) | |||||
| Num.Obs. | 250 | 250 | 250 | 250 | 250 |
| R2 | 0.691 | 0.810 | 0.755 | 0.831 | 0.770 |
| R2 Adj. | 0.687 | 0.808 | 0.752 | 0.828 | 0.767 |
| AIC | 4165.0 | -206.9 | 4107.4 | -235.1 | 4092.8 |
| BIC | 4182.6 | -189.3 | 4125.0 | -217.5 | 4114.0 |
| Log.Lik. | -2077.486 | 108.429 | -2048.688 | 122.556 | -2040.425 |
| F | 183.417 | 350.132 | 252.181 | 401.838 | 205.426 |
| RMSE | 983.39 | 0.16 | 876.39 | 0.15 | 847.90 |
set.seed(123)
n <- nrow(df)
idx_train <- sample(1:n, size = round(0.7 * n))
train <- df[idx_train, ]
test <- df[-idx_train, ]
m1 <- lm(quantity ~ price + income + promo, data = train)
m2 <- lm(ln_quantity ~ price + income + promo, data = train)
m3 <- lm(quantity ~ ln_price + ln_income + promo, data = train)
m4 <- lm(ln_quantity ~ ln_price + ln_income + promo, data = train)
m5 <- lm(quantity ~ price + I(price^2) + income + promo, data = train)
rmse <- function(y, yhat) sqrt(mean((y - yhat)^2))
pred1 <- predict(m1, newdata = test)
pred3 <- predict(m3, newdata = test)
pred5 <- predict(m5, newdata = test)
pred2 <- exp(predict(m2, newdata = test))
pred4 <- exp(predict(m4, newdata = test))
tibble(model = c("Level-Level", "Log-Level", "Level-Log", "Log-Log", "Poly"), RMSE = c( rmse(test$quantity, pred1), rmse(test$quantity, pred2),
rmse(test$quantity, pred3), rmse(test$quantity, pred4), rmse(test$quantity, pred5) )) %>% arrange(RMSE)
The Log-Log model is chosen as the preferred specification because it has the lowest RMSE in the out-of-sample prediction exercise, indicating the best predictive performance. In addition, the model has a relatively high R² of about 0.83, meaning it explains a large share of the variation in quantity demanded. The log-log form is also convenient because its coefficients can be interpreted as elasticities.
The price coefficient is -1.21, implying that a 1% increase in price reduces quantity demanded by about 1.21%. Since the elasticity is greater than one in absolute value, demand appears to be elastic to price.
The income coefficient is about 0.26, meaning that a 1% increase in income raises quantity demanded by roughly 0.26%. This shows that the product behaves as a normal good.
The promotion coefficient is 0.11996. Using the transformation 100(exp(β)−1), promotions increase expected demand by about 12.75%, holding other variables constant.
One limitation of the model is that it assumes constant elasticities across observations. As a possible extension, interaction terms such as price:promo could be included to test whether promotions affect consumers’ sensitivity to price.