1 Overview (2–4 sentences)

Research question.
Write one clear sentence.

Data.
Where does it come from? What is the unit of observation?

Model.
Write the regression equation you estimate.

# Put your dataset inside a /data subfolder.
# df <- read_csv(here("data", "DATASET_NAME.csv"))

df <- read_csv("block3_functional_forms_demand.csv")


glimpse(df)
## Rows: 250
## Columns: 4
## $ quantity <dbl> 3999.55, 3607.85, 6241.21, 7953.41, 6447.59, 5452.35, 4513.27…
## $ income   <dbl> 35902.04, 37138.25, 33830.93, 34923.83, 24469.31, 19435.45, 1…
## $ price    <dbl> 12.77, 10.93, 8.32, 9.30, 6.85, 7.72, 8.72, 13.56, 8.50, 11.7…
## $ promo    <dbl> 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1…
skim(df)
Data summary
Name df
Number of rows 250
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
quantity 0 1 4656.59 1772.77 1536.09 3364.93 4296.77 5533.88 15742.40 ▇▆▁▁▁
income 0 1 27640.62 10276.05 9561.82 20673.81 26259.82 32559.07 74606.70 ▆▇▂▁▁
price 0 1 9.88 2.50 4.66 8.15 9.61 11.58 20.13 ▃▇▅▁▁
promo 0 1 0.44 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
df %>% 
  summarise(
    n = n(),
    q_mean = mean(quantity),
    p_mean = mean(price),
    inc_mean = mean(income),
    promo_share = mean(promo)
  )
ggplot(df, aes(x = price, y = quantity)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE)

ggplot(df, aes(x = income, y = quantity, color = factor(promo))) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE)

Does it look linear or ‘curved’? The line looks to be linear.

Does the promo effect look like an offset (level) or does it change the slope? We can see that the promo does affect the slope of the line since the lines arent parallel, hence the relationship between income and when promo = 0 or 1 is different.

df <- df %>%
  mutate(
    ln_quantity = log(quantity),
    ln_price = log(price),
    ln_income = log(income)
  )
m_level_level <- lm(quantity ~ price + income + promo, data = df)

summary(m_level_level)
## 
## Call:
## lm(formula = quantity ~ price + income + promo, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1857.9  -589.9  -103.9   380.0  7436.2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.528e+03  3.197e+02  26.678  < 2e-16 ***
## price       -5.431e+02  2.512e+01 -21.616  < 2e-16 ***
## income       4.735e-02  6.129e-03   7.726 2.82e-13 ***
## promo        4.229e+02  1.265e+02   3.344 0.000955 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 991.4 on 246 degrees of freedom
## Multiple R-squared:  0.6911, Adjusted R-squared:  0.6873 
## F-statistic: 183.4 on 3 and 246 DF,  p-value: < 2.2e-16
# Add controls / dummies / interactions if required
# m2 <- lm(y ~ x + z1 + z2, data = df)
# summary(m2)
m_log_level <- lm(ln_quantity ~ price + income + promo, data = df)

summary(m_log_level)
## 
## Call:
## lm(formula = ln_quantity ~ price + income + promo, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46192 -0.10058  0.00345  0.08543  0.56559 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.272e+00  5.098e-02 181.879  < 2e-16 ***
## price       -1.209e-01  4.006e-03 -30.164  < 2e-16 ***
## income       9.201e-06  9.774e-07   9.414  < 2e-16 ***
## promo        1.108e-01  2.017e-02   5.495 9.77e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1581 on 246 degrees of freedom
## Multiple R-squared:  0.8102, Adjusted R-squared:  0.8079 
## F-statistic: 350.1 on 3 and 246 DF,  p-value: < 2.2e-16
beta_price <- coef(m_log_level)["price"]

100 * (exp(beta_price) - 1)
##     price 
## -11.38342
# Clean table for reports
# modelsummary(list("Baseline" = m1, "Extended" = m2), output = "markdown")
m_level_log <- lm(quantity ~ ln_price + ln_income + promo, data = df)

summary(m_level_log)
## 
## Call:
## lm(formula = quantity ~ ln_price + ln_income + promo, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2089.9  -508.1   -12.3   319.8  6501.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4022.2     1667.1   2.413   0.0166 *  
## ln_price     -5683.9      222.8 -25.514  < 2e-16 ***
## ln_income     1306.3      153.9   8.485 2.04e-15 ***
## promo          460.1      112.6   4.087 5.92e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 883.5 on 246 degrees of freedom
## Multiple R-squared:  0.7546, Adjusted R-squared:  0.7516 
## F-statistic: 252.2 on 3 and 246 DF,  p-value: < 2.2e-16
m_log_log <- lm(ln_quantity ~ ln_price + ln_income + promo, data = df)

summary(m_log_log)
## 
## Call:
## lm(formula = ln_quantity ~ ln_price + ln_income + promo, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46045 -0.10185  0.01338  0.09069  0.42295 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.38841    0.28192  29.754  < 2e-16 ***
## ln_price    -1.21395    0.03767 -32.223  < 2e-16 ***
## ln_income    0.26395    0.02603  10.139  < 2e-16 ***
## promo        0.11996    0.01904   6.301 1.35e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1494 on 246 degrees of freedom
## Multiple R-squared:  0.8305, Adjusted R-squared:  0.8285 
## F-statistic: 401.8 on 3 and 246 DF,  p-value: < 2.2e-16
beta_promo <- coef(m_log_log)["promo"]

100 * (exp(beta_promo) - 1)
##    promo 
## 12.74478
m_poly <- lm(quantity ~ price + I(price^2) + income + promo, data = df)

summary(m_poly)
## 
## Call:
## lm(formula = quantity ~ price + I(price^2) + income + promo, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2252.8  -481.6   -13.1   363.3  6017.6 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.411e+04  6.670e+02  21.156  < 2e-16 ***
## price       -1.661e+03  1.235e+02 -13.451  < 2e-16 ***
## I(price^2)   5.203e+01  5.658e+00   9.195  < 2e-16 ***
## income       4.750e-02  5.295e-03   8.971  < 2e-16 ***
## promo        5.392e+02  1.100e+02   4.903 1.72e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 856.5 on 245 degrees of freedom
## Multiple R-squared:  0.7703, Adjusted R-squared:  0.7666 
## F-statistic: 205.4 on 4 and 245 DF,  p-value: < 2.2e-16

Does the price^2 sign suggest curvature? Yes,since its coefficient is 52.03 which is positive, and has a high (p < 2e-16), and the price coefficient is -1661 which is negative. Hence it proves a curvature in the demand curve.

Does it make economic sense for demand to stop falling at high prices? No it doesnt because according to the law of demand, higher prices means that demand will fall.

modelsummary(
  list(
    "Level-Level" = m_level_level,
    "Log-Level" = m_log_level,
    "Level-Log" = m_level_log,
    "Log-Log" = m_log_log,
    "Poly" = m_poly
  ),
  output = "markdown"
)
Level-Level Log-Level Level-Log Log-Log Poly
(Intercept) 8528.000 9.272 4022.184 8.388 14110.566
(319.670) (0.051) (1667.121) (0.282) (666.973)
price -543.074 -0.121 -1660.635
(25.124) (0.004) (123.458)
income 0.047 0.000 0.048
(0.006) (0.000) (0.005)
promo 422.873 0.111 460.091 0.120 539.242
(126.465) (0.020) (112.572) (0.019) (109.993)
ln_price -5683.933 -1.214
(222.775) (0.038)
ln_income 1306.274 0.264
(153.945) (0.026)
I(price^2) 52.030
(5.658)
Num.Obs. 250 250 250 250 250
R2 0.691 0.810 0.755 0.831 0.770
R2 Adj. 0.687 0.808 0.752 0.828 0.767
AIC 4165.0 -206.9 4107.4 -235.1 4092.8
BIC 4182.6 -189.3 4125.0 -217.5 4114.0
Log.Lik. -2077.486 108.429 -2048.688 122.556 -2040.425
F 183.417 350.132 252.181 401.838 205.426
RMSE 983.39 0.16 876.39 0.15 847.90
set.seed(123)

n <- nrow(df)

idx_train <- sample(1:n, size = round(0.7 * n))

train <- df[idx_train, ]

test <- df[-idx_train, ]
m1 <- lm(quantity ~ price + income + promo, data = train)

m2 <- lm(ln_quantity ~ price + income + promo, data = train)

m3 <- lm(quantity ~ ln_price + ln_income + promo, data = train)

m4 <- lm(ln_quantity ~ ln_price + ln_income + promo, data = train)

m5 <- lm(quantity ~ price + I(price^2) + income + promo, data = train)
rmse <- function(y, yhat) sqrt(mean((y - yhat)^2))

pred1 <- predict(m1, newdata = test)

pred3 <- predict(m3, newdata = test)

pred5 <- predict(m5, newdata = test)
pred2 <- exp(predict(m2, newdata = test))
pred4 <- exp(predict(m4, newdata = test))

tibble(model = c("Level-Level", "Log-Level", "Level-Log", "Log-Log", "Poly"), RMSE = c( rmse(test$quantity, pred1), rmse(test$quantity, pred2),
rmse(test$quantity, pred3), rmse(test$quantity, pred4), rmse(test$quantity, pred5) )) %>% arrange(RMSE)

The Log-Log model is chosen as the preferred specification because it has the lowest RMSE in the out-of-sample prediction exercise, indicating the best predictive performance. In addition, the model has a relatively high R² of about 0.83, meaning it explains a large share of the variation in quantity demanded. The log-log form is also convenient because its coefficients can be interpreted as elasticities.

The price coefficient is -1.21, implying that a 1% increase in price reduces quantity demanded by about 1.21%. Since the elasticity is greater than one in absolute value, demand appears to be elastic to price.

The income coefficient is about 0.26, meaning that a 1% increase in income raises quantity demanded by roughly 0.26%. This shows that the product behaves as a normal good.

The promotion coefficient is 0.11996. Using the transformation 100(exp(β)−1), promotions increase expected demand by about 12.75%, holding other variables constant.

One limitation of the model is that it assumes constant elasticities across observations. As a possible extension, interaction terms such as price:promo could be included to test whether promotions affect consumers’ sensitivity to price.

2 Reproducibility checklist (must be TRUE before submission)