1 Overview (2–4 sentences)

Research question.
Write one clear sentence.

Data.
Where does it come from? What is the unit of observation?

Model.
Write the regression equation you estimate.

# Put your dataset inside a /data subfolder.
# df <- read_csv(here("data", "DATASET_NAME.csv"))

df <- read_csv("block3_functional_forms_demand.csv")


glimpse(df)

## Rows: 250
## Columns: 4
## $ quantity <dbl> 3999.55, 3607.85, 6241.21, 7953.41, 6447.59, 5452.35, 4513.27…
## $ income   <dbl> 35902.04, 37138.25, 33830.93, 34923.83, 24469.31, 19435.45, 1…
## $ price    <dbl> 12.77, 10.93, 8.32, 9.30, 6.85, 7.72, 8.72, 13.56, 8.50, 11.7…
## $ promo    <dbl> 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1…

skim(df)

Data summary
Name	df
Number of rows	250
Number of columns	4
_______________________
Column type frequency:
numeric	4
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
quantity	1	4656.59	1772.77	1536.09	3364.93	4296.77	5533.88	15742.40	▇▆▁▁▁
income	1	27640.62	10276.05	9561.82	20673.81	26259.82	32559.07	74606.70	▆▇▂▁▁
price	1	9.88	2.50	4.66	8.15	9.61	11.58	20.13	▃▇▅▁▁
promo	1	0.44	0.50	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▆

df %>% 
  summarise(
    n = n(),
    q_mean = mean(quantity),
    p_mean = mean(price),
    inc_mean = mean(income),
    promo_share = mean(promo)
  )

ggplot(df, aes(x = price, y = quantity)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE)

ggplot(df, aes(x = income, y = quantity, color = factor(promo))) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE)

Does it look linear or ‘curved’? The line looks to be linear.

Does the promo effect look like an offset (level) or does it change the slope? We can see that the promo does affect the slope of the line since the lines arent parallel, hence the relationship between income and when promo = 0 or 1 is different.

df <- df %>%
  mutate(
    ln_quantity = log(quantity),
    ln_price = log(price),
    ln_income = log(income)
  )

m_level_level <- lm(quantity ~ price + income + promo, data = df)

summary(m_level_level)

## 
## Call:
## lm(formula = quantity ~ price + income + promo, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1857.9  -589.9  -103.9   380.0  7436.2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.528e+03  3.197e+02  26.678  < 2e-16 ***
## price       -5.431e+02  2.512e+01 -21.616  < 2e-16 ***
## income       4.735e-02  6.129e-03   7.726 2.82e-13 ***
## promo        4.229e+02  1.265e+02   3.344 0.000955 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 991.4 on 246 degrees of freedom
## Multiple R-squared:  0.6911, Adjusted R-squared:  0.6873 
## F-statistic: 183.4 on 3 and 246 DF,  p-value: < 2.2e-16

# Add controls / dummies / interactions if required
# m2 <- lm(y ~ x + z1 + z2, data = df)
# summary(m2)

m_log_level <- lm(ln_quantity ~ price + income + promo, data = df)

summary(m_log_level)

## 
## Call:
## lm(formula = ln_quantity ~ price + income + promo, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46192 -0.10058  0.00345  0.08543  0.56559 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.272e+00  5.098e-02 181.879  < 2e-16 ***
## price       -1.209e-01  4.006e-03 -30.164  < 2e-16 ***
## income       9.201e-06  9.774e-07   9.414  < 2e-16 ***
## promo        1.108e-01  2.017e-02   5.495 9.77e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1581 on 246 degrees of freedom
## Multiple R-squared:  0.8102, Adjusted R-squared:  0.8079 
## F-statistic: 350.1 on 3 and 246 DF,  p-value: < 2.2e-16

beta_price <- coef(m_log_level)["price"]

100 * (exp(beta_price) - 1)

##     price 
## -11.38342

# Clean table for reports
# modelsummary(list("Baseline" = m1, "Extended" = m2), output = "markdown")
m_level_log <- lm(quantity ~ ln_price + ln_income + promo, data = df)

summary(m_level_log)

## 
## Call:
## lm(formula = quantity ~ ln_price + ln_income + promo, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2089.9  -508.1   -12.3   319.8  6501.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4022.2     1667.1   2.413   0.0166 *  
## ln_price     -5683.9      222.8 -25.514  < 2e-16 ***
## ln_income     1306.3      153.9   8.485 2.04e-15 ***
## promo          460.1      112.6   4.087 5.92e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 883.5 on 246 degrees of freedom
## Multiple R-squared:  0.7546, Adjusted R-squared:  0.7516 
## F-statistic: 252.2 on 3 and 246 DF,  p-value: < 2.2e-16

m_log_log <- lm(ln_quantity ~ ln_price + ln_income + promo, data = df)

summary(m_log_log)

## 
## Call:
## lm(formula = ln_quantity ~ ln_price + ln_income + promo, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46045 -0.10185  0.01338  0.09069  0.42295 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.38841    0.28192  29.754  < 2e-16 ***
## ln_price    -1.21395    0.03767 -32.223  < 2e-16 ***
## ln_income    0.26395    0.02603  10.139  < 2e-16 ***
## promo        0.11996    0.01904   6.301 1.35e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1494 on 246 degrees of freedom
## Multiple R-squared:  0.8305, Adjusted R-squared:  0.8285 
## F-statistic: 401.8 on 3 and 246 DF,  p-value: < 2.2e-16

beta_promo <- coef(m_log_log)["promo"]

100 * (exp(beta_promo) - 1)

##    promo 
## 12.74478

m_poly <- lm(quantity ~ price + I(price^2) + income + promo, data = df)

summary(m_poly)

## 
## Call:
## lm(formula = quantity ~ price + I(price^2) + income + promo, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2252.8  -481.6   -13.1   363.3  6017.6 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.411e+04  6.670e+02  21.156  < 2e-16 ***
## price       -1.661e+03  1.235e+02 -13.451  < 2e-16 ***
## I(price^2)   5.203e+01  5.658e+00   9.195  < 2e-16 ***
## income       4.750e-02  5.295e-03   8.971  < 2e-16 ***
## promo        5.392e+02  1.100e+02   4.903 1.72e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 856.5 on 245 degrees of freedom
## Multiple R-squared:  0.7703, Adjusted R-squared:  0.7666 
## F-statistic: 205.4 on 4 and 245 DF,  p-value: < 2.2e-16

Does the price^2 sign suggest curvature? Yes,since its coefficient is 52.03 which is positive, and has a high (p < 2e-16), and the price coefficient is -1661 which is negative. Hence it proves a curvature in the demand curve.

Does it make economic sense for demand to stop falling at high prices? No it doesnt because according to the law of demand, higher prices means that demand will fall.

modelsummary(
  list(
    "Level-Level" = m_level_level,
    "Log-Level" = m_log_level,
    "Level-Log" = m_level_log,
    "Log-Log" = m_log_log,
    "Poly" = m_poly
  ),
  output = "markdown"
)

	Level-Level	Log-Level	Level-Log	Log-Log	Poly
(Intercept)	8528.000	9.272	4022.184	8.388	14110.566
	(319.670)	(0.051)	(1667.121)	(0.282)	(666.973)
price	-543.074	-0.121			-1660.635
	(25.124)	(0.004)			(123.458)
income	0.047	0.000			0.048
	(0.006)	(0.000)			(0.005)
promo	422.873	0.111	460.091	0.120	539.242
	(126.465)	(0.020)	(112.572)	(0.019)	(109.993)
ln_price			-5683.933	-1.214
			(222.775)	(0.038)
ln_income			1306.274	0.264
			(153.945)	(0.026)
I(price^2)					52.030
					(5.658)
Num.Obs.	250	250	250	250	250
R2	0.691	0.810	0.755	0.831	0.770
R2 Adj.	0.687	0.808	0.752	0.828	0.767
AIC	4165.0	-206.9	4107.4	-235.1	4092.8
BIC	4182.6	-189.3	4125.0	-217.5	4114.0
Log.Lik.	-2077.486	108.429	-2048.688	122.556	-2040.425
F	183.417	350.132	252.181	401.838	205.426
RMSE	983.39	0.16	876.39	0.15	847.90

set.seed(123)

n <- nrow(df)

idx_train <- sample(1:n, size = round(0.7 * n))

train <- df[idx_train, ]

test <- df[-idx_train, ]

m1 <- lm(quantity ~ price + income + promo, data = train)

m2 <- lm(ln_quantity ~ price + income + promo, data = train)

m3 <- lm(quantity ~ ln_price + ln_income + promo, data = train)

m4 <- lm(ln_quantity ~ ln_price + ln_income + promo, data = train)

m5 <- lm(quantity ~ price + I(price^2) + income + promo, data = train)

rmse <- function(y, yhat) sqrt(mean((y - yhat)^2))

pred1 <- predict(m1, newdata = test)

pred3 <- predict(m3, newdata = test)

pred5 <- predict(m5, newdata = test)

pred2 <- exp(predict(m2, newdata = test))
pred4 <- exp(predict(m4, newdata = test))

tibble(model = c("Level-Level", "Log-Level", "Level-Log", "Log-Log", "Poly"), RMSE = c( rmse(test$quantity, pred1), rmse(test$quantity, pred2),
rmse(test$quantity, pred3), rmse(test$quantity, pred4), rmse(test$quantity, pred5) )) %>% arrange(RMSE)

The Log-Log model is chosen as the preferred specification because it has the lowest RMSE in the out-of-sample prediction exercise, indicating the best predictive performance. In addition, the model has a relatively high R² of about 0.83, meaning it explains a large share of the variation in quantity demanded. The log-log form is also convenient because its coefficients can be interpreted as elasticities.

The price coefficient is -1.21, implying that a 1% increase in price reduces quantity demanded by about 1.21%. Since the elasticity is greater than one in absolute value, demand appears to be elastic to price.

The income coefficient is about 0.26, meaning that a 1% increase in income raises quantity demanded by roughly 0.26%. This shows that the product behaves as a normal good.

The promotion coefficient is 0.11996. Using the transformation 100(exp(β)−1), promotions increase expected demand by about 12.75%, holding other variables constant.

One limitation of the model is that it assumes constant elasticities across observations. As a possible extension, interaction terms such as price:promo could be included to test whether promotions affect consumers’ sensitivity to price.

Econometrics I – Lab Report

Lab 06: Functional Forms & Transformations

Name Surname

2026-03-18

1 Overview (2–4 sentences)

2 Reproducibility checklist (must be TRUE before submission)