#Step 1: Conceptual model
Dependable variable includes Revenue in million pounds.
Independent variables s MAN (management quality), STA (staff quality), TYP (1 = modern pub, 0 = traditional pub), COM (competitor density), INCOME (local income).
The independent variables MAN, STA and INCOME will have a positive effect on revenue.
The independent variable TYP will have a likely positive effect on revenue.
The independent variables COM will have a ambiguous effect on revenue.
#Step 2: Load data and summary statistics
data <- read.csv("companions_data.csv")
head(data)
## pub rev man sta typ income com
## 1 1 14.46734 7 5 1 12.60 0.4758482
## 2 2 15.04026 6 8 1 9.98 0.3994318
## 3 3 14.87616 6 7 0 10.30 0.9406128
## 4 4 16.82898 8 6 1 8.90 0.3127369
## 5 5 10.79394 4 5 0 9.00 0.7215752
## 6 6 12.35022 5 8 1 8.90 0.5239477
summary(data)
## pub rev man sta typ
## Min. : 1.0 Min. : 6.68 Min. : 3.000 Min. :1.00 Min. :0.0
## 1st Qu.: 250.8 1st Qu.:13.03 1st Qu.: 5.000 1st Qu.:5.00 1st Qu.:0.0
## Median : 500.5 Median :14.59 Median : 6.000 Median :6.00 Median :0.5
## Mean : 500.5 Mean :14.57 Mean : 6.182 Mean :5.76 Mean :0.5
## 3rd Qu.: 750.2 3rd Qu.:16.21 3rd Qu.: 7.000 3rd Qu.:7.00 3rd Qu.:1.0
## Max. :1000.0 Max. :20.51 Max. :10.000 Max. :9.00 Max. :1.0
## income com
## Min. : 7.00 Min. :0.08551
## 1st Qu.: 9.00 1st Qu.:0.40239
## Median :10.34 Median :0.50554
## Mean :10.67 Mean :0.50542
## 3rd Qu.:12.30 3rd Qu.:0.61342
## Max. :14.30 Max. :1.09986
#Step 3: Scatter plot
#This assignment requires the following packages
#install.packages(“stargazer”)
#install.packages(“dplyr”)
#install.packages(“GGally”)
##install.packages(“ggplot2”)
library(stargazer)
## Warning: package 'stargazer' was built under R version 4.5.2
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(GGally)
## Warning: package 'GGally' was built under R version 4.5.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.5.2
library(ggplot2)
ggpairs(data[,c("rev", "man", "sta", "income", "com")])
Revenue increases with management and staff quality (as expected), while income and competition show very weak relationships, which is not fully consistent with prior economic expectations.
#Step 4: Correlation Matrix
cor_matrix <- cor(data[, c("rev","man","sta","typ","income","com")])
cor_matrix
## rev man sta typ income com
## rev 1.00000000 0.866677747 0.52764040 0.22769740 0.04748579 0.032137991
## man 0.86667775 1.000000000 0.37534361 0.13791423 -0.03755771 0.008185842
## sta 0.52764040 0.375343612 1.00000000 0.02504888 -0.01949465 -0.019629220
## typ 0.22769740 0.137914233 0.02504888 1.00000000 0.01269144 0.065605394
## income 0.04748579 -0.037557712 -0.01949465 0.01269144 1.00000000 -0.049762416
## com 0.03213799 0.008185842 -0.01962922 0.06560539 -0.04976242 1.000000000
If |correlation| > 0.85 drop one variable.
There are no correlations above this threshold.
#Step 5: Choose Model
We keep all the variables
#Step 6: Estimate model
model1 <- lm(rev ~ man + sta + typ + income + com, data=data)
summary(model1)
##
## Call:
## lm(formula = rev ~ man + sta + typ + income + com, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.66824 -0.65635 0.01647 0.66823 3.11067
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.57062 0.29180 5.383 9.16e-08 ***
## man 1.55939 0.03002 51.939 < 2e-16 ***
## sta 0.29834 0.01808 16.500 < 2e-16 ***
## typ 0.51786 0.06227 8.316 2.97e-16 ***
## income 0.11035 0.01847 5.975 3.21e-09 ***
## com 0.40926 0.20390 2.007 0.045 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9726 on 994 degrees of freedom
## Multiple R-squared: 0.8192, Adjusted R-squared: 0.8183
## F-statistic: 900.6 on 5 and 994 DF, p-value: < 2.2e-16
#Step 7: Compare with alternative models
model2 <- lm(rev ~ man + sta + income + com, data=data) # drop TYPE
model3 <- lm(rev ~ man + income + com, data=data) # drop STA & TYPE
AIC(model1, model2, model3)
## df AIC
## model1 7 2790.195
## model2 6 2855.454
## model3 5 3075.995
BIC(model1, model2, model3)
## df BIC
## model1 7 2824.550
## model2 6 2884.901
## model3 5 3100.534
Choosen model is model1 as it has the lowest AIC/BIC.
#Step 8: Coefficient Interpretation
A one unit increase in management quality(man) increases revenue by 1.56 million pounds.
A one unit increase in staff quality(sta) increases revenue by 0.30 million pounds.
Since it is a modern style pub (type = 1), the revenue increases by 0.52 million pounds as compared to a traditional pub.
A 1000 pound increase in local income increases revenue by 0.11 million pounds.
A one unit increase in competitor(com) increases revenue by 0.41 million pounds which indicates competition cluster attracts customers.
#Step 9: Prediction of revenue for Site A and B
We use our chosen regression model model1 to predict revenue for 2 new sites.
#Site Characteristics (Assumed)
| Variable | Site A | Site B |
|---|---|---|
| MAN | 6 (experienced owner) | 6 (expected competent manager) |
| STA | 7 (loyal local staff) | 6 (average, new hires) |
| TYPE | 0 (traditional) | 1 (modern pub conversion) |
| INCOME | 8 (rural low-income) | 14.8(urban high-income) |
| COM | 0.1 (very low competition) | 9 (high competition area) |
#Scenerio Calculation
Revenue= 1.5706 + 1.5594(MAN) + 0.2983(STA) + 0.5179(TYP) + 0.1104(INCOME) + 0.4093(COM)
Revenue (Site A) = 1.5706+ 1.5594 (6)+ 0.2983(7) + 0.5179(0) + 0.1104(8) + 0.4093(0.1) = 13.94 million pounds
Revenue (Site B) = 1.5706+ 1.5594 (6)+ 0.2983(6) + 0.5179(1) + 0.1104(14.8) + 0.4093(9) = 18.55 million pounds
#Sensitivity Analysis (uncertain income)
We test 3 possible income levels for each location to assess how predictions change under uncertainty.
| INCOME | Site A Predicted Revenue | Site B Predicted Revenue |
|---|---|---|
| 6 | 13.72 | 18.33 |
| 8 (base) | 13.94 | 18.55 |
| 10 | 14.16 | 18.77 |
#Conclusion:
Companions should choose Site B because it consistently shows significantly higher predicted revenues across all tested income levels.
#Step 10: Unexpected shock (eg: Covid)
An unexpected shock like COVID would reduce foot traffic and on-premise sales, lowering revenues for both sites.
Under such circumstances, Site A is comparatively more secure due to its lower cost structure, whereas Site B becomes riskier because its higher fixed costs might escalate losses.