#Step 1: Conceptual model

Dependable variable includes Revenue in million pounds.

Independent variables s MAN (management quality), STA (staff quality), TYP (1 = modern pub, 0 = traditional pub), COM (competitor density), INCOME (local income).

The independent variables MAN, STA and INCOME will have a positive effect on revenue.

The independent variable TYP will have a likely positive effect on revenue.

The independent variables COM will have a ambiguous effect on revenue.

#Step 2: Load data and summary statistics

data <- read.csv("companions_data.csv")
head(data)
##   pub      rev man sta typ income       com
## 1   1 14.46734   7   5   1  12.60 0.4758482
## 2   2 15.04026   6   8   1   9.98 0.3994318
## 3   3 14.87616   6   7   0  10.30 0.9406128
## 4   4 16.82898   8   6   1   8.90 0.3127369
## 5   5 10.79394   4   5   0   9.00 0.7215752
## 6   6 12.35022   5   8   1   8.90 0.5239477
summary(data)
##       pub              rev             man              sta            typ     
##  Min.   :   1.0   Min.   : 6.68   Min.   : 3.000   Min.   :1.00   Min.   :0.0  
##  1st Qu.: 250.8   1st Qu.:13.03   1st Qu.: 5.000   1st Qu.:5.00   1st Qu.:0.0  
##  Median : 500.5   Median :14.59   Median : 6.000   Median :6.00   Median :0.5  
##  Mean   : 500.5   Mean   :14.57   Mean   : 6.182   Mean   :5.76   Mean   :0.5  
##  3rd Qu.: 750.2   3rd Qu.:16.21   3rd Qu.: 7.000   3rd Qu.:7.00   3rd Qu.:1.0  
##  Max.   :1000.0   Max.   :20.51   Max.   :10.000   Max.   :9.00   Max.   :1.0  
##      income           com         
##  Min.   : 7.00   Min.   :0.08551  
##  1st Qu.: 9.00   1st Qu.:0.40239  
##  Median :10.34   Median :0.50554  
##  Mean   :10.67   Mean   :0.50542  
##  3rd Qu.:12.30   3rd Qu.:0.61342  
##  Max.   :14.30   Max.   :1.09986

#Step 3: Scatter plot

#This assignment requires the following packages

#install.packages(“stargazer”)

#install.packages(“dplyr”)

#install.packages(“GGally”)

##install.packages(“ggplot2”)

library(stargazer)
## Warning: package 'stargazer' was built under R version 4.5.2
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(GGally)
## Warning: package 'GGally' was built under R version 4.5.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.5.2
library(ggplot2)

ggpairs(data[,c("rev", "man", "sta", "income", "com")])

Revenue increases with management and staff quality (as expected), while income and competition show very weak relationships, which is not fully consistent with prior economic expectations.

#Step 4: Correlation Matrix

cor_matrix <- cor(data[, c("rev","man","sta","typ","income","com")])
cor_matrix
##               rev          man         sta        typ      income          com
## rev    1.00000000  0.866677747  0.52764040 0.22769740  0.04748579  0.032137991
## man    0.86667775  1.000000000  0.37534361 0.13791423 -0.03755771  0.008185842
## sta    0.52764040  0.375343612  1.00000000 0.02504888 -0.01949465 -0.019629220
## typ    0.22769740  0.137914233  0.02504888 1.00000000  0.01269144  0.065605394
## income 0.04748579 -0.037557712 -0.01949465 0.01269144  1.00000000 -0.049762416
## com    0.03213799  0.008185842 -0.01962922 0.06560539 -0.04976242  1.000000000

If |correlation| > 0.85 drop one variable.

There are no correlations above this threshold.

#Step 5: Choose Model

We keep all the variables

#Step 6: Estimate model

model1 <- lm(rev ~ man + sta + typ + income + com, data=data)
summary(model1)
## 
## Call:
## lm(formula = rev ~ man + sta + typ + income + com, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.66824 -0.65635  0.01647  0.66823  3.11067 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.57062    0.29180   5.383 9.16e-08 ***
## man          1.55939    0.03002  51.939  < 2e-16 ***
## sta          0.29834    0.01808  16.500  < 2e-16 ***
## typ          0.51786    0.06227   8.316 2.97e-16 ***
## income       0.11035    0.01847   5.975 3.21e-09 ***
## com          0.40926    0.20390   2.007    0.045 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9726 on 994 degrees of freedom
## Multiple R-squared:  0.8192, Adjusted R-squared:  0.8183 
## F-statistic: 900.6 on 5 and 994 DF,  p-value: < 2.2e-16

#Step 7: Compare with alternative models

model2 <- lm(rev ~ man + sta + income + com, data=data)     # drop TYPE
model3 <- lm(rev ~ man + income + com, data=data)           # drop STA & TYPE

AIC(model1, model2, model3)
##        df      AIC
## model1  7 2790.195
## model2  6 2855.454
## model3  5 3075.995
BIC(model1, model2, model3)
##        df      BIC
## model1  7 2824.550
## model2  6 2884.901
## model3  5 3100.534

Choosen model is model1 as it has the lowest AIC/BIC.

#Step 8: Coefficient Interpretation

A one unit increase in management quality(man) increases revenue by 1.56 million pounds.

A one unit increase in staff quality(sta) increases revenue by 0.30 million pounds.

Since it is a modern style pub (type = 1), the revenue increases by 0.52 million pounds as compared to a traditional pub.

A 1000 pound increase in local income increases revenue by 0.11 million pounds.

A one unit increase in competitor(com) increases revenue by 0.41 million pounds which indicates competition cluster attracts customers.

#Step 9: Prediction of revenue for Site A and B

We use our chosen regression model model1 to predict revenue for 2 new sites.

#Site Characteristics (Assumed)

Variable Site A Site B
MAN 6 (experienced owner) 6 (expected competent manager)
STA 7 (loyal local staff) 6 (average, new hires)
TYPE 0 (traditional) 1 (modern pub conversion)
INCOME 8 (rural low-income) 14.8(urban high-income)
COM 0.1 (very low competition) 9 (high competition area)

#Scenerio Calculation

Revenue= 1.5706 + 1.5594(MAN) + 0.2983(STA) + 0.5179(TYP) + 0.1104(INCOME) + 0.4093(COM)

Revenue (Site A) = 1.5706+ 1.5594 (6)+ 0.2983(7) + 0.5179(0) + 0.1104(8) + 0.4093(0.1) = 13.94 million pounds

Revenue (Site B) = 1.5706+ 1.5594 (6)+ 0.2983(6) + 0.5179(1) + 0.1104(14.8) + 0.4093(9) = 18.55 million pounds

#Sensitivity Analysis (uncertain income)

We test 3 possible income levels for each location to assess how predictions change under uncertainty.

INCOME Site A Predicted Revenue Site B Predicted Revenue
6 13.72 18.33
8 (base) 13.94 18.55
10 14.16 18.77

#Conclusion:

Companions should choose Site B because it consistently shows significantly higher predicted revenues across all tested income levels.

#Step 10: Unexpected shock (eg: Covid)

An unexpected shock like COVID would reduce foot traffic and on-premise sales, lowering revenues for both sites.

Under such circumstances, Site A is comparatively more secure due to its lower cost structure, whereas Site B becomes riskier because its higher fixed costs might escalate losses.