Evidencia

# Load packages
library(readxl) # for the read_xlsx() function
library(wooldridge)
library(psych) # for the describe() function
library(tseries) # for the jarque.bera.test() function

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(lmtest) # for bptest(), coeftest(), dwtest() & bgtest() functions

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(sandwich) # for vcovHC() function
library(dplyr) # for lag() function

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:psych':
## 
##     logit

library(data.table)

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

1. Executive Summary

This project consisted in developing a model that could help Alicia, an entrepreneur who wants to construct a hotel and a gas station, predict some factors that could help her decide whether it would be a good or bad decision to do the investment. By analyzing the future behavior of some variables such as the gas sales in the area of Yucatan and Quintana Roo, the hotel occupancy of the area and the exchange rate (USD-MXN), Alicia could get to know the operating margin for her business. She was worried about this, because depending on this factor, she would decide whether to invest or not; after all the analysis, we can say that Alicia is ready to do the investment.

First of all, we were provided a database with some historical information about gasoline, hotel occupancy and the exchange rate. With this information we had to develop a regression model, for each one of these three variables, that could help to predict future values of these factors. For gasoline, we were given information about the price of the gasoline in the area, the demand, the number of vehicles and the oil GDP; all from 2000 to 2019. After making some analysis, we decided that the best model is one that includes the data of vehicles and oilGdp, all in a logarithmic form, as well as the sales. For the hotel occupancy, we were given data about the GDP of the touristic activity in the area, the number of national and foreign tourists and the average time of stay of these people. With this information we saw that the best model was one that included the number of foreign tourists and the average time of stay. Lastly, with information about financial instruments such as the Cetes for 28 days, Cetes for 91 days, the oil price (WTI), gold and the Standard & Poors Index, we had to find the best model for predicting the exchange rate. We got to the conclusion that the information of Cetes 28, Cetes 91 and the Standard and Poors, all in a logarithmic way, had the best results.

After testing that all these models accomplished the principles of a correct regression model, which include linearity, normality of errors, homoscedasticity, no multicollinearity, and others, we used them to predict some other factors that would led us to the operating margin. With predicted information for the next 10 years of the variables that we used to create our initial models, we were able to predict. Alicia told us that we could calculate the revenue of the gasoline by multiplying the total sales by 6%. On the other hand, knowing that the hotel will have 20 rooms with a $800 fee for the night and the hotel occupancy that the model gives us, we could calculate the hotel revenue. This information also helped to calculate the costs, which included a variable expense equivalent to 33% of the revenue of the gasoline and 20% of the revenue of the hotel, as well as almost 2.5 million in fixed costs. The last cost would be the for the debt, this because Alicia decided to get a loan for 6 million pesos with an annual interest rate of 3%. Knowing this information, as well as that the time of the loan will be 10 years, we could calculate that the annual interest payment for the loan in pesos is of a little more than 35,000.

Getting to know our earning and costs allowed us to calculate the profit, which is the result of subtracting the costs to the revenue. Also, by knowing the profit, we could easily calculate the operating profit, which is the result of dividing the profit by the total earnings. We got to know that the base profit is, of about 1.6 million, or 26% margin, which is exactly what Alicia expected. However, these results can be understood as the base or realistic scenario, but Alicia also wanted to know that she could expect in a pessimist or optimistic scenario. By making a process in which we calculate some normal random numbers to create different scenarios, including the variance of our used variables and by using our models, we could again predict the value of the variables sales of gasoline, hotel occupancy and the exchange rate. This step is very important, because as we use random numbers, we are making different scenarios that affect all the variables.

After doing this, and calculating again all the factors such as revenues, costs and operating margin, we got to know the expected results for the other two scenarios, pessimist and optimistic. In the base scenario, Alicia got a 26% operating margin, which is exactly what she was expecting. For the pessimist scenario she needs no less than 23% margin, our analysis tell us that, using a 5% cutoff, she will get a 24% margin. On the other hand, in the best possible scenario, using a 95% cutoff, she could expect a 29% margin. In conclusion, the analysis helped us to decide that the investment seems to be a good decision, based on the conditions that Alicia has for the project.

2. Problem to solve

Alice is an entrepreneur that wants to venture into roadside ancillary services in Yucatan and Quintana Roo states. She is interested in constructing a hotel and a gas station, with the intention to expand into other services in the future, such as a convenience store and restaurants, in the same service area. Alicia is planning to build the gas station with two pumps and the hotel with 20 rooms (charging $800 per room/night), with no intention of expanding during the first 10 years.

Alicia knows that the exoected revenue of the gasoline sales is 6% of the total gas sales, according to the current legislation and business model in Mexico.

Alicia must apply for a loan of 6,000,000 MXN to build this project. However, during the research of the credit loans in pesos, she realized that interest rates in Mexico are extremely high and it would be very expensive, at the moment, to take a loan in that exchange rate. Therefore, she is assessing to take a loan in USD for 299,102.7 dollars with an effective annual rate of 3.0%, payable over 10 years (in annual payments).

To take a decision on this opportunity, Alicia must calculate her operating profit, using the forecast of both the hotel and gas station revenues, as well as the forecast of the exchange rate MXN/USD to calculate the debt in MXN. Regarding the expenses, Alicia has the following information: variable expense equivalent to 33% of the revenue of the gasoline and 20% of the revenue of the hotel, aswell as almost 2.5 million in fixed expenses.

The main questions Alicia is facing are the following: 1. What variables are related to gas sales, hotel occupancy, and the exchange rate? Is it possible to create a model to forecast 10 years of these variables? 2. Are these models statistically valid? Is it necessary to make some statistical fit to the models in order to use them? 3. Knowing the models are statistically valid, which would be a correct way to simulate revenues and expenses? 4. Which would be the base scenario of the operating cash flow for this business? 5. Which would be the optimistic and pessimistic scenario for this business in terms of operating cash flow?

3. Resolution Methodology

For this problem we decided to use regression models for each one of the three main variables: gas sales, hotel occupancy and exchange rate. As we already said, by taking the historical information of the given “x” or predictory variables, we ran different models to see which one would be statiscally more significant. For exaple, for the gas sales, we first ran a regression model including all the variables in a normal form, but we quickly discovered that some variables were not significant at all. Then, after taking out these variables and running the model again, we saw that the model faced some issues with the principles of the regression models, so we decided to try with less variables or by transforming them in a different form. In summary, we ran different models to see which wone was highly significant and that had no issue with the principles. All this process was made for the three main variables.

After having the models validated, we used the models to predict the future values of the main variables using information that we were given of the predictory variables for the next 10 years. The only issue was that, for the exchange rate, we were only given information about two variables, being the Cetes91 and the Standard and Poors, so according to the model that I chose, i had to research about future values for the Cetes 28. Having this, we proceeded to calculate the revenue of the gasoline, the revenye of the hotel, the costs and the profit/ operating margin. We knew that this was only the base scenario, because the used information for the variables is the one expected for the next years, however, Alicia also wanted to know what she could expect in a bad and good scenario. So, by making a covariance matrix in which we correlate the variance of the variables, and by creating more than 1 million different scenarios for the used variables, we could get to see how the profit would look like in many different scenarios. Using the models, and affecting the predicted values for the different variables with these scenarios, we calculated again the profit for one year in the future. It was great news that, by using quantiles of these scenario, we saw that the base margin is about 26%, the worse in 24% and the best in 29%.

4. Explanatory variables research

Alicia wanted to calculate the gas sales, hotel occupancy and exchange rate, in order to get to the profit. To do this, we were given this information:

Gas Sales: -Sales of gasoline in the area -Price of the gasoline in the area -Demand of gasoline in the area -Vehicles in the area -Oil gross domestic product of the area All from 2000 to 2019
Hotel Occupancy -Average occupancy rate of the hotels of the area -Growth domestic product of the touristic activity in the area -Total of national tourists in the area -Total of foreign tourists in the area -Average days of stay of the tourists in the area All from 2003 to 2019
Exchange Rate -Exchange rate (USDMXN) -Return rate for the Cete of 28 days -Return rate for the Cete of 91 days -Market value of the crude oil (WTI) -Index price for the Standard and Poors All from December of 2013 to August of 2019

All these variables were used to calculate the best model for each one of the three main variables. Some of them were more significant than others to predict each variable, and therefore, used for the models. Later on, as we already said, we also got information for the next 10 years of almost all of these explanatory variables, except for the exchange rate.

5. Regression models and statistical inference in R // 6. Detection and correction of the violations of the assumptions in R

First we read the excel file in which we have the information about all the different variables. We store them in three different variables because the information is separated in three sheets, for gasoline, occupancy and exchange rate:

# Leemos la info del excel, separando las hojas
gas <- read_xlsx("FP_Hotel&Gas_Data1.xlsx", sheet = "gasoline")

## New names:
## * `` -> ...7
## * `` -> ...8
## * `` -> ...9
## * `` -> ...10
## * `` -> ...11
## * ...

head(gas, 5)

## # A tibble: 5 x 19
##      yr  sales price demand vehicles oilGdp ...7   ...8   ...9  ...10  ...11
##   <dbl>  <dbl> <dbl>  <dbl>    <dbl>  <dbl> <lgl> <dbl>  <dbl>  <dbl>  <dbl>
## 1  2000 1.55e7  5.27   15.0   1.56e7 6.07e5 NA     4.96 5.15e6 2.00e5 5.35e6
## 2  2001 1.62e7  5.61   15.3   1.77e7 6.20e5 NA     5.06 5.83e6 2.04e5 6.03e6
## 3  2002 1.64e7  5.86   16.5   1.84e7 6.57e5 NA     5.44 6.06e6 2.17e5 6.27e6
## 4  2003 1.66e7  6.04   16.6   1.84e7 6.59e5 NA     5.49 6.09e6 2.18e5 6.31e6
## 5  2004 1.66e7  6.21   17.4   1.91e7 6.73e5 NA     5.73 6.31e6 2.22e5 6.54e6
## # … with 8 more variables: ...12 <dbl>, ...13 <dbl>, ...14 <lgl>, ...15 <dbl>,
## #   ...16 <dbl>, ...17 <dbl>, ...18 <dbl>, ...19 <dbl>

occ <- read_xlsx("FP_Hotel&Gas_Data1.xlsx", sheet = "occupancy")

## New names:
## * `` -> ...7
## * `` -> ...8
## * `` -> ...9
## * `` -> ...10
## * `` -> ...11

head(occ, 5)

## # A tibble: 5 x 11
##      yr   occ tourGdp domTour intTour avgStay ...7   ...8  ...9 ...10 ...11
##   <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
## 1  2003  68.7    75.8 106756. 320576.    4.76 NA     8.29  57.6  65.9 2.77 
## 2  2004  66.1    76.3 140338. 342074.    4.71 NA     8.85  57.0  65.8 0.307
## 3  2005  70.6    78.2 135418. 371493.    4.96 NA     9.61  60.0  69.6 1.06 
## 4  2006  75.2    83.4 117063. 418025.    5.13 NA    10.8   62.1  72.9 2.30 
## 5  2007  73.0    83.5 106490. 392934.    5.03 NA    10.2   60.8  71.0 1.99

exr <- read_xlsx("FP_Hotel&Gas_Data1.xlsx", sheet = "exrate")

## New names:
## * `` -> ...8
## * `` -> ...9

head(exr, 5)

## # A tibble: 5 x 9
##   Dates                 MXN CETE28 CETE91   WTI  GOLD   SnP ...8      ...9
##   <dttm>              <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <lgl>    <dbl>
## 1 2013-12-31 00:00:00  13.0   3.16   3.44  98.4 1202. 1848. NA    -0.00253
## 2 2014-01-31 00:00:00  13.4   3.18   3.42  97.5 1245. 1783. NA    -0.0103 
## 3 2014-02-28 00:00:00  13.2   3.15   3.34 103.  1326. 1859. NA    -0.00523
## 4 2014-03-31 00:00:00  13.1   3.19   3.3  102.  1284. 1872. NA     0.0155 
## 5 2014-04-30 00:00:00  13.1   3.25   3.42  99.7 1292. 1884. NA    -0.00511

##. Gasoline regression model

As a first step, we ran a multiple regression model for gasoline, including the information about all the explanatory variablles, which are the price, demand, vehicles and oilGdp. The importance of this first model is to determine which variables are significant or not.

# Model1: Gasoline
modGas <- lm(sales ~ price + demand + vehicles + oilGdp, data = gas)
modGasSum <- summary(modGas)
modGasSum

## 
## Call:
## lm(formula = sales ~ price + demand + vehicles + oilGdp, data = gas)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1478875  -357762   -41949   443314  1513857 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.942e+06  1.361e+06   2.895 0.011100 *  
## price        4.468e+04  1.209e+05   0.369 0.716993    
## demand      -3.200e+05  2.591e+05  -1.235 0.235725    
## vehicles     4.203e-01  1.323e-01   3.176 0.006268 ** 
## oilGdp       1.508e+01  3.222e+00   4.681 0.000296 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 769500 on 15 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.9403, Adjusted R-squared:  0.9244 
## F-statistic: 59.05 on 4 and 15 DF,  p-value: 5.328e-09

We realize that only the intercept, vehicles and oilGdp are significant or that can explain the sales of gasoline. Therefore, we decide to eliminate the other variables.

This second model includes only the variables that were seemed as significant, vehicles and oilGdp:

# Model1.2: Gasoline
modGas1 <- lm(sales ~ vehicles + oilGdp, data = gas)
modGasSum1 <- summary(modGas1)
modGasSum1

## 
## Call:
## lm(formula = sales ~ vehicles + oilGdp, data = gas)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1452572  -384514    -3055   391498  1778490 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.871e+06  1.129e+06   3.430 0.003192 ** 
## vehicles    2.354e-01  6.934e-02   3.395 0.003443 ** 
## oilGdp      1.259e+01  2.624e+00   4.799 0.000167 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 790500 on 17 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.9286, Adjusted R-squared:  0.9202 
## F-statistic: 110.5 on 2 and 17 DF,  p-value: 1.809e-10

The adjusted R-squared does not change that much, but now all the variables seem significant. Now we had to see if this model was valid by looking at the principles of a regression model:

#Graphs of the model
plot(modGas1)

By plotting the model, we see some good things. However, we know that this is only a visual support, so we run some tests in order to validate the model:

jarque.bera.test(modGas1$residuals)

## 
##  Jarque Bera Test
## 
## data:  modGas1$residuals
## X-squared = 1.0564, df = 2, p-value = 0.5897

vif(modGas1)

## vehicles   oilGdp 
## 3.554923 3.554923

bptest(modGas1)

## 
##  studentized Breusch-Pagan test
## 
## data:  modGas1
## BP = 5.1897, df = 2, p-value = 0.07466

We get to see that the model barely accomplishes the principles or assumptions, so we take it an option. Nevertheless, we have to look if there is a better model.

This third model includes again all the variables, but now in a logarithmic form:

# Model3: Gasoline
modGas3 <- lm(log(sales) ~ log(price) + log(demand) + log(vehicles) + log(oilGdp), data = gas)
modGasSum3 <- summary(modGas3)
modGasSum3

## 
## Call:
## lm(formula = log(sales) ~ log(price) + log(demand) + log(vehicles) + 
##     log(oilGdp), data = gas)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.067857 -0.016999 -0.003489  0.024604  0.071207 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.14940    2.51344   0.059  0.95339    
## log(price)     0.08081    0.06471   1.249  0.23083    
## log(demand)   -0.52313    0.26328  -1.987  0.06551 .  
## log(vehicles)  0.53953    0.15550   3.470  0.00343 ** 
## log(oilGdp)    0.65506    0.12244   5.350  8.1e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03443 on 15 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.9541, Adjusted R-squared:  0.9418 
## F-statistic: 77.92 on 4 and 15 DF,  p-value: 7.523e-10

We realize that, again, only vehicles and oilGdp are significant, so we can not use this model.

This fourth model includes all the variables again, but without including the intercept. This practice is not always the most recommended one, but other experts say that this shold not affect as the intercept is almost not significant in any case.

# Model4: Gasoline
modGas4 <- lm(log(sales) ~ log(price) + log(demand) + log(vehicles) + log(oilGdp) - 1, data = gas)
modGasSum4 <- summary(modGas4)
modGasSum4

## 
## Call:
## lm(formula = log(sales) ~ log(price) + log(demand) + log(vehicles) + 
##     log(oilGdp) - 1, data = gas)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.068119 -0.016870 -0.003546  0.025617  0.070768 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## log(price)     0.08252    0.05619   1.469 0.161337    
## log(demand)   -0.53665    0.12851  -4.176 0.000714 ***
## log(vehicles)  0.54733    0.08075   6.778 4.44e-06 ***
## log(oilGdp)    0.65907    0.09888   6.666 5.43e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03334 on 16 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.267e+06 on 4 and 16 DF,  p-value: < 2.2e-16

By taking away the intercept, something changes: now the demand also becomes significant, so we only need to take out the price.

# Model5: Gasoline
modGas5 <- lm(log(sales) ~ log(demand) + log(vehicles) + log(oilGdp) -1, data = gas)
modGasSum5 <- summary(modGas5)
modGasSum5

## 
## Call:
## lm(formula = log(sales) ~ log(demand) + log(vehicles) + log(oilGdp) - 
##     1, data = gas)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.055673 -0.021188 -0.004044  0.025193  0.066426 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## log(demand)   -0.35968    0.04613  -7.796 5.18e-07 ***
## log(vehicles)  0.53761    0.08317   6.464 5.83e-06 ***
## log(oilGdp)    0.64476    0.10169   6.341 7.39e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03446 on 17 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.582e+06 on 3 and 17 DF,  p-value: < 2.2e-16

The R-squared, similar to the last model, becomes 1. This could make us think that this model is perfect; however, it is a consequence of taking out the intercept, so we should avoid looking at this R2 as a critical factor now.

Again, we run some tests to validate this model:

#Graphs of the model
plot(modGas5)

jarque.bera.test(modGas5$residuals)

## 
##  Jarque Bera Test
## 
## data:  modGas5$residuals
## X-squared = 0.50288, df = 2, p-value = 0.7777

vif(modGas5)

## Warning in vif.default(modGas5): No intercept: vifs may not be sensible.

##   log(demand) log(vehicles)   log(oilGdp) 
##      338.8836    33640.9575    32115.0009

gas$Z <- 0.3333*gas$demand + 0.333*gas$vehicles + 0.333*gas$oilGdp
modGas5F <- lm(log(sales) ~ log(Z) - 1, data = gas)
modGas5FSum <- summary(modGas5F)
modGas5FSum

## 
## Call:
## lm(formula = log(sales) ~ log(Z) - 1, data = gas)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25763 -0.05328 -0.02989  0.06275  0.21582 
## 
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)    
## log(Z)  1.05408    0.00144     732   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1025 on 19 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 5.358e+05 on 1 and 19 DF,  p-value: < 2.2e-16

bptest(modGas5)

## 
##  studentized Breusch-Pagan test
## 
## data:  modGas5
## BP = 3.7407, df = 2, p-value = 0.1541

Finally, we learned that this model is good except for two things: it has multicollinearity and heteroskedasticity. We solved the multicollinearity by making an index of all three variables, as seen above, but we could not solve the heteroskedasticity issue. We can keep this model as another option, but also remembering that it has an issue.

Finally, this last model includes only the two variables that, since the beginning were seemed as significant, but everything in logarithmic.

# Model5: Gasoline
modGas6 <- lm(log(sales) ~ log(vehicles) + log(oilGdp), data = gas)
modGasSum6 <- summary(modGas6)
modGasSum6

## 
## Call:
## lm(formula = log(sales) ~ log(vehicles) + log(oilGdp), data = gas)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.057162 -0.020533 -0.001987  0.020857  0.083520 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    5.10470    0.71058   7.184 1.53e-06 ***
## log(vehicles)  0.26664    0.08606   3.098 0.006531 ** 
## log(oilGdp)    0.52661    0.11253   4.680 0.000215 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03669 on 17 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.9409, Adjusted R-squared:  0.934 
## F-statistic: 135.3 on 2 and 17 DF,  p-value: 3.614e-11

As this model does not take out the intercept, the R2 can be analyzed, and we see that is very good at 93%. Also, all the factors are significant and positive.

We proceed to validate the model:

#Gráficos del modelo
plot(modGas6)

jarque.bera.test(modGas6$residuals)

## 
##  Jarque Bera Test
## 
## data:  modGas6$residuals
## X-squared = 1.0352, df = 2, p-value = 0.5959

vif(modGas6)

## log(vehicles)   log(oilGdp) 
##      4.728478      4.728478

No hay multicolinearidad

bptest(modGas6)

## 
##  studentized Breusch-Pagan test
## 
## data:  modGas6
## BP = 3.535, df = 2, p-value = 0.1708

We discover that this model is great. It passed all the tests, meaning that is good for predicting.

We decide to keep the model 6 for gasoline for the next steps.

##. Occupancy

Similar to the first step with gasoline, we ran a multiple regression model to test all the variables:

# Model1: Occupancy
modOcc <- lm(occ ~ tourGdp + domTour + intTour + avgStay, data = occ)
modOccSum <- summary(modOcc)
modOccSum

## 
## Call:
## lm(formula = occ ~ tourGdp + domTour + intTour + avgStay, data = occ)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0604 -1.7841  0.0743  1.2059  4.8907 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  1.887e+00  2.239e+01   0.084  0.93425   
## tourGdp     -1.444e-01  2.300e-01  -0.628  0.54173   
## domTour     -1.227e-05  3.525e-05  -0.348  0.73392   
## intTour      3.721e-05  9.204e-06   4.043  0.00163 **
## avgStay      1.376e+01  3.721e+00   3.697  0.00305 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.833 on 12 degrees of freedom
## Multiple R-squared:  0.832,  Adjusted R-squared:  0.776 
## F-statistic: 14.86 on 4 and 12 DF,  p-value: 0.0001346

We quickly discovered that only the national tourists and the average days for stay are significant. We proceed to remove the other variables in the next model:

# Model1: Occupancy
modOcc1 <- lm(occ ~ intTour + avgStay, data = occ)
modOccSum1 <- summary(modOcc1)
modOccSum1

## 
## Call:
## lm(formula = occ ~ intTour + avgStay, data = occ)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0052 -2.3257  0.6618  1.1284  3.9807 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.896e+01  1.254e+01  -1.511    0.153    
## intTour      3.046e-05  4.022e-06   7.575 2.57e-06 ***
## avgStay      1.572e+01  2.420e+00   6.496 1.41e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.762 on 14 degrees of freedom
## Multiple R-squared:  0.8138, Adjusted R-squared:  0.7872 
## F-statistic:  30.6 on 2 and 14 DF,  p-value: 7.754e-06

The R2 does not change that much, but now we see that the intercept is not significant at all. Again, we test a new model but now taking out the intercept, knowing that it may affect the R2.

# Model2: Occupancy
modOcc2 <- lm(occ ~ intTour + avgStay -1, data = occ)
modOccSum2 <- summary(modOcc2)
modOccSum2

## 
## Call:
## lm(formula = occ ~ intTour + avgStay - 1, data = occ)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6425 -2.0826  0.9628  1.9995  2.9011 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)    
## intTour 2.586e-05  2.737e-06   9.449 1.05e-07 ***
## avgStay 1.210e+01  3.606e-01  33.551 1.59e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.877 on 15 degrees of freedom
## Multiple R-squared:  0.9985, Adjusted R-squared:  0.9983 
## F-statistic:  4904 on 2 and 15 DF,  p-value: < 2.2e-16

The R2 soars to almost 1, but we know that we should not look at this factor anymore. The important thing is that now the variables included in the model are significant.

We proceed to test this model:

#Gráficos del modelo
plot(modOcc2)

jarque.bera.test(modOcc2$residuals)

## 
##  Jarque Bera Test
## 
## data:  modOcc2$residuals
## X-squared = 2.1746, df = 2, p-value = 0.3371

vif(modOcc2)

## Warning in vif.default(modOcc2): No intercept: vifs may not be sensible.

##  intTour  avgStay 
## 5.484057 5.484057

bptest(modOcc2)

## 
##  studentized Breusch-Pagan test
## 
## data:  modOcc2
## BP = 0.074118, df = 1, p-value = 0.7854

After validating the model, we can confirm that is good for predicting. It passed all the tests in a great way. We decide to keep this model for the next steps.

##. Exchange Rate

Lastly, for the exchange rate we first have to do something different. As we are dealing with numeric factors that represent the performance of some financial assets, we should use them but in a logarithmic return. For this reason, we proceed to add new columns that include all the variables but expressed in logarithmic return:

exr$mxnlog <- log(exr$MXN/lag(exr$MXN))
exr$c28log <- log(exr$CETE28/lag(exr$CETE28))
exr$c91log <- log(exr$CETE91/lag(exr$CETE91))
exr$wtilog <- log(exr$WTI/lag(exr$WTI))
exr$goldlog <- log(exr$GOLD/lag(exr$GOLD))
exr$snplog <- log(exr$SnP/lag(exr$SnP))
head(exr,10)

## # A tibble: 10 x 15
##    Dates                 MXN CETE28 CETE91   WTI  GOLD   SnP ...8      ...9
##    <dttm>              <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <lgl>    <dbl>
##  1 2013-12-31 00:00:00  13.0   3.16   3.44  98.4 1202. 1848. NA    -0.00253
##  2 2014-01-31 00:00:00  13.4   3.18   3.42  97.5 1245. 1783. NA    -0.0103 
##  3 2014-02-28 00:00:00  13.2   3.15   3.34 103.  1326. 1859. NA    -0.00523
##  4 2014-03-31 00:00:00  13.1   3.19   3.3  102.  1284. 1872. NA     0.0155 
##  5 2014-04-30 00:00:00  13.1   3.25   3.42  99.7 1292. 1884. NA    -0.00511
##  6 2014-05-30 00:00:00  12.9   3.31   3.38 103.  1250. 1924. NA    -0.0650 
##  7 2014-06-30 00:00:00  13.0   2.9    2.91 105.  1327. 1960. NA     0      
##  8 2014-07-31 00:00:00  13.2   2.8    2.91  98.2 1283. 1931. NA    -0.00601
##  9 2014-08-29 00:00:00  13.1   2.76   2.87  96.0 1287. 2003. NA     0.0105 
## 10 2014-09-30 00:00:00  13.4   2.85   2.94  91.2 1208. 1972. NA    -0.00296
## # … with 6 more variables: mxnlog <dbl>, c28log <dbl>, c91log <dbl>,
## #   wtilog <dbl>, goldlog <dbl>, snplog <dbl>

Now, similar to the last models, we firt run a multiple regression to see which variables are significant:

# Model1: EXR
modEXR <- lm(mxnlog ~ c28log + c91log + wtilog + goldlog + snplog, data = exr)
modEXRSum <- summary(modEXR)
modEXRSum

## 
## Call:
## lm(formula = mxnlog ~ c28log + c91log + wtilog + goldlog + snplog, 
##     data = exr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08848 -0.01124 -0.00108  0.01424  0.05205 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.007095   0.003854   1.841   0.0704 . 
## c28log      -0.347776   0.151767  -2.292   0.0253 * 
## c91log       0.451402   0.145571   3.101   0.0029 **
## wtilog      -0.040015   0.040641  -0.985   0.3287   
## goldlog     -0.081694   0.093495  -0.874   0.3856   
## snplog      -0.236540   0.118310  -1.999   0.0500 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02903 on 62 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:   0.27,  Adjusted R-squared:  0.2111 
## F-statistic: 4.586 on 5 and 62 DF,  p-value: 0.001267

We get to discover that only the Cete 28, Cete 91 and the Standard & Poors are significant. So we create a second model that includes only these variables, and again taking away the intercept, which also was not significant.

# Model2: EXR
modEXR1 <- lm(mxnlog ~ c28log + c91log + snplog -1, data = exr)
modEXRSum1 <- summary(modEXR1)
modEXRSum1

## 
## Call:
## lm(formula = mxnlog ~ c28log + c91log + snplog - 1, data = exr)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.080685 -0.009017  0.005317  0.025949  0.060389 
## 
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)   
## c28log  -0.3244     0.1521  -2.132  0.03675 * 
## c91log   0.4826     0.1455   3.317  0.00149 **
## snplog  -0.2257     0.1074  -2.102  0.03943 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02963 on 65 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.2322, Adjusted R-squared:  0.1968 
## F-statistic: 6.554 on 3 and 65 DF,  p-value: 0.0006113

Something to see is that our R2 is very low, at nearly 20%. We could take this model by also knowing that the variables are significant, even though it has not a lot of power for explaining the exchange rate. We decide to test the model, and if it accomplishes the principles, we will use it:

#Gráficos del modelo
plot(modEXR1)

jarque.bera.test(modEXR1$residuals)

## 
##  Jarque Bera Test
## 
## data:  modEXR1$residuals
## X-squared = 2.7779, df = 2, p-value = 0.2493

vif(modEXR1)

## Warning in vif.default(modEXR1): No intercept: vifs may not be sensible.

##   c28log   c91log   snplog 
## 3.700288 3.698831 1.029766

bptest(modEXR1)

## 
##  studentized Breusch-Pagan test
## 
## data:  modEXR1
## BP = 1.8223, df = 2, p-value = 0.4021

We decide to keep this model as the final one, as it passes all the tests in a good way. Even though the R2 is not that great, it is a significant model.

7. Scenarios’ analysis in R

Base scenario

Having all the three models validated, we proceed to calculate some factors that will led us to the final thing that Alicia wants to know: the operating margin. However, in order to do this, we have to use our models to predict the gasoline sales, hotel occupancy and the exchange rate, because this will give us the revenue, costs, and therefore the profit. So, Alicia gave us a new database with future information for the next 10 years about each one of the explanatory variables. The only issue was that, as we already explained, we were missing information about the possible future values of Cete 28, so we had to do some research and make up some numbers to use our model for the exchange rate. We found information in a report from Scotiabank, which showed that from 2021 to 2022 the return rate of the Cetes will increase, and that this pattern should be present for the next years also.

We read the excel files, also separetely because the information is in different sheets:

fGas <- read_xlsx("FP_Hotel&Gas_Fcst1.xlsx", sheet = "gasoline")
head(fGas, 5)

## # A tibble: 5 x 5
##      yr price demand  vehicles   oilGdp
##   <dbl> <dbl>  <dbl>     <dbl>    <dbl>
## 1  2020  20.1   29.7 32570097.  990185.
## 2  2021  20.8   30.4 33346790. 1008428.
## 3  2022  21.5   31.2 34123482. 1026672.
## 4  2023  22.3   31.9 34900175. 1044916.
## 5  2024  23.0   32.7 35676868. 1063159.

fOcc <- read_xlsx("FP_Hotel&Gas_Fcst1.xlsx", sheet = "occupancy")
head(fOcc, 5)

## # A tibble: 5 x 6
##      yr  days tourGdp domTour  intTour avgStay
##   <dbl> <dbl>   <dbl>   <dbl>    <dbl>   <dbl>
## 1  2020   366    118. 274637.  918505.    4.87
## 2  2021   365    123. 284366.  959017.    4.72
## 3  2022   365    127. 294094.  999528.    4.08
## 4  2023   365    131. 303823. 1040040.    4.45
## 5  2024   366    136. 313552. 1080552.    4.55

fEXR <- read_xlsx("FP_Hotel&Gas_Fcst1.xlsx", sheet = "lrexrate")
head(fEXR, 5)

## # A tibble: 5 x 4
##      yr  c28log  c91log snplog
##   <dbl>   <dbl>   <dbl>  <dbl>
## 1  2020 -0.301  -0.0716  0.118
## 2  2021  0.0512 -0.0407  0.11 
## 3  2022  0.0458  0       0.11 
## 4  2023  0.0792  0       0.11 
## 5  2024  0.0669  0       0.11

Now, having the information, is time to make the operations and predictions:

#Revenue
fGas$nsales = predict(modGas6,fGas)
fGas$fnsales = exp(fGas$nsales)
fGas$revGas = 0.06 * fGas$fnsales
revGas = fGas$revGas
head(fGas,10)

## # A tibble: 10 x 8
##       yr price demand  vehicles   oilGdp nsales   fnsales   revGas
##    <dbl> <dbl>  <dbl>     <dbl>    <dbl>  <dbl>     <dbl>    <dbl>
##  1  2020  20.1   29.7 32570097.  990185.   17.0 23853375. 1431202.
##  2  2021  20.8   30.4 33346790. 1008428.   17.0 24235628. 1454138.
##  3  2022  21.5   31.2 34123482. 1026672.   17.0 24616200. 1476972.
##  4  2023  22.3   31.9 34900175. 1044916.   17.0 24995136. 1499708.
##  5  2024  23.0   32.7 35676868. 1063159.   17.0 25372480. 1522349.
##  6  2025  23.8   33.4 36453561. 1081403.   17.1 25748271. 1544896.
##  7  2026  24.7   34.1 37230254. 1099647.   17.1 26122549. 1567353.
##  8  2027  25.5   34.9 38006947. 1117890.   17.1 26495350. 1589721.
##  9  2028  26.4   35.6 38783639. 1136134.   17.1 26866709. 1612003.
## 10  2029  27.4   36.4 39560332. 1154378.   17.1 27236661. 1634200.

fOcc$nocc = predict(modOcc2,fOcc)
fOcc$revHot = (20 * 800 * fOcc$days) * (fOcc$nocc/100)
revHot = fOcc$revHot
head(fOcc,10)

## # A tibble: 10 x 8
##       yr  days tourGdp domTour  intTour avgStay  nocc   revHot
##    <dbl> <dbl>   <dbl>   <dbl>    <dbl>   <dbl> <dbl>    <dbl>
##  1  2020   366    118. 274637.  918505.    4.87  82.6 4839512.
##  2  2021   365    123. 284366.  959017.    4.72  81.9 4780974.
##  3  2022   365    127. 294094.  999528.    4.08  75.2 4391076.
##  4  2023   365    131. 303823. 1040040.    4.45  80.7 4711843.
##  5  2024   366    136. 313552. 1080552.    4.55  83.0 4859039.
##  6  2025   365    141. 323281. 1121063.    4.40  82.2 4798654.
##  7  2026   365    146. 333010. 1161575.    4.33  82.5 4816327.
##  8  2027   365    151. 342739. 1202086.    4.47  85.1 4971716.
##  9  2028   366    156. 352468. 1242598.    4.82  90.4 5295061.
## 10  2029   365    162. 362197. 1283109.    4.11  82.8 4838374.

totRev = fGas$revGas + fOcc$revHot
totRev

##       1       2       3       4       5       6       7       8       9      10 
## 6270715 6235112 5868048 6211551 6381388 6343551 6383680 6561437 6907064 6472574 
##      11 
## 6486285

In this part we first use the gasoline model to predict the gasoline sales, using the future values that we just got from the database. This instantly gives us the gasoline sales, but as the model is in logarithmic, we need to transform it using the exponential function. Having the real number, we multiply it by 6%, because the problem told us that this was the way to calculate the revenue.

Then, we calculate the hotel occupancy by also predicting the data using our model. Having the predicted hotel occupancy allows us to calculate the revenue for the hotel, which is the result of multiplying 20 rooms times $800 for the night times the day of the year, and this all, times the hotel occupancy. Finally, having this two revenues allows us to calculate the total one, which is the result of adding them.

In this next part we now calculate all the costs:

#Costs
varcost = (fGas$revGas*0.33) + (fOcc$revHot*0.2)
fixcost = rep(512342 + 1900000, length(varcost))

int = 0.03
mxnDebt = 6000000
usdDebt = 299102.7
spot = mxnDebt/usdDebt
n = 10
annualDebt = usdDebt/(1/int-1/(int*(1+int)^n))
annualDebt

## [1] 35063.96

fEXR$nEXR = predict(modEXR1,fEXR)
fEXR$fnEXR = spot*cumprod(exp(fEXR$nEXR))
fEXR$tEXR = spot*exp(cumsum(fEXR$nEXR))
fEXR$payment = fEXR$tEXR * annualDebt
head(fEXR,10)

## # A tibble: 10 x 8
##       yr  c28log  c91log snplog     nEXR fnEXR  tEXR payment
##    <dbl>   <dbl>   <dbl>  <dbl>    <dbl> <dbl> <dbl>   <dbl>
##  1  2020 -0.301  -0.0716  0.118  0.0366   20.8  20.8 729580.
##  2  2021  0.0512 -0.0407  0.11  -0.0610   19.6  19.6 686383.
##  3  2022  0.0458  0       0.11  -0.0397   18.8  18.8 659690.
##  4  2023  0.0792  0       0.11  -0.0505   17.9  17.9 627199.
##  5  2024  0.0669  0       0.11  -0.0465   17.1  17.1 598680.
##  6  2025  0       0       0.11  -0.0248   16.7  16.7 584002.
##  7  2026  0.0300  0       0.11  -0.0345   16.1  16.1 564174.
##  8  2027  0.0544  0       0.11  -0.0425   15.4  15.4 540724.
##  9  2028 -0.0843  0       0.11   0.00253  15.5  15.5 542093.
## 10  2029  0.0300  0       0.11  -0.0345   14.9  14.9 523688.

paymentMXN = annualDebt * fEXR$tEXR
paymentMXN

##        1        2        3        4        5        6        7        8 
## 729580.0 686382.6 659690.4 627199.3 598679.7 584001.9 564174.0 540723.8 
##        9       10       11 
## 542092.7 523687.7 515837.7

totCost = varcost + fixcost + fEXR$payment
totCost

##       1       2       3       4       5       6       7       8       9      10 
## 4582121 4534785 4437648 4476814 4485205 4465891 4457008 4472017 4545408 4442990 
##      11 
## 4440758

Alicia told us that there were two main costs: variables and fix. The variable expenses are equivalent to 33% of the gasoline revenue and 20% of the hotel revenue, so we first make this calculation. Then, we just include the fix expenses, which are 512342 + 1900000. The next part is a little bit more complicated, because we need to calculate the financial cost that Alicia will have for the loan.

Knowing that the loan is of 299102.7 dollars with an annual interest rate of 3% and for 10 years, we need to calculate the annual payment that Alicia will have to do in pesos. We use an annuality formula to make this operation, which tells us that the payment is of a little more than 35,000 dollars. However, we also need to predict the exchange rate to traduce this number in pesos. Again, we use our model to predict this factor, but we also make some operations that include the compound effect that the exchange rate has along the years. Finally, by having the predicted USDMXN, we just multiply it by the annual payment in dollars.

This allows us to calculate the total cost, which is the result of adding the variable and fix expenses, aswell as the financial cost.

Having the revenues and costs, leds us to calculate the profit and operating margin:

#Profit and margin
profit = totRev - totCost
profit

##       1       2       3       4       5       6       7       8       9      10 
## 1688593 1700327 1430399 1734737 1896183 1877660 1926672 2089420 2361656 2029583 
##      11 
## 2045528

margin = profit/totRev
margin

##         1         2         3         4         5         6         7         8 
## 0.2692824 0.2727019 0.2437607 0.2792761 0.2971428 0.2959951 0.3018121 0.3184394 
##         9        10        11 
## 0.3419190 0.3135667 0.3153620

In this simple step, we calculate the profit by subtracting the costs to the revenue. Also, we calculate the margin by dividing the profit by the total revenue.

We can see all the previously calculated factors in this data frame:

#Base scenario
escenarioBase = data.frame(yr=fGas$yr, revGas, revHot, varcost, fixcost, paymentMXN, profit, margin)
escenarioBase

##      yr  revGas  revHot varcost fixcost paymentMXN  profit    margin
## 1  2020 1431202 4839512 1440199 2412342   729580.0 1688593 0.2692824
## 2  2021 1454138 4780974 1436060 2412342   686382.6 1700327 0.2727019
## 3  2022 1476972 4391076 1365616 2412342   659690.4 1430399 0.2437607
## 4  2023 1499708 4711843 1437272 2412342   627199.3 1734737 0.2792761
## 5  2024 1522349 4859039 1474183 2412342   598679.7 1896183 0.2971428
## 6  2025 1544896 4798654 1469547 2412342   584001.9 1877660 0.2959951
## 7  2026 1567353 4816327 1480492 2412342   564174.0 1926672 0.3018121
## 8  2027 1589721 4971716 1518951 2412342   540723.8 2089420 0.3184394
## 9  2028 1612003 5295061 1590973 2412342   542092.7 2361656 0.3419190
## 10 2029 1634200 4838374 1506961 2412342   523687.7 2029583 0.3135667
## 11 2030 1656314 4829971 1512578 2412342   515837.7 2045528 0.3153620

avrgMargen = mean(margin)
avrgMargen

## [1] 0.2953871

This data frame can be understood as the base or realistic scenario. We used expected values for the different explanatory variables to calculate factors such as the gasoline revenue, hotel revenue, exchange rate, etc; all the necesary things to calculate the margin. But as they were all the real values, this scenario is also the real or base one. For this scenario Alicia was expecting an operating margin of no less than 26%. We are comfortable by looking that the average operating margin for all these realistic scenarios from 2020 to 2029 show an average margin of 29%.

Pessimist and optimistic scenario

Alicia was not comfortable by only knowing what the operating margin would be in the real scenario. As a business woman who puts in risk a significant amount of money in a project, she also would like to know what she could expect if things were wrong or better than reality. Therefore, we also need to calculate the operating margin for a pessimist and optimistic scenario. To do this, we will use the Monte Carlo Simulation.

The first step is to read a covariance matrix of the gasoline sales, hotel occupancy and the exchange rate. This step is critical, because this allows us to include the variance or error in the variables to later on consider them in the predicted scenarios:

#Reading the file with the matrix
covMa <- read.csv("FP_Hotel&Gas_Cov1.csv")
covMa = as.matrix(covMa)
covMa

##         lgasSales          occ        lfxr
## [1,]  0.001346211 -0.027284355 0.000134531
## [2,] -0.027284355  8.027332044 0.007933282
## [3,]  0.000134531  0.007933282 0.000925052

We just make a quick confirmation of the previous matrix with the Cholesky decomposition:

#Cholesky decomposition
a = chol(covMa)
t(a) %*% a

##              lgasSales          occ        lfxr
## lgasSales  0.001346211 -0.027284355 0.000134531
## occ       -0.027284355  8.027332044 0.007933282
## lfxr       0.000134531  0.007933282 0.000925052

Now, we proceed to create the simulations or number of scenarios. It is recommended to use as many as possible, so we decided to create 1 million scenarios for the gasoline sales, hotel occupancy and the exchange rate. It is important to mention that we make this by creating some random numbers that follow a normal distribution. As seen next, we finally make a frame including these scenarios.

#Generating the random numbers for the scenarios
epsgas = rnorm(1000000)
epsocc = rnorm(1000000)
epsexr = rnorm(1000000)
epsilons = cbind(epsgas, epsocc, epsexr)
head(epsilons)

##           epsgas    epsocc     epsexr
## [1,] -0.23151360  1.021898  0.1231206
## [2,]  0.34894920 -0.262030 -0.1366416
## [3,] -0.36298376 -0.768503 -0.7232324
## [4,] -0.05002359  1.210032 -0.5231462
## [5,]  0.72322561 -1.586222  1.3244731
## [6,] -1.68955835 -1.336187 -2.2891880

Now, in this next step is where we join the created scenarios with the covariances. We make this by multiplying the matrix with the random normal scenarios and the covariance matrix. With the results, we create some new matrixes:

#Generating the random numbers for the scenarios
corrScen = epsilons %*% a
colnames(corrScen) = c("devGas", "devOcc", "devFxr")
head(corrScen)

##            devGas     devOcc       devFxr
## [1,] -0.008494407  2.9659537  0.006821862
## [2,]  0.012803207 -0.9758596 -0.003833273
## [3,] -0.013318146 -1.8311038 -0.025980997
## [4,] -0.001835403  3.3453353 -0.011128369
## [5,]  0.026535689 -4.8744242  0.036121682
## [6,] -0.061991160 -2.3966284 -0.079943251

hist(corrScen[,1])

cor(epsilons)

##               epsgas        epsocc        epsexr
## epsgas  1.0000000000 -0.0007681201  0.0006081770
## epsocc -0.0007681201  1.0000000000 -0.0005353758
## epsexr  0.0006081770 -0.0005353758  1.0000000000

cor(corrScen)

##            devGas     devOcc    devFxr
## devGas  1.0000000 -0.2630092 0.1207979
## devOcc -0.2630092  1.0000000 0.0912163
## devFxr  0.1207979  0.0912163 1.0000000

We now have the scenarios for the three different variables that include the variance of the factors, meaning that we can use them to affect the explanatory variables and create many different scenarios that can be seemed as pessimist, real or optimistic. Then we only created a matrix to see the correlation between the random numbers created for the three variables, which of course are very low because they were generated independently. Lastly, the last matrix also shows us that the scenarios are also lowly correlated.

Next, we proceed to calculate this new scenarios for the first future year: 2020. We have to make the calculations for each year separately:

#Creating the scenarios
corrScen = data.frame(corrScen) #data frame so we can access to the columns

corrScen$gasSim = fGas$nsales[1] + corrScen$devGas #escenarios en logarítmico, predicción +- desviación
corrScen$gasSimF = exp(corrScen$gasSim) #escenarios ya en númerico

corrScen$occSim = fOcc$nocc[1] + corrScen$devOcc

corrScen$EXRSim = fEXR$nEXR[1] + corrScen$devFxr #escenarios en logarítmico, predicción +- desviación
corrScen$EXRSimF  = spot*exp(corrScen$EXRSim)

corrScen$revGasSim = corrScen$gasSimF * 0.06
corrScen$revHotSim = 800*20*fOcc$days[1]*(corrScen$occSim/100)
corrScen$revTotSim = corrScen$revGasSim + corrScen$revHotSim

corrScen$varCostSim = corrScen$revGasSim*0.3333 + corrScen$revHotSim*0.2
corrScen$fixCostSim = 512342 + 1900000
corrScen$debtMXNSim = annualDebt*corrScen$EXRSimF

corrScen$profit = corrScen$revTotSim - corrScen$varCostSim - corrScen$fixCostSim - corrScen$debtMXNSim

corrScen$margin = corrScen$profit/corrScen$revTotSim
hist(corrScen$margin)

We start by transforming the matrix with the random scenarios into a data frame, so we can access to the columns. After doing this, we calculate again the gas revenue, which is the result of the previously calculated sales for the first year, plus the standard error or deviation that we just just from the scenarios. As it is in logarithmic, we change it into the normal number by using the exponential function. Then, we simply do the same calculation for the hotel occupancy, taking the previously predicted occupancy and adding the deviation. Lastly, we do the same for the exchange rate, and we also convert it into normal with the exponential function, as the number was in logarithmic.

Again, we proceed to calculate the total revenue with the gas and hotel revenue, which follow the same calculations. We do the same thing for the costs. This allows us to calculate the profit and the margin. We created a histogram that shows the distribution of the margin, and we see something very good: the average margin is between 26 and 28%, but we also wee that it is not highly probable that the worst margin can be below 23%, which is what Alicia was expecting. Also, we can say that the best margin will be a little more than 28%.

However, these are only visual inferences. We need to calculate the quantiles for the margin, in order to detect where are the scenarios.

First, only to do a check and learn more about the scenarios, we calculate the quantiles for the predicted gasoline sales, hotel occupancy and exchange rate:

#Quantiles for best and worse scenario
quantile(corrScen$gasSimF, c(0,0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95,1))

##       0%       5%      10%      25%      50%      75%      90%      95% 
## 19983044 22454296 22757519 23270945 23853142 24450217 24999489 25331972 
##     100% 
## 28375514

quantile(corrScen$occSim, c(0,0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95,1))

##       0%       5%      10%      25%      50%      75%      90%      95% 
## 68.12800 77.98962 79.01456 80.73697 82.64883 84.55559 86.27547 87.30853 
##     100% 
## 98.49102

quantile(corrScen$EXRSimF, c(0,0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95,1))

##       0%       5%      10%      25%      50%      75%      90%      95% 
## 18.13873 19.79099 20.01088 20.38417 20.80645 21.23885 21.63606 21.87687 
##     100% 
## 24.04449

The 50% quantiles are what we already calculated in our base scenario. Now, it is important to mention that we will use a 5% cutoff for the pessimist scenario and 95% for the best scenario. We decided to do this to be more strict with our analysis; we realize that Alicia is taking a big risk with this business, so we want to give her values that are not too far from reality.

The important results are the following ones:

#Quantiles for best and worse scenario - margin 
quantile(corrScen$profit, c(0,0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95,1))

##      0%      5%     10%     25%     50%     75%     90%     95%    100% 
## 1011452 1475138 1521278 1598669 1684568 1770115 1847132 1893461 2307853

quantile(corrScen$margin, c(0,0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95,1))

##        0%        5%       10%       25%       50%       75%       90%       95% 
## 0.1846202 0.2452690 0.2505598 0.2592352 0.2685731 0.2775726 0.2854381 0.2900467 
##      100% 
## 0.3271138

Focusing on the margin, we already know that the base one (50% quantile) is of about 26%. However, it is great to see the next two things: for the pessimist scenario, using a 5% cutoff, we see that the worst margin is of about 24%, which is more than the 23% that Alicia was expecting. On the other hand, we see that the best margin can be of almost 29%.

8. Conclusions and further research

In conclusion, our analysis indicates that Alicia is ready to initiate the project. This because, as we saw, she can expect an operating margin of no less than 24% and up to 29%. She can be comfortable as she already knows what to expect for her business in different scenarios. This is the advantage of making all this process, that we get to understand how the factor will behave in different conditions, which is what one should do when trying to understand the future of a business; many factors are in place, and we can not expect that they all will behave as expected. Doing this can led to a bad planning and complicating the business´survival.

Our models were not perfect at a 100% level, they could have some details such a not 1 R2, not 100% significant variables, or other details. However, they passed the needed tests for validation, and showed good results. This is important to mention, because these models were the basis of our whole analysis. Alicia can be comfortable as our basis were strong and valid.

Nevertheless, one important thing that Alicia should consider is that there were some details regarding the models that are kind of strange. For instance, the predicted exchange rate becomes lower and lower as the time advances. This is not because the models fails or gives a bad prediction; it happens because of the given information used to predict. It would be good to check again the data for the next 10 years to see if it is correct, because obviously this causes that the annual payment is lower and lower. However, in reality, the exchange rate increases every year, and if that happens the annual payment of the loan will also increase and affect the profit margin.

So, Alicia should re-check all the information, just to be sure that the results are valid. She should not worry about the models and the calculations being incorrect. And to this point, with the data that we have, she can start her business idea.

9. Bibliography

S/a. (2021). Problem Setup: Hotel & Gas Station Investment. From: Tecnológico de Monterrey. Available at: https://experiencia21.tec.mx/courses/142745/pages/problem-setup.

S/a. (2020). Resumen Variables Financieras. De: Scotiabank. Disponible en: https://scotiabankfiles.azureedge.net/scotia-bank-mexico/spanish/pdf/acerca-de/economia-y-finanzas/analisis-economico-y-politco/pronosticos/pvf.pdf.

Escobar, J. [Jacob Escobar]. (2021, Febrero 26). 21S1 EM | Gp1 | S06 Normality, Multicol, Hetero & Autocorr [archivo de vídeo]. Disponible en: https://www.youtube.com/watch?v=-bcPVS0UBuE&list=PLydYpDVPXkgOagFlApq7t0DTtme8hKuMz&index=7.