Introduction

In our current era there are many things’ people want to buy. whether it is a chocolate bar, bicycle, or some other good. For the most part everybody knows how much a candy bar cost in the stores around them. But what about goods that are not sold frequently and is hard to find similar goods to compare to? In this study we will explore data obtain from BVR to try to predict the value of a private business.

There are many reasons why someone would be interested in selling/buy their business. They could want to sell because they are getting old and want to retire or they could simply want to get into a different industry. Maybe they simply were grandfathered in and don’t want to be the owners. The list can stretch far and wide on why someone would want to sell their business. So, who would want to but it? A big reason why someone would want to buy a business is to expand their own business. Not only is it a time saver for the purchaser, since they don’t need to do all the planning involved in opening another location, but they also don’t have to try competing with the organization to win costumers. So, we understand that someone might be interested in selling/buying a business but surely there is a deterministic way to do this. The business has assets, liabilities, and equity how hard could it be? Turns out it is very labor intense and, as a result, expensive.

This is where the data analytics comes in. Through predictive analysis we can give a range with a certain degree of accuracy to the client. But would a client even be interested in knowing a range? How would that be useful when they want to know how much it is worth? Typically selling and acquiring a business is no small transaction. It takes time, effort, and money to make the deal go through. So, knowing ahead of time how much you can expect to get from it can either motivate or deteriorate a seller and buyer alike.

This study will choose the best model in predicting business multiples. In particular, we will be looking at the multiples of the form (Market Value Invested Capital)/Sales and (Market Value Invested Capital)/ EBIT. As is the main assumption with multiple analysis we are assuming throughout this study that the value of certain financial ratios can be helpful in estimating value of a company.

Methods

Data reduction

The original data contained 123 observations and 199 variables. Note many of these “variables” are not going to help us with our analysis (i.e., ID, Target Country, acquirer name, etc.). Thus, after logically going through the original data we are left with 26 variables, where 4 of them are potential outcome variables (MVIC, MVIC/EBIT, MVIC/EBITDA, MVIC/Net Sales).

Transformations

There are a couple different transformations that might be of interest for our dataset. one option would be, since this is financial data, to compute the log dollars since most of the features in our models will have $ units and are right skewed. Another option would be to normalize the data by subtracting the mean and dividing my a standard deviation. In this analysis we chose to do the latter for two reasons. First, some of our variables have negative values and 0 values. When performing log transformation on variables with 0 values it is typical to simply shift the data by one. However, with negative values it gets a little more messy. Second, many of the algorithms we will be using requires us to standardize our data (i.e. PCR), so we might as well do that to begin with.

Analyses Methods

The Analysis will emphasize on predictive statistics with many methods applied (a shot gun approach). Though to begin we will analyze basic visualization and correlations to get a “sense” of the data. Next, we will process our data to make our models as described in the previous Transformation section. A preliminary analysis has shown that basic algorithms like linear regression are sensitive to the random seed set so we will be using repeated cross validation as the main data splitting method. To find optimal parameters we will use the entire dataset. When we compare models with each other we will split the data into an 80/20 training/testing dataset, while still using repeated cross validation to train the models. For our models, we will look at simple/transparent models along with black box algorithms. When comparing models we will set RMSE as the gold standard, other metrics will also be observed and taken into consideration. Finally, we will build the best predictive model that minimizes variation in MVIC/Net Sales, MVIC/EBIT, MVIC/EBITDA, and MVIC predictions and deploy some algorithms in a shiny application.

In addition to the above, we will see that our data is rather sparse. We will analyze the type of missing values (MCAR, MAR, MNAR) in a future section to see if imputing our data is desirable. And of course if you do impute your data you should always compare it back to itself. So the levels in our models will include differnt types of imputed data, different dependent variables, and lastly different models.

Models

The following is a list of potential regression algorithms we can use to predict multiples for this study. (A brief description is to come later)

  • Lasso Regression
  • Ridge Regression
  • Partial Least Squares Generalized Linear
  • Principal Component Regression
  • Projection Pursuit Regression
  • Neural Networks
  • Forward selection Linear Regression
  • Backwards selection Linear Regression
  • Step wise selection Linear Regression

Evalutaion Criteria

As described above our main gold standard will be the RMSE in evaluating our models. We also will use criteria such as \(R^2\) if there is a need for a tie breaker. As a side note, since we are using k-fold CV we will be using the RMSE from that as an estimate of the deviation the model has, we will talk about this more in the best Models section.

Data Visualizations

Correlation between dependant variables

As Descussed earlier we are interested in 4 potential outcomes. Namly, MVIC/Net Sales, MVIC/EBIT, MVIC/EBITDA, and MVIC. Now an honest question could be if you have to predict all of these individually or if you can just predict one and derive the rest. As the graph shows below, these different outcome variables are not highly correlated with each other. Though the linear correlation between MVIC/EBITDA and MVIC/EBIT is around 70%. Thus it is not in our best interest to predict only one variable and call it a day. That is, we will need to build models for each dependant variable to see what is the best estimate for it.

Missing Values

As we can see below we have a nice visualization of how sparse our data is. Note that one of our variables of interest (MVIC/EBITDA) has the most missing values. This is not good. If we could we would want to use as many observations as possible but the need for that variable is high since it is one of the object we are tasked with predicting. So what can we do about it? One shiny method that would be great is the idea of imputing our data. But in order to do that we need to test a couple of different things out first.

First thing is test to see if the missing values are missing completely at random(MCAR). This can be accomplished by performing Little’s test statistic where the null hypothesis is:

\[H_0: \mathsf{Data\_is\_MCAR}\]

Not that it is not possible to test for Missing at Random (MAR) or Missing not at Random (MNAR).

From little’s test we find a p-value of 1.632028e-14. Thus we reject the null hypothesis. This means that we have evidence that supports that our data is MAR or MNAR. Therefore, we have decided to try a common imputing method called multiple imputations for this data.

Importance of variables

One thing that caught my eye in this analysis was in the process of making a lasso regression. Recall that lass regression is a type of regression that shrinks parameters to 0 as we add more bias in our model. One property of this is that as our bias approaches infinity all parameters shrink to 0, albeit not at the same time. This can be seen in the graph below:

An Introduction to Statistical Learning Applications in R

Note the bottom of the graph is defined as a proportion of L-1 norm of the bias and full beta parameters. Another more intuitive way to see the bottom is imaging taking \(1/ \lambda\), where lambda is our bias. This way we stay within \((0,1)\). From here we can see that we will have a large bias when we are close to 0. And thus not as many parameters. Now give your attention to the graph below with the same horizontal units.

As we can see from above, this particular lasso model minimizes its RMSE when lambda is rather large. Meaning that the algorithm found it best to shrink many parameters. This is interesting because it tells us that only a few variables are actually important when it comes to making a linear regression model.

Model Parameter Optimization

Below is a brief table showing the different types of models we will use along with the hyper parameters we will have to tune.

Model Hyper Parameter(s) Number Tested
Lasso Lambda 10
Ridge Lambda 10
PLSR # of Components, p-value threshold 16
PCR # of Components 10
Project Pursuit Number of Terms 10
NN # of hidden layers, weight decay 16
Forward # of parameters NA
Backwards # of parameters NA
Step wise # of parameters NA

As we said previously we will be using RMSE as the gold standard when looking at which model is more acceptable for our prediction needs. Below is a couple graphics from the best models (we will talk about soon) hyper parameters.

We will be choosing the parameters based on when the RMSE is minimized in the below models.

Fitting Models

With the help of recent R packages it is easy to make models n mass. The code below is one of the many ways one can do this.

library(tidyverse)
library(caret)
library(modelgrid)

#Set up
Models <- list(
  MI = list(
    MVICSales = list(),
    MVICEBIT = list(),
    MVICEBITDA = list(),
    Price = list()
  ),
  NA_RM = list(
    MVICSales = list(),
    MVICEBIT = list(),
    MVICEBITDA = list(),
    Price = list()
  )
)

control <- trainControl(method = "repeatedcv", number = 10, repeats = 10)
mg <- model_grid() %>% 
  add_model(
    model_name = "Lasso Regression",
    method = "lasso",
    tuneLength = 10
    #tuneGrid = data.frame(fraction = seq(0,1,by=.1))
  ) %>%
  add_model(
    model_name = "Ridge Regression",
    method = "ridge",
    tuneLength = 10
  ) %>%
  #This takes awhile
  add_model(
    model_name = "Partial Least Squares Generalized Linear",
    method = "plsRglm",
    tuneLength = 4
  ) %>%
  add_model(
    model_name = "Principal Component Regression",
    method = 'pcr',
    tuneLength = 10
  ) %>%
  add_model(
    model_name = "Projection Pursuit Regression",
    method = "ppr",
    tuneLength = 10
  ) %>%
  add_model(
    model_name = "Neural Networks",
    method = "nnet",
    tuneLength = 4
  ) %>%
  add_model(
    model_name = "Linear Regression Backwards",
    method = "leapBackward",
    tuneLength = 15
  ) %>%
  add_model(
    model_name = "Linear Regreassion Forwards",
    method = "leapForward",
    tuneLength = 15
  ) %>%
  add_model(
    model_name = "Stepwise",
    method = "lmStepAIC"
  )


for (i in 1:2) {#
  # 2 type of data imputations
  if (i == 1) {
    data <- read_csv("DealStats_MI.csv")#multiple imputations
    Y_nam <- c("MVICSales","MVICEBIT","MVICEBITDA","Price")
    #train in for loop
    set.seed(3.14519)
    train <- sample.split(data$MVICSales, SplitRatio = 0.8)
    test <- !train
  }else {
    Y_nam <- c("MVIC.Sales","MVIC.EBIT","MVIC.EBITDA","MVIC.Price")
    data <- read.csv("DealStats_ind.csv") %>% 
      dplyr::select(-c(X,other.region, Franchise,
                       Total.Assets.PPA,
                       Employment.Agreement,
                       S.Atlantic,
                       Pacific)) %>% #linear dependent 
      na.omit()
    set.seed(3.14519)
    train <- sample.split(data$MVIC.Sales, SplitRatio = 0.8)
    test <- !train
  }
  
  
  for (k in 1:4) {
    # 4 types of responses
    mg <- mg %>% 
      share_settings(
        y = data[[Y_nam[k]]][train],
        x = data[train,] %>% dplyr::select(-Y_nam),
        trControl = control,
        preProc = c("center", "scale")
      )
    
    Models[[i]][[k]] <- train(mg, resample_seed = 381)
    
    
  }
}

Best Models

To Impute or to not Impute

Note from below that the data that had multiple imputations(MI) had a rather high variability in the best models in terms of RMSE. As mentioned previously we are using \(R^2\) as a tie breaker between tough decisions. In the case of whether we should use multiple imputations for modeling purposes, this analysis has decided that we will go with the non imputed data. Reason being is because when we simply remove the missing values the models still to well with cross validation techniques. Another why is because those models are consistently good. When it comes to the MI the RMSE can be off by a lot, which is worry some on real data later on.

Note with the graph above it is using the plain RMSE. If we take this to be the unbiased estimater of the deviations of the model we find that this RMSE is comparable to only estimating for single prediction. Though in our final models we plan on using the data with removed missing values as mentioned above it is curious to know by how much we can shrink a CI for a confidence prediction (on average). This is shown for fun in the graph below.

Best Algorithms

From below we can see that a couple of the best performing algorithms are PLSR, Forward (linear regression), and NN for one of the responses.

Best Response

Another thing that is of interest is comparing the dependent variables to see which is “easier” to predict. Note for this graphic we transformed the residuals to be in $ units. For example, take (MVIC/EBIT) * EBIT = MVIC. Note that the only response that didn’t do as well as the others was trying to predict MVIC/Net Sales.

Conclusion

In conclusion the Models that actually performed the best as measured by RMSE where PLSR for MVIC/EBIT, MVIC/EBITDA, and MVIC while the model that best predicted MVIC/Net Sales was Neural Nets. It is note worthy that multiple regression was not far behind these more complex models so we would not loose much information if we decided to use the more simple model for simplicity.

After Finding out which models are the best for each response an app was made to deploy the models which can be found here and the code accessed at https://github.com/SeanCranston/BVal.

Some future improvements would be to use different imputing methods since we know the data is not MCAR. This would help us shrink our CI and hopefully shrink our RMSE, provided the imputed data is reasonably close to its true value. More improvements can also be made in the shiny application. Specifically, it would be nice to have a page that explains what is “under the hood” of the models. this would give more confidence to the users when they are predicting with the models. Lastly, since this data was only performed on tavern businesses it would be interesting to do the analysis on a variety of businesses. if they are similar to each other this could help increase out sample size along with giving the app more flexibility in which type of business it can try to predict.

References

  • Chris VanSchooneveld
  • Multiple imputation by chained equations for systematically and sporadically missing multilevel data
  • To Impute or not Impute: That’s the Question
  • Caret Documentation
  • ModelGrid Documentation
  • An Introduction to Statistical Learning with Applications in R