In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. Source: Wikipedia
Stepwise regression selects a model by automatically adding or removing individual predictors, a step at a time, based on their statistical significance. The end result of this process is a single regression model, which makes it nice and simple. You can control the details of the process, including the significance level and whether the process can only add terms, remove terms, or both. Souce: http://blog.minitab.com/
Best Subsets compares all possible models using a specified set of predictors, and displays the best-fitting models that contain one predictor, two predictors, and so on. The end result is a number of models and their summary statistics. It is up to you to compare and choose one. Sometimes the results do not point to one best model and your judgment is required. Souce: http://blog.minitab.com/
t statistic is the coefficient divided by its standard error. The standard error is an estimate of the standard deviation of the coefficient, the amount it varies across cases. It can be thought of as a measure of the precision with which the regression coefficient is measured. If a coefficient is large compared to its standard error, then it is probably different from 0. Source:https://dss.princeton.edu/online_help/analysis/interpreting_regression.htm
A t-test is commonly used to determine whether the mean of a population significantly differs from a specific value (called the hypothesized mean) or from the mean of another population. Source: http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen
Residual plots
A residual plot plots the residuals on the y-axis vs. the predicted values of the dependent variable on the x-axis. We would like the residuals to be unbiased: have an average value of zero in any thin vertical strip, and homoscedastic, which means “same stretch”: the spread of the residuals should be the same in any thin vertical strip. Source: http://condor.depaul.edu/sjost/it223/documents/regress.htm
Residual plots. Source: https://drsimonj.svbtle.com/visualising-residuals
R-squared reference Use predicted R-squared to determine how well a regression model makes predictions. http://statisticsbyjim.com/regression/interpret-adjusted-r-squared-predicted-r-squared-regression/
In general, an F-test in regression compares the fits of different linear models. Unlike t-tests that can assess only one regression coefficient at a time, the F-test can assess multiple coefficients simultaneously.
The F-test of the overall significance is a specific form of the F-test. It compares a model with no predictors to the model that you specify. A regression model that contains no predictors is also known as an intercept-only model.
While R-squared provides an estimate of the strength of the relationship between your model and the response variable, it does not provide a formal hypothesis test for this relationship. The overall F-test determines whether this relationship is statistically significant. If the P value for the overall F-test is less than your significance level, you can conclude that the R-squared value is significantly different from zero.
F-statistic is a good indicator of whether there is a relationship between our predictor and the response variables. The further the F-statistic is from 1 the better it is. However, how much larger the F-statistic needs to be depends on both the number of data points and the number of predictors. Generally, when the number of data points is large, an F-statistic that is only a little bit larger than 1 is already sufficient to reject the null hypothesis (H0 : There is no relationship between speed and distance). The reverse is true as if the number of data points is small, a large F-statistic is required to be able to ascertain that there may be a relationship between predictor and response variables. In our example the F-statistic is 89.5671065 which is relatively larger than 1 given the size of our data.
https://feliperego.github.io/blog/2015/10/23/Interpreting-Model-Output-In-R
https://www.andrew.cmu.edu/user/achoulde/94842/homework/regression_diagnostics.html
https://www.theanalysisfactor.com/assessing-the-fit-of-regression-models/ Use adjusted R-squared to compare the goodness-of-fit for regression models that contain differing numbers of independent variables. http://statisticsbyjim.com/regression/interpret-adjusted-r-squared-predicted-r-squared-regression/
We will use the following paramters to explain the model performance and the intrinsic differences in the fitting of various models. We can extract all of these results from the fit statement which has a list of stored values for each model. AIC- Akaike’s Information Criterion offers a relative estimate of the infomration lost wen a given model is used to fit the data. It deals with the trade-off between goodness of fit of the model and the complexity of the model. The lower the AIC, better the model.
BIC- Bayesian Information Criterion/ Schwartz Criterion offers a similar trade-off between goodness of fit and complexity of model but penalizes the complexity more than AIC as the number of paramters added to the model increases, typically having BIC values > AIC values. Lower the BIC, Better the model.
MSE- Mean Square Error is the average distance between the observed values and the predicted values. Lower the MSE, more accurate the model.
ROC - AUC 1. https://medium.com/greyatom/lets-learn-about-auc-roc-curve-4a94b4d88152
Confusion metrics 2. https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c
How to tune RF & GBM 3.https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/gbm-randomforest/GBM_RandomForest_Example.py