The objective of this exercise is to predict the tip amount of a waiter using Linear regression.
Scenario 1:
Scenario 2:
Scenario 3:
The dataset used for this exercise has been provided as part of the OpenIntro package. This dataset contains 95 observations with 5 variables. It is a simulated dataset of tips over a few weeks based on couple days per week. Each tip is associated with a single group, which may include several bills and tables (i.e. groups paid in one lump sum in simulations).
After looking at the 3 different models for this dataset, I conclude that Model #3 is the better model because the sum of squared error is lower compared to the other two models.
The average tip amount received by the waiter is 7.3386842.
Model #3 is the better model because the SSE is low compared to the other model.
This report was generated on November 21, 2018.
The linear regression is a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables).
The case of one explanatory variable is called simple linear regression.
The case of more than one explanatory variable is called multiple linear regression.
The Total Sum of Square (SST) is defined as being the sum, over all observations, of the squared differences of each observation from the overall mean.
The Sum of Squares Regression (SSR) is the sum of the squared differences between the prediction for each observation and the population mean.
The Sum of Squares Due to Error(aka) Summed Squares of Residuals is a statistic measures the total deviation of the response values from the fit to the response values.
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.