Multiple Regression

Multiple Regression

Name(s):
Name 1, Name 2, Name 3, Name 4, Name 5

Questions
Multiple Regression

Load any packages (installing if necessary).

library("tidyverse")
library("openxlsx")

The estimated regression equation for a model involving two independent variables and 10 observations is as follows: $y = 37.814 + .6023x_1 + .0199x_2$. The values of $SS_{Total}$ and $SS_{Regression}$ were $7819.661$ and $7145.691$, respectively.
1. Find $SS_{Residual}$ (also called $SS_{Error}$).
```
7819.661-7145.691
```
```
## [1] 673.97
```
  sserror =673.97
2. Compute $R^2$.
```
7145.691/7819.661
```
```
## [1] 0.9138108
```
3. Compute $R^2_{Adjusted}$.
```
1-(1-.914)*(10-1)/(10-2-1) 
```
```
## [1] 0.8894286
```
  $R^2$ adjusted = 13.34593
4. Compute the $F$.
5. Comment on how well the model, overall, fits.
  Answer:
The auditing division within a large accounting firm investigated the link between audit delay (Delay), the length of time from a company’s fiscal year-end to date of the auditor’s report, and variables that describe the client and the auditor. Some of the independent variables that were included in the study were:
Industry: A dummy variable coded 1 if the firm was an industrial company or 0 if the firm was a bank, savings and loan, or insurance company.
Public: A dummy variable coded 1 if the company was traded on an organized exchange or over the counter; otherwise coded 0.
Quality: A measure of overall quality of internal controls, as judged by the auditor, on a five-point scale ranging from “virtually none” (1) to “excellent” (5).
Finished: A measure ranging from 1 to 4, as judged by the auditor, where 1 indicates “all work performed subsequent to year-end” and 4 indicates “most work performed prior to year-end.”
The AuditDelay data file for 40 companies can be found here as CSV file, here as an SPSS file, or in the following directory: http://bit.ly/MSBA_Data.
```
 ```r
     data = read_csv('https://www.dropbox.com/s/2796a0zvgek8yip/AuditDelay.csv?dl=1')
 ```

 ```
 ## Parsed with column specification:
 ## cols(
 ##   Delay = col_double(),
 ##   Industry = col_double(),
 ##   Public = col_double(),
 ##   Quality = col_double(),
 ##   Finished = col_double()
 ## )
 ```
```
1. Develop the estimated regression equation using all of the independent variables.
```
lm(Delay ~ Industry+Public+Quality+Finished,data = data)
```
```
## 
## Call:
## lm(formula = Delay ~ Industry + Public + Quality + Finished, 
##     data = data)
## 
## Coefficients:
## (Intercept)     Industry       Public      Quality     Finished  
##      80.429       11.944       -4.816       -2.624       -4.073
```
2. Develop a scatter diagram showing Delay as a function of Finished.
```
 plot(Delay ~ Finished, data = data) 
```
  1. What does this figure indicate about the relationship between Delay and Finished?
    Answer:
3. Develop an estimated regression equation that can be used to predict Delay by using Industry and Quality.
```
lm(Delay ~ Industry+Quality,data = data) 
```
```
## 
## Call:
## lm(formula = Delay ~ Industry + Quality, data = data)
## 
## Coefficients:
## (Intercept)     Industry      Quality  
##      70.634       12.737       -2.919
```
As a consultqnt for a movie theater, you have been asked to estimate weekly gross revenue as a function of advertising expenditures. Historical data for a sample of eight weeks during the summer in which advertising campaigns were used follow. The data file Showtime can be found here as CSV file, here as an SPSS file, or in the following directory: http://bit.ly/MSBA_Data. Hint: Be careful with the units of the data and the units of the question.
```
```r
    consult <- read_csv('https://www.dropbox.com/s/9w0zdr7899qcasz/Showtime.csv?dl=1')
```

```
## Parsed with column specification:
## cols(
##   WeeklyGross = col_double(),
##   TelevisionAds = col_double(),
##   NewspaperAds = col_double()
## )
```
```
1. Develop an estimated regression equation with the amount of television advertising as the independent variable.
```
lm(WeeklyGross ~ TelevisionAds, data = consult)
```
```
## 
## Call:
## lm(formula = WeeklyGross ~ TelevisionAds, data = consult)
## 
## Coefficients:
##   (Intercept)  TelevisionAds  
##        88.638          1.604
```
2. Develop an estimated regression equation with both television advertising and newspaper advertising as the independent variables.
```
 lm(WeeklyGross ~ TelevisionAds + NewspaperAds, data = consult)
```
```
## 
## Call:
## lm(formula = WeeklyGross ~ TelevisionAds + NewspaperAds, data = consult)
## 
## Coefficients:
##   (Intercept)  TelevisionAds   NewspaperAds  
##        83.230          2.290          1.301
```
3. Is the estimated regression equation coefficient for television advertising expenditures the same in part (a) and in part (b)? Interpret the coefficient in each case.
  Answer:
4. What is the estimate of the weekly gross revenue for a week when $3,500 is spent on television advertising and $1800 is spent on newspaper advertising?
```
83.230 + (3.5 * 2.290) +(1.8 * 1.301)
```
```
## [1] 93.5868
```

Note that this question extends question 3 above (i.e., the Showtime data) and is thus based on the same data.

Use $\alpha$=.05 to test the null hypothesis of $H_0: \beta_1=\beta_2=0$ with the alternative hypothesis being $H_a$: $\beta_1$ and/or $\beta_2$ is not equal to zero. For the model $y=\beta_0+\beta_1x_1+\beta_2x_2+\epsilon$, where $x_1$=television advertising ($1,000s) and $x_2$=newspaper advertising ($1,000s)

Use $\alpha$=.05 to test the significance of $\beta_1$. Should $x_1$ be dropped from the model? Why or why not?
Answer:

summary(lm(WeeklyGross ~  NewspaperAds, data = consult))

## 
## Call:
## lm(formula = WeeklyGross ~ NewspaperAds, data = consult)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7704 -0.2554  0.2876  1.2274  2.2081 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  93.85641    2.23745   41.95 1.23e-08 ***
## NewspaperAds -0.04299    0.85473   -0.05    0.962    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.061 on 6 degrees of freedom
## Multiple R-squared:  0.0004215,  Adjusted R-squared:  -0.1662 
## F-statistic: 0.00253 on 1 and 6 DF,  p-value: 0.9615

P value is .5 Fail to Reject the Null Hypothesis.

Use $\alpha$=.05 to test the significance of $\beta_2$. Should $x_2$ be dropped from the model? Why or why not?
Answer:

summary(lm(WeeklyGross ~  TelevisionAds, data = consult))

## 
## Call:
## lm(formula = WeeklyGross ~ TelevisionAds, data = consult)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8454 -0.6498 -0.1522  0.7512  1.5507 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    88.6377     1.5824  56.016 2.17e-09 ***
## TelevisionAds   1.6039     0.4778   3.357   0.0153 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.215 on 6 degrees of freedom
## Multiple R-squared:  0.6526, Adjusted R-squared:  0.5946 
## F-statistic: 11.27 on 1 and 6 DF,  p-value: 0.01529

P Value 0.8598 Fail to reject the Null Hypothesis

From the following incomplete ANOVA table for a multiple regression analysis for 50 individuals. Fill in the apprpriate values (a–g).

ANOVA_Table <- as.data.frame(
  rbind(
  c("Regression", "a", 4, 8, "b", "c"),
  c("Residual", 184, "d", "e", "", ""),
  c("Total", "f", "g", "", "", "")
  ))
colnames(ANOVA_Table) <- c("Source", "$SS$", "$df$", "$MS$", "$F$", "Significance $F$")
knitr::kable(ANOVA_Table, caption="ANOVA Source Table")

ANOVA Source Table
Source	$SS$	$df$	$MS$	$F$	Significance $F$
Regression	a	4	8	b	c
Residual	184	d	e
Total	f	g

ANOVA_Table <- as.data.frame(
  rbind(
  c("Regression", 4*8, 4, 8, round(8/(184/(50-4-1)),2), 1-pf(1.96,4,45)),
  c("Residual", 184, 50-4-1, round(184/(50-4-1),2), "", ""),
  c("Total",(4*8)+184 , 50-1, "", "", "")
  ))
colnames(ANOVA_Table) <- c("Source", "$SS$", "$df$", "$MS$", "$F$", "Significance $F$")
knitr::kable(ANOVA_Table, caption="ANOVA Source Table")

ANOVA Source Table
Source	$SS$	$df$	$MS$	$F$	Significance $F$
Regression	32	4	8	1.96	0.116935384922812
Residual	184	45	4.09
Total	216	49

Barron’s conducts an annual review of online brokers, including both brokers, who can be accessed via a Web browser, as well as direct-access brokers who connect customers directly with the broker’s network server. Each broker’s offerings and performance are evaluated in six areas, using a scale of 0–5 in each category (the higher the better). The results are weighted to obtain an overall score, and a final star rating, ranging from zero to five stars, is assigned to each broker. Trade execution, ease of use, and range of offerings are three of the areas evaluated. The following data show the point values for trade execution, ease of use, range of offerings, and the star rating for a sample of 10 of the online brokers that Barron’s evaluated. The Brokers data file can be found here as CSV file, here as an SPSS file, or in the following directory: http://bit.ly/MSBA_Data.

brokers <- read_csv('https://www.dropbox.com/s/jfu08pc6lrg00dy/Brokers.csv?dl=1')

## Parsed with column specification:
## cols(
##   Broker = col_character(),
##   TradeEx = col_double(),
##   Use = col_double(),
##   Range = col_double(),
##   Rating = col_double()
## )

model <- lm(Rating ~ TradeEx + Use+Range, data = brokers)
model

## 
## Call:
## lm(formula = Rating ~ TradeEx + Use + Range, data = brokers)
## 
## Coefficients:
## (Intercept)      TradeEx          Use        Range  
##      0.3451       0.2548       0.1325       0.4585

brokers <- brokers %>%
mutate(TradeEx = as.factor(TradeEx), Use = as.factor(Use), Range = as.factor(Range),Rating = as.factor(Rating))

Determine the estimated regression equation (using all predictor variables) that can be used to predict the star rating given the point values for execution, ease of use, and range of offerings.
Answer:

.3451 + .2548(x1)+.1325(X2)+.4585(X3)

Use the $F$-test to determine the overall significance of the relationship. What is the conclusion at the $\alpha$=.05 level of significance?

summary(model)

## 
## Call:
## lm(formula = Rating ~ TradeEx + Use + Range, data = brokers)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3251 -0.1171 -0.0599  0.1460  0.3366 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.34510    0.53067   0.650  0.53958   
## TradeEx      0.25482    0.08556   2.978  0.02469 * 
## Use          0.13249    0.14043   0.944  0.38185   
## Range        0.45852    0.12319   3.722  0.00983 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2431 on 6 degrees of freedom
## Multiple R-squared:  0.8856, Adjusted R-squared:  0.8284 
## F-statistic: 15.49 on 3 and 6 DF,  p-value: 0.00313

qf(.95,3,6)

## [1] 4.757063

Statistically Significant

Use the $t$-test to determine the significance of each independent variable. What is your conclusion at the $\alpha$.05 level of significance?

 summary(model)

## 
## Call:
## lm(formula = Rating ~ TradeEx + Use + Range, data = brokers)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3251 -0.1171 -0.0599  0.1460  0.3366 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.34510    0.53067   0.650  0.53958   
## TradeEx      0.25482    0.08556   2.978  0.02469 * 
## Use          0.13249    0.14043   0.944  0.38185   
## Range        0.45852    0.12319   3.722  0.00983 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2431 on 6 degrees of freedom
## Multiple R-squared:  0.8856, Adjusted R-squared:  0.8284 
## F-statistic: 15.49 on 3 and 6 DF,  p-value: 0.00313

Answer:

Remove any independent variable that is not significant from the estimated regression equation. What is your recommended estimated regression equation? Compare the $R^2$ with the value of $R^2$ from part (a). Discuss the differences.
Answer:

Consider multicolinearity in the context of regression.
1. In your own words, what does multicollinearity mean?
  Answer:
2. In your own words, what are the implications of multicollinearity?
  Answer:

Understanding the factors that contribute to housing values is an important consideration of investment firms interested in the residential market. The Mishawaka residential real estate market was evaluated as “opportunistic” by an investment firm. To better understand the Mishawaka market, the firm seeks to model the asking price a house by various characteristics of the house. This group hires you as a consultant to help them understand the market as it exists today, in hopes of capitalizing on changes in the marketplace in the future. The data set is available for download here as CSV file, here as an SPSS file, or in the following http://bit.ly/MSBA_Data directory.

house <- read_csv('https://www.dropbox.com/sh/4wm9mff88ium0wv/AABgY6zWFMOt9wd7FIj3pFGIa/Mishawaka_Asking.csv?dl=1')

## Parsed with column specification:
## cols(
##   MLS = col_double(),
##   Asking = col_double(),
##   Square_Footage = col_double(),
##   Proportion_Brick_or_Stone = col_double(),
##   Baths = col_double(),
##   Bedrooms = col_double(),
##   Bed_and_Bathrooms = col_double(),
##   Garage = col_double(),
##   Proportion_Granite = col_double(),
##   Proportion_Hardwood = col_double(),
##   Basement = col_double(),
##   Basement_Finished = col_double(),
##   Stories2 = col_double(),
##   Lot_Size = col_double(),
##   Acreage = col_double(),
##   Culdasac = col_double(),
##   Fireplaces = col_double(),
##   Year_Built = col_double()
## )

What are some relevant descriptive statistics of the data?

summary(house)

##       MLS              Asking        Square_Footage Proportion_Brick_or_Stone
##  Min.   : 111113   Min.   :  16900   Min.   : 520   Min.   :0.0000           
##  1st Qu.:3567585   1st Qu.:  64900   1st Qu.:1008   1st Qu.:0.0000           
##  Median :6016570   Median : 112450   Median :1366   Median :0.0000           
##  Mean   :5638786   Mean   : 152970   Mean   :1548   Mean   :0.0675           
##  3rd Qu.:7837547   3rd Qu.: 170475   3rd Qu.:1797   3rd Qu.:0.0000           
##  Max.   :9803406   Max.   :2500000   Max.   :6451   Max.   :1.0000           
##      Baths         Bedrooms    Bed_and_Bathrooms     Garage    
##  Min.   :1.00   Min.   :1.00   Min.   :2.00      Min.   :0.00  
##  1st Qu.:1.00   1st Qu.:2.00   1st Qu.:3.50      1st Qu.:1.00  
##  Median :1.50   Median :3.00   Median :4.50      Median :2.00  
##  Mean   :1.64   Mean   :2.93   Mean   :4.57      Mean   :1.48  
##  3rd Qu.:2.00   3rd Qu.:3.00   3rd Qu.:5.50      3rd Qu.:2.00  
##  Max.   :4.00   Max.   :6.00   Max.   :9.00      Max.   :4.00  
##  Proportion_Granite Proportion_Hardwood    Basement    Basement_Finished
##  Min.   :0.0000     Min.   :0.0000      Min.   :0.00   Min.   :0.0      
##  1st Qu.:0.0000     1st Qu.:0.0000      1st Qu.:1.00   1st Qu.:0.0      
##  Median :0.0000     Median :0.1000      Median :1.00   Median :0.0      
##  Mean   :0.0585     Mean   :0.1005      Mean   :0.83   Mean   :0.3      
##  3rd Qu.:0.0000     3rd Qu.:0.1000      3rd Qu.:1.00   3rd Qu.:1.0      
##  Max.   :1.0000     Max.   :1.0000      Max.   :1.00   Max.   :1.0      
##     Stories2       Lot_Size         Acreage        Culdasac      Fireplaces  
##  Min.   :0.00   Min.   :  1914   Min.   :0.00   Min.   :0.00   Min.   :0.00  
##  1st Qu.:0.00   1st Qu.:  5040   1st Qu.:0.00   1st Qu.:0.00   1st Qu.:0.00  
##  Median :0.00   Median :  8955   Median :0.00   Median :0.00   Median :0.00  
##  Mean   :0.28   Mean   : 19751   Mean   :0.07   Mean   :0.01   Mean   :0.41  
##  3rd Qu.:1.00   3rd Qu.: 18578   3rd Qu.:0.00   3rd Qu.:0.00   3rd Qu.:1.00  
##  Max.   :1.00   Max.   :261360   Max.   :1.00   Max.   :1.00   Max.   :1.00  
##    Year_Built  
##  Min.   :1882  
##  1st Qu.:1922  
##  Median :1950  
##  Mean   :1948  
##  3rd Qu.:1972  
##  Max.   :2008

What are some relevant visualizations that help one understand the data?
```
pairs(house[2:18])
```
The data set contains, among other things, the variables Baths (total number of bathrooms), Bedrooms (total number of bedrooms), and another variable, Bed_and_Bathrooms (which is the sum of Bath and Bedrooms). Explain why including (a) Baths, (b) Bedrooms, and (c) Bed_and_Bathrooms would or would not be a problem in a regression model (i.e., a model that includes all three variables?
Answer:

Fit a model in which the only predictors are Bed_and_Bathrooms (variable 1) and Square_Footage (variable 2). Provided the estimated regression equation.

  lm(Asking ~ Bed_and_Bathrooms + Square_Footage, data = house)

## 
## Call:
## lm(formula = Asking ~ Bed_and_Bathrooms + Square_Footage, data = house)
## 
## Coefficients:
##       (Intercept)  Bed_and_Bathrooms     Square_Footage  
##          -89587.1           -45720.7              291.7

Remove MLS=1575935 and rerun the model above (i.e., with Bed and Bathrooms and Square Footage).

  house2 <- house %>%
  filter(MLS != 1575935)
       lm(Asking ~ Bed_and_Bathrooms + Square_Footage, data = house2)

## 
## Call:
## lm(formula = Asking ~ Bed_and_Bathrooms + Square_Footage, data = house2)
## 
## Coefficients:
##       (Intercept)  Bed_and_Bathrooms     Square_Footage  
##         -18931.68             227.37              98.23

Describe what major change happened in the output from d and e above. Explain why.
Answer: MLS 1575935 is an outlier, removing it changes the coefficients considerably.

Again with MLS=1575935 removed, fit a model with Square Footage, Year Built, Basement Finished, and Lot Size and save the unstandardized residuals and the unstandardized predicted values. Provide a visual assessment and comments on the normality of errors assumption.

lm(Asking ~ Square_Footage + Year_Built+ Basement_Finished+Lot_Size, data = house2)

## 
## Call:
## lm(formula = Asking ~ Square_Footage + Year_Built + Basement_Finished + 
##     Lot_Size, data = house2)
## 
## Coefficients:
##       (Intercept)     Square_Footage         Year_Built  Basement_Finished  
##        -1.583e+06          6.647e+01          8.172e+02          5.078e+04  
##          Lot_Size  
##         3.356e-01

For the model and data in part h, provide comments and a visual assessment of the homoscedasticity assumption. Do this by plotting the saved values (with the unstandardized residuals on the ordinate [$y$-axis] and the unstandardized predicted values on the abscissa [$x$-axis]).
```
 ```r
 model <-lm(Asking ~ Square_Footage + Year_Built+ Basement_Finished+Lot_Size, data = house2) 
 ```
```
```
plot(model)
```
For the model and data in part h, remove all houses that have an Asking price of $225,000 or above. Now, like h, provide a visual assessment and comments on the normality of errors assumption.
```
house3 <- house2 %>%
filter(Asking < 225000)
model.2 <- lm(Asking ~ Square_Footage + Year_Built+ Basement_Finished+Lot_Size, data = house3)
plot(model.2)
```
For the model and data in part j, provide comments and a visual assessment of the homoscedasticity assumption as you did in part i. Again, do this by plotting the saved values (with the unstandardized residuals on the ordinate [$y$-axis] and the unstandardized predicted values on the abscissa [$x$-axis]).
```
plot(model.2)
```
a Does this data (i.e., no houses at or above $225,000 [as in j and k]) satisfy the assumptions any better or any worse than the data that excluded only the house with an asking price of $2,500,000 (as in h and i)?
Answer: Removig the outliers greatly increases the homoscedasity
What are the implications and are there any cautions to consider by conditioning the analysis to houses that are less than $225,000?
Answer: Need to use caution to predict values above 225,000 becuase we have removed them
Provide a matrix plot for Asking, Square Footage, Year Built, Basement Finished, and Lot Size.

    house3 %>%
  select(Asking,Square_Footage, Year_Built,Basement_Finished,Lot_Size) %>%
  pairs()

Again using the data for houses less than $225,000 as a basis of your model, consider a house in the area that has the following properties: Square Footage=1,800, Year Built=1996, Basement Finished=1, and Lot Size=43,560.
1. What is the optimal predicted value for the Asking Price?

   new <- data.frame(Square_Footage =1800,
                  Year_Built =1996,
                  Basement_Finished=1,
                  Lot_Size=43560)
predict(model.2, newdata = new)

##        1 
## 190527.7

 i. What is the 95% prediction interval for the mean for houses like this?

new <- data.frame(Square_Footage =1800,
                  Year_Built =1996,
                  Basement_Finished=1,
                  Lot_Size=43560)
predict(model.2, newdata = new,interval ='confidence')

##        fit      lwr      upr
## 1 190527.7 174407.1 206648.3

 i. What is the 95% prediction interval for such a house you are considering purchasing?

new2 <- data.frame(Square_Footage =2628,
                  Year_Built =2015,
                  Basement_Finished=0,
                  Lot_Size=26136)
        predict(model.2, newdata = new2, interval ='confidence')

##        fit      lwr      upr
## 1 221725.9 201907.5 241544.3

In the last assignment you used the FairValue data to find three correlation coefficients. Take the FairValue data and now apply a multiple regression model in which FairValue is modeled from SharePrice and EarningsPerShare. The Fair Value data set is available here.

fair <- read_csv('https://www.dropbox.com/s/nr20yk7mcmg8s49/FairValue.csv?dl=1')

## Parsed with column specification:
## cols(
##   Company = col_character(),
##   Fair_Value = col_double(),
##   Share_Price = col_double(),
##   Earnings_Per_Share = col_double()
## )

fairmodel <- lm(Fair_Value ~ Share_Price+Earnings_Per_Share,data = fair)
fairmodel

## 
## Call:
## lm(formula = Fair_Value ~ Share_Price + Earnings_Per_Share, data = fair)
## 
## Coefficients:
##        (Intercept)         Share_Price  Earnings_Per_Share  
##            17.5142              0.7109              1.3834

summary(fairmodel)

## 
## Call:
## lm(formula = Fair_Value ~ Share_Price + Earnings_Per_Share, data = fair)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.1698  -8.7624  -0.7402   9.8206  18.7756 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        17.51418    3.70684   4.725 6.38e-05 ***
## Share_Price         0.71085    0.05527  12.861 5.01e-13 ***
## Earnings_Per_Share  1.38339    1.27796   1.082    0.289    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.65 on 27 degrees of freedom
## Multiple R-squared:  0.9144,   Adjusted R-squared:  0.908 
## F-statistic: 144.1 on 2 and 27 DF,  p-value: 3.896e-15

Overall, does the model provide any inferential evidence that is accounts for more variance in $y$ (the dependent variable) than would have been expected by chance alone? Describe how you know this?
Answer:
What is a descriptive summary of the overall fit of the model (specially how much variance in $y$ do you estimate that the model accounts for in the population?
Answer:
Describe, separately, how well each of the regressors “worked” or “did not work” in modeling the outcome variable.
Answer:
Describe, in words, what is an overall managerial summary of the model and the implications of the model that would be relevant to the top-management team (who are interested in the findings).
Answer:
Provide output in a way that would be reasonable in a report to the top-management team (do not include the raw data).
```
# The answer is.... 
```

Here is a college football data set on bowl game. The code below downloads the Excel data set and then uses the readxl (read Excel) package to load into the R workspace.
```
Bowl_Game <- openxlsx::read.xlsx("http://www.dropbox.com/s/yw2wj8htfew7i5i/Bowl_Game_Spreads.xlsx?dl=1")
# tibble::glimpse(Bowl_Game)
# summary(Bowl_Game)
```
Fit a simple or multiple regression model of your choice and summarize your findings.
```
    # The answer is.... 
```
In the last assignment you fitted multiple simple regressions. Here, fit the multiple regression model with all of the regressors
1. Fit multiple models in a model building framework.
```
    # The answer is.... 
```
1. Provide one or more model comparisons (i.e., with $F$-tests).
```
    # The answer is.... 
```
1. From the various models, pick the one that you think is the best of the alteratives to use for purposes of a managerial decision. State why you choose this model using formal statistical reasoning.
  Answer:
2. How does this model seem better – or does it – over multiple single regressor models?
3. Can causality be legitimately claimed – why or why not.
  Answer:
4. Provide an overall summary of what you learned from applying multiple regression to the data from a statistical perspective.
  Answer:
5. What is some managerial guidance you would advise based on the data and analysis?
  Answer:

Multiple Regression

Ken Kelley’s Statistics for Managerial Decision Making