Assignment 2

Question 2

Carefully explain the differences between the KNN classifier and KNN regression methods.

The KNN Classifier is used when the response variable is categorical (qualitative), and it shows Y as 0 or 1. The KNN Regression is used in numerical situations(quantitative), and shows/ predicts the value of Y and can be continuous.

Question 9

This question involves the use of multiple linear regression on the Auto data set.

Produce a scatterplot matrix which includes all of the variables in the data set.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.4     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(ISLR)
data(Auto)
auto <- na.omit(Auto)
plot(Auto)

Compute the matrix of correlations between the variables using the function cor(). You will need to exclude the name variable, cor() which is qualitative

Auto1<-Auto
Auto1$name=NULL
cor(Auto1)

##                     mpg  cylinders displacement horsepower     weight
## mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442
## cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273
## displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944
## horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377
## weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000
## acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392
## year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199
## origin        0.5652088 -0.5689316   -0.6145351 -0.4551715 -0.5850054
##              acceleration       year     origin
## mpg             0.4233285  0.5805410  0.5652088
## cylinders      -0.5046834 -0.3456474 -0.5689316
## displacement   -0.5438005 -0.3698552 -0.6145351
## horsepower     -0.6891955 -0.4163615 -0.4551715
## weight         -0.4168392 -0.3091199 -0.5850054
## acceleration    1.0000000  0.2903161  0.2127458
## year            0.2903161  1.0000000  0.1815277
## origin          0.2127458  0.1815277  1.0000000

Use the lm() function to perform a multiple linear regression with mpg as the response and all other variables except name as the predictors. Use the summary() function to print the results. Comment on the output. For instance:

model1<-lm(mpg~ .-name,data=Auto)
summary(model1)

## 
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

Is there a relationship between the predictors and the response? Yes, there us a relationship between predictor and response variable as show by the pvaule of less than 0.05f, meaning we reject the null hypothesis.
1. Which predictors appear to have a statistically significant relationship to the response? Displacement, weight, year, and groin have a significant relationship with MPG shown by their p-values.
2. What does the coefficient for the year variable suggest? The coefficient of year variable is positive which suggest that if all other variable are constant than an average mpg increases by 0.75 every year as it is also significant.

Use the plot() function to produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?

par(mfrow=c(2,2))
plot(model1)

plot(predict(model1),rstudent(model1))

plot(hatvalues(model1))

which.max(hatvalues(model1))

## 14 
## 14

The first graph shows that there is a non-linear relationship between the responce and the predictors; The second graph shows that the residuals are normally distributed and right skewed; The third graph shows that the constant variance of error assumption is not true for this model; The Third graphs shows that there are no leverage points. However, there on observation that stands out as a potential leverage point (labeled 14 on the graph

Use the * and : symbols to fit linear regression models with interaction effects. Do any interactions appear to be statistically significant?

model2 = lm(mpg ~.-name+displacement:weight, data = Auto)
summary(model2)

## 
## Call:
## lm(formula = mpg ~ . - name + displacement:weight, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9027 -1.8092 -0.0946  1.5549 12.1687 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -5.389e+00  4.301e+00  -1.253   0.2109    
## cylinders            1.175e-01  2.943e-01   0.399   0.6899    
## displacement        -6.837e-02  1.104e-02  -6.193 1.52e-09 ***
## horsepower          -3.280e-02  1.238e-02  -2.649   0.0084 ** 
## weight              -1.064e-02  7.136e-04 -14.915  < 2e-16 ***
## acceleration         6.724e-02  8.805e-02   0.764   0.4455    
## year                 7.852e-01  4.553e-02  17.246  < 2e-16 ***
## origin               5.610e-01  2.622e-01   2.139   0.0331 *  
## displacement:weight  2.269e-05  2.257e-06  10.054  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.964 on 383 degrees of freedom
## Multiple R-squared:  0.8588, Adjusted R-squared:  0.8558 
## F-statistic: 291.1 on 8 and 383 DF,  p-value: < 2.2e-16

model3 = lm(mpg ~.-name+displacement:cylinders+displacement:weight+acceleration:horsepower, data=Auto)
summary(model3)

## 
## Call:
## lm(formula = mpg ~ . - name + displacement:cylinders + displacement:weight + 
##     acceleration:horsepower, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3344 -1.6333  0.0188  1.4740 11.9723 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -1.725e+01  5.328e+00  -3.237  0.00131 ** 
## cylinders                6.354e-01  6.106e-01   1.041  0.29870    
## displacement            -6.805e-02  1.337e-02  -5.088 5.68e-07 ***
## horsepower               6.026e-02  2.601e-02   2.317  0.02105 *  
## weight                  -8.864e-03  1.097e-03  -8.084 8.43e-15 ***
## acceleration             6.257e-01  1.592e-01   3.931  0.00010 ***
## year                     7.845e-01  4.470e-02  17.549  < 2e-16 ***
## origin                   4.668e-01  2.595e-01   1.799  0.07284 .  
## cylinders:displacement  -1.337e-03  2.726e-03  -0.490  0.62415    
## displacement:weight      2.071e-05  3.638e-06   5.694 2.49e-08 ***
## horsepower:acceleration -7.467e-03  1.784e-03  -4.185 3.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.905 on 381 degrees of freedom
## Multiple R-squared:  0.865,  Adjusted R-squared:  0.8615 
## F-statistic: 244.2 on 10 and 381 DF,  p-value: < 2.2e-16

model4 = lm(mpg ~.-name+displacement:cylinders+displacement:weight+year:origin+acceleration:horsepower, data=Auto)
summary(model4)

## 
## Call:
## lm(formula = mpg ~ . - name + displacement:cylinders + displacement:weight + 
##     year:origin + acceleration:horsepower, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.6504 -1.6476  0.0381  1.4254 12.7893 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              5.287e+00  9.074e+00   0.583 0.560429    
## cylinders                4.249e-01  6.079e-01   0.699 0.485011    
## displacement            -7.322e-02  1.334e-02  -5.490 7.38e-08 ***
## horsepower               5.252e-02  2.586e-02   2.031 0.042913 *  
## weight                  -8.689e-03  1.086e-03  -7.998 1.54e-14 ***
## acceleration             5.796e-01  1.582e-01   3.665 0.000283 ***
## year                     5.116e-01  9.976e-02   5.129 4.66e-07 ***
## origin                  -1.220e+01  4.161e+00  -2.933 0.003560 ** 
## cylinders:displacement  -4.368e-04  2.712e-03  -0.161 0.872156    
## displacement:weight      1.992e-05  3.608e-06   5.522 6.21e-08 ***
## year:origin              1.630e-01  5.341e-02   3.051 0.002440 ** 
## horsepower:acceleration -6.735e-03  1.781e-03  -3.781 0.000181 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.874 on 380 degrees of freedom
## Multiple R-squared:  0.8683, Adjusted R-squared:  0.8644 
## F-statistic: 227.7 on 11 and 380 DF,  p-value: < 2.2e-16

model5 = lm(mpg ~.-name-cylinders-acceleration+year:origin+displacement:weight+
displacement:weight+acceleration:horsepower+acceleration:weight, data=Auto)
summary(model5)

## 
## Call:
## lm(formula = mpg ~ . - name - cylinders - acceleration + year:origin + 
##     displacement:weight + displacement:weight + acceleration:horsepower + 
##     acceleration:weight, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5074 -1.6324  0.0599  1.4577 12.7376 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              1.868e+01  7.796e+00   2.396 0.017051 *  
## displacement            -7.794e-02  9.026e-03  -8.636  < 2e-16 ***
## horsepower               8.719e-02  3.167e-02   2.753 0.006183 ** 
## weight                  -1.350e-02  1.287e-03 -10.490  < 2e-16 ***
## year                     4.911e-01  9.825e-02   4.998 8.83e-07 ***
## origin                  -1.262e+01  4.109e+00  -3.071 0.002288 ** 
## year:origin              1.686e-01  5.277e-02   3.195 0.001516 ** 
## displacement:weight      2.253e-05  2.184e-06  10.312  < 2e-16 ***
## horsepower:acceleration -9.164e-03  2.222e-03  -4.125 4.56e-05 ***
## weight:acceleration      2.784e-04  7.087e-05   3.929 0.000101 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.861 on 382 degrees of freedom
## Multiple R-squared:  0.8687, Adjusted R-squared:  0.8656 
## F-statistic: 280.8 on 9 and 382 DF,  p-value: < 2.2e-16

From the models used as shown it seems the last model as variables significant including the interaction models. It is very likely that it is the best combination of predictors & interaction terms. This showed with 87% of the changed in the response can be explain with these predictors by R squared statistic.

Try a few different transformations of the variables, such as log(X), √ X, X2. Comment on your findings.

plot(log(auto$weight), auto$mpg)

plot(sqrt(auto$weight), auto$mpg)

hist(log(auto$weight))

hist(sqrt(auto$weight))

hist(sqrt(auto$mpg))

lm = lm(mpg ~ ., data = Auto[, 1:8])
summary(lm)

## 
## Call:
## lm(formula = mpg ~ ., data = Auto[, 1:8])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

lm.fit.trans = lm(mpg ~ . + I(cylinders^2) + log(horsepower) + sqrt(displacement), data = Auto[, 1:8])
summary(lm.fit.trans)

## 
## Call:
## lm(formula = mpg ~ . + I(cylinders^2) + log(horsepower) + sqrt(displacement), 
##     data = Auto[, 1:8])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.8017 -1.5427 -0.0296  1.5157 11.7164 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         7.145e+01  1.235e+01   5.785 1.51e-08 ***
## cylinders           3.345e-01  1.512e+00   0.221  0.82505    
## displacement        7.073e-02  2.860e-02   2.473  0.01383 *  
## horsepower          1.075e-01  3.133e-02   3.430  0.00067 ***
## weight             -3.270e-03  6.561e-04  -4.984 9.45e-07 ***
## acceleration       -2.595e-01  9.838e-02  -2.637  0.00870 ** 
## year                7.578e-01  4.534e-02  16.715  < 2e-16 ***
## origin              5.691e-01  2.729e-01   2.085  0.03772 *  
## I(cylinders^2)     -1.671e-02  1.242e-01  -0.135  0.89300    
## log(horsepower)    -1.927e+01  3.582e+00  -5.379 1.31e-07 ***
## sqrt(displacement) -2.271e+00  8.504e-01  -2.671  0.00789 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.928 on 381 degrees of freedom
## Multiple R-squared:  0.8628, Adjusted R-squared:  0.8592 
## F-statistic: 239.7 on 10 and 381 DF,  p-value: < 2.2e-16

It seems both the log and square roots transformation show a linear trend, as ran the histogram of the log shows as skewed while the square roots showed relatively normal. Also by transforming horsepower with log and the square roots of displacement it made more of the predictors significant as it made their respective p values more significant. Finally, then squaring the cyliders variable had no significant impact/ changes.

Question 10

This question should be answered using the Carseats data set. a) Fit a multiple regression model to predict Sales using Price, Urban, and US.

data(Carseats)
head(Carseats)

##   Sales CompPrice Income Advertising Population Price ShelveLoc Age Education
## 1  9.50       138     73          11        276   120       Bad  42        17
## 2 11.22       111     48          16        260    83      Good  65        10
## 3 10.06       113     35          10        269    80    Medium  59        12
## 4  7.40       117    100           4        466    97    Medium  55        14
## 5  4.15       141     64           3        340   128       Bad  38        13
## 6 10.81       124    113          13        501    72       Bad  78        16
##   Urban  US
## 1   Yes Yes
## 2   Yes Yes
## 3   Yes Yes
## 4   Yes Yes
## 5   Yes  No
## 6    No Yes

str(Carseats)

## 'data.frame':    400 obs. of  11 variables:
##  $ Sales      : num  9.5 11.22 10.06 7.4 4.15 ...
##  $ CompPrice  : num  138 111 113 117 141 124 115 136 132 132 ...
##  $ Income     : num  73 48 35 100 64 113 105 81 110 113 ...
##  $ Advertising: num  11 16 10 4 3 13 0 15 0 0 ...
##  $ Population : num  276 260 269 466 340 501 45 425 108 131 ...
##  $ Price      : num  120 83 80 97 128 72 108 120 124 124 ...
##  $ ShelveLoc  : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
##  $ Age        : num  42 65 59 55 38 78 71 67 76 76 ...
##  $ Education  : num  17 10 12 14 13 16 15 10 10 17 ...
##  $ Urban      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
##  $ US         : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...

s1 = lm(Sales ~ Price+Urban+US, data= Carseats)
summary(s1)

## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!

As all other predictors are held constant, the sales in units will decrease by 54.46 units when the price increases by a thousand dollars is shown by the coefficient of the price variable. The sales are not affected by weather or not it is an urban area. It shows a US store sale in average 1200 more car seats than the overseas ones.

Write out the model in equation form, being careful to handle the qualitative variables properly.

The Model written out is as follows: The model may be written as Sales = 13.043469 + (-0.054459)(Price) + (-0.021916)(Urban) + 1.200573*(US) + ε with Urban=1 if the store is in an urban location and 0 if not, and US=1 if the store is in the US and 0 if not.

For which of the predictors can you reject the null hypothesis H0 : βj = 0?

We can reject the null hypothesis for the “Price” and “US” variables. This being due to the predictor ‘Urban’. Its p-value is not statistically significant with a value of 0.936.

On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

s2 = lm(Sales ~ Price+US, data= Carseats)
summary(s2)

## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

How well do the models in (a) and (e) fit the data?

anova(s1,s2)

## Analysis of Variance Table
## 
## Model 1: Sales ~ Price + Urban + US
## Model 2: Sales ~ Price + US
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    396 2420.8                           
## 2    397 2420.9 -1  -0.03979 0.0065 0.9357

As there is a slight decrease in the residual standard error and a slight increase in the adjusted r square the anova test shows that the difference is not statistically significant for both model and so we do not reject the null hypothesis. To add based on the two models r-squared values the models are only a 24% change in response explained and removing the non-significant predictor didn’t change much.

Using the model from (e), obtain 95 % confidence intervals for the coefficient(s).

confint(s2)

##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632

Is there evidence of outliers or high leverage observations in the model from (e)?

par(mfrow=c(2,2))
plot(s2)

Based on the Residuals vs leverage & the Normall Q-Q plot the is no evidence of the points of outliers or high leverage observations.

Question 12

This problem involves simple linear regression without an intercept.

Recall that the coefficient estimate βˆ for the linear regression of Y onto X without an intercept is given by (3.38). Under what circumstance is the coefficient estimate for the regression of X onto Y the same as the coefficient estimate for the regression of Y onto X?

From the equation, the parameter estimate will be equal if the summation of xi^2 equals the summation of yi ^2.

Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is different from the coefficient estimate for the regression of Y onto X.

x=rnorm(100)
y=rbinom(100,2,0.3)
model12<-lm(y~x+0)
summary(model12)

## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17896  0.02441  0.80486  1.01173  2.12560 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)
## x  0.08405    0.07952   1.057    0.293
## 
## Residual standard error: 0.848 on 99 degrees of freedom
## Multiple R-squared:  0.01116,    Adjusted R-squared:  0.00117 
## F-statistic: 1.117 on 1 and 99 DF,  p-value: 0.2931

model12a<-lm(x~y+0)
summary(model12a)

## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4292 -0.8298 -0.1029  0.5355  2.6864 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)
## y   0.1328     0.1256   1.057    0.293
## 
## Residual standard error: 1.066 on 99 degrees of freedom
## Multiple R-squared:  0.01116,    Adjusted R-squared:  0.00117 
## F-statistic: 1.117 on 1 and 99 DF,  p-value: 0.2931

As we can see the observations are different in both cases when looking at the coefficients

Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is the same as the coefficient estimate for the regression of Y onto X.

x=1:100
y=100:1
model12b<-lm(y~x+0)
summary(model12b)

## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -49.75 -12.44  24.87  62.18  99.49 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   0.5075     0.0866    5.86 6.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.37 on 99 degrees of freedom
## Multiple R-squared:  0.2575, Adjusted R-squared:   0.25 
## F-statistic: 34.34 on 1 and 99 DF,  p-value: 6.094e-08

model12c<-lm(x~y+0)
summary(model12c)

## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -49.75 -12.44  24.87  62.18  99.49 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y   0.5075     0.0866    5.86 6.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.37 on 99 degrees of freedom
## Multiple R-squared:  0.2575, Adjusted R-squared:   0.25 
## F-statistic: 34.34 on 1 and 99 DF,  p-value: 6.094e-08

As shown above the observations are the same for the coefficients

Assignment 2

Kelley Williams

2023-04-24

Question 2

Question 9

Question 10

Question 12