library(mosaic)
library(Stat2Data)
library(readr)
library(car)
library(corrplot)
UsedCars <- read_csv("Downloads/UsedCars.csv")
## Parsed with column specification:
## cols(
##   Id = col_double(),
##   Price = col_double(),
##   Year = col_double(),
##   Mileage = col_double(),
##   City = col_character(),
##   State = col_character(),
##   Vin = col_character(),
##   Make = col_character(),
##   Model = col_character()
## )
head(UsedCars)
Cars = as.data.frame(table(UsedCars$Model))
head(Cars)
names(Cars)[1] = "Model"
names(Cars)[2] = "Count"
head(Cars)
Cars2 = subset(Cars, Count >= 2500)
Cars2
set.seed(1938575)
MyCars = sample_n(subset(UsedCars, Model  == "Civic"), 200)
range(MyCars$Year)
## [1] 2005 2017
MyCars$Age = 2017 - MyCars$Year

MODEL #1: Use Age as a predictor for Price

1

Calculate the least squares regression line that best fits your data. Interpret (in context) what the slope estimate tells you about prices and ages of your used car model. Explain why the sign (positive/negative) make sense.

lm(Price ~ Age, data = MyCars)
## 
## Call:
## lm(formula = Price ~ Age, data = MyCars)
## 
## Coefficients:
## (Intercept)          Age  
##       18442        -1360

The equation for the regression line is y = -1360x + 18442 As the value of a car’s age increases by 1 year, the value of the car’s price decreases by $1,360. The negative sign of the slope makes sense because the older a car gets, the more a car’s value depreciates.

2

Produce a scatterplot of the relationship with the regression line drawn on it.

mod1 = lm(Price ~ Age, data = MyCars)
plot(Price ~ Age, data = MyCars)
abline(mod1)

The above plot demonstrates a scatterplot of the relationship between age and price with the regression line drawn on it.

3

Produce appropriate residual plots and comment on how well your data appear to fit the conditions for a simple linear model. Don’t worry about doing transformations at this point if there are problems with the conditions.

plot(mod1$residuals ~ mod1$fitted.values)
abline(a = 0, b = 0)

The residual plot shows that the data does not fit the conditions for a simple linear model. The data is not shapeless because there appears to be a slightly curved pattern in the data, there is an obvious outlier, and the data is clustered together rather than symmetrically distributed. The model does not satisfy the second condition of zero mean.

hist(mod1$residuals)

Using a histogram of residuals, we can see that the residuals are skewed to the right - the distribution of the errors are not completely centered at zero. There appears to be outliers that are skewing the data. This plot does not satisfy the fifth condition of normality because the values do not follow a normal distribution.

qqnorm(mod1$residuals)
qqline(mod1$residuals)

Using a normal q-q plot, we can see that there is not too much variability expected because the line fits very well - the variance for Y is the same at each X (homoscedastcity). However, there is a cery slight curvature at two or three of the points, which indicates the data may be skewed. There is an overt outlier at the rightmost part of the graph. This conclusion fits with the histogram that the data is not completely normally distributed and/or there may be relationships among the errors.

4

Find the car in your sample with the largest residual (in magnitude - positive or negative). For that car, find its standardized and studentized residual. Based on these residuals, would this value be considered influential?

mod1 = lm(Price ~ Age, data = MyCars)
mod1$residuals
##            1            2            3            4            5            6 
##  3257.251557   572.051841  2601.650988  3406.451273  -864.948159  -913.948159 
##            7            8            9           10           11           12 
## -1451.347591   957.650988 -2227.748443  1628.051841 -3472.748443 -2121.948159 
##           13           14           15           16           17           18 
##  -762.948159  -372.948159   120.853546   537.051841 -3923.748443 -1922.748443 
##           19           20           21           22           23           24 
## -3928.747023   927.051841  1618.051841 -1778.748443   -83.548727    50.650988 
##           25           26           27           28           29           30 
##  1215.650988 -2490.748443 -1672.948159  -653.347591  1244.652409  -840.948159 
##           31           32           33           34           35           36 
##  1087.852125 -3755.347591  -128.147875  1150.653830  1605.451273  2636.051841 
##           37           38           39           40           41           42 
## -4374.948159 -2368.747023 -1976.948159 -2582.548727  1026.053262 -2594.548727 
##           43           44           45           46           47           48 
##  2276.251557 -2144.347591  3525.051841 -1731.748443  -811.948159  1510.650988 
##           49           50           51           52           53           54 
##   261.251557  -377.948159 -1222.748443   207.051841    73.252977  -655.347591 
##           55           56           57           58           59           60 
## -1567.948159 -1755.347591  2265.251557  -524.747023    14.051841 -2125.548727 
##           61           62           63           64           65           66 
## -2001.748443  -573.946738  1019.251557 -2734.748443   -83.548727  1513.650988 
##           67           68           69           70           71           72 
##   102.650988  -891.548727  3305.451273  -862.948159   -63.946738 -1033.547307 
##           73           74           75           76           77           78 
## 21457.650988   632.051841  4987.852125   267.251557 -1925.747023 -1712.948159 
##           79           80           81           82           83           84 
##  -543.349012 -1229.147875 -2404.147875 -1091.948159 -3769.147875 -1575.946738 
##           85           86           87           88           89           90 
## -1834.748443  1161.251557 -1363.948159 -1866.948159  -783.547307  -438.748443 
##           91           92           93           94           95           96 
## -1374.948159   463.051841  -465.948159 -1808.548727   684.652409 -1283.547307 
##           97           98           99          100          101          102 
## -1748.347591  3763.254398  -924.747023   854.251557  -367.948159  2537.051841 
##          103          104          105          106          107          108 
##  1136.051841 -2004.147875  2043.653830  3623.051841   269.251557  2165.652409 
##          109          110          111          112          113          114 
## -5648.347591 -2004.147875  -983.948159 -1735.748443  5682.451273  1593.852125 
##          115          116          117          118          119          120 
## -2372.948159  1576.252977 -2362.948159 -1653.347591 -1772.748443  2458.454114 
##          121          122          123          124          125          126 
##  -769.347591   902.451273   272.251557  -727.748443 -2781.748443  -362.948159 
##          127          128          129          130          131          132 
##  1425.053262  -732.748443  2657.251557  4144.653830  -675.946738 -1867.948159 
##          133          134          135          136          137          138 
##   272.251557  -564.946738 -1327.548727 -1507.748443  1137.051841  3387.451273 
##          139          140          141          142          143          144 
##  2326.451273 -1422.748443  -365.948159  2917.451273   764.051841  3277.251557 
##          145          146          147          148          149          150 
## -1309.147875   905.451273  4370.254398 -1845.948159  2106.653830 -2362.948159 
##          151          152          153          154          155          156 
##  -533.948159  3244.652409   636.051841 -3222.748443  3777.251557  2416.451273 
##          157          158          159          160          161          162 
## -2332.548727  2306.652409   995.853546 -4648.347591    -3.147875   137.051841 
##          163          164          165          166          167          168 
##  3936.053262   806.451273   343.652409 -3723.748443  4357.650988  3145.653830 
##          169          170          171          172          173          174 
## -1960.948159  1786.653830  1635.051841 -1389.547307 -1012.948159  1254.251557 
##          175          176          177          178          179          180 
## -1422.948159 -1293.547307   856.652409 -3323.147875  2917.451273  -477.948159 
##          181          182          183          184          185          186 
## -3483.547307  2351.652409 -1783.547307  -454.349012  2277.251557 -3777.748443 
##          187          188          189          190          191          192 
##   155.653830  3799.650988  2275.251557 -1004.147875 -2821.748443 -3096.548727 
##          193          194          195          196          197          198 
##  2637.051841   227.251557    74.051841  -616.548727    -2.146454  1137.051841 
##          199          200 
## -2067.347591  1132.051841

The residual with the largest value (in terms of magnitude) is number 73 out of 200 - ID 383124 - a 2017 Honda Civic. It has a residual with the value of 21457.650988.

rstandard(mod1)
##             1             2             3             4             5 
##  1.2598703689  0.2210853413  1.0102344048  1.3196493555 -0.3342832683 
##             6             7             8             9            10 
## -0.3532206809 -0.5612889908  0.3718607840 -0.8616694794  0.6292059063 
##            11            12            13            14            15 
## -1.3432222799 -0.8200858726 -0.2948625319 -0.1441361869  0.0472355050 
##            16            17            18            19            20 
##  0.2075586181 -1.5176642984 -0.7436987130 -1.5250614669  0.3582849631 
##            21            22            23            24            25 
##  0.6253411282 -0.6880010396 -0.0323665350  0.0196680383  0.4720435055 
##            26            27            28            29            30 
## -0.9633950910 -0.6465573369 -0.2526733170  0.4813524335 -0.3250078009 
##            31            32            33            34            35 
##  0.4204115422 -1.4523297333 -0.0495240525  0.4518221189  0.6219471725 
##            36            37            38            39            40 
##  1.0187755364 -1.6908203733 -0.9195004892 -0.7640465904 -1.0004718961 
##            41            42            43            44            45 
##  0.3995016141 -1.0051206613  0.8804299694 -0.8292973391  1.3623543074 
##            46            47            48            49            50 
## -0.6698219379 -0.3137999445  0.5865935166  0.1010493324 -0.1460685760 
##            51            52            53            54            55 
## -0.4729461343  0.0800209415  0.0284353491 -0.2534467899 -0.6059771671 
##            56            57            58            59            60 
## -0.6788568666  0.8761752860 -0.2036963590  0.0054307247 -0.8234314199 
##            61            62            63            64            65 
## -0.7742550754 -0.2234705127  0.3942356961 -1.0577717042 -0.0323665350 
##            66            67            68            69            70 
##  0.5877584319  0.0398599046 -0.3453833944  1.2805222481 -0.3335103127 
##            71            72            73            74            75 
## -0.0248981473 -0.4003015788  8.3321157877  0.2442740098  1.9276062949 
##            76            77            78            79            80 
##  0.1033700688 -0.7475366987 -0.6620164492 -0.2109852043 -0.4750167250 
##            81            82            83            84            85 
## -0.9291074518 -0.4220137306 -1.4566256152 -0.6136068070 -0.7096612459 
##            86            87            88            89            90 
##  0.4491597906 -0.5271356943 -0.7215340316 -0.3034744727 -0.1697032463 
##            91            92            93            94            95 
## -0.5313869502  0.1789592603 -0.1800786230 -0.7006265381  0.2647800308 
##            96            97            98            99           100 
## -0.4971286849 -0.6761497115  1.4952512229 -0.3589684046  0.3304154454 
##           101           102           103           104           105 
## -0.1422037979  0.9805142334  0.4390588249 -0.7745233746  0.8024724550 
##           106           107           108           109           110 
##  1.4002291326  0.1041436476  0.8375366886 -2.1844218016 -0.7745233746 
##           111           112           113           114           115 
## -0.3802741274 -0.6713690955  2.2013651626  0.6159603999 -0.9170918023 
##           116           117           118           119           120 
##  0.6118700604 -0.9132270242 -0.6394097504 -0.6856803033  0.9706472694 
##           121           122           123           124           125 
## -0.2975347433  0.3496070089  0.1053040157 -0.2814853823 -1.0759508059 
##           126           127           128           129           130 
## -0.1402714088  0.5548552882 -0.2834193293  1.0277967301  1.6274627755 
##           131           132           133           134           135 
## -0.2631849857 -0.7219205094  0.1053040157 -0.2199662945 -0.5142885315 
##           136           137           138           139           140 
## -0.5831811128  0.4394453027  1.3122888105  0.9012604839 -0.5503040139 
##           141           142           143           144           145 
## -0.1414308423  1.1302121720  0.2952890804  1.2676061569 -0.5059335404 
##           146           147           148           149           150 
##  0.3507692002  1.7364301059 -0.7134179976  0.8272103847 -0.9132270242 
##           151           152           153           154           155 
## -0.2063591140  1.2548253002  0.2458199210 -1.2465249304  1.4610008559 
##           156           157           158           159           160 
##  0.9361262232 -0.9036226203  0.8920665257  0.3892285067 -1.7976853682 
##           161           162           163           164           165 
## -0.0012165284  0.0529674950  1.5325321696  0.3124168869  0.1329029070 
##           166           167           168           169           170 
## -1.4403064188  1.6920981993  1.2351898911 -0.7578629455  0.7015574087 
##           171           172           173           174           175 
##  0.6319112509 -0.5381833778 -0.3914819838  0.4851312046 -0.5499378850 
##           176           177           178           179           180 
## -0.5010017691  0.3312986973 -1.2842643691  1.1302121720 -0.1847163567 
##           181           182           183           184           185 
## -1.3492072183  0.9094696652 -0.6907828970 -0.1764260484  0.8808167588 
##           186           187           188           189           190 
## -1.4611930463  0.0611198967  1.4754239411  0.8800431800 -0.3880631816 
##           191           192           193           194           195 
## -1.0914223818 -1.1995940072  1.0191620142  0.0878984928  0.0286193931 
##           196           197           198           199           200 
## -0.2388491910 -0.0008389398  0.4394453027 -0.7995186338  0.4375129136

The standardized residual value, for this car, is 8.3321157877.

rstudent(mod1)
##             1             2             3             4             5 
##  1.2617524670  0.2205535632  1.0102871605  1.3221398557 -0.3335321812 
##             6             7             8             9            10 
## -0.3524386382 -0.5603157475  0.3710501437 -0.8611068278  0.6282433903 
##            11            12            13            14            15 
## -1.3459725343 -0.8194051357 -0.2941815833 -0.1437792892  0.0471163379 
##            16            17            18            19            20 
##  0.2070563434 -1.5227095983 -0.7428565718 -1.5302193575  0.3574949628 
##            21            22            23            24            25 
##  0.6243768643 -0.6870832376 -0.0322847833  0.0196183278  0.4711151338 
##            26            27            28            29            30 
## -0.9632194053 -0.6456044404 -0.2520750887  0.4804165331 -0.3242725441 
##            31            32            33            34            35 
##  0.4195358454 -1.4564359495 -0.0493991394  0.4509122209  0.6209814906 
##            36            37            38            39            40 
##  1.0188735588 -1.6988544863 -0.9191400916 -0.7632407067 -1.0004742932 
##            41            42            43            44            45 
##  0.3986521975 -1.0051468556  0.8799279662 -0.8286408573  1.3653238562 
##            46            47            48            49            50 
## -0.6688865945 -0.3130843794  0.5856194219  0.1007964334 -0.1457070997 
##            51            52            53            54            55 
## -0.4720170052  0.0798199034  0.0283635097 -0.2528469803 -0.6050062636 
##            56            57            58            59            60 
## -0.6779298148  0.8756591174 -0.2032026155  0.0054169938 -0.8227593690 
##            61            62            63            64            65 
## -0.7734691881 -0.2229335945  0.3933933202 -1.0580910084 -0.0322847833 
##            66            67            68            69            70 
##  0.5867844335  0.0397592804 -0.3446139361  1.2826065079 -0.3327605283 
##            71            72            73            74            75 
## -0.0248352324 -0.3994511072 10.3135627072  0.2436930982  1.9410313845 
##            76            77            78            79            80 
##  0.1031114851 -0.7467010338 -0.6610746119 -0.2104754000 -0.4740858807 
##            81            82            83            84            85 
## -0.9287851218 -0.4211361333 -1.4607905725 -0.6126381016 -0.7087688652 
##            86            87            88            89            90 
##  0.4482525360 -0.5261722017 -0.7206577240 -0.3027775774 -0.1692864721 
##            91            92            93            94            95 
## -0.5304217233  0.1785212094 -0.1796380145 -0.6997229473  0.2641573190 
##            96            97            98            99           100 
## -0.4961814755 -0.6752200874  1.4999632406 -0.3581773411  0.3296709072 
##           101           102           103           104           105 
## -0.1418514867  0.9804182073  0.4381620361 -0.7737380293  0.8017482825 
##           106           107           108           109           110 
##  1.4036556755  0.1038831709  0.8369028106 -2.2056380606 -0.7737380293 
##           111           112           113           114           115 
## -0.3794512142 -0.6704351127  2.2231735094  0.6149924816 -0.9167220641 
##           116           117           118           119           120 
##  0.6109008072 -0.9128424686 -0.6384525377 -0.6847600713  0.9705047982 
##           121           122           123           124           125 
## -0.2968488107  0.3488307283  0.1050407013 -0.2808298566 -1.0763816364 
##           126           127           128           129           130 
## -0.1399236923  0.5538831407 -0.2827600802  1.0279437995  1.6343156970 
##           131           132           133           134           135 
## -0.2625654666 -0.7210447508  0.1050407013 -0.2194369344 -0.5133311531 
##           136           137           138           139           140 
## -0.5822068058  0.4385481009  1.3147005347  0.9008313662 -0.5493328523 
##           141           142           143           144           145 
## -0.1410803680  1.1310086928  0.2946073341  1.2695630026 -0.5049808312 
##           146           147           148           149           150 
##  0.3499910589  1.7453801387 -0.7125305369  0.8265483141 -0.9128424686 
##           151           152           153           154           155 
## -0.2058594839  1.2566592768  0.2452358024 -1.2482808059  1.4652260683 
##           156           157           158           159           160 
##  0.9358325326 -0.9032021419  0.8916045107  0.3883929793 -1.8079550531 
##           161           162           163           164           165 
## -0.0012134524  0.0528339437  1.5378050886  0.3117037936  0.1325727822 
##           166           167           168           169           170 
## -1.4442504384  1.7001572152  1.2368412476 -0.7570455398  0.7006549325 
##           171           172           173           174           175 
##  0.6309500434 -0.5372156814 -0.3906433559  0.4841924278 -0.5489668102 
##           176           177           178           179           180 
## -0.5000520679  0.3305526570 -1.2863861584  1.1310086928 -0.1842651882 
##           181           182           183           184           185 
## -1.3520252506  0.9090709037 -0.6898680819 -0.1759937980  0.8803160554 
##           186           187           188           189           190 
## -1.4654209155  0.0609659333  1.4798508689  0.8795398790 -0.3872292692 
##           191           192           193           194           195 
## -1.0919524200 -1.2009329275  1.0192621117  0.0876779566  0.0285470895 
##           196           197           198           199           200 
## -0.2382796028 -0.0008368186  0.4385481009 -0.7987875507  0.4366177868

The studentized residual, for this car, is 10.3135627072. This value would definitely be considered influential because its residual value, its standardized residual value, and its studentized residual value are much higher than all other values for their respective categories - it is an extremity.

5

Determine the leverage for the car with the largest absolute residual. What does this day about the potential for this car to be influential on your model?

hatvalues(mod1)[73]
##         73 
## 0.01457741
2*(2/73)
## [1] 0.05479452
3*(2/73)
## [1] 0.08219178

The leverage is less than the two or three times the average leverage (according to the linear model), so the value does ot need to be checked.

6

Compute and interpret in context a 90% confidence interval for the slope of your model.

confint(mod1, level = 0.9)
##                   5 %      95 %
## (Intercept) 17924.718 18959.980
## Age         -1477.657 -1241.944

There is a 90% chance that the true slope of the data falls between -1477.657 and -1241.944. The confidence interval calculated has a 90% chance of containing the true slope for the data.

7

Test the strength of the linear relationship between your variables using each of the three methods (test for correlation, test for slope, ANOVA for regression). Include hypotheses for each test and your conclusions in the context of the problem.

cor(MyCars$Price, MyCars$Age)
## [1] -0.8046167

H0: r;Age=0 H&alpha: r;Age!=0 The r value for the correlation test is -0.805, which means there is a strong negative relationship between age and price. As the age of a car goes up, the price of the car goes down.

summary(mod1)
## 
## Call:
## lm(formula = Price ~ Age, data = MyCars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5648.3 -1717.6  -370.4  1222.9 21457.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18442.35     313.22   58.88   <2e-16 ***
## Age         -1359.80      71.32  -19.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2594 on 198 degrees of freedom
## Multiple R-squared:  0.6474, Adjusted R-squared:  0.6456 
## F-statistic: 363.6 on 1 and 198 DF,  p-value: < 2.2e-16

H0Age=0.0 H&alphaAge>0.0 According to the test, our p-value is approximately 0, which indicates strong evidence to reject the null hypothesis. We can conclude there exists a relationship between variables age and price. Additionally, the r-squared value of .647 allows us to conclude that a large proportion of total variability in the response variable is explained by my model.

anova(mod1)

H0: βAge=0 H&alpha: βAge!=0 The test shows an F-value of 363.56 and a p-value of approximately 0, so we can reject the null hypothesis and conclude there exists a relationship between variables age and price.

8

Suppose that you are interested in purchasing a car of this model that is three years old (in 2017). Determine each of the following: 90% confidence interval for the mean price at this age and 90% prediction interval for the price of an individual car at this age. Write sentences that carefully interpret each of the intervals (in terms of car prices).

newdata=data.frame(Age = 3)
predict.lm(mod1, newdata, interval = "confidence", level = 0.9)
##        fit      lwr     upr
## 1 14362.95 14052.69 14673.2
predict.lm(mod1, newdata, interval = "prediction", level = 0.9)
##        fit      lwr      upr
## 1 14362.95 10064.48 18661.42

The predicted value for purchasing a car of this model that is three years old is 14,362.95 dollars. We are 90% confident that the mean price of a car of this model, that is three years old, falls between 14,052.69 dollars and 14,673.2. We are 90% confident that the price of an individual car falls between 10,064.48 dollars and 18,661.42 dollars.

9

According to your model, is there an age at which the car should be free? If so, find this age and comment on what the “free car” phenomenon says about the appropriateness of your model.

18442/1360
## [1] 13.56029

I set the price = 0 and solved for age (0 = -1360x + 18442). At approximately 13.6 years, the car should be free according to my model. The “free car” phenomenon basically says that my model may only be useful for certain ages and will fail to be as useful after a certain age because it does not accurately follow the rate of depreciation.

10

Experiment with some transformations to attempt to find one that seems to do a better job of satisfying the linearity condition. Include the summary output for fitting that model and a scatterplot of the transformed variable(s) with the least squares line. Explain why you think that this transformation does or does not improve satisfying the linear model conditions.

mod3 = lm(log(Price) ~ Age, data = MyCars)
plot(log(Price) ~ Age, data = MyCars)
abline(mod3)

summary(mod3)
## 
## Call:
## lm(formula = log(Price) ~ Age, data = MyCars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59982 -0.09889 -0.01009  0.10830  0.72006 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.874073   0.019935  495.31   <2e-16 ***
## Age         -0.115115   0.004539  -25.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1651 on 198 degrees of freedom
## Multiple R-squared:  0.7646, Adjusted R-squared:  0.7634 
## F-statistic: 643.2 on 1 and 198 DF,  p-value: < 2.2e-16
plot(mod3$residuals ~ mod3$fitted.values)
abline(a = 0, b = 0)

I tried a multitude of different transformations (logarithmic, exponential, square root, etc.) and making the response variable a logarithmic function was the biggest improvement I could get. I think satisfies the linear conditions more in some aspects, but not all. The scatterplot still demonstrates a negative linear relationship. The r-squared value has increased to 0.765, which makes it a stronger fit than the original model. Lastly, the residual plot shows more of a shapeless distribution than the previous one did because the plot appears shapeless, with more of the distribution centered around 0, and the distribution appears more distributed than the original residual plot. The transformation is not an extreme improvement, but it is an improvement.

MODEL #2: Use Age and Miles as predictors for Price

1

Run the model with two predictors (age and miles) for price as the response variable and provide the output (both the summary and the anova for the model).

mod2 = lm(Price ~ Age + Mileage, data = MyCars)
plot(Price ~ Age + Mileage, data = MyCars)

summary(mod2)
## 
## Call:
## lm(formula = Price ~ Age + Mileage, data = MyCars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4208.5 -1471.6  -217.5   979.8 21253.5 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.882e+04  3.017e+02  62.394  < 2e-16 ***
## Age         -9.674e+02  9.891e+01  -9.781  < 2e-16 ***
## Mileage     -3.664e-02  6.815e-03  -5.377 2.13e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2429 on 197 degrees of freedom
## Multiple R-squared:  0.6925, Adjusted R-squared:  0.6894 
## F-statistic: 221.9 on 2 and 197 DF,  p-value: < 2.2e-16
anova(mod2)

2

Find the largest residual for a car in your sample.

mod2$residuals
##           1           2           3           4           5           6 
##  4313.66158   405.31296  2737.80635  3136.49105 -2182.37832 -1288.88434 
##           7           8           9          10          11          12 
## -1629.29669   683.09120 -2517.76887   176.25711 -3104.64997 -2433.82423 
##          13          14          15          16          17          18 
## -1004.21596   -72.36069   736.34478   466.75352 -4208.52913 -1140.16074 
##          19          20          21          22          23          24 
##  -191.76411    97.28409   961.85511 -1793.07784  -350.06465  -215.55453 
##          25          26          27          28          29          30 
##  1110.74158 -1397.84344 -2202.03534 -1165.21115   492.85997  -298.37989 
##          31          32          33          34          35          36 
##   467.10715  -900.91518  -835.58676   975.36647  1878.04250  1913.60719 
##          37          38          39          40          41          42 
## -3737.36835 -2592.67918 -2305.23964 -1674.26322   178.97761 -1061.37821 
##          43          44          45          46          47          48 
##  2140.63863 -1186.22539  4107.55940 -2305.77754 -1194.39585  1189.66636 
##          49          50          51          52          53          54 
##   -53.09886  -796.67099 -1604.33611  -306.75582    98.37345   380.01500 
##          55          56          57          58          59          60 
## -1258.16366 -2199.97391  3124.74988  -421.58005  -684.24603 -1451.65923 
##          61          62          63          64          65          66 
## -1544.94077   107.33336   136.62732 -2687.48339  -309.50245  1139.60939 
##          67          68          69          70          71          72 
##  -225.73524  -834.15333  3796.16965  -918.22344  -209.37332 -1354.56230 
##          73          74          75          76          77          78 
## 21253.51626   350.11192  4417.04958   145.08608 -1578.87709 -2207.15259 
##          79          80          81          82          83          84 
##  -812.08279  -667.03029 -1916.44928 -2424.73113 -2369.95405   566.23202 
##          85          86          87          88          89          90 
## -1317.92191  1361.45845 -1970.78872 -1942.04652 -1352.84545  -703.44956 
##          91          92          93          94          95          96 
## -1901.58036  -630.82822 -1253.35828 -1820.69900   204.70364  -794.67409 
##          97          98          99         100         101         102 
##   -36.47204  1941.77415 -1245.52280   902.65250   174.03385  2160.54008 
##         103         104         105         106         107         108 
##  1546.71053 -2356.05379  1912.70275  3479.94676   331.28315   375.91474 
##         109         110         111         112         113         114 
##  -788.04673 -2177.13309 -1622.48366 -1387.10663  5031.89525  1197.17023 
##         115         116         117         118         119         120 
## -2769.61278  -379.39554 -2595.56855  2372.55186 -2233.62843  1565.01172 
##         121         122         123         124         125         126 
##  -219.52220   390.40032   246.70985  -762.63375 -2666.36675 -1028.37856 
##         127         128         129         130         131         132 
##   524.11453 -1129.39579  3091.85445  3067.54461  -143.76111 -2275.64188 
##         133         134         135         136         137         138 
##   -65.29296   388.98314 -1469.99635 -1476.56903   810.33595  3398.05541 
##         139         140         141         142         143         144 
##  1990.13320 -1545.86661  -679.17996  3287.98537   464.52398  2851.84060 
##         145         146         147         148         149         150 
## -1994.23542   425.31511  3147.93680  -838.43524  1880.69121 -2064.92560 
##         151         152         153         154         155         156 
##  -641.18116  3576.24072  1650.23352 -3100.22164  3786.66589  2894.52832 
##         157         158         159         160         161         162 
## -2298.27415  1655.66088   215.44824 -3128.65697  -632.50362  -205.16343 
##         163         164         165         166         167         168 
##  3639.81005   884.40258  1514.66202 -3614.99887  4225.04057   965.74383 
##         169         170         171         172         173         174 
## -2452.99074  -157.72427   886.81154   480.85897 -1171.22284  1034.98596 
##         175         176         177         178         179         180 
## -1310.99140 -1192.89133   992.90461 -2983.77428  2411.44617 -1302.51281 
##         181         182         183         184         185         186 
## -1739.29789  1040.63661 -1655.88650  -713.48271  3092.01055 -4112.10515 
##         187         188         189         190         191         192 
##  2379.91175  3563.41826  2620.63227 -1700.22788 -1769.22243 -2352.85707 
##         193         194         195         196         197         198 
##  1345.30740   166.68052   114.63086  -548.05094   333.91631   734.92763 
##         199         200 
##  1034.89163   798.74047

The largest residual for a car in my sample isnumber 73 out of 200 - ID 383124 - a 2017 Honda Civic. It has a value of 21253.516.

3

Assess the importance of each of the predictors in the model - be sure to indicate the specific value(s) from the output you are using to make the assessments. Include hypotheses and conclusions in context.

cor.test(MyCars$Price, MyCars$Age)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = -19.067, df = 198, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8486231 -0.7495505
## sample estimates:
##        cor 
## -0.8046167
cor.test(MyCars$Price, MyCars$Mileage)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = -15.345, df = 198, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7945271 -0.6664383
## sample estimates:
##        cor 
## -0.7370317
cor.test(MyCars$Age, MyCars$Mileage)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = 15.38, df = 198, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6673629 0.7951403
## sample estimates:
##       cor 
## 0.7377914

Both age and mileage are important predictors of price. I used a t-test for correlation to detect if age and mileage were good predictors for price and to test if there was a relationship between age and price and between mileage and price. The null hypothesis states that p = 0 and there is no true correlation between the variables being tested. The alternative hypothesis states that p is NOT equal to 0 and there is a true correlation between the variables being tested. The p-value for the correlation between age and price was less than 2.2e-16, which is not equal to 0, and thus, indicates we reject the null hypothesis and assume a correlation between age and price is present. The p-value for the correlation between mileage and price was also less than 2.2e16, which is not equal to 0, and thus, indicates we reject the null hypothesis and assume a correlation between mileage and price is present.

4

Assess the overall effectiveness of this model (with a formal test). Again, be sure to include hypotheses and the specific value(s) you are using from the output to reach a conclusion.

summary(mod2)
## 
## Call:
## lm(formula = Price ~ Age + Mileage, data = MyCars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4208.5 -1471.6  -217.5   979.8 21253.5 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.882e+04  3.017e+02  62.394  < 2e-16 ***
## Age         -9.674e+02  9.891e+01  -9.781  < 2e-16 ***
## Mileage     -3.664e-02  6.815e-03  -5.377 2.13e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2429 on 197 degrees of freedom
## Multiple R-squared:  0.6925, Adjusted R-squared:  0.6894 
## F-statistic: 221.9 on 2 and 197 DF,  p-value: < 2.2e-16

I performed a t-test and used the r-squared value of 0.693 and the p-value of approximately 0 to assess the effectiveness of my model. The r-sqaured demonstrates that most of the variability in the response variable (Price) can be explained by my linear combination of gas and mileage. It appears that this model is pretty effective. The r value of 0.83 (square root of r-squared) demonstrates a strong positive linear relationship between price and variables age and mileage. The p-value being close to 0 allows us to reject the null hypothesis and conclude a linear relationship from our model. Our model appears effective because of the conclusions we were able to draw from the p-value and r and r-squared values. H0: p-value;Age=0 H&alpha: p-value ;Age!=0

5 - Compute and interpret the variance inflation factor (VIF) for your predictors.

vif(mod2)
##     Age Mileage 
##  2.1946  2.1946

Both values for age and mileage are between 1 and 5, which means they are moderately correlated. This does not necessarily mean there is multicollinearity, but it is something to be aware of because it can impact coefficients and t-tests.

6 - Suppose that you are interested in purchasing a car of this model that is three years old (in 2017) with

31K miles. Determine each of the following: 90% confidence interval for the mean price at this age and mileage and 90% prediction interval for the price of an individual car at this age and mileage. Write sentences that carefully interpret each of the intervals (in terms of car prices).

newdata=data.frame(Age = 3, Mileage = 31000)
predict.lm(mod2, newdata, interval = "confidence", level = 0.9)
##        fit      lwr      upr
## 1 14785.55 14467.37 15103.74
predict.lm(mod2, newdata, interval = "prediction", level = 0.9)
##        fit      lwr      upr
## 1 14785.55 10759.19 18811.92

The predicted value for purchasing a car of this model that is three years old and has 31K miles is 14,785.55 dollars. We are 90% confident that the mean price of a car of this model, that is three years old and has 31K miles, falls between 14,467.37 dollars and 15,103.74. We are 90% confident that the price of an individual car falls between 10,759.19 dollars and 18,811.92 dollars.