library(readr)
abalone = read_csv("http://mclean.web.unc.edu/files/2020/02/abalone.csv")
## Parsed with column specification:
## cols(
## sex = col_character(),
## length = col_double(),
## diameter = col_double(),
## height = col_double(),
## weight_whole = col_double(),
## weight_shucked = col_double(),
## weight_viscera = col_double(),
## weight_shell = col_double(),
## rings = col_double()
## )
library(leaps)
#Question 1
lm(log(rings)~weight_whole,data=abalone)
##
## Call:
## lm(formula = log(rings) ~ weight_whole, data = abalone)
##
## Coefficients:
## (Intercept) weight_whole
## 2.2313 0.1372
plot(log(rings)~weight_whole,data=abalone)
#Question 2
mod2=lm(rings~weight_whole,data=abalone)
plot(rings~weight_whole,data=abalone)
#Question 3
plot(mod2$residuals ~ mod2$fitted.values)
abline(a = 0, b = 0)
Using a plot of residuals versus fitted values, we can see that this model does not satisfy the second condition of zero mean. The points are not symmetrically distributed with many of the points close to the middle of the plot. Most of the points falls on the left side of the line, which means the points are not distributed evenly. Lastly, there may certainly be outliers skewing the data because many of the plots are close together.
hist(mod2$residuals)
Using a histogram of residuals, we can see that the residuals are skewed to the right - the distribution of the errors are not centered at zero. There appears that there may be outliers that are skewing the data. This plot does not satisfy the fifth condition of normality because the values do not follow a normal distribution.
x <- rnorm(54, 0, 18.26)
qqnorm(x)
qqline(x)
Using a normal q-q plot, we can see that there is quite a bit variability expected because the line only moderately fits overall - the variance for Y is not the same at each X (homoscedastcity). There is a bit of a curvature at a few of the points, which indicates the data may be skewed. The curves may also be another indication of outliers in the data. This conclusion fits with the histogram that the data is not normally distributed and/or there may be relationships among the errors.
#Question 4
mod1 = lm(log(rings)~weight_whole,data=abalone)
summary(mod1)
##
## Call:
## lm(formula = log(rings) ~ weight_whole, data = abalone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.48409 -0.15746 -0.04593 0.11053 0.62263
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.23126 0.03265 68.339 < 2e-16 ***
## weight_whole 0.13721 0.02831 4.846 2.06e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2234 on 288 degrees of freedom
## Multiple R-squared: 0.07539, Adjusted R-squared: 0.07218
## F-statistic: 23.48 on 1 and 288 DF, p-value: 2.063e-06
The null hypothesis is that there is not a linear relationship between log(rings) and weight whole. The alternative hypothesis is there there exists a linear relationship between log(rings) and weight whole. There is statistically significant evidence (p-value is 2.063e-06) there is a relationship between log(rings) and weight_whole.
#Question 5
lm(log(rings) ~ weight_whole, data = abalone)
##
## Call:
## lm(formula = log(rings) ~ weight_whole, data = abalone)
##
## Coefficients:
## (Intercept) weight_whole
## 2.2313 0.1372
mod1$residuals
## 1 2 3 4 5 6
## -0.391680503 -0.273104624 0.379683155 0.031628452 -0.153469300 -0.287785725
## 7 8 9 10 11 12
## -0.149833327 0.540226707 -0.087349859 -0.087953171 -0.071900004 -0.254650343
## 13 14 15 16 17 18
## -0.127714537 0.078506912 -0.141738140 -0.416446285 -0.048520404 0.099979737
## 19 20 21 22 23 24
## 0.325181387 -0.062495244 -0.287305503 -0.115065582 -0.348529042 -0.182488486
## 25 26 27 28 29 30
## -0.045070012 -0.061555027 0.036453103 -0.131996475 0.124745520 -0.248338841
## 31 32 33 34 35 36
## 0.475521932 0.028726885 -0.029840084 0.021337960 -0.117240658 -0.009553743
## 37 38 39 40 41 42
## 0.095978325 -0.247721412 0.081571637 -0.174667713 0.267554635 -0.149764723
## 43 44 45 46 47 48
## 0.101054967 0.107526098 -0.094004376 0.468867415 -0.223733172 -0.074706627
## 49 50 51 52 53 54
## -0.255359283 0.116833721 0.003412276 0.122189883 0.125843173 -0.070061833
## 55 56 57 58 59 60
## -0.185589751 -0.121651497 -0.114996979 -0.298693646 -0.008161449 0.061356093
## 61 62 63 64 65 66
## -0.020578642 -0.201285784 0.105788593 0.134967408 0.228181945 0.278896200
## 67 68 69 70 71 72
## 0.002129043 0.095223689 -0.465633952 -0.187496525 0.002609266 0.109241180
## 73 74 75 76 77 78
## -0.082807925 -0.161015661 -0.065534017 -0.106476056 -0.037089609 -0.083096456
## 79 80 81 82 83 84
## -0.364993828 -0.037523651 0.474012660 0.426834281 0.268811518 -0.342903573
## 85 86 87 88 89 90
## -0.379880739 0.122824628 0.255005826 0.479452994 0.068628040 -0.228924114
## 91 92 93 94 95 96
## -0.082664607 0.577486125 -0.051058725 -0.187976748 -0.298968060 0.113836717
## 97 98 99 100 101 102
## -0.077930981 -0.345510498 0.467995296 -0.050235486 -0.068278148 -0.018246130
## 103 104 105 106 107 108
## -0.015728038 0.297334048 0.030467302 0.332367569 -0.036082982 0.218045094
## 109 110 111 112 113 114
## -0.230991003 -0.010033966 -0.223298646 -0.101605223 -0.183923043 0.360306264
## 115 116 117 118 119 120
## 0.108143527 0.590452144 -0.165680683 -0.074726856 0.056828276 0.178787586
## 121 122 123 124 125 126
## -0.231896962 -0.196552158 -0.326575993 -0.067523512 -0.199227686 0.265038338
## 127 128 129 130 131 132
## 0.151003259 -0.023411605 -0.189074401 -0.222269596 -0.076441938 -0.118427143
## 133 134 135 136 137 138
## 0.231765749 -0.289097978 -0.068332634 0.014752046 -0.054694699 -0.258286317
## 139 140 141 142 143 144
## 0.500505410 0.241438811 0.024447530 0.081388249 0.109698980 -0.116849268
## 145 146 147 148 149 150
## 0.307229353 0.259344266 -0.084331314 -0.040199179 -0.209586780 -0.144619477
## 151 152 153 154 155 156
## -0.010288150 -0.197918112 -0.112733071 -0.091260245 0.622627081 0.110801739
## 157 158 159 160 161 162
## -0.187304833 0.104095934 0.309265427 -0.075961715 -0.218642413 0.031033444
## 163 164 165 166 167 168
## -0.084331314 -0.050901290 -0.277426631 0.103730495 0.512381616 0.352073871
## 169 170 171 172 173 174
## 0.344343726 -0.131927871 0.306749130 -0.081223938 -0.093524154 0.029021527
## 175 176 177 178 179 180
## -0.306994643 -0.151219509 0.573781548 -0.053734253 -0.186330270 0.047137898
## 181 182 183 184 185 186
## 0.459948989 -0.079597688 0.415171724 0.030964841 -0.254856153 -0.386437237
## 187 188 189 190 191 192
## -0.187556338 -0.253072468 -0.132010592 0.078849928 0.070526509 -0.030957965
## 193 194 195 196 197 198
## -0.276191772 -0.165955096 -0.100864704 -0.021264675 0.084338190 0.057925046
## 199 200 201 202 203 204
## 0.178599092 0.339107852 -0.245594710 -0.067866528 -0.259864192 -0.071639708
## 205 206 207 208 209 210
## 0.013722996 -0.134809209 -0.021284903 -0.140777694 -0.032698999 -0.171786375
## 211 212 213 214 215 216
## 0.006588256 0.047544412 0.065678099 0.464900309 -0.172060788 0.551211070
## 217 218 219 220 221 222
## 0.260579125 -0.307817882 0.014594610 0.304262979 0.363873635 -0.344824465
## 223 224 225 226 227 228
## 0.039449225 0.579887240 0.163369165 -0.054743074 0.133921043 -0.078142903
## 229 230 231 232 233 234
## 0.069360254 -0.038778739 -0.159231975 0.315233912 -0.085085951 -0.112993367
## 235 236 237 238 239 240
## 0.004667364 -0.095362325 -0.251151576 0.472515273 -0.163553982 -0.063887538
## 241 242 243 244 245 246
## 0.366099706 -0.023823225 -0.083645282 0.332818375 -0.160261025 -0.047902975
## 247 248 249 250 251 252
## -0.198061430 0.058863952 0.136133664 -0.088090378 -0.077059367 -0.083576678
## 253 254 255 256 257 258
## -0.041934490 0.027286216 -0.203495206 0.181617636 -0.272487195 -0.173981680
## 259 260 261 262 263 264
## -0.161221470 0.158909952 -0.010562564 0.237391218 0.115946985 -0.015933848
## 265 266 267 268 269 270
## -0.158491456 0.019417068 -0.126919832 -0.081587183 0.006999875 -0.206088013
## 271 272 273 274 275 276
## 0.276357879 0.324037156 -0.279759142 0.324242966 0.328359162 0.222915926
## 277 278 279 280 281 282
## 0.298751963 0.120354910 -0.188662781 -0.310797240 0.160830844 -0.046785094
## 283 284 285 286 287 288
## -0.160604041 -0.088501997 0.069314073 0.340274108 -0.484088234 -0.154361143
## 289 290
## -0.154018126 0.007254060
rstandard(mod1)
## 1 2 3 4 5 6
## -1.757256032 -1.224783248 1.706727990 0.141815187 -0.688285451 -1.290359446
## 7 8 9 10 11 12
## -0.672035747 2.422860322 -0.391673594 -0.395750712 -0.323962646 -1.142614801
## 13 14 15 16 17 18
## -0.586840504 0.352131514 -0.635873141 -1.867329674 -0.217604375 0.448283804
## 19 20 21 22 23 24
## 1.458010991 -0.281058842 -1.288211093 -0.516288825 -1.567147970 -0.818209544
## 25 26 27 28 29 30
## -0.202463219 -0.276005020 0.163808157 -0.592377563 0.559369923 -1.114567781
## 31 32 33 34 35 36
## 2.132359930 0.129380447 -0.133943675 0.095670849 -0.530061816 -0.042910442
## 37 38 39 40 41 42
## 0.431346545 -1.111825358 0.366295828 -0.783145903 1.201739943 -0.671729196
## 43 44 45 46 47 48
## 0.454314331 0.482104149 -0.421553342 2.102338279 -1.012956381 -0.336296653
## 49 50 51 52 53 54
## -1.161479086 0.525874151 0.015338215 0.548894137 0.564300658 -0.314130755
## 55 56 57 58 59 60
## -0.836617021 -0.545977369 -0.515979750 -1.339224615 -0.036606687 0.275361155
## 61 62 63 64 65 66
## -0.092337231 -0.902678014 0.475753684 0.605317506 1.023755639 1.250595177
## 67 68 69 70 71 72
## 0.009547347 0.427934543 -2.100431027 -0.840685614 0.011700735 0.489793615
## 73 74 75 76 77 78
## -0.372754058 -0.722028548 -0.293836294 -0.478471704 -0.167830244 -0.372585653
## 79 80 81 82 83 84
## -1.639474717 -0.168494623 2.125543373 1.914510112 1.205260779 -1.542484770
## 85 86 87 88 89 90
## -1.705083646 0.550742872 1.145494104 2.156041956 0.307913000 -1.028427648
## 91 92 93 94 95 96
## -0.372376952 2.589309319 -0.228976412 -0.842841867 -1.340455647 0.515324118
## 97 98 99 100 101 102
## -0.350907061 -1.553911271 2.107398197 -0.225288176 -0.306135434 -0.081864414
## 103 104 105 106 107 108
## -0.070619349 1.339898013 0.138093209 1.492829070 -0.162013768 0.978470357
## 109 110 111 112 113 114
## -1.036875243 -0.045066168 -1.003498089 -0.456726889 -0.839987619 1.616026327
## 115 116 117 118 119 120
## 0.484872217 2.647906417 -0.742901314 -0.335045402 0.255089902 0.801646196
## 121 122 123 124 125 126
## -1.051074215 -0.881378307 -1.471015917 -0.302753014 -0.893414740 1.188327728
## 127 128 129 130 131 132
## 0.677452589 -0.105081376 -0.847771130 -0.998938917 -0.342736167 -0.531438001
## 133 134 135 136 137 138
## 1.039576482 -1.302988033 -0.307992370 0.066143636 -0.245266398 -1.158784087
## 139 140 141 142 143 144
## 2.244246263 1.082728508 0.109948026 0.365029372 0.493482452 -0.524326060
## 145 146 147 148 149 150
## 1.378652101 1.162801875 -0.378126543 -0.180533843 -0.940081602 -0.648742302
## 151 152 153 154 155 156
## -0.046148223 -0.906303190 -0.505782252 -0.409229209 2.795142968 0.498089615
## 157 158 159 160 161 162
## -0.844489921 0.466728715 1.386926452 -0.340582662 -0.980967729 0.139502794
## 163 164 165 166 167 168
## -0.378126543 -0.228738408 -1.244067319 0.466429361 2.300394395 1.578832541
## 169 170 171 172 173 174
## 1.548299015 -0.592071319 1.376472183 -0.365839984 -0.419396344 0.130123809
## 175 176 177 178 179 180
## -1.376503150 -0.679764931 2.572639054 -0.240963321 -0.835449754 0.213186882
## 181 182 183 184 185 186
## 2.062233796 -0.356889643 1.861742547 0.139195045 -1.143529805 -1.738633035
## 187 188 189 190 191 192
## -0.845247868 -1.135600689 -0.592750134 0.353666938 0.316534951 -0.138911000
## 193 194 195 196 197 198
## -1.238555874 -0.744129475 -0.452381280 -0.095417884 0.378235209 0.264514081
## 199 200 201 202 203 204
## 0.802443212 1.520457089 -1.102380916 -0.304290462 -1.165803802 -0.321204178
## 205 206 207 208 209 210
## 0.061530103 -0.604934220 -0.095544749 -0.631583916 -0.147881451 -0.770237847
## 211 212 213 214 215 216
## 0.029542002 0.213230744 0.294708535 2.086601093 -0.771466965 2.471718045
## 217 218 219 220 221 222
## 1.168334864 -1.380204810 0.065655091 1.372043354 1.632177038 -1.550903839
## 223 224 225 226 227 228
## 0.176898180 2.600130853 0.733418194 -0.246063114 0.601224033 -0.351894252
## 229 230 231 232 233 234
## 0.311285591 -0.173958351 -0.714050616 1.413550550 -0.381512946 -0.507561814
## 235 236 237 238 239 240
## 0.020929201 -0.428849687 -1.127063927 2.120148713 -0.733384415 -0.286457512
## 241 242 243 244 245 246
## 1.647290201 -0.106927016 -0.375048196 1.495354445 -0.718653086 -0.214838205
## 247 248 249 250 251 252
## -0.888167291 0.264078000 0.610564401 -0.396363820 -0.345505067 -0.374740371
## 253 254 255 256 257 258
## -0.188097914 0.122876419 -0.919057318 0.816156589 -1.222029728 -0.780072086
## 259 260 261 262 263 264
## -0.722949176 0.713256368 -0.047379481 1.064663888 0.521045148 -0.071542711
## 265 266 267 268 269 270
## -0.712816678 0.087058539 -0.569717156 -0.365814265 0.031387555 -0.924308077
## 271 272 273 274 275 276
## 1.239173509 1.455180954 -1.254481664 1.456120958 1.474936624 1.000150785
## 277 278 279 280 281 282
## 1.344990193 0.539654082 -0.845922544 -1.402076420 0.721938958 -0.210188669
## 283 284 285 286 287 288
## -0.720187353 -0.398203103 0.310983740 1.525698076 -2.179862277 -0.692272004
## 289 290
## -0.690738677 0.032615576
There appears to be a few standardized residual values, from my model in question 1, that is greater than 2.5 or less than -2.5. It is at index 92, 116, 155, 177, and 224.
#Question 6
newdata=data.frame(weight_whole = 1.23)
predict.lm(mod1, newdata, interval = "confidence", level = 0.95)
## fit lwr upr
## 1 2.40002 2.372435 2.427604
predict.lm(mod1, newdata, interval = "prediction", level = 0.95)
## fit lwr upr
## 1 2.40002 1.959411 2.840628
#Question 7 Using subsets, the best model uses variables weight_shell, weight_shucked, weight_whole, and height.
#Question 8
library(mosaic)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: lattice
## Loading required package: ggformula
## Loading required package: ggplot2
## Loading required package: ggstance
##
## Attaching package: 'ggstance'
## The following objects are masked from 'package:ggplot2':
##
## geom_errorbarh, GeomErrorbarh
##
## New to ggformula? Try the tutorials:
## learnr::run_tutorial("introduction", package = "ggformula")
## learnr::run_tutorial("refining", package = "ggformula")
## Loading required package: mosaicData
## Loading required package: Matrix
## Registered S3 method overwritten by 'mosaic':
## method from
## fortify.SpatialPolygonsDataFrame ggplot2
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.
##
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
##
## mean
## The following object is masked from 'package:ggplot2':
##
## stat
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
library(Stat2Data)
library(readr)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following objects are masked from 'package:mosaic':
##
## deltaMethod, logit
## The following object is masked from 'package:dplyr':
##
## recode
library(corrplot)
## corrplot 0.84 loaded
mod3 = sample_n(subset(abalone, sex == "F"), 135)
mod4=sample_n(subset(abalone, sex == "M"), 135)
summary(mod3)
## sex length diameter height
## Length:135 Min. :0.3500 Min. :0.2650 Min. :0.0900
## Class :character 1st Qu.:0.5250 1st Qu.:0.4050 1st Qu.:0.1350
## Mode :character Median :0.5850 Median :0.4650 Median :0.1550
## Mean :0.5764 Mean :0.4539 Mean :0.1561
## 3rd Qu.:0.6400 3rd Qu.:0.5050 3rd Qu.:0.1800
## Max. :0.7800 Max. :0.6300 Max. :0.2500
## weight_whole weight_shucked weight_viscera weight_shell
## Min. :0.1855 Min. :0.0745 Min. :0.0415 Min. :0.0600
## 1st Qu.:0.7070 1st Qu.:0.2913 1st Qu.:0.1578 1st Qu.:0.2000
## Median :1.0645 Median :0.4410 Median :0.2295 Median :0.2985
## Mean :1.0449 Mean :0.4489 Mean :0.2277 Mean :0.3004
## 3rd Qu.:1.2797 3rd Qu.:0.5563 3rd Qu.:0.2720 3rd Qu.:0.3750
## Max. :2.6570 Max. :1.4880 Max. :0.5185 Max. :0.6240
## rings
## Min. : 7.00
## 1st Qu.: 9.00
## Median :11.00
## Mean :11.13
## 3rd Qu.:12.00
## Max. :19.00
summary(mod4)
## sex length diameter height
## Length:135 Min. :0.3500 Min. :0.2550 Min. :0.0800
## Class :character 1st Qu.:0.5350 1st Qu.:0.4200 1st Qu.:0.1400
## Mode :character Median :0.5850 Median :0.4650 Median :0.1550
## Mean :0.5853 Mean :0.4603 Mean :0.1611
## 3rd Qu.:0.6350 3rd Qu.:0.5100 3rd Qu.:0.1800
## Max. :0.7750 Max. :0.6300 Max. :0.5150
## weight_whole weight_shucked weight_viscera weight_shell
## Min. :0.2145 Min. :0.1000 Min. :0.0465 Min. :0.0600
## 1st Qu.:0.8145 1st Qu.:0.3197 1st Qu.:0.1740 1st Qu.:0.2300
## Median :1.0060 Median :0.4460 Median :0.2145 Median :0.2900
## Mean :1.0839 Mean :0.4712 Mean :0.2365 Mean :0.3093
## 3rd Qu.:1.3365 3rd Qu.:0.5777 3rd Qu.:0.2908 3rd Qu.:0.3745
## Max. :2.7795 Max. :1.3510 Max. :0.7600 Max. :0.7250
## rings
## Min. : 7.00
## 1st Qu.: 9.00
## Median :10.00
## Mean :11.13
## 3rd Qu.:12.00
## Max. :19.00