Problem 1 a.)
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
set.seed(1)
x<-rnorm(100)
y<-2*x+rnorm(100)
m1 <- lm(y~x+0)
summary(m1)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9154 -0.6472 -0.1771 0.5056 2.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.9939 0.1065 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
Coeficient of Bhat is 1.994. The standard error for this is 0.1065. The P-value would be 2.210^-16. The t-statistic is (comeback to this alex) both A and b said “without” and intercept so I’m guessing one was suposed to bwe with an intercept. B will be with the intercept. b.)
m2 <- lm(y~x)
summary(m2)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
intercept = -.037. coeficient of x is 1.994. TRhe t value is 18.556. The standard error for the coefficient is .10773 and the p-value is 2.2e-16 c.) These two seem tobe the same except one has an intercept. Other than that all the values seem the same as far as I can tell. d.)
m3 <- lm(x~y)
summary(m3)
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90848 -0.28101 0.06274 0.24570 0.85736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03880 0.04266 0.91 0.365
## y 0.38942 0.02099 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
If we look at this summary (for y onto x) we see the t statistics is 18.5. If we look at the summary in part b we see the t statistic is also 18.5. Those are the same.
Problem 2 a.)b.)
set.seed(1)
x <- rnorm(100, mean = 0, sd = 1)
eps <- rnorm(100, mean = 0, sd = .5)
c.)
y = -1 + .5 * x + eps
y
## [1] -1.62341024 -0.88712040 -1.87327513 -0.12334521 -1.16253844 -0.52659056
## [7] -0.39793174 -0.17575053 -0.52001665 -0.31160615 -0.56197764 -1.03590075
## [13] -0.59447917 -2.43269812 -0.54122491 -1.21887077 -1.16809157 -0.66763855
## [19] -0.34229524 -0.79171458 -0.79349005 0.06258756 -1.07000721 -2.08445411
## [25] -0.74018250 -0.67173122 -1.11467996 -1.75419328 -1.57990527 -0.95316436
## [31] -0.29058000 -1.34584111 -0.54041610 -1.78609956 -1.53525085 -1.97572219
## [37] -1.34763304 -1.29379665 -0.77603470 -0.64686051 -2.03944151 -0.53838918
## [43] -1.48400453 -0.95343360 -1.90233790 -1.72915708 0.22587425 -0.60703573
## [49] -1.69932337 -1.37974890 -0.57585351 -1.31529311 -0.98847434 -2.02936262
## [55] -1.02721830 -0.54739620 -0.68359634 -1.83270066 -1.40735361 -0.13288199
## [61] 0.41335907 -1.13894355 -0.12588879 -0.54278759 -1.68125813 0.19744738
## [67] -2.02999283 -0.97946989 -0.99557313 0.19007500 0.39174396 -1.30207203
## [73] -0.46613742 -1.50562528 -1.79381712 -0.87163990 -0.82782613 0.03817518
## [79] -0.44913312 -0.69080627 -1.89999608 -0.57564152 -0.30099410 -2.49540841
## [85] -0.44251553 -0.91290212 0.26384357 -1.53513296 -1.03009647 -1.32950535
## [91] -1.35981200 -0.19506021 -0.78567278 -0.23470659 -0.81062467 -1.24474899
## [97] -0.91771725 -1.79455644 -1.40631895 -1.42723834
There 100 observations in y.B0 is -1 and B1 is .5 for this model d.)
plot(x,y)
The scatter plot has a positive upward sloping trend. It is somewhat linearly corrolated. e.)
plot(x,y)
m4 <- lm(y~x)
abline(m4)
summary(m4)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.93842 -0.30688 -0.06975 0.26970 1.17309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.01885 0.04849 -21.010 < 2e-16 ***
## x 0.49947 0.05386 9.273 4.58e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4814 on 98 degrees of freedom
## Multiple R-squared: 0.4674, Adjusted R-squared: 0.4619
## F-statistic: 85.99 on 1 and 98 DF, p-value: 4.583e-15
The Bhats (slope:.499 and intercept:-1.0188) these are very close to the B1 and B0. f.)
plot(x,y)
m4 <- lm(y~x)
abline(m4)
curve(x*.5 - 1, from=-3, to=3, color= "red")
## Warning in plot.window(...): "color" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "color" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "color" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "color" is not a
## graphical parameter
## Warning in box(...): "color" is not a graphical parameter
## Warning in title(...): "color" is not a graphical parameter
abline(m4)
So like I couldn’t get the scatter plot and the least squared line and the pop regresion line all together so there are two graphs one with scatter plot and one with the two lines g.)
set.seed(1)
x2 <- rnorm(100, mean = 0, sd = 1)
eps2 <- rnorm(100, mean = 0, sd = 100)
y2 = -1 + .5 * x2 + eps2
y2
## [1] -63.349895 3.303409 -92.509979 15.600518 -66.293711 175.318493
## [7] 70.914462 90.386585 37.706426 167.064914 -63.817755 -46.969551
## [13] 141.917604 -67.176985 -21.175609 -40.303260 -33.007382 -28.439412
## [19] 48.829444 -18.436098 -51.136258 133.694951 -22.420658 -19.950329
## [25] -10.709161 70.238566 -8.434338 -5.498793 -69.405123 -33.218056
## [31] 5.695384 -59.940842 52.343455 -152.866311 28.967256 -154.852480
## [37] -31.294758 -53.857647 -65.659465 -6.308090 -192.518204 116.531650
## [43] -167.148762 -47.074709 -112.936388 -76.435648 207.898946 1.123828
## [49] -129.686226 -164.620000 44.217763 -3.161996 -32.636278 -94.500896
## [55] -149.029519 -107.529030 98.819270 -63.648737 -139.157825 185.861535
## [61] 42.710847 -24.884330 105.193175 87.656266 -63.295941 219.704643
## [67] -27.405182 -142.716688 -15.363334 20.840140 230.035595 9.225264
## [73] 45.005244 -9.182342 -35.026901 -4.326880 77.542315 206.525054
## [79] 101.776415 119.496079 -124.416677 97.321968 21.581524 -148.486786
## [85] 51.399247 -16.708985 145.990281 -77.760292 -43.836166 -93.477400
## [91] -18.981656 39.805112 -73.594616 82.387424 -121.014862 -105.519198
## [97] 142.477475 -102.871379 39.585165 -39.344305
plot(x2,y2)
m5 <- lm(y2~x2)
abline(m5)
summary(m5)
##
## Call:
## lm(formula = y2 ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -187.68 -61.38 -13.95 53.94 234.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.769 9.699 -0.492 0.624
## x2 0.394 10.773 0.037 0.971
##
## Residual standard error: 96.28 on 98 degrees of freedom
## Multiple R-squared: 1.365e-05, Adjusted R-squared: -0.01019
## F-statistic: 0.001337 on 1 and 98 DF, p-value: 0.9709
The intercept is -4.769. The coefficient is .394. The p-value is 0.9709. There is much less of a linear relationship between x and y. Also there is a much larger spread of the points in the set. h.)
set.seed(1)
x3 <- rnorm(100, mean = 0, sd = 1)
eps3 <- rnorm(100, mean = 0, sd = .000001)
y3 = -1 + .5 * x3 + eps3
y3
## [1] -1.313227526 -0.908178296 -1.417815217 -0.202359441 -0.835246769
## [6] -1.410232425 -0.756284757 -0.630836737 -0.712108940 -1.152692511
## [11] -0.244110052 -0.805078843 -1.310618858 -2.107350594 -0.437534748
## [16] -1.022467197 -1.008095452 -0.528082174 -0.589388908 -0.703049517
## [21] -0.540511820 -0.608930507 -0.962717723 -1.994676027 -0.690087226
## [26] -1.028063657 -1.077897827 -1.735376230 -1.239075709 -0.791029544
## [31] -0.320660164 -1.051394453 -0.806163663 -1.026904039 -1.688529472
## [36] -1.207498818 -1.197145278 -1.029657227 -0.449987966 -0.618412183
## [41] -1.082263712 -1.126679663 -0.651519977 -0.721668864 -1.344378963
## [46] -1.353748329 -0.817706932 -0.615733520 -1.056174392 -0.559447777
## [51] -0.800946610 -1.306013215 -0.829440472 -1.564682477 -0.283489637
## [56] -0.009801126 -1.183609738 -1.522067934 -0.715141571 -1.067525433
## [61] 0.200809305 -1.019620240 -0.655129260 -0.985998034 -1.371637224
## [66] -0.905601644 -1.902479569 -0.267223994 -0.923373475 0.086306043
## [71] -0.762242928 -1.354973110 -0.694636366 -1.467048893 -1.626817034
## [76] -0.854276917 -1.221645149 -0.999445249 -0.962828311 -1.294759265
## [81] -1.284335598 -1.067588324 -0.410956282 -1.761784867 -0.703026385
## [86] -0.833524973 -0.468448617 -1.152092728 -0.814991025 -0.866451531
## [91] -1.271260193 -0.396065695 -0.419799424 -0.649892345 -0.206584481
## [96] -0.720757835 -1.638294663 -1.286633723 -1.612305895 -1.236700699
plot(x3,y3)
m6 <- lm(y3~x3)
abline(m6)
summary(m6)
##
## Call:
## lm(formula = y3 ~ x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.877e-06 -6.138e-07 -1.395e-07 5.394e-07 2.346e-06
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.000e+00 9.699e-08 -10310629 <2e-16 ***
## x3 5.000e-01 1.077e-07 4641361 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.628e-07 on 98 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.154e+13 on 1 and 98 DF, p-value: < 2.2e-16
So here the intercept is -1.000e00 and the coeficient is 5*10^-1. This data seems to be very linearly realted. Also the spread here is much smaller all points are with in 0 and -2 on the y axis. i.)
confint(m4)
## 2.5 % 97.5 %
## (Intercept) -1.1150804 -0.9226122
## x 0.3925794 0.6063602
confint(m6)
## 2.5 % 97.5 %
## (Intercept) -1.0000002 -0.9999998
## x3 0.4999998 0.5000002
confint(m5)
## 2.5 % 97.5 %
## (Intercept) -24.01607 14.47755
## x2 -20.98412 21.77204
The more noise the larger the confidence interval and a smaller confidence for less noise.