Hw4_AlexMatteson

Problem 1 a.)

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

set.seed(1)
x<-rnorm(100)
y<-2*x+rnorm(100)
m1 <- lm(y~x+0)
summary(m1)

## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9154 -0.6472 -0.1771  0.5056  2.3109 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   1.9939     0.1065   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

Coeficient of Bhat is 1.994. The standard error for this is 0.1065. The P-value would be 2.210^-16. The t-statistic is (comeback to this alex) both A and b said “without” and intercept so I’m guessing one was suposed to bwe with an intercept. B will be with the intercept. b.)

m2 <- lm(y~x)
summary(m2)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8768 -0.6138 -0.1395  0.5394  2.3462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.03769    0.09699  -0.389    0.698    
## x            1.99894    0.10773  18.556   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

intercept = -.037. coeficient of x is 1.994. TRhe t value is 18.556. The standard error for the coefficient is .10773 and the p-value is 2.2e-16 c.) These two seem tobe the same except one has an intercept. Other than that all the values seem the same as far as I can tell. d.)

m3 <- lm(x~y)
summary(m3)

## 
## Call:
## lm(formula = x ~ y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90848 -0.28101  0.06274  0.24570  0.85736 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03880    0.04266    0.91    0.365    
## y            0.38942    0.02099   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

If we look at this summary (for y onto x) we see the t statistics is 18.5. If we look at the summary in part b we see the t statistic is also 18.5. Those are the same.

Problem 2 a.)b.)

set.seed(1)
x <- rnorm(100, mean = 0, sd = 1)
eps <- rnorm(100, mean = 0, sd = .5)

c.)

y = -1 + .5 * x + eps
y

##   [1] -1.62341024 -0.88712040 -1.87327513 -0.12334521 -1.16253844 -0.52659056
##   [7] -0.39793174 -0.17575053 -0.52001665 -0.31160615 -0.56197764 -1.03590075
##  [13] -0.59447917 -2.43269812 -0.54122491 -1.21887077 -1.16809157 -0.66763855
##  [19] -0.34229524 -0.79171458 -0.79349005  0.06258756 -1.07000721 -2.08445411
##  [25] -0.74018250 -0.67173122 -1.11467996 -1.75419328 -1.57990527 -0.95316436
##  [31] -0.29058000 -1.34584111 -0.54041610 -1.78609956 -1.53525085 -1.97572219
##  [37] -1.34763304 -1.29379665 -0.77603470 -0.64686051 -2.03944151 -0.53838918
##  [43] -1.48400453 -0.95343360 -1.90233790 -1.72915708  0.22587425 -0.60703573
##  [49] -1.69932337 -1.37974890 -0.57585351 -1.31529311 -0.98847434 -2.02936262
##  [55] -1.02721830 -0.54739620 -0.68359634 -1.83270066 -1.40735361 -0.13288199
##  [61]  0.41335907 -1.13894355 -0.12588879 -0.54278759 -1.68125813  0.19744738
##  [67] -2.02999283 -0.97946989 -0.99557313  0.19007500  0.39174396 -1.30207203
##  [73] -0.46613742 -1.50562528 -1.79381712 -0.87163990 -0.82782613  0.03817518
##  [79] -0.44913312 -0.69080627 -1.89999608 -0.57564152 -0.30099410 -2.49540841
##  [85] -0.44251553 -0.91290212  0.26384357 -1.53513296 -1.03009647 -1.32950535
##  [91] -1.35981200 -0.19506021 -0.78567278 -0.23470659 -0.81062467 -1.24474899
##  [97] -0.91771725 -1.79455644 -1.40631895 -1.42723834

There 100 observations in y.B0 is -1 and B1 is .5 for this model d.)

plot(x,y)

The scatter plot has a positive upward sloping trend. It is somewhat linearly corrolated. e.)

plot(x,y)

m4 <- lm(y~x)
abline(m4)

summary(m4)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.93842 -0.30688 -0.06975  0.26970  1.17309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.01885    0.04849 -21.010  < 2e-16 ***
## x            0.49947    0.05386   9.273 4.58e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4814 on 98 degrees of freedom
## Multiple R-squared:  0.4674, Adjusted R-squared:  0.4619 
## F-statistic: 85.99 on 1 and 98 DF,  p-value: 4.583e-15

The Bhats (slope:.499 and intercept:-1.0188) these are very close to the B1 and B0. f.)

plot(x,y)
m4 <- lm(y~x)
abline(m4)

curve(x*.5 - 1, from=-3, to=3, color= "red")

## Warning in plot.window(...): "color" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "color" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "color" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "color" is not a
## graphical parameter

## Warning in box(...): "color" is not a graphical parameter

## Warning in title(...): "color" is not a graphical parameter

abline(m4)

So like I couldn’t get the scatter plot and the least squared line and the pop regresion line all together so there are two graphs one with scatter plot and one with the two lines g.)

set.seed(1)
x2 <- rnorm(100, mean = 0, sd = 1)
eps2 <- rnorm(100, mean = 0, sd = 100)

y2 = -1 + .5 * x2 + eps2
y2

##   [1]  -63.349895    3.303409  -92.509979   15.600518  -66.293711  175.318493
##   [7]   70.914462   90.386585   37.706426  167.064914  -63.817755  -46.969551
##  [13]  141.917604  -67.176985  -21.175609  -40.303260  -33.007382  -28.439412
##  [19]   48.829444  -18.436098  -51.136258  133.694951  -22.420658  -19.950329
##  [25]  -10.709161   70.238566   -8.434338   -5.498793  -69.405123  -33.218056
##  [31]    5.695384  -59.940842   52.343455 -152.866311   28.967256 -154.852480
##  [37]  -31.294758  -53.857647  -65.659465   -6.308090 -192.518204  116.531650
##  [43] -167.148762  -47.074709 -112.936388  -76.435648  207.898946    1.123828
##  [49] -129.686226 -164.620000   44.217763   -3.161996  -32.636278  -94.500896
##  [55] -149.029519 -107.529030   98.819270  -63.648737 -139.157825  185.861535
##  [61]   42.710847  -24.884330  105.193175   87.656266  -63.295941  219.704643
##  [67]  -27.405182 -142.716688  -15.363334   20.840140  230.035595    9.225264
##  [73]   45.005244   -9.182342  -35.026901   -4.326880   77.542315  206.525054
##  [79]  101.776415  119.496079 -124.416677   97.321968   21.581524 -148.486786
##  [85]   51.399247  -16.708985  145.990281  -77.760292  -43.836166  -93.477400
##  [91]  -18.981656   39.805112  -73.594616   82.387424 -121.014862 -105.519198
##  [97]  142.477475 -102.871379   39.585165  -39.344305

plot(x2,y2)
m5 <- lm(y2~x2)
abline(m5)

summary(m5)

## 
## Call:
## lm(formula = y2 ~ x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -187.68  -61.38  -13.95   53.94  234.62 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -4.769      9.699  -0.492    0.624
## x2             0.394     10.773   0.037    0.971
## 
## Residual standard error: 96.28 on 98 degrees of freedom
## Multiple R-squared:  1.365e-05,  Adjusted R-squared:  -0.01019 
## F-statistic: 0.001337 on 1 and 98 DF,  p-value: 0.9709

The intercept is -4.769. The coefficient is .394. The p-value is 0.9709. There is much less of a linear relationship between x and y. Also there is a much larger spread of the points in the set. h.)

set.seed(1)
x3 <- rnorm(100, mean = 0, sd = 1)
eps3 <- rnorm(100, mean = 0, sd = .000001)

y3 = -1 + .5 * x3 + eps3
y3

##   [1] -1.313227526 -0.908178296 -1.417815217 -0.202359441 -0.835246769
##   [6] -1.410232425 -0.756284757 -0.630836737 -0.712108940 -1.152692511
##  [11] -0.244110052 -0.805078843 -1.310618858 -2.107350594 -0.437534748
##  [16] -1.022467197 -1.008095452 -0.528082174 -0.589388908 -0.703049517
##  [21] -0.540511820 -0.608930507 -0.962717723 -1.994676027 -0.690087226
##  [26] -1.028063657 -1.077897827 -1.735376230 -1.239075709 -0.791029544
##  [31] -0.320660164 -1.051394453 -0.806163663 -1.026904039 -1.688529472
##  [36] -1.207498818 -1.197145278 -1.029657227 -0.449987966 -0.618412183
##  [41] -1.082263712 -1.126679663 -0.651519977 -0.721668864 -1.344378963
##  [46] -1.353748329 -0.817706932 -0.615733520 -1.056174392 -0.559447777
##  [51] -0.800946610 -1.306013215 -0.829440472 -1.564682477 -0.283489637
##  [56] -0.009801126 -1.183609738 -1.522067934 -0.715141571 -1.067525433
##  [61]  0.200809305 -1.019620240 -0.655129260 -0.985998034 -1.371637224
##  [66] -0.905601644 -1.902479569 -0.267223994 -0.923373475  0.086306043
##  [71] -0.762242928 -1.354973110 -0.694636366 -1.467048893 -1.626817034
##  [76] -0.854276917 -1.221645149 -0.999445249 -0.962828311 -1.294759265
##  [81] -1.284335598 -1.067588324 -0.410956282 -1.761784867 -0.703026385
##  [86] -0.833524973 -0.468448617 -1.152092728 -0.814991025 -0.866451531
##  [91] -1.271260193 -0.396065695 -0.419799424 -0.649892345 -0.206584481
##  [96] -0.720757835 -1.638294663 -1.286633723 -1.612305895 -1.236700699

plot(x3,y3)
m6 <- lm(y3~x3)
abline(m6)

summary(m6)

## 
## Call:
## lm(formula = y3 ~ x3)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.877e-06 -6.138e-07 -1.395e-07  5.394e-07  2.346e-06 
## 
## Coefficients:
##               Estimate Std. Error   t value Pr(>|t|)    
## (Intercept) -1.000e+00  9.699e-08 -10310629   <2e-16 ***
## x3           5.000e-01  1.077e-07   4641361   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.628e-07 on 98 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.154e+13 on 1 and 98 DF,  p-value: < 2.2e-16

So here the intercept is -1.000e00 and the coeficient is 5*10^-1. This data seems to be very linearly realted. Also the spread here is much smaller all points are with in 0 and -2 on the y axis. i.)

confint(m4)

##                  2.5 %     97.5 %
## (Intercept) -1.1150804 -0.9226122
## x            0.3925794  0.6063602

confint(m6)

##                  2.5 %     97.5 %
## (Intercept) -1.0000002 -0.9999998
## x3           0.4999998  0.5000002

confint(m5)

##                 2.5 %   97.5 %
## (Intercept) -24.01607 14.47755
## x2          -20.98412 21.77204

The more noise the larger the confidence interval and a smaller confidence for less noise.

Hw4_AlexMatteson_stats239

Alex Matteson

2/25/2020