Question 1

Part a.

y=25-.5x. If x=7, then:

25-.5*7
## [1] 21.5

y=21.5 if x=7.

Part b.

The point is below the line. If x=3, then y=23.5, which is below 30. 6.5 residual.

Part c.

25-.5*3
## [1] 23.5
25-.5*6
## [1] 22
25-.5*9
## [1] 20.5

For every increase by 3 units, y decreases by 1.5.

Part d.

No, the test score wouldn't necessarily be 22 because y hat is a prediction variable and the output is the best estimate for a given data point.

Part e.

Finding variance given SSE = 7, n=16.

7/16
## [1] 0.4375

The variance is .4375.

Question 2

Part a.

Finding mean and variance of 7-day and 28-day treatments.

sevenday <- c(2300, 3390, 2430, 2890, 3330, 2480, 3380, 2660, 2620, 3340)
month <- c(4070, 5220, 4640, 4620, 4850, 4120, 5020, 4890, 4190, 4630)

mean(sevenday)
## [1] 2882
mean(month)
## [1] 4625
var(sevenday)
## [1] 193240
var(month)
## [1] 153716.7

As we can see from the code, the mean of the 7-day treatment is 2,882 and the mean for the 28-day treatment is 4,625. The variance for the 7-day treatment is 193240 and the variance for the 28-day is 153,716.67.

Part b.

Finding the correlation coefficient of the two variables.

cor(sevenday, month)
## [1] 0.7584091

From the code, we can see the coefficient of correlation is .7584091.

Part c.

Plotting 7-day strength against 28-day strength via scatterplot.

plot(sevenday, month, main="Plotting 7-day against 28-day strength",
     xlab="7-day strength ", ylab="28-day strength ", pch=19)

Part d.

Creating the line of best fit

Part e.

Finding the intercept and slope.

lm2 <- lm(formula = month~sevenday)

From this code we can determine that the intercept is 2675.5619 and the slope of the 7-day treatment is .6764.

Part f.

For every 1 unit of increase in strength by the 7-day treatment, the 28-day treatment increases by 1.6764

Part g.

Finding standard deviation.

sigma(lm2)
## [1] 271.0423

The standard deviation of this model is 271.0423.

Part h.

Making a historam and superimposing a density plot.

Question 3

Part a.

Computing the mean and variance.

righthumerus <- c(24.8, 24.59, 24.59, 24.29, 23.81, 24.87, 25.9, 26.11, 26.63, 26.31, 26.84)
righttibia <- c(36.05, 35.57, 35.57, 34.58, 34.2, 34.73, 37.38, 37.96, 37.46, 37.75, 38.5)

mean(righthumerus) #this is 25.34
## [1] 25.34
mean(righttibia) #this is 36.34091
## [1] 36.34091
var(righthumerus) #this is 1.08424
## [1] 1.08424
var(righttibia) #this is 2.315329
## [1] 2.315329

The mean of the right humerus is 25.34 and the variance is 1.08424. The mean of the right tibia is 36.34091 and the variance is 2.315329.

Part b.

Computing the correlation coefficient

cor(righthumerus, righttibia)
## [1] 0.9513161

The correlation is .9513161. This is a strong, positive relationship, very close to 1.

Part c.

Fitting a simple linear regression on the data.

plot(righthumerus, righttibia, main="Plotting right tibia length against right humerus length",
     xlab="Right Humerus length ", ylab="Right Tibia length ", pch=19)

abline(lm(righttibia~righthumerus), col="red")

lm1 <- lm(formula = righttibia~righthumerus)

Part d.

The intercept of the data is 1.114 and the slope is 1.390.

Part e.

For every increase by 1 unit of the right humerus, the length of the right tibia is expected to increase by 2.504 units.

Part f.

The standard deviation around the linear regression is .4943579.

sigma(lm1)
## [1] 0.4943579

Part g.

Histogram of residuals with density curve

library(ggplot2)
ggplot(data = lm1, aes(x = lm1$residuals)) +
  geom_histogram(fill = 'steelblue', color = 'black') + geom_density() +
  labs(title = 'Tristan Tucker Histogram of Residuals', x = 'Residuals', y = 'Frequency')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Question 4

Part a.

Plotting the data with a scatterplot

HW1.ST430.119 <- read.table("~/HW1 ST430/HW1 ST430 119.txt", quote="\"", comment.char="")
plot(HW1.ST430.119, main="Tristan Tucker - Plotting GPA Data",
     xlab="V1 ", ylab="V2 ", pch=19)

Part b.

Plotting data with linear regression line on plot.

Part c.

Running a linear regression model.

lm3 <- lm(formula = HW1.ST430.119$V2~HW1.ST430.119$V1)

The intercept is 18.98 and the slope is 1.87. y=1.87x+18.98.

Part d.

When increased by one unit (from 1 to 2) the point estimate increases by 1.87 units. When increased by four units (from 1 to 5) the point estimate increases by 7.48 units.

1.87*1+18.98
## [1] 20.85
1.87*2+18.98
## [1] 22.72
1.87*5+18.98
## [1] 28.33

Part e.

The GPA of a student that scores a 20 on the ACT is estimated to be .545.

Part f.

E1 is -5.26420719, E2 is -12.24176295, E3 is 1.95836484.

resid(lm3)
##            1            2            3            4            5            6 
##  -5.26420719 -12.24176295   1.95836484  -1.72613786  -3.63887023   4.79564411 
##            7            8            9           10           11           12 
##   7.48457308   0.61609020   9.08938274   1.08057678  -1.16630985   4.40724962 
##           13           14           15           16           17           18 
##  -0.74173966  -0.61081494   7.95526312   2.48270273  -0.56282473   6.38918506 
##           19           20           21           22           23           24 
##   0.51262838  -2.93684879   0.23084402  -4.25421645   2.04627144   0.68342292 
##           25           26           27           28           29           30 
##   2.37358326   1.22459398   4.49830453  -1.79908163  -0.21557801  -3.69872154 
##           31           32           33           34           35           36 
##  -1.32031779  -7.41752957   3.54629474   0.71211719  -2.31903983  -0.72116578 
##           37           38           39           40           41           42 
##  -4.29910492   4.87109722   1.38667573   0.20525147   1.45464743   6.06064188 
##           43           44           45           46           47           48 
##   0.27386214   2.81626458   6.15479852   0.12984494   7.94157825  -7.71614711 
##           49           50           51           52           53           54 
##  -5.49672339  -8.09400505   2.65477522  -9.24486468   1.84304192   1.50327661 
##           55           56           57           58           59           60 
##  -1.60958356   4.12108557  -3.88575686  -3.07899564   4.80937556   2.66284903 
##           61           62           63           64           65           66 
##   0.55938721  -1.05532003  -0.27853104  -2.71993440   6.08308611  -2.56154677 
##           67           68           69           70           71           72 
##   0.05877152   6.07806745  -4.35329858  -2.55160261  -4.65944412  -1.52290833 
##           73           74           75           76           77           78 
##  -5.61765737  -6.02539437   2.69277468  -4.27602171  -2.00171876   1.25638999 
##           79           80           81           82           83           84 
##   1.54314642   8.45774916  -5.19062444  -4.72116578   3.13609498   5.77694058 
##           85           86           87           88           89           90 
##   0.02446618   2.56312792   1.76822779   2.62544197  -7.09774575  -2.66254585 
##           91           92           93           94           95           96 
##  -2.24363331   3.94157825  -0.38888188  -6.63138882   3.01073473  -2.14450459 
##           97           98           99          100          101          102 
##   2.20214974  -3.35270619  -5.11330098  -5.29782697   2.58123905  -3.93620981 
##          103          104          105          106          107          108 
##  -4.36018760   6.30812090  -0.42564997   9.07432674   0.78757030   2.25201030 
##          109          110          111          112          113          114 
##   1.93217990   0.66225664  -3.37017835   4.29128772  -3.45306628  -3.25106814 
##          115          116          117          118          119          120 
##   9.24521445  -6.24176295   2.91721707   1.70399680  -6.45429766   3.51075802

Part g.

Using the X bar of 3.07405, that student would be predicted to have a 24.7284735 ACT score.

X bar is 3.07405, Y bar is 24.725

mean(HW1.ST430.119$V1)
## [1] 3.07405
mean(HW1.ST430.119$V2)
## [1] 24.725

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.