Chapter 14

Question 2:

Q: The formula for a regression equation based on a sample size of 25 observations is Y’ = 2X + 9.

What would be the predicted score for a person scoring 6 on X?
If someone’s predicted score was 14, what was this person’s score on X?

A: (a) With a score of 6 = 8, we can simply calculate our Y as 2(6) + 9 = 21. (b) Reversing course, if our predicted value of Y = 14, then we know our x = ((14)-9)/2 = 2.5

Question 4:

Q: What does the standard error of the estimate measure? What is the formula for the standard error of the estimate?

A: The standard errof of an estimate measures the accuracy of our predictions. The formula for the standard error of the estimate sigma = sqrt[ sums(Y - Y’)^2 / N]. Where the delta of Y - Y’ is the difference in predicted versus actual Y and N is our sample size.

Question 6:

Q: For the X,Y data below, compute:

r and determine if it is significantly different from zero.
the slope of the regression line and test if it differs significantly from zero.
the 95% confidence interval for the slope.

X <- c(2,4,4,5,6)
Y <- c(5,6,7,11,12)
XY <- cbind(X,Y)
XY

##      X  Y
## [1,] 2  5
## [2,] 4  6
## [3,] 4  7
## [4,] 5 11
## [5,] 6 12

XY.LM <- lm(Y ~ X)

## 
## Call:
## lm(formula = Y ~ X)
## 
## Coefficients:
## (Intercept)            X  
##      0.1818       1.9091

summary(XY.LM)

## 
## Call:
## lm(formula = Y ~ X)
## 
## Residuals:
##       1       2       3       4       5 
##  1.0000 -1.8182 -0.8182  1.2727  0.3636 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.1818     2.2234   0.082   0.9400  
## X             1.9091     0.5048   3.782   0.0324 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.497 on 3 degrees of freedom
## Multiple R-squared:  0.8266, Adjusted R-squared:  0.7688 
## F-statistic:  14.3 on 1 and 3 DF,  p-value: 0.0324

A: (a) We can see our “un-adjusted” R^2 value is 0.8266, meaning our R value is 0.9091755, and insinuates our predicted values are very well explained by our predictor data - and significantly different from zero. (b) The slope of our linear model is 0.1818, with a p-value of 0.0324 we know it is significant enough from zero, this is a good fit. (c) The 95% confidence interval for the slope of our line is our slope +/- two standard errors. Leaving us with [-0.183,1.8362].

Question 8:

Q: The correlation between years of education and salary in a sample of 20 people from a certain company is .4. Is this correlation statistically significant at the .05 level?

A: To find out if I am confident / statistically significant I can calculate my t-stat and compare it to my alpha. In this case alpha is 0.05 and my t-stat = [r* sqrt(n-2)] / sqrt(1-r^2).

Because my t-stat = 1.8516402 which is >> than our alpha value of 0.05, our correlation value is not statistically significant to the 95% percentile.

Question 10:

Q: Using linear regression, find the predicted post-test score for someone with a score of 43 on the pre-test.

Pre <- c(59,52,44,51,42,42,41,45,27,63,54,44,50,47,55,49,45,57,46,60,65,64,50,74,59)
Post <- c(56,63,55,50,66,48,58,36,13,50,81,56,64,50,63,57,73,63,46,60,47,73,58,85,44)
Pre.Post <- cbind(Pre,Post)
Pre.Post

##       Pre Post
##  [1,]  59   56
##  [2,]  52   63
##  [3,]  44   55
##  [4,]  51   50
##  [5,]  42   66
##  [6,]  42   48
##  [7,]  41   58
##  [8,]  45   36
##  [9,]  27   13
## [10,]  63   50
## [11,]  54   81
## [12,]  44   56
## [13,]  50   64
## [14,]  47   50
## [15,]  55   63
## [16,]  49   57
## [17,]  45   73
## [18,]  57   63
## [19,]  46   46
## [20,]  60   60
## [21,]  65   47
## [22,]  64   73
## [23,]  50   58
## [24,]  74   85
## [25,]  59   44

Pre.Post.LM <- lm(Post ~ Pre)
Pre.Post.LM

## 
## Call:
## lm(formula = Post ~ Pre)
## 
## Coefficients:
## (Intercept)          Pre  
##     16.1552       0.7869

summary(Pre.Post.LM)

## 
## Call:
## lm(formula = Post ~ Pre)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.401  -6.351   2.288   6.486  22.354 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  16.1552    13.5774   1.190  0.24624   
## Pre           0.7869     0.2596   3.032  0.00593 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.61 on 23 degrees of freedom
## Multiple R-squared:  0.2855, Adjusted R-squared:  0.2544 
## F-statistic: 9.191 on 1 and 23 DF,  p-value: 0.005933

A: The results indicate our predictor coefficient is 0.7869 (m) and our y-intercept is 16.1552 (b). Given the equation of a line y = mx + b, and our new linear model, if our x is 43, then our predicted post test score is 49.9919

Pre.Post.DF <- data.frame(Pre,Post)
ggplot(Pre.Post.DF, aes(x=Pre, y=Post)) + 
   geom_point(size = 2, shape = 23) + 
   geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

And looking at our plotted data we can see my predictive model seems to fit.

Question 12:

Q: Based on the table below, compute the regression line that predicts Y from X.

##      Mx My  sX sY    r
## [1,] 10 12 2.5  3 -0.6

A: For a linear model our equation y = bx + A can be filled in using the above data points.

#The slope (b) can be calculated as follows:
b = r * sY/sX
#The intercept (A) can be calculated as
A = My - b*Mx

This yields a linear model regression line of y = -0.72*x + 19.2

Question 14:

Q: True/false: If the slope of a simple linear regression line is statistically significant, then the correlation will also always be significant.

A: False. The values we need to determine the significance of the slope of a line are the value we’re measuring and the standard error. The values we need to determine the significance of the correlation and the sample size. We also need to know what confidence we’re measuring each against.

Question 16:

Q: True/false: If the correlation is .8, then 40% of the variance is explained.

A: False. Under general conditions are correlation is r and the amount of variation explained is r squared. In this case case that would be 0.8^2 or 0.64% of the variation is explained.

The following questions use data from the Angry Moods (AM) case study.

Question 18:

Q: Find the regression line for predicting Anger-Out from Control-Out.

What is the slope?
What is the intercept?
Is the relationship at least approximately linear?
Test to see if the slope is significantly different from 0.
What is the standard error of the estimate?

q16file <- read.csv(file = "angry_moods.csv", header = TRUE)
AI.AO.LM <- lm(q16file$Anger.Out ~ q16file$Control.Out)
AI.AO.LM

## 
## Call:
## lm(formula = q16file$Anger.Out ~ q16file$Control.Out)
## 
## Coefficients:
##         (Intercept)  q16file$Control.Out  
##             28.4948              -0.5241

summary(AI.AO.LM)

## 
## Call:
## lm(formula = q16file$Anger.Out ~ q16file$Control.Out)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.488 -2.440 -0.295  2.193 10.560 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         28.49482    2.02477   14.07  < 2e-16 ***
## q16file$Control.Out -0.52413    0.08386   -6.25 2.18e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.45 on 76 degrees of freedom
## Multiple R-squared:  0.3395, Adjusted R-squared:  0.3308 
## F-statistic: 39.07 on 1 and 76 DF,  p-value: 2.183e-08

q16file.DF <- data.frame(q16file)
Control.Out <- q16file$Control.Out
Anger.Out <- q16file$Anger.Out
ggplot(q16file.DF, aes(x=Anger.Out, y=Control.Out)) + 
   geom_point(size = 2, shape = 23) + 
   geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

The Slope is -0.5241
The Intercept is 28.4948
The relationship LOOKS marginally linear, but we with a P-value of 2.183e-08, we can feel confident in the relationship
See above. Our P-value is solid, but if I want a 95% confidence that my slope is significant I need it to be below the 0.05 threshold. It is, so i can be confident it is significantly different from 0.
My standard error is 0.08386

The following question is from the SAT and GPA (SG) case study.

Question 19:

Q: Find the regression line for predicting the overall university GPA from the high school GPA.

What is the slope?
What is the y-intercept?
If someone had a 2.2 GPA in high school, what is the best estimate of his or her college GPA?
If someone had a 4.0 GPA in high school, what is the best estimate of his or her college GPA?

q19file <- read.csv(file = "sat.csv", header = TRUE)
univ_GPA <- q19file$univ_GPA
high_GPA <- q19file$high_GPA
UGP.HGP.LM <- lm(univ_GPA ~ high_GPA)
UGP.HGP.LM

## 
## Call:
## lm(formula = univ_GPA ~ high_GPA)
## 
## Coefficients:
## (Intercept)     high_GPA  
##      1.0968       0.6748

summary(UGP.HGP.LM)

## 
## Call:
## lm(formula = univ_GPA ~ high_GPA)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.69040 -0.11922  0.03274  0.17397  0.91278 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.09682    0.16663   6.583 1.98e-09 ***
## high_GPA     0.67483    0.05342  12.632  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2814 on 103 degrees of freedom
## Multiple R-squared:  0.6077, Adjusted R-squared:  0.6039 
## F-statistic: 159.6 on 1 and 103 DF,  p-value: < 2.2e-16

q19file.DF <- data.frame(q19file)
ggplot(q19file.DF, aes(x=high_GPA, y=univ_GPA)) + 
   geom_point(size = 2, shape = 23) + 
   geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

The slope is 0.6748
The intercept is 1.0968
If someone had a 2.2 GPA in high school, the estimated university GPA would be 2.58136
If someone had a 4.0 GPA in high school, the estimated university GPA would be 3.796

MSCA 31000 5 Introduction to Statistics Assignment Four

Christopher Marasco

9/27/2020

Assignment summary

Chapter 14

Question 2:

Question 4:

Question 6:

Question 8:

Question 10:

Question 12:

Question 14:

Question 16:

The following questions use data from the Angry Moods (AM) case study.

Question 18:

The following question is from the SAT and GPA (SG) case study.

Question 19:

Chapter 15

Question 1:

Question 2:

Question 3:

Question 4:

Question 5: