Homework 2

You can complete this homework by filling the rest of the .Rmd document. When you click the Knit button in RStudio a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Problem 1

The NAEP music assessment scores for eighth-grade students are approximately \(N(150,35)\). Find z-scores by standardizing the following scores: 150, 140, 100, 180, 230.

mean = 150
sd = 35
zscore1 = (150-150)/35
zscore1

## [1] 0

zscore2 = (140-150)/35
zscore2

## [1] -0.2857143

zscore3= (100-180)/35
zscore3

## [1] -2.285714

zscore4= (180-150)/35
zscore4

## [1] 0.8571429

zscore5= (230-150)/35
zscore5

## [1] 2.285714

Problem 2

Many random number generators allow users to specify the range of the random numbers to be produced. Suppose that you specify that the outcomes are to be distributed uniformly between 0 and 4. Then the density curve of the outcomes has constant height between 0 and 4, and height 0 elsewhere.

What is the height of the density curve between 0 and 4? Draw a graph of the density curve. (Hint: You can use the following r code to generate the graph. But you need to replace the value of \(a\) by the height of the density curve between 0 and 4 (current value is not correct) and remove the argument “eval=FALSE” to show the graph.)

grid = seq(0,4)
height = dnorm(grid)
a=.25  # replace 2 by the height of the density curve between 0 and 4
plot(a,type="n",xlim=c(-1,5), ylim=c(0,a),xlab="Values",ylab="Probability")
segments(-1,0,0,0,col="red")
segments(0,0,0,a,col="red")
segments(0,a,4,a,col="red")
segments(4,a,4,0,col="red")
segments(4,0,5,0,col="red")

Use your graph from (a) and the fact that areas under the curve are proportions of outcomes to find the proportion of outcomes that are less than 1.

25 % of outcomes will be less than 1.

Find the proportion of outcomes that lie between 0.5 and 2.5.

50 % of outcomes will be between .5 and 2.5

Problem 3

Using either Table A or your calculator or software, find the proportion of observations from a standard Normal distribution for each of the following events.

\(Z>1.6\) 5.48 % Proportion of observations is .0548
\(-1.6 \leq Z<1.8\) .9641- .0548 = .9093 Proportion of observations is .9093

Find the value \(z\) of a standard Normal variable \(Z\) that satisfies each of the following conditions. (If you use Table A, report the value of \(z\) that comes closest to satisfying the condition.)

30% of the observations fall below \(z\).

-.52

62% of the observations fall above \(z\).

.31

Problem 4

Osteoporosis is a condition where bones become weak. Exercise is one way to produce strong bones and to prevent osteoporosis. Since we use our dominant arm (the right arm for most people) more than our nondominant arm, we expect the bone in our dominant arm to be stronger than the bone in our nondominant arm. By comparing the strengths, we can get an idea of the effect that exercise can have on bone strength. Here are some data on the strength of bones,measured in \(cm^4/1000\), for the arms of 15 young men:

bonestrength <- data.frame(Nondominant=c(15.7, 25.2, 17.9, 19.1, 12.0, 20.0, 12.3, 14.4, 15.9, 13.7, 17.7, 15.5, 14.4, 14.1, 12.3),Dominant=c(16.3, 26.9, 18.7, 22.0, 14.8, 19.8, 13.1, 17.5, 20.1, 18.7, 18.7, 15.2, 16.2, 15.0, 12.9))
print(bonestrength)

##    Nondominant Dominant
## 1         15.7     16.3
## 2         25.2     26.9
## 3         17.9     18.7
## 4         19.1     22.0
## 5         12.0     14.8
## 6         20.0     19.8
## 7         12.3     13.1
## 8         14.4     17.5
## 9         15.9     20.1
## 10        13.7     18.7
## 11        17.7     18.7
## 12        15.5     15.2
## 13        14.4     16.2
## 14        14.1     15.0
## 15        12.3     12.9

Make a scatterplot of the data with the nondominant arm strength on the \(x\) axis and the dominant arm strength on the \(y\) axis.

plot(bonestrength, xlab = "NonDominant Hand", ylab = "Dominant Hand")

Describe the overall pattern in the scatterplot and any striking deviations from the pattern. The overall pattern seems to be as the strength is the NonDominant Hand increases, so does the strength in the Dominant hand. These two variables seem to have a positive association. There seems to be a moderate linear relationship between the two variables. There is one clear outlier near the limits of the graph that is far beyond any of the rest of the data.
Describe the form, direction, and strength of the relationship.

The form is a linear relationship.

The direction is positive.

The strength is moderate.

What is the correlation between the nondominant arm strength and the dominant arm strength. Is correlation a good summary of the relationship between them?

We know that their is a postive correlation by looking at the scatterplot. I would rate the strength of the correlation as moderate. I think that correlation is misleading because correlation is not causation. We cannot be certain that strength in the nondominant hand leads to a stronger dominant hand. Other factors can be at play!

Problem 5

The following 20 observations on \(Y\) and \(X\) were generated by a computer program.

##        Y     X
## 1  25.66 22.06
## 2  19.53 19.88
## 3  20.59 18.83
## 4  20.50 22.09
## 5  22.65 17.19
## 6  21.88 20.72
## 7  18.25 18.10
## 8  16.96 18.01
## 9  19.52 18.69
## 10 20.52 18.05
## 11 16.80 17.75
## 12 21.35 19.96
## 13 21.04 17.87
## 14 22.73 20.20
## 15 22.02 20.65
## 16 19.12 20.32
## 17 25.19 21.37
## 18 16.72 17.31
## 19 23.59 23.50
## 20 19.76 22.02

Make a scatterplot and describe the relationship between \(Y\) and \(X\).

plot(gendata$X, gendata$Y, xlab = "X", ylab = "Y")

I think that the relationship between X and Y is a very weak linear relationship. The direction is positive. The strength is very weak.

Find the equation of the least-squares regression line and add the line to your plot.

You would use this these types of calculations to find LSRL.

xbar = mean(logbody) sx = sd(logbody) ybar = mean(logbrain) sy = sd(logbrain) r = cor(logbody, logbrain)

slope = r*sy/sx

intercept = ybar - slope*xbar

plot(gendata$X, gendata$Y, xlab = "X", ylab = "Y")
cor(gendata$X,gendata$Y)

## [1] 0.5967597

line <- lm(gendata$X ~ gendata$Y)
summary(line)

## 
## Call:
## lm(formula = gendata$X ~ gendata$Y)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3838 -0.5777 -0.1679  0.5304  2.7113 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  10.6585     2.8949   3.682  0.00171 **
## gendata$Y     0.4378     0.1387   3.155  0.00547 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.532 on 18 degrees of freedom
## Multiple R-squared:  0.3561, Adjusted R-squared:  0.3204 
## F-statistic: 9.956 on 1 and 18 DF,  p-value: 0.005475

abline(line, col = "red", lwd = 3)

What percent of the variability in \(Y\) is explained by \(X\)?

cor(gendata$X,gendata$Y)^2

## [1] 0.3561222

35.6 % of the variability in Y is explained by the liner relationship between X and Y.

If we get a new value \(x^*=21\), what is your predicted value of \(y\) based on the least-squares regression model in part (b)?

If we were given a new x value of 21, the predicted y value (y-hat) would be 19.85.