1a). Based on the scatter plot, is there a positive or negative association between height of athlete and length of jump?
Answer: There is a positive association since the data trend is upwards; meaning as height increase, jump distance also increases.
1b.) Based on the scatter plot, state why using a linear regression equation is justified to asses the relationship between height and length of jump.
Answer: A linear regression equation is justified since we are able to estimate the average value of the length of the jump(Y) at a given height (X). This allows us to assess a relationship between X and Y, which is the goal of a simple linear regression.
1c.) Which of the following (pick one only) could be a possible value of the correlation coefficient between distance and height? Explain in a sentence or two. Choices are: -1, -0.7, 0, 0.1, 0.7, 1
Answer: Since, there trend is upwards and since it is not perfectly linear but close, I estimate that 0.7 is a possible correlation coefficient.
Now say a simple linear regression model is fit to the data. Fitting the simple linear regression model, the estimated regression equation is:\[ \hat{Y}= 6.4285 + 1.0534X \]
1d.) What type of variable is the response (categorical or quantitative)? How about the explanatory variable?
Answer: The response variable is the distance, which is quantitative. The explanatory variable is height, which is also quantitative.
1e.) What is the predicted length of the jump for an athlete who is 72 inches tall?
Answer: Based on the regression equation, the predicted length of jump for an athlete who is 72 inches tall is 82 inches
1f.) Interpret what the 1.0534 represents.
Answer: 1.0534 is the slope of the regression equation, this is the value that Y increases when X increases by 1.
1g.) Does the intercept of 6.4285 inches have any useful interpretation to the coach?
Answer: No, as it is not possible for an athlete to have a height of 0 inches
1h.) Can the coach conclude the taller the athlete is will cause them to jump farther?
Answer: No, since this is an observational study no casual conclusions can be drawn.
1i.) The original units of measurement were Y=distance length in inches and X=height in inches. Now say the response variable is recorded in feet NOT inches (there are 12 inches in one feet). What will happen to the intercept estimate of 6.4285? Will it stay the same, increase, or decrease? Explain in a sentence or two.
Answer: Since we are changing the unit of the response variable to be measured in feet instead of inches, the intercept estimate will now dictate the same value but in the unit of feet. Therefore, since our original estimate was 6.4285 inches, our new intercept will be \(6.4285/12 = 0.5357083\) feet. Thus our intercept estimate will decrease to account for the change in units.
1j.) The original units of measurement were Y=distance length in inches and X=height in inches. Now say the explanatory variable is recorded in feet NOT inches (there are 12 inches in one feet). What will happen to the intercept estimate of 6.4285? Will it stay the same, increase, or decrease? Explain in a sentence or two.
Answer: Since we are changing the units of our explanatory variable, our original intercept estimate will not change. This is because the intercept estimate is the value at which we estimate the distance jumped when the athlete’s height is 0. Since 0 is the same in both inches and feet, our original intercept estimate will stay the same.
1k.) Continue with the simulation in part j.). Let \(\rho_1\) be the correlation coefficient between distance in inches and height in inches. Let \(\rho_2\) be the correlation coefficient between distance in inches and height in feet. Is \(\rho_1\) equal to, less than, or greater than \(\rho_2\). Explain in a sentence.
Answer: Since both map the same relationship just in different units,\(\rho_1=\rho_2\). The only thing that changes is the units of the height variable, since we are not changing the actual data at all, the relationship between height and distance will be the same.
2.) Suppose researchers want to investigate if experiencing mother nature more often reduces a persons blood pressure. They define the response variable to be blood pressure (systolic), and the explanatory variable to be how many hours, on average, someone spends outdoors each day.
2a.) Could the researchers conduct a randomized experiment to test this? Explain briefly.
Answer: Yes, researchers could randomly assign individuals a certain number of hours to spend outdoors, then measure their blood pressure before and after the experiment.
2b.) Say an observational study was conducted, and it was found that the people who spend more time outdoors tended to have lower blood pressure. Could this be used to conclude that spending time outdoors causes blood pressure to decrease? Explain briefly.
Answer: No, since this is an observational experiment, we cannot conclude a casual relationship.
2c.) In a few sentences, discuss why the number of miles (on average) someone jogs each day could be a possible confounder.
Answer: Someone may be outside for a longer period of time if they are jogging a larger number of miles. This large amount of miles jogged can then relate to a lower blood pressure level since they are receiving more exercise.
For questions 3-6 refer to theMedGPA.txt data set on the class website. The MCAT (medical college admission test) is a test that is taken by students who want to attend medical school in the USA. The description of each variable in this data set is as follows:
•Accept Status: A= accepted to med school and D=denied
•Acceptance: 1=accepted and 0=denied
•Sex: F=female and M=male
•BCPM: Bio/Chem/Physics/Math grade point average
•GPA: College grade point average
•VR: Verbal reasoning subscore on the MCAT
•PS: Physical science subscore
•WS: Writing sample subscore
•BS: Biological science subscore
•MCAT: Score on the MCAT (sum of VR, PS, WS, and BS)
•Apps: Number of medical schools applied to
3.) Specify which of the variables listed are quantitative and which are categorical.
Answer: Accept Status: categorical,Acceptance: categorical, Sex: categorical, BCPM: quantitative, GPA: quantitative, VR: quantitative, PS: quantitative, WS: quantitative, BS: quantitative, MCAT: quantitative, Apps: quantitative
4.) In each part below, it will be stated what the goal of the study is. In each study, state what is the response variable and what is the explanatory variable (from the list above).
4a.) Do verbal reasoning scores differ on average for males and females?
Answer: response: VR , explanatory: Sex
4b.) Are equal proportions of males and females accepted into medical school?
Answer: response: Sex , explanatory: Acceptance
4c.) Is GPA a good predictor of MCAT scores?
Answer: response: MCAT , explanatory: GPA
4d.) s BCPM a good predictor of whether or not someone gets accepted into med school?
Answer: response: Acceptance, explanatory: BCPM
mcat = read.table("/Users/admin/Downloads/Stats 110 Data/MedGPA.txt", fill=TRUE, header=TRUE)
5a.) Compute the mean and five number summary for GPA separately for those who were admitted and for those who were denied admission to med school.
Answer:
summary(mcat[mcat$Acceptance=="1",]$GPA)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.140 3.545 3.715 3.693 3.888 3.970
mean(mcat[mcat$Acceptance=="1",]$GPA)
## [1] 3.693333
summary(mcat[mcat$Acceptance=="0",]$GPA)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.290 3.380 3.385 3.610 3.770
mean(mcat[mcat$Acceptance=="0",]$GPA)
## [1] 3.3852
5b.) In a couple of sentences compare the GPAs for the two groups.
Answer: The mean GPA of those who were admitted is 3.693333 while the mean GPA of those who were denied is 3.3852. From this, we see that those who had a higher GPA on average were more likely to be accepted into Med school.
6a.) Find the estimated regression equation for predicting MCAT scores based on GPA. Write the equation using proper notation and also show the output from R.
Answer: \(Y = 3.923 + 9.104X\)
model = lm(MCAT~GPA, data=mcat)
summary(model)
##
## Call:
## lm(formula = MCAT ~ GPA, data = mcat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.4148 -2.5168 -0.1519 2.6653 8.6616
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.923 6.922 0.567 0.573
## GPA 9.104 1.942 4.688 1.97e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.088 on 53 degrees of freedom
## Multiple R-squared: 0.2931, Adjusted R-squared: 0.2798
## F-statistic: 21.98 on 1 and 53 DF, p-value: 1.969e-05
6b.) Write out the sum of squared errors (SSE, the objective function) that is going to be minimized with respect to the \(\beta\) coefficients to obtain our estimated regression equation. That is to say, write out the theoretical form of the SSE (sum of the residuals squared) using the \(\beta\) coefficients that is to be minimized to find the estimated regression line.
Answer: \(\Sigma (MCAT -(\beta_0 + \beta_1 * GPA)) ^2\)
6c.) Interpret the slope value in context of the problem.
Answer: The slope vale of 9.104 estimates the increase in MCAT score as GPA increases by 1.
6d.) Interpret the intercept value in context of the problem. Is this a useful interpretation?
Answer: The intercept value of 3.923 is an estimate of a MCAT score of a student with a 0 GPA. This is not useful as it is not possible to have a 0 GPA.
6e.) Use the equation from part (a) to predict the MCAT score for someone that has a GPA of 3.0.
Answer: The estimated MCAT score for someone with a 3.0 GPA is 31.235
Now predict the MCAT score form someone with a GPA of 4.0.
Answer: The estimated MCAT score for someone with a 4.0 GPA is 40.339
6f.) What is the predicted difference in MCAT scores for two people who differ in GPA by 2.0?
Answer: The predicted difference in MCAT score for two people whose GPS differs by 2.0 is 18.208
6g.) Can we conclude that increasing GPA scores will increase MCAT scores? Explain in a sentence or two.
Answer: Since this is an observational study, we cannot concluded that an increase in GPA scores will increase MCAT score. Instead, we can assume that those with higher GPA scores will tend to have higher MCAT scores.
Answer: Since an observational study is not randomized, they cannot control or gauge any factors that may be causing the observed the results. However, in a randomized experiment, each subject is u usually assigned to a group with expectancy of a certain outcome.