Does the scatterplot show a positive or negatve association? Explain why your answer makes sense for these two variables.
This scatterplot shows a negative association. This makes sense because it is reasonable that as the percentage of residents that eat at least 5 servings of fruits and vegetables increases, the percentage of residents that is obese decreases, since healthy eating habits can prevent obesity.
Which of the following is most likely to be the correlation between these two variables: \(-1, -0.941, -0.605, -0.083, 0.172, 0.445, 0.955, or 1\)? Explain your reasoning.
The mostly likely correlation between these two variables is -.605, because it is a moderately strong, negative relationiship.
Would a negative correlation imply that eating more vegetables will cause you to lose weight? Explain.
No; correlation does not imply causation, so we only know that there is some relationship between the two not necessarily that it eating more vegetables will cause you to lose weight.
x_values <- c(2002,2003,2004,2005,2006,2007,2008,2009,2010,2011)
y_values <- c(50,45,54,49,54,66,59,68,54,62)
xyplot(y_values~x_values,type = c("p", "r"))
model<-lm(y_values~x_values)
model
##
## Call:
## lm(formula = y_values ~ x_values)
##
## Coefficients:
## (Intercept) x_values
## -3385.352 1.715
origCorrelation <- cor(y_values~x_values)
listOfCorrelations <- c()
for (i in 1:10000){
reordered_y_values <- sample(y_values,size=10,replace=F)
newCorrelation <- cor(reordered_y_values~x_values)
listOfCorrelations <- c(listOfCorrelations,newCorrelation)
}
histogram(~listOfCorrelations)
#2-sided p-value
pvalue <- (sum(listOfCorrelations >= origCorrelation)+sum(listOfCorrelations <= -1*origCorrelation))/10000
pvalue
## [1] 0.0327
We make a null hypothesis that the correlation coefficient is equal to zero and a alternative hypothesis that it is not equal to zero. With a p-value of ‘r pvalue’, we can reject the null hypothesis and conclude that there is a significant linear relationship bewteen the year and the number of hot dogs and buns eaten.
\(HotDogs = 1.715*Year - 3385.352\) The slope tells us that for every additional year, the number of hot dogs and buns eaten increases by 1.715.
yavg = mean(y_values)
predicted = (1.715 * x_values) - 3385.352
SST = sum((y_values - yavg)^2)
SST
## [1] 506.9
SSE = sum((y_values - predicted)^2)
SSE
## [1] 265.1333
SSR = sum((predicted - yavg)^2)
SSR
## [1] 243.5783
anova(model)
## Analysis of Variance Table
##
## Response: y_values
## Df Sum Sq Mean Sq F value Pr(>F)
## x_values 1 242.69 242.694 7.3486 0.02662 *
## Residuals 8 264.21 33.026
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
r2 = (SSR/SST)^2
r2
## [1] 0.2309045
The R^2 value tells us that 23.09% of the variability of the number of hot dogs and buns eaten is explained by the model.