Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
The relationship between number of calories and amount of carbs may be linear, but not very strong. In fact, to me it looks like the small cloud of points in the lower left corner (low calorie, low carb) is forcing the linear relationship. It is positive if it exists
Number of calories is the explanatory variable. Amount of carbs is the response variable.
Mostly to confirm (a), easy to make predictions and see residuals with a regression line.
Linearity: As described in part (a), there may be a weak linear relationship. Nearly normal residuals: The histogram of the residuals is not completely symmetrical and may not necessarily be nearly normal. Constant variability: Based on the residual plot, I believe there is no contant variability as there are significantly more points on the right (with larger residuals) than on the left of the plot. Independent observations: Observations are independent since food items and their nutritional information does not depend on each other. I believe conditions are not met because of lack of constant variability and distribution of residuals that is not nearly enough normal.
Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals.19 The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
\end{center}
There appears to be a positive linear relationship
As the shoulder girth is in cm we would divide the value by 2.54 which would greatly shorten (squish) the plot but would not change the overall relationship.
Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
responseMean = 171.14
responseSD = 9.41
Rvar = .67
explanatoryMean= 107.2
explanatorySD= 10.37
slope = (responseSD/explanatorySD)*Rvar
intercept = responseMean - (slope)*explanatoryMean
regressionFunction = function(x,slope,intercept){
y =(x*slope)+ intercept
return(y)
}
tinyGraph = 1:150
tinyGraph = sapply(tinyGraph,regressionFunction,slope,intercept)
plot(tinyGraph, type = 'l',xlab = 'Shoulder girth',ylab='height')
Predicted Y = 105.9650878 + 0.6079749 * explanatory variable
The intercept has a positive offset, slope is going up there at a .6 rate to shoulder girth.
0.4489…44.89% of the variability in the height is explained by the model
I’m just plugging 100 into my function (regressionFunction) I made for question (a)…166.7625805
Residual = observed - predicted, which for this case is -6.7625805 Meaning the actual height is that much less than the line predicted.
Original data only includes 80 to 140 shoulder girth which is a minus…Variability seems to be constant, correlation is strong and linear…While it would be innappropriate, I’d do it in absence of any other model.
Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
b0 <- -0.357
b1 <- 4.034
$ =105.8445+0.691*‘bodyweight’ $
In case the cat’s weight is zero we would expect the heart to weight -0.357g
For each kg of weight the heart would weight an additional 4.034g
R2=64.66% * 64.66% of data can be explained using the linear model
r2 <- .6466
corr = sqrt(r2)
corr
## [1] 0.8041144
Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
b0 <- 4.010
b1 <- 4.13 * 0.0322
b1
## [1] 0.132986
Since the slope is positive, we can assume that the relationship is positive. We can set up a hypothesis test to where H0: b1 = 0 and HA: b1 > 0 We look at the p-values which are close to zero, which leads us to reject the null hypothesis. It appears that there is a relationship between beauty and teaching evaluation.
Linerarity: there is a weak, positive linear relationship. There is correlation coefficient or R^2, we accept that this condition is met.nearly normal residuals: based on the histogram, it appears that the residual distibution is nearly normal.
constant variability: the scatterplot data does not appear to have a constant variable.
independent observations: Observations are assumed to be independent.