The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
-——————————————————————————-
Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals. The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
\begin{center} \end{center}
-——————————————————————————-
Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
# height as response variable (y) and shoulder girth as prediction variable (x)
mean_x <- 107.20
sd_x <- 10.37
mean_y <- 171.14
sd_y <- 9.41
r <- 0.67
m <- r * sd_y / sd_x #slope
m## [1] 0.6079749
## [1] 105.9651
The slope means for every 1cm increase in shoulder girth, height would increase by 0.608cm.
The intercept means when should girth is 0cm, the height is 105.97cm, which is impossible to happen in real word. It is an adjustment on prediction of height when we get hold of a piece of shoulder girth information.
## [1] 0.4489
## [1] 166.7626
## [1] -6.7626
-——————————————————————————-
The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
\begin{center} \end{center}
##
## Call:
## lm(formula = cats$Hwt ~ cats$Bwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5694 -0.9634 -0.0921 1.0426 5.1238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3567 0.6923 -0.515 0.607
## cats$Bwt 4.0341 0.2503 16.119 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.452 on 142 degrees of freedom
## Multiple R-squared: 0.6466, Adjusted R-squared: 0.6441
## F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16
## [1] 0.6466209
# using R^2 to find R, however, it will only give us a positive value
sqrt(summary(m_cats_hwt_bwt)$r.squared)## [1] 0.8041274
## [1] 0.8041274
-——————————————————————————-
Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
\begin{center} \end{center}
##
## Call:
## lm(formula = eval ~ beauty)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.80015 -0.36304 0.07254 0.40207 1.10373
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.01002 0.02551 157.205 < 2e-16 ***
## beauty 0.13300 0.03218 4.133 4.25e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5455 on 461 degrees of freedom
## Multiple R-squared: 0.03574, Adjusted R-squared: 0.03364
## F-statistic: 17.08 on 1 and 461 DF, p-value: 4.247e-05
## [1] 0.1325028
The residuals in the scatter plot dispersed randomly around y=0 without showing any obvious shapes or patterns, which shows strong linearity between beauty score and teaching evaluation score. Therefore, linearity condition appears to be met.
The histogram is unimodal and nearly normal with no outliers. Also, the normal probability plot lies close to the normal line. Therefore, the nearly normal residuals condition appears to be met.
From the scatter plot in part (1), the variation of the residuals is relatively small. Although some points lies below y=-1.5, most of the points are lying between y=(-1.5,1.5). Therefore, the constant variability condition appears to be met.
From the 4th graph, the residual points are randomly dispersed. It shows that the deviation of errors are independent of the time of data collection, independent of the errors collected earlier, which means it is not a time series dataset. Therefore, this condition appears to be met.