7.6 Husbands and Wives

The Great Britain Office of Population Census and Surveys once collected data on a random sample of 170 married couples in Britain, recording the age (in years) and heights (converted here to inches) of the husbands and wives.16 The scatterplot on the left shows the wife’s age plotted against her husband’s age, and the plot on the right shows wife’s height plotted against husband’s height.
a) Describe the relationship between husbands’ and wives’ ages.

Based on the scatterplot, the age’s of husbands and wives in this survey have a strong, positive correlation.

b)Describe the relationship between husbands’ and wives’ heights.

Based on the scatterplot, the height’s of husbands and wives in this survey have a very weak correlation if any correlation at all.

c) Which plot shows a stronger correlation? Explain your reasoning.

The age scatterplot shows a much stronger correlation as the points are much closer together, in a positive linear direction.

d)Data on heights were originally collected in centimeters, and then converted to inches. Does this conversion affect the correlation between husbands’ and wives’ heights?

Yes. This conversion would affect the correlation between husbands’ and wives’ heights because it would create less variance in the points potentially making a stronger correltaion.

7.12 Trees

The scatterplots below show the relationship between height, diameter, and volume of timber in 31 felled black cherry trees. The diameter of the tree is measured 4.5 feet above the ground.
a) Describe the relationship between volume and height of these trees.

There seems to be a moderate to weak, positive correlation between tree volume and height.

b) Describe the relationship between volume and diameter of these trees.

There is a strong positive correlation between tree volume and diameter.

c)Suppose you have height and diameter measurements for another black cherry tree. Which of these variables would be preferable to use to predict the volume of timber in this tree using a simple linear regression model? Explain your reasoning.

Diameter would be preferable to use to predict the volume of timber in this tree because the correlation between volume and diameter seems to be stronger than that of volume and height.

7.18 Correlation

What would be the correlation between the annual salaries of males and females at a company if for a certain type of position men always made
a) $5,000 more than women?

If men always made $5,000 more than women there would be a strong positve correlation between men and women salaries.

b) 25% more than women?

If men always made 25% more than women there would be a strong positive relationship between men and women salaries.

c) 15% less than women?

If men always made 15% less than women there would be a strong negative relationship between men and women.

7.24 Nutrition at Starbucks

The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items con- tain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
a) Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.

There is a moderate to strong positive correlation between calories and carbs on the Starbucks food menu.

b)In this scenario, what are the explanatory and response variables?

The calories are the explanatory variable and the carbs are the response variable

c) Why might we want to fit a regression line to these data?

We may want to fit a regressin line to these data in order to visualize just how well the regression is at predicting carbs based on calories.

d)Do these data meet the conditions required for fitting a least squares line?

Linearity: By looking at the scatterplot I can say that this data does follow a linear trend

Nearly Normal Residual: The residuals in the histogram are slightly skewed to the left but I would say they are nearly normal.

Constant Variability: When looking at the scatterplot for the residuals it is clear that there is not constant variability as the regression seems to work better for lower calorie counts.

The data already does not meet the conditions required for fitting a least squared line

7.30 Cats

The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
a)Write out the linear model.

HeartWeight = -0.357 + (4.034)BodyWeight

b) Interpret the intercept.

With in an intercept of -.357, this means that (theoretically) a cat that weighs 0kg would have a heart that weighs -.357kg, according to this regression.

c) Interpret the slope.

With a slope of 4.034, for every 1kg increase in a cat’s weight, the heart weight will increase by 4.034kg.

d) Interpret R^2

R-squared in this regression model shows that 64% of the variation in a the heart weights of cats is due to the variation in their body weight.

e) Calculate the correlation coefficient

The correlation coefficient is .8041

7.36 Beer and blood alcohol content

Many people believe that gender, weight, drinking habits, and many other factors are much more important in predicting blood alcohol content (BAC) than simply considering the number of drinks a person consumed. Here we examine data from sixteen student volunteers at Ohio State University who each drank a randomly assigned number of cans of beer. These students were evenly divided between men and women, and they differed in weight and drinking habits. Thirty minutes later, a police officer measured their blood alcohol content (BAC) in grams of alcohol per deciliter of blood. The scatterplot and regression table summarize the findings.
a) Describe the relationship between the number of cans of beer and BAC.

There is a moderate positive relationship between the number of cans of beer and BAC

b) Write the equation of the regression line. Interpret the slope and intercept in context.

BAC = -.0127 + (.018)Cans

The intercept shows that, according to this regression model, a person who has consumed 0 cans of beer will have a BAC of -.0127.

The slope shows that, according to this regression model, for every can of beer a person consumes, their BAC will rise by .018.

c) Do the data provide strong evidence that drinking more cans of beer is associated with an increase in blood alcohol? State the null and alternative hypotheses, report the p-value, and state your conclusion.

Ho: Drinking more cans of beer is not associated with an increase in BAC (b1 = 0)

Ha: Drinking more cans of beer is associated with an increase in BAC (b1 =/= 0)

With a small p-value, near 0, the data does provide strong evidence that drinking more cans of beer is associated with an increase in BAC

d) The correlation coefficient for number of cans of beer and BAC is 0.89. Calculate R2 and interpret it in context.

R^2 = .7921

This means that 79% of the variation in BAC is explained by the variation in cans of beer consumed.

e)Suppose we visit a bar, ask people how many drinks they have had, and also take their BAC. Do you think the relationship between number of drinks and BAC would be as strong as the relationship found in the Ohio State study?

It would probably not be as strong of a relationship because of other factors, such as the amount of time each person consumed their drinks in. In the original study it was in a controlled setting. If we just walked into a bar and asked people how many drinks they’ve had, they could lie and/or be wrong, and there could also be a very large variation in the amount of time each person spent consuming their drinks. Therefore, I think the relationship would be weakened.

7.42 Babies

Is the gestational age (time between conception and birth) of a low birth-weight baby useful in predicting head circumference at birth? Twenty-five low birth-weight babies were studied at a Harvard teaching hospital; the investigators calculated the regression of head circumference (measured in centimeters) against gestational age (measured in weeks). The estimated regression line is

head_circdumference = 3.91 + 0.78 x gestational_age

a)What is the predicted head circumference for a baby whose gestational age is 28 weeks?

head_circdumference = 3.91 + 0.78 x (28)

head_circumference = 25.75cm

b)The standard error for the coefficient of gestational age is 0.35, which is associated with df = 23. Does the model provide strong evidence that gestational age is significantly associated with head circumference?

t = (.78-0)/.35

t = 2.229

df = 23

p-value = .0178 < .05

This model provides strong evidence that gestational age is significantly associated with head circumference