7.6 Husbands and wives, Part I. The Great Britain Office of Population Census and Surveys once collected data on a random sample of 170 married couples in Britain, recording the age (in years) and heights (converted here to inches) of the husbands and wives. The scatterplot on the left shows the wife’s age plotted against her husband’s age, and the plot on the right shows wife’s height plotted against husband’s height.

  1. Describe the relationship between husbands’ and wives’ ages.
    Answer: Given the linearly related scatter plot, it can deduced that the relationship between husbands’ and wives’ ages is linear and positively correlated. This could mean that young/old men prefer to choose young/old women as partners respectively or vice versa.

  2. Describe the relationship between husbands’ and wives’ heights.
    Answer: Given the non-linear scatter plot(spread across), I would interpret that husband and wife’s heights do not follow linear or any particular pattern/relationship.

  3. Which plot shows a stronger correlation? Explain your reasoning.
    Answer: The correlation plot of husband vs wife ages plot shows a strong positive correlation as it shows as the x increases y. Whether x causes y or y causes is not known though.
  4. Data on heights were originally collected in centimeters, and then converted to inches. Does this conversion affect the correlation between husbands’ and wives’ heights?
    Answer:
    Change of units will not impact the mathematical relationship.


7.12 Trees The scatterplots below show the relationship between height, diameter, and volume of timber in 31 felled black cherry trees. The diameter of the tree is measured 4.5 ft above the ground.

  1. Describe the relationship between volume and height of these trees.
    Answer:
    The relationship between the volume and height of the trees is slightly positively linear. I belive that it still is a weak relationship.

  2. Describe the relationship between volume and diameter of these trees.
    Answer:
    The relationship between the volume and height of the trees is strongly linear and positive. Indicating that the volume of the tree is directly proprtional to its diameter.

  3. Suppose you have height and diameter measurements for another black cherry tree. Which of these variables would be preferable to use to predict the volume of timber in this tree using a simple linear regression model? Explain your reasoning. Answer:
    Using the information derived from the scatter plots (and as discussed above) it is suggested to use diameter of the tree variable( dia ) to predict the volume of timber in the tree.


7.18 Correlation, Part II. What would be the correlation between the annual salaries of males and females at a company if for a certain type of position men always made:

  1. $5,000 more than women?
    Answer:
    \(salMen = salWomen + 5000\)
    This indicates a positive linear relationship. Meaning that there will be a upward sloping line shown in the scatter plot.

  2. 25% more than women?
    Answer:
    \(salMen = 1.25 * salWomen\)
    This indicates a positive linear relationship such that there will be a upward sloping line shown in the scatter plot.

  3. 15% less than women?
    Answer:
    \(salMen = 0.85 * salWomen\)
    This indicates a positive linear relationship. Therefore there will be a upward sloping line shown in the scatter plot.


7.24 Nutrition at Starbucks, Part I. The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.

  1. Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.
    Answer:
    Amount of carbohydrate and number of calories seem to follow a weak yet positive linear relationship. As the number of calories increases the amount of carbohydrate increases linearly despite some significant amount of dispersion at higher calorie values.

  2. In this scenario, what are the explanatory and response variables?
    Answer:
    Response variable: amount of carbohydrates (in grams)
    Explanatory variable: number of calories in Starbucks food menu items.

  3. Why might we want to fit a regression line to these data?
    Answer:
    We can predict the amount of carbohydrate (in grams) a starbucks menu food items contains given the number of calories in the item.

  4. Do these data meet the conditions required for fitting a least squares line?
    Answer:
    The following are the conditions for fitting the least squares lines:

Due to heteroscedascity of the data fitting simple linear regression is not suggested or ideal technique for this case. We may need to evaluate other advanced techniques or transformation ofthe data.


7.30 Cats, Part I. The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats. Estimate Std. Error t value Pr(>|t|) (Intercept) -0.357 0.692 -0.515 0.607 body wt 4.034 0.250 16.119 0.000 s = 1.452 R2 = 64.66% R2 adj = 64.41%

  1. Write out the linear model.
    Answer:
    Using \(y = B0 + B1 * x\)
    \(heartWeight = -0.357 + 4.034*bodyWeight\)

  2. Interpret the intercept.
    Answer:
    Expected heart weight in cats with 0 kg body weight is -0.357 g. This is not a meaningful value, it just serves to adjust the height of the regression line.

  3. Interpret the slope.
    Answer:
    For each additional unit(kg) increase in body weight, we expect an additional 4.034 grams in the heart weight of cats.

  4. Interpret R2.
    Answer:
    Body weight variable is able to explain 64.66% of the variability in the heart weight of cats.

  5. Calculate the correlation coefficient. Answer:

R2=.6466
paste("Correlation coefficient: ", round(sqrt(R2),3))
## [1] "Correlation coefficient:  0.804"

7.36 Beer and blood alcohol content. Many people believe that gender, weight, drinking habits, and many other factors are much more important in predicting blood alcohol content (BAC) than simply considering the number of drinks a person consumed. Here we examine data from sixteen student volunteers at Ohio State University who each drank a randomly assigned number of cans of beer. These students were evenly divided between men and women, and they differed in weight and drinking habits. Thirty minutes later, a police officer measured their blood alcohol content (BAC) in grams of alcohol per deciliter of blood. The scatterplot and regression table summarize the findings.

  1. Describe the relationship between the number of cans of beer and BAC.
    Answer:
    The increase in no. of cans consumed leads to increase in BAC. Also by looking at the scatterplot and positive reg coeff. I can deduce that y vs x plot is an upward sloping line indicating a moderate-strong positive linear relationship.

  2. Write the equation of the regression line. Interpret the slope and intercept in context.
    Answer:
    \(y = b0+b1*x\)
    \(BAC = Intercept + b1*beers\)
    \(BAC = -0.0127 + 0.0180*beers\)

  3. Do the data provide strong evidence that drinking more cans of beer is associated with an increase in blood alcohol? State the null and alternative hypotheses, report the p-value, and state your conclusion.
    Answer:
    p-value of the reg coeff for ‘beers’ = 0.0000 , since p-value<0.05 it is a statistically significant variable. Null Hypothesis, Ho: There is no significant association or b1=0.
    Alternate Hypothesis, Ha: There is some significant association, b1!=0.
    p-value of b1 = 0.0000, which makes us reject Ho or indicating that there is strong relationship between the response & explanatory variables. This is a strong evidence that drinking more cans of beer is associated with an increase in blood alcohol.

  4. The correlation coefficient for number of cans of beer and BAC is 0.89. Calculate R2 and interpret it in context.
    Answer:

r=0.89
paste("R2: ", round(r^2,3))
## [1] "R2:  0.792"
  1. Suppose we visit a bar, ask people how many drinks they have had, and also take their BAC. Do you think the relationship between number of drinks and BAC would be as strong as the relationship found in the Ohio State study?
    Answer:
    Yes, the relationship will be as strong as that in the Ohio state study.

7.42 Babies. Is the gestational age (time between conception and birth) of a low birth-weight baby useful in predicting head circumference at birth? Twenty-five low birth-weight babies were studied at a Harvard teaching hospital; the investigators calculated the regression of head circumference (measured in centimeters) against gestational age (measured in weeks). The estimated regression line is \(headCircumference = 3.91 + 0.78 * gestationalAge\).

  1. What is the predicted head circumference for a baby whose gestational age is 28 weeks?
    Answer:
    \(headCircumference = 3.91 + 0.78 * gestationalAge\) \(headCircumference = 3.91 + 0.78 * 28\) $headCircumference = 25.75 cm $

  2. The standard error for the coefficient of gestational age is 0.35, which is associated with df = 23. Does the model provide strong evidence that gestational age is significantly associated with head circumference?
    Answer:
    As shown in the regression equation of the model the regression coefficient = 0.78 i.e. it is positive indicating there is a positive correlation between the 2 variables.

For lack of more information related to the p-values of the explanatory variables , R2 etc. We can perform a hypothesis test and check the evidence of significance association.
Null Hypothesis, Ho: There is no significant association.
Alternate Hypothesis, Ha: There is some significant association.
now that the n is small (n=23 or n<30), calculate t=(0.78-0)/0.35 = 2.229. for df=23 p-value from the t-table = 0.0178, which makes us reject Ho or indicating that there is strong relationship between the 2 variables in question.