1 Assessment

This session is assessed using MCQs (questions highlighted below). The actual MCQs can be found on the BS2004 Blackboard site under Assessments and Feedback/Data analysis MCQs. The deadline is listed there and on the front page of the BS2004 blackboard site. This assessment contributes about 4% of module marks. Remember to save your commented script and copy and paste it where you answer the MCQs (don’t worry about making it neat). You will receive feedback on this assessment after the submission deadline.

2 Investigating obesity using scatterplots

The obesity dataset contains data from 39 men. For each man, the investigators measured the forearm skin fold (a proxy for bodyfat percentage) (FOREARM) and their height (HT) and their weight (WT).

Take FOREARM as the response variable, using scatterplots (ggplot?) answer the below question about the explanatory variables. Draw two scatterplots, one with FOREARM as the y axis and HT as the x-axis and one with FOREARM as the y axis and WT as the x-axis.

Blackboard MCQ: From your scatterplots, which explanatory variable is the best predictor of obesity? (We’ll define obesity as higher bodyfat i.e. larger forearm skin fold FOREARM).

3 Single explanatory variable models

First we are going to explore the data looking at only one explanatory variable at a time. Set up one linear model looking at FOREARM=HT and another using FOREARM=WT. I would then use the summary command on each (anova would also work for what you need)

Blackboard MCQ: Taking FOREARM as the response variable, which of the two explanatory variables HT and WT is the best predictor of obesity when used alone in a linear model?

4 Linear model with two explanatory variables

Now create a linear model with two explanatory variables (FOREARM=WT+HT).

Blackboard question: Using the Anova command from the car package (adjusted sum of squares), report the ANOVA statistics correctly for WT’s effect on FOREARM

Blackboard question: Using the anova command (sequential sum of squares) rerun the analysis ( (FOREARM=WT+HT order of explanatory variables important!!!). Report the ANOVA statistics correctly for WT’s effect on FOREARM. Why is it different than the Anova command ouput?

4.1 How does height or weight affect obesity

If you used ANOVA tables to answer the previous questions, you now know if height or weight affect obesity? But does being taller predict more obesity or less? To answer these sorts of questions you need to look at the slope of the lines. We can get these from the summary command on the lm command output. If you aren’t sure have a look back at the first year lecture which discussed how to do a regression in R (second last slide here). Briefly, if the estimate is positive, it’s a positive relationship and if the estimate is negative it’s a negative relationship.

Blackboard question: From your summary output of lm(FOREARM~WT+HT) how do the two explanatory variables affect obesity?