This session is assessed using MCQs (questions highlighted below). The actual MCQs can be found on the BS2004 Blackboard site under Assessments and Feedback/Data analysis MCQs. The deadline is listed there and on the front page of the BS2004 blackboard site. This assessment contributes about 4% of module marks. Remember to save your commented script and copy and paste it where you answer the MCQs (don’t worry about making it neat). You will receive feedback on this assessment after the submission deadline.
The obesity
dataset contains data from 39 men. For each
man, the investigators measured the forearm skin fold (a proxy for
bodyfat percentage) (FOREARM
) and their height
(HT
) and their weight (WT
).
Take FOREARM
as the response variable, using
scatterplots (ggplot?) answer the below question about the explanatory
variables. Draw two scatterplots, one with FOREARM
as the y
axis and HT
as the x-axis and one with FOREARM
as the y axis and WT
as the x-axis.
Blackboard MCQ: From your scatterplots, which explanatory variable is the best predictor of obesity? (We’ll define obesity as higher bodyfat i.e. larger forearm skin fold
FOREARM
).
First we are going to explore the data looking at only one
explanatory variable at a time. Set up one linear model looking at
FOREARM=HT
and another using FOREARM=WT
. I
would then use the summary
command on each
(anova
would also work for what you need)
Blackboard MCQ: Taking
FOREARM
as the response variable, which of the two explanatory variablesHT
andWT
is the best predictor of obesity when used alone in a linear model?
Now create a linear model with two explanatory variables (FOREARM=WT+HT).
Blackboard question: Using the
Anova
command from thecar
package (adjusted sum of squares), report the ANOVA statistics correctly for WT’s effect on FOREARM
Blackboard question: Using the
anova
command (sequential sum of squares) rerun the analysis ( (FOREARM=WT+HT order of explanatory variables important!!!). Report the ANOVA statistics correctly for WT’s effect on FOREARM. Why is it different than theAnova
command ouput?
If you used ANOVA tables to answer the previous questions, you now
know if height or weight affect obesity? But does being taller
predict more obesity or less? To answer these sorts of questions you
need to look at the slope of the lines. We can get these from the
summary
command on the lm
command output. If
you aren’t sure have a look back at the first year lecture which
discussed how to do a regression in R (second last slide here).
Briefly, if the estimate is positive, it’s a positive relationship and
if the estimate is negative it’s a negative relationship.
Blackboard question: From your summary output of
lm(FOREARM~WT+HT)
how do the two explanatory variables affect obesity?