Solutions to Homework from Day 08

Question 0.1:

a. The response variable is time spent sleeping the previous night (quantitative) and the explanatory variable is major (categorical; not binary).

b. The response variable is the score on the final exam (quantitative) and the explanatory variable is the score on the first exam (quantitative).

c. The response variable is the time spent on the final exam (quantitative) and the explanatory variable is gender (categorical; binary).

d. The response variable is handedness (categorical; binary); there are several explanatory variables: major (categorical; not binary), gender (categorical; binary), and time spent on the final exam (quantitative).

Question 0.7:

a. The registrar is interested in all members of the entering class.

b. The descriptive summaries computed by the registrar are parameters since they describe the entire population rather than a sample.

c. The Mathematics department is interested in the population of all students who want to take a mathematics course.

d. The numerical summaries obtained by the Mathematics department are statistics since they refer to a sample of the population of interest.

Question 0.10:

a. Since Wins = 4.6 + 0.5PF - 0.3PA + \( \epsilon \), a team increases their expected number of wins by (0.5)*3 = 1.5 if they increase their scoring average by 3 points per game.

b. A team increases their expected number of wins by (-0.3)*(-3) = 0.9 if they decrease their points allowed by an average of 3 points per game.

c. Assuming that it is equally much work to improve the offense by 3 points as it is to improve the defense by 3 points, it seems that the team should concentrate on improving their offense in order to increase their expected number of wins the most.

d. The model predicts that the 2010 Green Bay Packers should have won

4.6 + 0.5(35) - 0.3(22.44) = 15.368

games. Since they won 15 games, their residual was 15-15.368 = -0.368.

Question 0.14:

We load the data:

setwd("/Users/traves/Dropbox/SM339/day 08")
WL <- read.csv("WeightLossIncentive7.csv")
summary(WL)

##        Group      Month7Loss    
##  Control  :18   Min.   :-22.00  
##  Incentive:15   1st Qu.: -0.50  
##                 Median :  5.50  
##                 Mean   :  6.08  
##                 3rd Qu.: 16.00  
##                 Max.   : 24.50

attach(WL)

CHOOSE:

The model is

Month7Loss = \( \mu_1 \) + \( \epsilon_1 \)

(with \( \epsilon_1 \sim N(0,\sigma_1) \)) for the control group and

Month7Loss = \( \mu_2 \) + \( \epsilon_2 \)

(with \( \epsilon_2 \sim N(0,\sigma_2) \)) for the incentive group.

FIT:

We estimate the parameters by first subsetting the Month7Loss variable:

WL7C = Month7Loss[which(Group == "Control")]
WL7I = Month7Loss[which(Group == "Incentive")]

Then we compute the fit statistics:

mean(WL7C)  # control mean = 4.64

## [1] 4.639

mean(WL7I)  # incentive mean = 7.8

## [1] 7.8

sd(WL7C)  # control sample standard deviation = 9.84

## [1] 9.835

sd(WL7I)  # incentive sample standard deviation = 12.06

## [1] 12.06

ASSESS:

We need to check whether the residuals are normally distributed. Let's check this using density plots.

require(lattice)

## Loading required package: lattice

require(latticeExtra)

## Loading required package: latticeExtra

## Loading required package: RColorBrewer

densityplot(WL7C)

plot of chunk unnamed-chunk-4

densityplot(WL7I)

plot of chunk unnamed-chunk-4

Both plots suggest some non-normality, especially the left-skewed data for the incentive group. This means that we need to be more careful in how strongly we state our statistical conclusions (we may have to reconsider our methods too).

Now we do a one-sided two-sample t-test to conduct a test of hypotheses:

\( H_0 \): \( \mu_1-\mu_2 = 0 \)

\( H_1 \): \( \mu_1 - \mu_2 < 0 \)

t.test(WL7C, WL7I, conf.level = 0.95, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  WL7C and WL7I 
## t = -0.8144, df = 26.99, p-value = 0.2113
## alternative hypothesis: true difference in means is less than 0 
## 95 percent confidence interval:
##  -Inf 3.45 
## sample estimates:
## mean of x mean of y 
##     4.639     7.800

USE:

The p-value of 0.2113 is quite large (certainly larger than the level of significance 0.05). We fail to reject the null hypothesis. The data do not support the claim that the financial incentives help dieters lose more weight over a 7 month interval.