setwd("/Users/traves/Dropbox/SM339/homework 2")
WL <- read.csv("WeightLoss4.csv")
summary(WL)
## WeightLoss Group
## Min. :-17.00 Control :19
## 1st Qu.: 1.75 Incentive:17
## Median : 7.75
## Mean : 9.47
## 3rd Qu.: 18.62
## Max. : 30.00
attach(WL)
There are 17 people in the Incentive group.
b. Find a 90% CI for the mean weight loss of the Incentive group.
WLI <- WL[which(Group == "Incentive"), ]
t.test(WLI$WeightLoss, conf.level = 0.9)
##
## One Sample t-test
##
## data: WLI$WeightLoss
## t = 6.866, df = 16, p-value = 3.794e-06
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## 11.69 19.66
## sample estimates:
## mean of x
## 15.68
A 90% confidence interval for the mean weight loss of the Incentive group is (11.7 lbs, 19.7 lbs).
c. We test the following hypotheses:
\( H_0 \): the mean weight loss \( \mu \) for people in the Incentive group is at least 18 lbs (\( \mu \geq 18 \))
\( H_1 \): the mean weight loss \( \mu \) for people in the Incentive group is less than 18 lbs (\( \mu < 18 \))
d. We run the R-command to do the test of hypotheses:
t.test(WLI$WeightLoss, mu = 18, alternative = "less")
##
## One Sample t-test
##
## data: WLI$WeightLoss
## t = -1.018, df = 16, p-value = 0.162
## alternative hypothesis: true mean is less than 18
## 95 percent confidence interval:
## -Inf 19.66
## sample estimates:
## mean of x
## 15.68
The test statistic is -1.02 and is distributed as a student-t distribution with 16 degrees of freedom. The p-value is 0.162.
e. Since the p-value is greater than \( \alpha = 0.05 \) we fail to reject the null hypothesis. There is not sufficient evidence to reject the null hypothesis. The study's claim is consistent with the evidence.
f. A type I error in this situation would be a finding that there is evidence to reject the study's claim, when in fact the study is correct.
g. A type II error in this situation would be a finding that there is not enough evidence to reject the study's claim, when in fact the study is incorrect.
setwd("/Users/traves/Dropbox/sm339/homework 2")
satGPA <- read.csv("satGPA.csv")
summary(satGPA)
## sex SATV SATM SATSum HSGPA
## Min. :1.00 Min. :24.0 Min. :29.0 Min. : 53 Min. :1.8
## 1st Qu.:1.00 1st Qu.:43.0 1st Qu.:49.0 1st Qu.: 93 1st Qu.:2.8
## Median :1.00 Median :49.0 Median :55.0 Median :103 Median :3.2
## Mean :1.48 Mean :48.9 Mean :54.4 Mean :103 Mean :3.2
## 3rd Qu.:2.00 3rd Qu.:54.0 3rd Qu.:60.0 3rd Qu.:113 3rd Qu.:3.7
## Max. :2.00 Max. :76.0 Max. :77.0 Max. :144 Max. :4.5
## FYGPA
## Min. :0.00
## 1st Qu.:1.98
## Median :2.46
## Mean :2.47
## 3rd Qu.:3.02
## Max. :4.00
attach(satGPA)
drop = FYGPA - HSGPA
a. The largest population is the freshman class at Dartmouth college.
b. Here's the plot:
plot(drop ~ HSGPA, col = "red", pch = 19, main = "GPA drop vs. High School GPA",
ylab = "GPA drop", xlab = "High School GPA")
c. The mathematical equation for the model drop~HSGPA is
drop = \( \beta_0 \) + \( \beta_1 \)*drop + \( \epsilon \),
where \( \epsilon \) ~ N(0, \sigma).
d. We fit the model:
gradedrop = lm(drop ~ HSGPA)
summary(gradedrop)
##
## Call:
## lm(formula = drop ~ HSGPA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3054 -0.3742 0.0394 0.4191 1.7524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0913 0.1179 0.77 0.44
## HSGPA -0.2569 0.0363 -7.07 3e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.622 on 998 degrees of freedom
## Multiple R-squared: 0.0477, Adjusted R-squared: 0.0467
## F-statistic: 49.9 on 1 and 998 DF, p-value: 2.96e-12
The line of best fit has equation
drop = 0.09132 -0.25686*HSGPA.
e. The six assumptions:
LINEARITY: There is a linear association between the GPA drop from high school to college and the high school GPA.
ZERO MEAN: The mean error in the GPA drop (compared with the model's prediction) for each fixed value of high school GPA is zero.
CONSTANT VARIANCE: The variance of the errors in the model (the spread of the GPA drop for each fixed value of the high school GPA) do not depend on the high school GPA.
INDEPENDENCE: The GPA drop for any student is independent of the GPA drop for any other students.
RANDOM: The data are obtained from a random process (e.g. the students in the sample are selected randomly).
NORMALITY: The errors (the actual GPA drops minus the predicted GPA drop) are normally distributed.
f. Plot residuals:
plot(gradedrop$fitted, gradedrop$residuals, col = "red", pch = 19, main = "Grade Drop: Residuals vs. Fitted Values",
ylab = "Residual values (in GPA)", xlab = "Fitted values (in GPA)")
abline(0, 0, col = "blue", lwd = 4)
The point at the bottom left looks pretty odd (partly because it is the only point so far to the left).
which(gradedrop$fitted < -1) # student 612
## 612
## 612
satGPA[612, ]
## sex SATV SATM SATSum HSGPA FYGPA
## 612 1 42 37 79 4.5 1.13
Note that this student is unusual in that their high school GPA was 4.5. This may be a score on a nonstandard scale (some high schools have GPAs that go to 5 rather than 4) or could be a data entry error.
g. Density plot:
require(lattice)
## Loading required package: lattice
require(latticeExtra)
## Loading required package: latticeExtra
## Loading required package: RColorBrewer
densityplot(gradedrop$residuals, main = "Density plot of the residuals", xlab = "grade drop (in units of GPA)")
The density plot looks pretty much like a normal distribution but it is sometimes hard to tell if the tails are too heavy or not. Still, the visual evidence suggests that the residuals are normally distributed.
h. The “typical” error is the residual standard error, 0.62 units of GPA.
i. The model says that a high school student with 0 GPA should expect to see his or her GPA increase by 0.091 in college (it can hardly go down!) and that for each unit of GPA a student had in high school, they should expect to see their college GPA drop by .257 units. For example, a high school student with a 4.0 GPA should expect to see their GPA drop by 0.937 in their freshman year of college. The advisor was pretty close to being right since a difference of 1 unit of GPA corresponds to a change of about 10% in a grade (for reasonable grades).