Chapter 10 (cont.)

Example 2: Public University Tuition

1

(a)

Load the data:

tuition<-read.file("/home/emesekennedy/Data/Ch10/tuition.txt", header=T, sep="\t")

## Reading data with read.table()

(b)

xyplot(Year.2008~Year.2000, data=tuition)

The relationship between the tuition for the two years is strong, positive, and linear.

(c)

y<-lm(Year.2008~Year.2000, data=tuition)
y

## 
## Call:
## lm(formula = Year.2008 ~ Year.2000, data = tuition)
## 
## Coefficients:
## (Intercept)    Year.2000  
##    1132.750        1.692

The equation of the least-squares regression line is $\hat{y}=1.692x+1132.75$ where $x$ is the tuition for 2000.

Create a function from the line:

f<-makeFun(y)

Add the line to the scatterplot:

plotFun(f(Year.2000)~Year.2000, data=tuition, add=T)

(d)

cor(Year.2008~Year.2000, data=tuition)

## [1] 0.8844314

cor(Year.2008~Year.2000, data=tuition)^2

## [1] 0.7822189

The correlation is $r=.7822$, and $r^2=.8844$. This means that 88.44% of the variation in the 2008 tuition is explained by the least-squares regression line.

(e)

mplot(y, which=1)

## [[1]]

The residuals look fairly random, but there are a couple large values.

(f)

mplot(y, which=2)

## [[1]]

The residuals look very close to Normal.

2

(a)

$H_0: \rho=0$ (no correlation between the tutuions for 2000 and 2008)

$H_a: \rho>0$

(b)

summary(y)

## 
## Call:
## lm(formula = Year.2008 ~ Year.2000, data = tuition)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2727.22  -691.07    64.44   750.01  2521.62 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1132.7501   701.4152   1.615    0.116    
## Year.2000      1.6924     0.1604  10.552 8.75e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1134 on 31 degrees of freedom
## Multiple R-squared:  0.7822, Adjusted R-squared:  0.7752 
## F-statistic: 111.3 on 1 and 31 DF,  p-value: 8.746e-12

8.746/2

## [1] 4.373

The test statistic is $t=10.552$ and the $P$-value is $4.373\times 10^{-12}$. We can conclude that the data provides very strong evidence that there is a positive correlation between the tuition for 2000 and 2008.

3

(a)

The formula for the 95% confidence interval for the slope is estimte$\pm t^*$ SE. Find $t^*$:

xqt(.975, df=31)

## [1] 2.039513

1.6924-2.04*.1604

## [1] 1.365184

1.6924+2.04*.1604

## [1] 2.019616

The 95% confidence interval for the slope is $(1.3652, 2.0196)$. This means that we are 95% confident that every $1 increase in 2000 tuition will result in an average incerase of 2008 tuition between $1.37 and $2.02, or every $1000 increase in 2000 tuition will result in an average increase of 2008 tuition between $1365 and $2020.

(b)

predict(y, data.frame(Year.2000=5100), interval="prediction", level=.95)

##        fit      lwr      upr
## 1 9763.775 7397.608 12129.94

The fitted value 9764, and the 95% prediction interval is $(7398, 12130)$. This means that there is a 95% chance that the 2008 tuition for Stat U will be between $7398 and $12130.

(c)

predict(y, data.frame(Year.2000=8700), interval="prediction", level=.95)

##        fit      lwr      upr
## 1 15856.26 13084.74 18627.79

The fitted value 15856, and the 95% prediction interval is $(13085, 18628)$. This means that there is a 95% chance that the 2008 tuition for Moneypit U will be between $13085 and $18628.

(d)

The prediction for part (c) might not be as accurate because $8700 is outside of the range of 2000 tuition values for the data. This is an example of extrapolation.

MTH 119-05 (2:00pm)

Emese Kennedy

December 10

Chapter 10 (cont.)

Example 2: Public University Tuition

1

(a)

(b)

(c)

(d)

(e)

(f)

2

(a)

(b)

3

(a)

(b)

(c)

(d)