March 4, 2013 Class Notes

Survey Project

  1. Show the project description on the Moodle site.
  2. Ask for suggestions about project areas and frame some hypotheses.
  3. Have students form groups around common interests. Enter their information on the Google Doc.


\( R^2 \) between zero and 1.

Read in the manipulate software.

## Retrieving from
## [1] TRUE

Run littleR.

The blue vector is constructed from a linear combination of a red vector and a black vector. Move the slider to change the amount of the black vector that goes into the sum.

The vectors are being displayed in both variable space and case space. Notice how the roundness of the case-space cloud reflects the angle. The correlation coefficient, r, corresponds to the roundness and to the angle \( \theta \) between the blue and black vectors.

Strategies for Including Explanatory Terms

Context to think about: Grades and the GPA.

Suppose that a new measure of academic achievement were proposed. How would you decide if it's better than GPA?

Partial versus total

Suppose I have a used car. I'm going away for a year and thinking of selling it. On the other hand, it would be nice to have a car available when I get back. How much will it cost me to delay selling the car for a year?

Consider several models of used car prices fitted to data on used Hondas:

cars = fetchData("used-hondas.csv")
## Retrieving from
mod1 = lm(Price ~ Age, data = cars)
mod2 = lm(Price ~ Mileage, data = cars)
mod3 = lm(Price ~ Mileage + Age, data = cars)
mod4 = lm(Price ~ Mileage * Age, data = cars)

Each of the first three is nested in the 4th. You can play around with the various models this way:

mLM(Price ~ Age * Mileage, data = cars)

Include and exclude terms to try to answer this question:

Which is the right model to use to inform my car-selling decision?

Tempting to use model 1, since Age is the only variable that I'm interested in.

xyplot(fitted(mod1) + Price ~ Age, data = cars)

plot of chunk unnamed-chunk-5

It's a bit hard to see the model. Let's try another way of plotting it.

f1 = makeFun(mod1)
plotPoints(Price ~ Age, data = cars)
plotFun(f1(Age) ~ Age, add = TRUE)

plot of chunk unnamed-chunk-6

How much the price goes down with a year depends on how old the car is, but you can get the rate from the derivative of the function. Let's evaluate that derivative for an 8-year old car with 50,000 miles:

f1 = makeFun(mod1)
df1 = D(f1(Age) ~ Age)
df1(Age = 8)
##     1 
## -1559

Or, since I'm really thinking about a 1-year difference:

f1(Age = c(9, 8), Mileage = 50000)
##    1    2 
## 6465 8023

Take the difference. QUESTION: How come I get the same answer for the finite-difference and the derivative?

But let's consider mod4

f4 = makeFun(mod4)
plotFun(f4(Age = a, Mileage = m) ~ a & m, a.lim = c(0, 10), m.lim = c(0, 1e+05), 
    levels = 1000 * (1:20), npts = 200)
plotPoints(Mileage ~ Age, data = cars, add = TRUE, pch = 20, col = "red")

plot of chunk unnamed-chunk-9

Examine the change as age goes up by one year. Should I hold mileage constant or should I let the mileage change with age in the typical way?

Here's the same question another way: Do I want to compare cars with different mileages and different ages, or do I want to compare cares with different ages and the same mileage.

Calculating the partial derivative or partial change:

f4(Age = c(8, 9), Mileage = 50000)
##     1     2 
## 12815 12238
df4da = D(f4(Age = Age, Mileage = Mileage) ~ Age)
df4da(Age = 8, Mileage = 50000)
##    1 
## -577

Review of Partial Derivatives

Relate to the two-variable polynomial: \( f(x,y) = a_0 + a_1 x + a_2 y + a3 x y + ... \)

Relate to the three-variable polynomial.

Real-world examples


Total-vs-partial In-class activity


Kids feet data.

The question is whether girls' shoes are narrower than boys' because girls' feet are narrower. Address this.