\( R^2 \) between zero and 1.
Read in the manipulate software.
fetchData("M155/littleR.R")
## Retrieving from http://www.mosaic-web.org/go/datasets/M155/littleR.R
## [1] TRUE
Run littleR.
The blue vector is constructed from a linear combination of a red vector and a black vector. Move the slider to change the amount of the black vector that goes into the sum.
The vectors are being displayed in both variable space and case space. Notice how the roundness of the case-space cloud reflects the angle. The correlation coefficient, r, corresponds to the roundness and to the angle \( \theta \) between the blue and black vectors.
Context to think about: Grades and the GPA.
Suppose that a new measure of academic achievement were proposed. How would you decide if it's better than GPA?
Suppose I have a used car. I'm going away for a year and thinking of selling it. On the other hand, it would be nice to have a car available when I get back. How much will it cost me to delay selling the car for a year?
Consider several models of used car prices fitted to data on used Hondas:
cars = fetchData("used-hondas.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/used-hondas.csv
mod1 = lm(Price ~ Age, data = cars)
mod2 = lm(Price ~ Mileage, data = cars)
mod3 = lm(Price ~ Mileage + Age, data = cars)
mod4 = lm(Price ~ Mileage * Age, data = cars)
Each of the first three is nested in the 4th. You can play around with the various models this way:
fetchData("mLM.R")
mLM(Price ~ Age * Mileage, data = cars)
Include and exclude terms to try to answer this question:
Which is the right model to use to inform my car-selling decision?
Tempting to use model 1, since Age is the only variable that I'm interested in.
xyplot(fitted(mod1) + Price ~ Age, data = cars)
It's a bit hard to see the model. Let's try another way of plotting it.
f1 = makeFun(mod1)
plotPoints(Price ~ Age, data = cars)
plotFun(f1(Age) ~ Age, add = TRUE)
How much the price goes down with a year depends on how old the car is, but you can get the rate from the derivative of the function. Let's evaluate that derivative for an 8-year old car with 50,000 miles:
f1 = makeFun(mod1)
df1 = D(f1(Age) ~ Age)
df1(Age = 8)
## 1
## -1559
Or, since I'm really thinking about a 1-year difference:
f1(Age = c(9, 8), Mileage = 50000)
## 1 2
## 6465 8023
Take the difference. QUESTION: How come I get the same answer for the finite-difference and the derivative?
But let's consider mod4
f4 = makeFun(mod4)
plotFun(f4(Age = a, Mileage = m) ~ a & m, a.lim = c(0, 10), m.lim = c(0, 1e+05),
levels = 1000 * (1:20), npts = 200)
plotPoints(Mileage ~ Age, data = cars, add = TRUE, pch = 20, col = "red")
Examine the change as age goes up by one year. Should I hold mileage constant or should I let the mileage change with age in the typical way?
Here's the same question another way: Do I want to compare cars with different mileages and different ages, or do I want to compare cares with different ages and the same mileage.
Calculating the partial derivative or partial change:
f4(Age = c(8, 9), Mileage = 50000)
## 1 2
## 12815 12238
df4da = D(f4(Age = Age, Mileage = Mileage) ~ Age)
df4da(Age = 8, Mileage = 50000)
## 1
## -577
Relate to the two-variable polynomial: \( f(x,y) = a_0 + a_1 x + a_2 y + a3 x y + ... \)
Relate to the three-variable polynomial.
Total-vs-partial In-class activity
The question is whether girls' shoes are narrower than boys' because girls' feet are narrower. Address this.