Tom Detzel
11.5.15
Three variables:
head(urbOwn,5)
State OwnPct UrbPct
1 Alabama 69.7 59.04
2 Alaska 63.1 66.02
3 Arizona 66.0 89.81
4 Arkansas 67.0 56.16
5 California 55.9 94.95
Check for skew, normality.
OwnPct UrbPct
Min. :53.30 Min. :38.66
1st Qu.:65.60 1st Qu.:65.39
Median :67.60 Median :74.20
Mean :67.07 Mean :73.98
3rd Qu.:69.65 3rd Qu.:87.53
Max. :73.40 Max. :94.95
Check for skew: Negative, so left skewed.
myStats[c(4,11,12,13)]
sd skew kurtosis se
OwnPct 4.23 -1.23 1.51 0.59
UrbPct 14.69 -0.44 -0.57 2.06
Percent Home Ownership
Percent Urban
(a) For these data, \( { R }^{ 2 } \) = 0.28. What is the correlation? How can you tell if it is positive or negative?
(a) For these data, \( { R }^{ 2 } \) = 0.28. What is the correlation?
$statistic
t
-4.235275
$parameter
df
49
$estimate
cor
-0.5176625
(b) Examine the residual plot. What do you observe? Is a simple least squares fit appropriate for these data?
ownFit <- lm(OwnPct~UrbPct, data=urbOwn)
fOwn <- fortify(ownFit)
head(fOwn[c(1,2,6,7)], 3)
OwnPct UrbPct .fitted .resid
1 69.7 59.04 69.29798 0.4020211
2 63.1 66.02 68.25808 -5.1580759
3 66.0 89.81 64.71376 1.2862355
(b) Examine the residual plot. What do you observe? Is a simple least squares fit appropriate for these data?
ggplot(fOwn, aes(x=.fitted, y=.resid)) +
geom_point(shape=1) +
geom_hline(y=0, lty="dashed") +
theme_fivethirtyeight() +
labs(x="Fitted Values", y="Residuals")
(b) Is a simple least squares fit appropriate for these data?
(b) Is a simple least squares fit appropriate for these data?
(b) Is a simple least squares fit appropriate for these data?
Conditions are partially met, but with an \( { R }^{ 2 } \) of only .28, you're not getting much bang for the buck out of this model. That's only 28 percent of the variability in the dependent variable (home ownership); other unidentified factors could be more important than the size of a state's urban population.