1.

From book, Exercise 10.1: These data are the average March temperatures (Fahrenheit) in Kansas Cityfrom 1961 to 1990. Estimate the probability denisty function using the histogram and the kernel density estimate.

#Make sure to pick a good bin width for the histogram.
#Make sure to pick a good bandwith for the kernel smoother. 
kcTemp <- c(43.8,40.1,49.2,41.8,34.0,49.1,47.8,48.1,37.6,42.0,43.7,47.1,47.7,46.9,36.5,45.0,48.0,37.6,42.2,38.7,45.2,42.5,43.1,36.0,47.4,48.5,47.1,43.2,43.8,45.7)
set.seed(1234)
n<-100
x = rnorm(n) 

S<-sd(kcTemp)
n<-length(kcTemp)

hist(kcTemp,main="Delta=1.5",freq=FALSE)  
dens<-density(kcTemp,bw=1.5) 
points(dens$x,dens$y,type="l",col="red",lwd=3)

2.

Download this data: http://bit.ly/HW6stateData (For a description of the data use: help(state.x77)). We are interested in the relationship between HS graduation rate and Income.

df.2 = read.csv("https://www.dropbox.com/s/s4cp6q7xpdli7na/HW6stateData.csv?dl=1")

Plot HS graduation rate (x-axis) vs per capita income (y-axis)

• Fit a simple linear regression model to this data. Plot this line over the points.

lm = lm(df.2$Income~df.2$HS.Grad)
plot(df.2$HS.Grad,df.2$Income)
par(new=T)
abline(lm)

Fit a linear regression model with a quadratic effect for HSgraduation rate to this data. Plot this line over the points.

df.2$HS.Grad2 = df.2$HS.Grad^2
lm2 = lm(df.2$Income~HS.Grad + HS.Grad2, data=df.2)

x <- seq(30, 70, 0.1)
p = predict(lm2, data.frame(HS.Grad = x, HS.Grad2 = x^2))

plot(df.2$HS.Grad,df.2$Income)
par(new=T)
abline(lm, col = "red", lwd = 3, lty = 4)
lines(x, p, col = "dark green", lwd = 3, lty = 1)
legend("topleft", lwd = 3, lty = c(4, 1), col = c("red", "dark green"), legend = c("Linear", "Quadratic"))

Fit a loess model to this data with an appropriate bandwidth. Plot this line over the points.

lw<-loess(Income~HS.Grad,span=1,degree=2,data=df.2)
p2 = predict(lw, data.frame(HS.Grad = x))

plot(df.2$HS.Grad,df.2$Income)
par(new=T)
abline(lm, col = "red", lwd = 3, lty = 4)
lines(x, p, col = "dark green", lwd = 3, lty = 3)
lines(x, p2, col = "blue", lwd = 3, lty = 1)
legend("topleft", lwd = 3, lty = c(4, 3, 1), col = c("red", "dark green", "blue"), legend = c("Linear", "Quadratic", "Loess"))

##Comments on the similarities and differences between these three different fits. All three show that Income increases as Graduation Rate increases. Linear does not appear to be appropriate as the data between 40-50 has a concave down shape.

The loess curve shows this leveling out and also shows an increase in income between 60-65 which the quadratic curve misses.

3.

Consider the data generated based on the code below. Find a range of spans (the smoothing parameter) that produces reasonable (i.e. the fit is not too jagged nor too smooth) loess fits to this data.

set.seed(1234)
x <- runif(100,0,10)
y <- sin(x)+rnorm(100,0,.3)
lw1<-loess(y~x,span=.65,degree=2)
plot(x,y)
par(new=T)
j = order(x)
points(x[j],lw1$fitted[j],col="red",lwd=3,type="l") 

A span in the range of .2 to .65 provides a reasonable fit for these data.

4.

(GRAD ONLY) Repeat the previous problem with values of σ equal to 0.1, 0.5, 1, and 1.5. Discuss the relationship between the random variablity in the data and the range of reasonable spans.

σ equal to 0.1,

#sigma is the amount of noise added.
set.seed(1234)
sigma<-.1
x <- runif(100,0,10)
y <- sin(x)+rnorm(100,0,sigma)
lw2<-loess(y~x,span=.45,degree=2)
plot(x,y)
par(new=T)
j = order(x)
points(x[j],lw2$fitted[j],col="red",lwd=3,type="l") 

A span from .15 to .45 is appropriate

σ equal to 0.5

set.seed(1234)
sigma<-.5
x <- runif(100,0,10)
y <- sin(x)+rnorm(100,0,sigma)
lw2<-loess(y~x,span=.65,degree=2)
plot(x,y)
par(new=T)
j = order(x)
points(x[j],lw2$fitted[j],col="red",lwd=3,type="l") 

A span from .2 to .55 is appropriate

σ equal to 1

set.seed(1234)
sigma<-1
x <- runif(100,0,10)
y <- sin(x)+rnorm(100,0,sigma)
lw2<-loess(y~x,span=.7,degree=2)
plot(x,y)
par(new=T)
j = order(x)
points(x[j],lw2$fitted[j],col="red",lwd=3,type="l") 

A span from .4 to .7 is appropriate

σ equal to 1.5

set.seed(1234)
sigma<-1.5
x <- runif(100,0,10)
y <- sin(x)+rnorm(100,0,sigma)
lw2<-loess(y~x,span=.9,degree=2)
plot(x,y)
par(new=T)
j = order(x)
points(x[j],lw2$fitted[j],col="red",lwd=3,type="l") 

A span from .5 to .9 is appropriate.

It seems like the more noise introduced, the higher the span we need to smooth the curve.