** Load data **
# chagne working directory to where oldfaithful.csv is at
setwd("~/..")
# load Old Faithful Geyser Data
oldfaithful <- read.csv("faithful.csv", header = TRUE)
# preview oldfaithful data
head(oldfaithful,3)
## X eruptions waiting
## 1 1 3.600 79
## 2 2 1.800 54
## 3 3 3.333 74
1. Produce density histograms of eruption times and of waiting times.
# density histograms of waiting
hist(oldfaithful$waiting,
freq=FALSE,
xlab = "Waiting",
main="Density Histogram of Waiting",
col="#599ad3")
# density histograms of eruptions
hist(oldfaithful$eruptions,
freq=FALSE,
xlab = "Eruptions" ,
main="Density Histogram of Eruptions",
col="#599ad3")
2. Produce a smoothed density histogram from local polynomial regression.
# if locfit library does not exist, then install
if (!require("locfit")) install.packages("locfit")
## Loading required package: locfit
## locfit 1.5-9.1 2013-03-22
# load library
library(locfit)
# find smoothed density histogram for waiting
lp.waiting <- locfit( ~lp(waiting), data = oldfaithful)
plot(lp.waiting)
# find smoothed density histogram for eruptions
lp.eruptions <- locfit( ~lp(eruptions), data = oldfaithful)
plot(lp.eruptions)
# local polynomial regression for waiting ~ eruptions, we used color to show clusters of eruptions
plot(waiting ~ eruptions,
data = oldfaithful,
col=eruptions,
main="Local Polynomial Regression for Waiting ~ Eruptions")
# add abline after computing regular regression line for comparison
reg.regression <- lm(waiting ~ eruptions, data = oldfaithful)
abline(reg.regression,col="black")
# add lines after computing local polynomial regression line for comparison
lp.regression <- locfit(waiting ~ eruptions, data = oldfaithful)
lines(lp.regression,col="red")
3. Discuss each step of the R code to show eruption times and waiting times.
We have provided comments on each line of the R code for Q1 and Q2. Please take a look at answers for the above questions that apply to this section.
4. Discuss the results of the local polynomial regression.
In question 1 we showed the histogram for an eruption and waiting time histograms in minutes. In question 2 we showed smoothed density histogram from local polynomial regression. We have also plotted a scatter diagram of eruptions and waiting and also an abline for a regular regression in black and a line for local polynomial regression in red.
5. In your discussion, compare local polynomial regression to regular regression.
Azzalini and Bowman (1990) state the Old Faithful data is waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA. Our analysis helped to check the proposition that the waiting time is based on the eruption magnitude which is measured by eruption time. We have color-coded five clusters in the scatter plot of eruptions and waiting time. The scatter plot indicates there is a positive relationship between our variables as the time of one parameter increases or decreases the other one do the same. We have drawn an abline of regular regression (in black color) that confirms the statement we made in the previous sentence. We have also drawn local polynomial regression (in red color) that shows the time of eruptions and waiting to tend to be smoothed or more horizontally around the hight and low eruption time.
References
Azzalini, A. & Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics, 39, 357-365. doi: 10.2307/2347385. Retrieved from: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/faithful.html