The data set give the locations of 1000 seismic events of MB > 4.0. The events occurred in a cube near Fiji since 1964.
library(tidyverse)
data(quakes)
What do you think should be the response and explantory variables?
# space for answer
# SCATTERPLOT
# JITTER ADDS NOISE
# Add a smoothed line to visualize a non-parametric model to your scatterplor
# Add the line for best fit to your scatterplor
Weโre going to go with a simple linear model
This model has the form: \[Y_i=\beta_0+\beta_1 x_i +\varepsilon_i\]
This might look familar to \(Y=mx+b\), which was called slope-intercept form.
# Use the lm function to create a SLR model and store the output
# Look at the summary of that output
We can estimate the parameters:
We can estimate the error associated estimating these parameters: * Standard Error for Y-intercept: \(SE(\hat{\beta}_0) = 4.1899\) * Standard Error for Slope: \(SE(\hat{\beta}_1) = 0.9034\)
We can estimate the \(\sigma^2\) with \(s^2\): \[s^2=\frac{1}{n-2}\sum_{i=1}^n(Y_i-\hat{Y}_i)^2\]
The quantity \(s^2\) is called the mean squared error (MSE).
The denominator \(n-2\) is referred to as the residual regrees of freedom.
Rather than \(s^2\) we often use \(s\): \[s=\sqrt{\frac{1}{n-2}\sum_{i=1}^n(Y_i-\hat{Y}_i)^2}\]
We can find this value in the R output, but we can also check it:
# Write code to varify s
\[t=\frac{\hat{\beta}_1}{SE({\hat{\beta}_1})}\]
where \[SE(\hat{\beta}_1)=\sqrt{\frac{s^2}{\sum_{i=1}^n (x_i-\bar{x})^2}}\]
T-distribution with \(n-2\) degrees of freedom.
#beta1 <- mod$coefficients[2]
#beta1SE <- sqrt((sum(mod$residuals^2))/998/sum((quakes$mag - mean(quakes$mag))^2))
# Use the above components to create the test statistic
# Use the test statistic to find the p-value
Confidence intervals take the form
\[\text{Point Estimate} \pm \text{Critical Value} \times \text{Standard Error}\]
\[\hat{\beta}_1 \pm t_{df=n-2, \alpha/2} \times \sqrt{\frac{s^2}{\sum_{i=1}^n (x_i-\bar{x})^2}}\]
#beta1 <- mod$coefficients[2]
#beta1
We define the significance level \(\alpha\), therefore we need to find the \(1-\alpha/2\) quantile. Typically, the default significance level used is \(\alpha=0.05\), thus the percentile we look for is \(97.5%\).
Recall that the degrees of freedom are \(n-2\).
# Find the critical value
#beta1SE <- sqrt((sum(mod$residuals^2))/998/sum((quakes$mag - mean(quakes$mag))^2))
#beta1SE
# Find the confidence interval for the slope
We can check this with the confint function in R:
# note that we can also use a special function in R instead of making a confidence interval from scratch
# Predict for an earthquake with magnitude 7
# Create a confidence interval
# Create a prediction interval
What is the difference between confidence and prediction intervals?
# space for notes here..
#confBand<-predict(mod, interval="confidence")
#predBand<-predict(mod, interval="predict")
#colnames(predBand)<-c("fit2", "lwr2", "upr2")
#newDF<-cbind(quakes, confBand, predBand)
# Create a scatterplot that include confidence and prediction bands..
# print the anova table
# Look at diagnostic plots