library(graphics)
WidthDepth <- read.table(file = "Data.txt", header = T)
n = which(WidthDepth$W > 0)
Width = WidthDepth$W[n]
Depth = WidthDepth$D[n]
DepthError = WidthDepth$De[n]
WidthError = WidthDepth$We[n]
Slope = WidthDepth$S
SlopeError = WidthDepth$Se
fitLinear.lm = lm(Depth ~ Width)
pred.frame <- data.frame(Width = seq(0.1, 25, by = 0.1))
confidenceInterval <- predict(fitLinear.lm, interval = "confidence", newdata = pred.frame)
predictionInterval <- predict(fitLinear.lm, interval = "prediction", newdata = pred.frame)
par(mfrow = c(1, 1))
plot(Width, Depth, main = "Q Max Width Depth", xlab = "Width (m)", ylab = "Depth (m)",
pch = 20, ylim = c(0, 7))
pred.Size <- pred.frame$Width
# Add curves
matlines(pred.Size, confidenceInterval, lty = c(1, 2, 2), lwd = 1.5, col = 2)
matlines(pred.Size, predictionInterval, lty = c(1, 3, 3), lwd = 1.5, col = 1)
Analyse the quality of the linear fit
par(mfrow = c(2, 2))
plot(fitLinear.lm)
This is not a good fit for several reasons including:
logWidth = log10(Width)
logDepth = log10(Depth)
fitLogLog.lm = lm(logDepth ~ logWidth)
pred.frame <- data.frame(logWidth = seq(-1, 2, by = 0.1))
confidenceInterval <- predict(fitLogLog.lm, interval = "confidence", newdata = pred.frame)
predictionInterval <- predict(fitLogLog.lm, interval = "prediction", newdata = pred.frame)
par(mfrow = c(1, 2))
plot(logWidth, logDepth, main = "Q Max Width Depth", xlab = "log10(Width) (m)",
ylab = "log10(Depth) (m)", pch = 20)
pred.Size <- pred.frame$logWidth
# Add curves
matlines(pred.Size, confidenceInterval, lty = c(1, 2, 2), lwd = 1.5, col = 2)
matlines(pred.Size, predictionInterval, lty = c(1, 3, 3), lwd = 1.5, col = 1)
plot(Width, Depth, main = "Q Max Width Depth", xlab = "Width (m)", ylab = "Depth (m)",
pch = 20, ylim = c(0, 7))
matlines(10^pred.Size, 10^confidenceInterval, lty = c(1, 2, 2), lwd = 1.5, col = 2)
matlines(10^pred.Size, 10^predictionInterval, lty = c(1, 3, 3), lwd = 1.5, col = 1)
Analyse the quality of the log-log model fit
summary(fitLogLog.lm)
##
## Call:
## lm(formula = logDepth ~ logWidth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3979 -0.0782 0.0058 0.0854 0.3258
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5667 0.0303 -18.7 <2e-16 ***
## logWidth 1.0040 0.0450 22.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.158 on 54 degrees of freedom
## Multiple R-squared: 0.902, Adjusted R-squared: 0.9
## F-statistic: 498 on 1 and 54 DF, p-value: <2e-16
par(mfrow = c(2, 2))
plot(fitLogLog.lm)
This is a much better fit.
Notice that the gradient of the power is 1. Therefore, the fit is a straight line in the linear plot. This fit is better because the residuals are power law distributed.
The pairs command plots everything against everything else.
library(psych)
pairs.panels(cbind(logWidth, logDepth, Slope))
fit = lm(log10(Depth) ~ log10(Width) + Slope)
summary(fit)
##
## Call:
## lm(formula = log10(Depth) ~ log10(Width) + Slope)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3960 -0.0980 0.0193 0.0823 0.2983
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.67590 0.10377 -6.51 2.7e-08 ***
## log10(Width) 1.07376 0.07769 13.82 < 2e-16 ***
## Slope 0.00337 0.00306 1.10 0.28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.157 on 53 degrees of freedom
## Multiple R-squared: 0.904, Adjusted R-squared: 0.901
## F-statistic: 250 on 2 and 53 DF, p-value: <2e-16
The fit command above adds the slope as a co-variate. The summary command produces a table showing which co-variates are important for predicting the depth and the significant level is indicated by the starts on the RSH. As you can see - the Slope does not add predicatbility to the relation. I have tried several other forms and they all suggest the same.