Analysis of debris flow width-depth data for Kirstein

Read in the data and some libraries

library(graphics)

WidthDepth <- read.table(file = "Data.txt", header = T)

n = which(WidthDepth$W > 0)

Width = WidthDepth$W[n]
Depth = WidthDepth$D[n]
DepthError = WidthDepth$De[n]
WidthError = WidthDepth$We[n]

Slope = WidthDepth$S
SlopeError = WidthDepth$Se

Linear data analysis


fitLinear.lm = lm(Depth ~ Width)

pred.frame <- data.frame(Width = seq(0.1, 25, by = 0.1))
confidenceInterval <- predict(fitLinear.lm, interval = "confidence", newdata = pred.frame)
predictionInterval <- predict(fitLinear.lm, interval = "prediction", newdata = pred.frame)

par(mfrow = c(1, 1))
plot(Width, Depth, main = "Q Max Width Depth", xlab = "Width (m)", ylab = "Depth (m)", 
    pch = 20, ylim = c(0, 7))

pred.Size <- pred.frame$Width

# Add curves
matlines(pred.Size, confidenceInterval, lty = c(1, 2, 2), lwd = 1.5, col = 2)
matlines(pred.Size, predictionInterval, lty = c(1, 3, 3), lwd = 1.5, col = 1)

plot of chunk unnamed-chunk-2

Analyse the quality of the linear fit

par(mfrow = c(2, 2))
plot(fitLinear.lm)

plot of chunk unnamed-chunk-3

This is not a good fit for several reasons including:

The variance on the residual increases with increasing width violating an assumption of least squares analysis
The large width outlier lies significantly beyond the CIs
The Q-Q plot is far from linear

Log-log data analysis


logWidth = log10(Width)
logDepth = log10(Depth)

fitLogLog.lm = lm(logDepth ~ logWidth)

pred.frame <- data.frame(logWidth = seq(-1, 2, by = 0.1))
confidenceInterval <- predict(fitLogLog.lm, interval = "confidence", newdata = pred.frame)
predictionInterval <- predict(fitLogLog.lm, interval = "prediction", newdata = pred.frame)

par(mfrow = c(1, 2))
plot(logWidth, logDepth, main = "Q Max Width Depth", xlab = "log10(Width) (m)", 
    ylab = "log10(Depth) (m)", pch = 20)

pred.Size <- pred.frame$logWidth

# Add curves
matlines(pred.Size, confidenceInterval, lty = c(1, 2, 2), lwd = 1.5, col = 2)
matlines(pred.Size, predictionInterval, lty = c(1, 3, 3), lwd = 1.5, col = 1)

plot(Width, Depth, main = "Q Max Width Depth", xlab = "Width (m)", ylab = "Depth (m)", 
    pch = 20, ylim = c(0, 7))
matlines(10^pred.Size, 10^confidenceInterval, lty = c(1, 2, 2), lwd = 1.5, col = 2)
matlines(10^pred.Size, 10^predictionInterval, lty = c(1, 3, 3), lwd = 1.5, col = 1)

plot of chunk unnamed-chunk-4

Analyse the quality of the log-log model fit

summary(fitLogLog.lm)

## 
## Call:
## lm(formula = logDepth ~ logWidth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3979 -0.0782  0.0058  0.0854  0.3258 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.5667     0.0303   -18.7   <2e-16 ***
## logWidth      1.0040     0.0450    22.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.158 on 54 degrees of freedom
## Multiple R-squared:  0.902,  Adjusted R-squared:   0.9 
## F-statistic:  498 on 1 and 54 DF,  p-value: <2e-16


par(mfrow = c(2, 2))
plot(fitLogLog.lm)

plot of chunk unnamed-chunk-5

This is a much better fit.

The residuals are randomly distributed
The variance looks relatively constant
The data looks appropriately distributed within the confidence intervals

Notice that the gradient of the power is 1. Therefore, the fit is a straight line in the linear plot. This fit is better because the residuals are power law distributed.

Slope Analysis

The pairs command plots everything against everything else.

library(psych)
pairs.panels(cbind(logWidth, logDepth, Slope))

plot of chunk unnamed-chunk-6


fit = lm(log10(Depth) ~ log10(Width) + Slope)
summary(fit)

## 
## Call:
## lm(formula = log10(Depth) ~ log10(Width) + Slope)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3960 -0.0980  0.0193  0.0823  0.2983 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.67590    0.10377   -6.51  2.7e-08 ***
## log10(Width)  1.07376    0.07769   13.82  < 2e-16 ***
## Slope         0.00337    0.00306    1.10     0.28    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.157 on 53 degrees of freedom
## Multiple R-squared:  0.904,  Adjusted R-squared:  0.901 
## F-statistic:  250 on 2 and 53 DF,  p-value: <2e-16

The fit command above adds the slope as a co-variate. The summary command produces a table showing which co-variates are important for predicting the depth and the significant level is indicated by the starts on the RSH. As you can see - the Slope does not add predicatbility to the relation. I have tried several other forms and they all suggest the same.