MVAR Midterm

Patrick Oster

Thursday February 28th, 2019

(4) In northern climates, roads must be cleared of snow quickly following a storm. One measure of storm severity is \(X_{1}\) = its duration in hours, while the effectiveness of snow removal can be quantified by \(X_{2}\) = the number of hours crews, men and machine spend to clear the snow. The provided dataset details 25 incidents in Wisconsin.

a. Do the variables seem to be normally distributed? Investigate.

## $multivariateNormality
##      Test        H    p value MVN
## 1 Royston 11.84998 0.00273702  NO
## 
## $univariateNormality
##           Test  Variable Statistic   p value Normality
## 1 Shapiro-Wilk   storm      0.9170    0.0438    NO    
## 2 Shapiro-Wilk   work       0.8825    0.0078    NO    
## 
## $Descriptives
##        n   Mean  Std.Dev Median  Min  Max 25th 75th      Skew  Kurtosis
## storm 25  9.420 3.760208    8.5  3.5 19.5  7.0 10.5 0.9829208 0.5554744
## work  25 19.272 7.889155   17.4 10.4 42.3 13.2 23.6 1.1529966 0.7998056
## 
## $multivariateOutliers
##   Observation Mahalanobis Distance Outlier
## 1           1               20.275    TRUE
## 2           2               18.267    TRUE
## 3           3               13.373    TRUE
## 4           4                9.790    TRUE
## 5           5                8.868    TRUE
## 6           6                8.279    TRUE

The histograms & QQ-plots for both storm and work hours show deviations from normality which may cause concern. The results of the Shapiro-Wilk tests (at \(\alpha = 0.05\)) for both storm & work hours lead to the conclusion that neither variable follows a univariate normal distribution. Also, the Royston test indicates the data do not come from a bivariate normal distribution. Additional plots & metrics are provided to further the case for non-normality; two observations were identified as outliers based on an adjusted, squared Mahalanobis distances as well.

## [1] 0.6333333

b. If the data is not bivariate normal distributed, determine the power transformation for approximate bivariate normality. Denoted the bivariate mean by \(\mu = (\mu_{1}, \mu_{2})'\).

## bcPower Transformations to Multinormality 
##       Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## storm    0.2396           1      -0.5558       1.0350
## work    -0.6417           0      -1.7565       0.4732
## 
## Likelihood ratio test that transformation parameters are equal to 0
##  (all log transformations)
##                              LRT df    pval
## LR test, lambda = (0 0) 1.711958  2 0.42487
## 
## Likelihood ratio test that no transformations are needed
##                              LRT df      pval
## LR test, lambda = (1 1) 12.45482  2 0.0019746
## The following objects are masked from df (pos = 3):
## 
##     storm, work

## $multivariateNormality
##      Test         H  p value MVN
## 1 Royston 0.8760029 0.647228 YES
## 
## $univariateNormality
##           Test  Variable Statistic   p value Normality
## 1 Shapiro-Wilk   storm      0.9779    0.8416    YES   
## 2 Shapiro-Wilk   work       0.9590    0.3945    YES   
## 
## $Descriptives
##        n     Mean   Std.Dev   Median      Min      Max     25th     75th
## storm 25 2.170685 0.3889775 2.140066 1.252763 2.970414 1.945910 2.351375
## work  25 2.888439 0.3726098 2.856470 2.341806 3.744787 2.580217 3.161247
##              Skew   Kurtosis
## storm -0.06361083 -0.0189353
## work   0.47449566 -0.7519325

After comparing a few transformations, I settled on a log transformation for this dataset. With the transformation, storm & work hours appear to follow univariate normal distributions (based on the updated histograms, qq-plots, & Shapiro-Wilk test results) as well as some bivariate normal distribution (according to the results of the Royston test.

d. Construct 95% simultaneous T2 intervals for \(\mu_{1} & \mu_{2}\) and the 95% Bonferroni intervals for \(\mu_{1} & \mu_{2}\). Compare the two sets of intervals.

## $`One-at-a-time Intervals`
##          lower    upper
## Storm 2.010123 2.331247
## Work  2.734633 3.042244
## 
## $`T^2 Intervals`
##          lower    upper
## Storm 1.962783 2.378587
## Work  2.689284 3.087593
## 
## $`Bonferroni Simultaneous Intervals`
##         lower    upper
## Storm 1.98468 2.356690
## Work  2.71026 3.066617
##           lower    upper
## Storm  7.464235 10.29077
## Work  15.404086 20.95221
##           lower    upper
## Storm  7.119109 10.78965
## Work  14.721137 21.92424
##           lower    upper
## Storm  7.276718 10.55595
## Work  15.033188 21.46914

One-at-a-time, simultaneous T2, & Bonferroni simultaneous intervals are provided for both log-transformed and back-transformed means for storm & work hours. It makes intuitive sense that the one-at-a-time CI’s are the most narrow of the three because we can be more certain about a range of mean responses for a single variable if we ignore the presence of another possibly influential variable. That is also the reason that we are interested in simultaneous CI’s; simply analyzing regular 95% CI’s for one variable at a time may lead to naive misinterpretations of the data.
The simultaneous T2 and Bonferroni CI’s provide better insight into the mean responses of these variables in tandem because of their more nuanced interpretations. Although the Bonferroni intervals are slightly more narrow than the simultaneous T2 intervals, they are relatively similar when comparing to regular CI’s. The Bonferroni intervals indicate that we can be 95% confident that the true mean values of storm & work hours are contained in the intervals.


(5) In the first phase of a study of the cost of transporting milk from farms to dairy plants, a survey was taken of firms engaged in milk transportation. Cost data on \(X_{1}\) = fuel, \(X_{2}\) = repair, and \(X_{3}\) = capital, all measured on a per-mile basis, are provided for \(n_{1} = 36\) gasoline trucks and \(n_{2} = 23\) diesel trucks.

a. Test for differences in mean cost vectors at \(\alpha = 0.01\).

##      [,1]
## [1,] TRUE
##      fuel    repair   capital 
## 12.218611  8.112500  9.590278
##     fuel   repair  capital 
## 10.10565 10.76217 18.16783
##              fuel    repair   capital
## fuel    23.013361 12.366395  2.906609
## repair  12.366395 17.544111  4.773082
## capital  2.906609  4.773082 13.963334
##              fuel     repair   capital
## fuel    4.3623166  0.7598872  2.362099
## repair  0.7598872 25.8512360  7.685732
## capital 2.3620992  7.6857322 46.654400
##              fuel    repair   capital
## fuel    15.814712  7.886690  2.696447
## repair   7.886690 20.750370  5.897263
## capital  2.696447  5.897263 26.580938
##          [,1]
## [1,] 50.91279

\(H_{0}: \mu_{1} - \mu_{2} = \delta_{0}\)
\(H_{0}: \mu_{1} - \mu_{2} \neq \delta_{0}\)
Test Statistic: \(T^{2} = 50.9127868\)
Critical Value: 12.93096
Conclusion: Since T2 > the critical value (50.9127868 > 12.93096 at \(\alpha = 0.01\)), we reject the null hypothesis with statistically significant evidence in the direction of the alternate. That is, the data indicates that the mean difference in cost vectors between the groups is not equal to \(\delta_{0}\).

b. If the hypothesis of equal cost vetors is rejected in part a, find the mean components most responsible for the rejection.

##               [,1]
## fuel     0.2547452
## repair  -0.1339036
## capital -0.3188296

The linear combination of the mean components most responsible for rejection of \(H_{0}\) in part a is provided above. In terms of magnitude, capital seems to be most responsible but further analysis is necessary to confirm.

c. Construct 99% simultaneous T2 confidence intervals for the mean differences. Which cost, if any, appear to be quite different?

Lower Upper
Fuel -1.704346 5.930264
Repair -7.022268 1.722920
Capital -13.526479 -3.628618

The cost of capital seems to be most different according to the simultaneous confidence intervals as both of its limits are positive; this backs up the idea that capital is more.

d. Note that the observations 9 and 21 for gasoline trucks have been identified as multivariate outliers. Repeat part a with these observations deleted. Comment on the results.

\(H_{0}: \mu_{1} - \mu_{2} = \delta_{0}\)
\(H_{0}: \mu_{1} - \mu_{2} \neq \delta_{0}\)
Test Statistic: \(T^{2} = 52.3835991\)
Critical Value: 8.6519598
Conclusion: The conclusion after removing the two outliers is in the same direction, but with slightly stronger evidence in the form of a higher test statistic & a lower critical value. Since T2 > the critical value (52.3835991 > 8.6519598 at \(\alpha = 0.01\)), we reject the null hypothesis with statistically significant evidence in the direction of the alternate. That is, the data indicates that the mean difference in cost vectors between the groups is not equal to \(\delta_{0}\).