| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.5069838 | 9.6070759 | 0.5732216 | 0.5749883 |
| Missingrate | -260.7794011 | 138.3239661 | -1.8852800 | 0.0789136 |
| Wounded | 0.2248974 | 0.0082179 | 27.3668544 | 0.0000000 |
Killed = 5.506 − 260.7(Missingrate) + .224(Wounded)
The bottom left graph shows a cluster of variables and shows missing groups of variables. Also, the bottom left also shows that some groups of variables missing. While the bottom right chart indicates that there are some outliers. Moreover, the top right chart shows points veering away from the end points of the line, which indicates that the assumption for normal assumption has been violated. Also, the top left chart has a weak linear trend.
We can see that the explanatory variables are also not normally distributed. Both of the explanatory variables are skewed more to the right
Since non-constant variance and also there were violations in the normal assumption, we perform various transformations of the response variable.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 4.4608743 | 0.6358428 | 7.0156876 | 0.0000042 |
| Missingrate | -3.5363246 | 9.1549495 | -0.3862746 | 0.7047172 |
| Wounded | 0.0076354 | 0.0005439 | 14.0381886 | 0.0000000 |
There are some improvements in the residual diagnostic plots: first
of all, the weak curve pattern still remains in residual plot; Second of
all, the points on the QQ line are reaching closer to the line but the
points are still veering off a bit. There is also pattern with the
points of the residual plot, which shows that the variance is not
constant. The points should look like a cloud of points.Therefore,
Unfortunately, the violation of the normality assumption is still an
issue.
The residual diagnostic plots below are similar to that of the previous model that had a square root transformation. The Q-Q plots of two models are similar and the residual plots are similar. Also, There is also pattern with the points of the residual plot, which shows that the variance is also not constant, Therefore, the assumption of normal residuals is not satisfied for the two models.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.5069838 | 9.6070759 | 0.5732216 | 0.5749883 |
| Missingrate | -260.7794011 | 138.3239661 | -1.8852800 | 0.0789136 |
| Wounded | 0.2248974 | 0.0082179 | 27.3668544 | 0.0000000 |
In this section, we use bootstrapping cases to find the confidence intervals for the coefficients in the final regression model. The code finds the confidence interval our of final model.
We made an R function to make histograms of the bootstrap coeffifcients. This function will aslo be usedto make histograms for the residual bootstrap estimated regression coefficients.
These histograms of the bootstrap estimates of regression
coefficients represent the sampling distribution’scorresponding
estimates from our final model.
The above histograms that the red and blue curves in all histograms are close. However, for the variable, wonded, it is skewed to the left a bit. The significance test results and the corresponding confidence intervals should be consistent. Afterwards, we calculate the 95% bootstrap confidence intervals of each regression coefficient and combine them with the output of the final model.
| Estimate | Std. Error | t value | Pr(>|t|) | btc.ci.95 | |
|---|---|---|---|---|---|
| (Intercept) | 5.5070 | 9.6071 | 0.5732 | 0.5750 | [ -3.1679 , 14.5639 ] |
| Missingrate | -260.7794 | 138.3240 | -1.8853 | 0.0789 | [ -620.8353 , 103.6062 ] |
| Wounded | 0.2249 | 0.0082 | 27.3669 | 0.0000 | [ 0.1947 , 0.2507 ] |
The residual bootstrap confidence interval’s results as p-values are not consistent.
Here we will demonstrate bootstrap methods to estimate the bootstrap confidence intervals of the residuals
We can see that this distribution has outliers and is skewed to the left.
Now, we will make histograms of the boostrap residuals
After resampling the residuals, it only shows more of a normal distribution. As the number of trials increased, the distribution began to look more curvy and skinnier. Next, we calculate the 95% residual bootstrap confidence intervals
| Estimate | Std. Error | t value | Pr(>|t|) | btr.ci.95 | |
|---|---|---|---|---|---|
| (Intercept) | 5.5070 | 9.6071 | 0.5732 | 0.5750 | [ -11.1114 , 22.3416 ] |
| Missingrate | -260.7794 | 138.3240 | -1.8853 | 0.0789 | [ -502.2863 , -21.5395 ] |
| Wounded | 0.2249 | 0.0082 | 27.3669 | 0.0000 | [ 0.2095 , 0.2403 ] |
The residual bootstrap confidence intervals do not yield the same results as p-values for the variable Wounded because p<.05 and 0 is in the confidence interval. However, Missingrate’s p-value and confidence intervals do match because 0 is inside the confidence interval and the p>.05, so it considered statistically insignificant. The sample size is not very big, so the sampling distributions of the estimated coefficients do not have good approximations of the normal distributions.
| Estimate | Std. Error | Pr(>|t|) | btc.ci.95 | btr.ci.95 | |
|---|---|---|---|---|---|
| (Intercept) | 5.5070 | 9.6071 | 0.5750 | [ -3.1679 , 14.5639 ] | [ -11.1114 , 22.3416 ] |
| Missingrate | -260.7794 | 138.3240 | 0.0789 | [ -620.8353 , 103.6062 ] | [ -502.2863 , -21.5395 ] |
| Wounded | 0.2249 | 0.0082 | 0.0000 | [ 0.1947 , 0.2507 ] | [ 0.2095 , 0.2403 ] |
All three methods do not give the same results in terms of the significance of the individual explanatory variables. It may be because Our last model has a serious violation of the model assumption
| btc.wd | btr.wd |
|---|---|
| 17.7317276 | 33.4530050 |
| 724.4415366 | 480.7468021 |
| 0.0560342 | 0.0307326 |
We can also see that the widths are about the same. However, we are getting similar results in the confidence intervals of the bootstrapped residuals sampling means because inside the bootstrap confidence intervals of regression coefficients and residuals, contains 0. Therefore, the mean=o.
However, Since there are violations to the final model assumptions, in addition to failing to fix the plots of normality using various transformations, we can see that bootstrap confidence intervals of regression coefficients could be more reliable than the parametric p-values because the bootstrap method gives us a nonparamentric inference about a population. Moreover, the fact that the histograms of the explanatory variables of the regression function are both skewed to the right is another reason why the bootstrap method is more reliable.