In the following analysis, we build upon the examination conducted previously on the case study of average life expectancy for the populations in 127 countries during the year 2014. We found previously that the most effective model is a log-transformed model with the explicit structure given by \[log(Life.expectancy)=3.858-0.004144\times StatusDeveloping-2.364*10^{-8}\times GDP-0.0214*\times HIV.AIDS+\]\[ 0.003373\times Total.expenditures+0.03277\times regionAmericas+0.01685\times regionAsia+0.02907\times regionEurope+\]\[ 0.02363\times regionOceania+0.5575\times Income.composition.of.resources\] However, a residual analysis concluded that there were still minor violations to the assumption \(\epsilon \sim U(0,\sigma^2)\). The goal of this report is to correct these violations using the bootstrap sampling method to construct bootstrap confidence intervals for the regression coefficients and bootstrap the residuals.
Our data set for this analysis is the same as the data set from the previous assignment. Similarly, the first data set, Life Expectancy (WHO), records and tracks life expectancy and other health, social, and economic factors in 193 countries between the periods of 2000-2015, comes from the Global Health Observatory (GHO) data repository under the authority of the World Health Organization (WHO). A second data set, Country Mapping - ISO, Continent, Region, was created by Kaggle user andradaolteanu for the explicit purpose of country mapping. The second data set is used solely for merging the region to the Life Expectancy (WHO) data set to determine the region of the country. Our final data set, aptly named Country.stats, contains the following variables:
The purpose of the following analysis is to verify the empirical connection between life expectancy and various social, economic, health, and geographic factors in 127 countries for the year 2014 determined in the previous analysis. This will be conducted using bootstrap sampling to construct confidence intervals around the regression coefficients and bootstrap the residuals to correct the violations found in the previous analysis.
There will be three main components to the following analysis:
In accordance with the previous report, the preliminary analysis will be conducted in which the data will be imported, transformed, and cleaned, and two plots, a pairwise scatter plot and an exploratory graph, will help determine the relationship between the variables and develop a narrative to be explored.
Expectancy <- read.csv("https://raw.githubusercontent.com/as927097/STA321/main/Life%20Expectancy%20Data.csv",
header = TRUE) #read in data
Region <- read.csv("https://raw.githubusercontent.com/as927097/STA321/main/continents2.csv",
header = TRUE)
expectancy <- filter(Expectancy, Year == 2014) %>%
na.omit() # construct data set containing only the year 2014 and omit NAs.
Country.stats <- inner_join(expectancy, Region, by="Country") %>%
select(Country, Status, Life.expectancy, GDP, HIV.AIDS, Total.expenditure, region,Income.composition.of.resources) #merge data sets expectancy and Region and select only certain variables for testing. After omitting NAs, our data set only has 127 countries
pander(head(Country.stats))
| Country | Status | Life.expectancy | GDP | HIV.AIDS |
|---|---|---|---|---|
| Afghanistan | Developing | 59.9 | 612.7 | 0.1 |
| Albania | Developing | 77.5 | 4576 | 0.1 |
| Algeria | Developing | 75.4 | 547.9 | 0.1 |
| Angola | Developing | 51.7 | 479.3 | 2 |
| Argentina | Developing | 76.2 | 12245 | 0.1 |
| Armenia | Developing | 74.6 | 3995 | 0.1 |
| Total.expenditure | region | Income.composition.of.resources |
|---|---|---|
| 8.18 | Asia | 0.476 |
| 5.88 | Europe | 0.761 |
| 7.21 | Africa | 0.741 |
| 3.31 | Africa | 0.527 |
| 4.79 | Americas | 0.825 |
| 4.48 | Asia | 0.739 |
The following pairwise scatter plot visualizes the distributions of each of the variables and the scatter plots of the relationship between variables. An assessment of the plot reveals that the quantitative variables have the following correlation with the response variable Life.expectancy: GDP = -0.445, HIV.AIDS = -0.611, Total.expenditures = 0.332, and Income.composition.of.resources = 0.891.
ggpairs(Country.stats, columns = 2:8) # pairwise plot of all variables in data set
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The following graph is meant to explore some of the variables deeper and define a narrative for the response variable and the independent variables. In particular, this graph shows the correlation between life expectancy and resource utilization with the individual points colored by their respective region, shaped by status of development, and sized by GDP per capita. What is evident upon first glance is that the countries furthest to the upper-right area of the graph are disproportionately developed European and Oceanic countries with high GDP per capita (although, not all). Those in the bottom-left area of the map are disproportionately developing African countries with very low GDP per capita.
ggplot(Country.stats, aes(x=Income.composition.of.resources, y=Life.expectancy, col = region, shape=Status, size=GDP))+
geom_point()+
theme_minimal()+
labs(title="Life Expectancy as a Function of Resource Utilization in 127 Countries",
subtitle = "Shaped by Status of Development, Sized by GDP per capita (in USD), and Colored by Region",
x = "Income Composition of Resources",
y = "Life Expectancy")+
scale_color_manual(values=c("#68aed6","#4292c6","#2171b5","#08519c","#08306b"), name="Region")+
guides(size=guide_legend(
override.aes = list(color = c("azure3","azure3","azure3"))
), color=guide_legend(
override.aes = list(size=3)), shape=guide_legend(override.aes = list(size=2)))+
scale_size_continuous(name = "GDP (per capita)")
This concludes the exploratory section of the analysis. The following section will build upon this analysis by restating the final model generated in the previous analysis and then constructing a bootstrap sample to recreate the final model.
The previous analysis found that, after creating three linear and non-linear models - a multiple OLS linear regression, a log-transformed regression, and a squared-transformed regression - that the log-transformed response variable regression with the structure \(log(Y)=\beta_0+\beta_1x_1+\cdots+\beta_ix_i\) is the best model based on residual analysis and goodness-of-fit measures. The model is summarized in the following table
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 3.8580921 | 0.0398693 | 96.7685100 | 0.0000000 |
| StatusDeveloping | -0.0041436 | 0.0196952 | -0.2103883 | 0.8337306 |
| GDP | 0.0000000 | 0.0000004 | -0.0653112 | 0.9480377 |
| HIV.AIDS | -0.0213968 | 0.0038689 | -5.5304367 | 0.0000002 |
| Total.expenditure | 0.0033727 | 0.0019019 | 1.7733084 | 0.0787795 |
| regionAmericas | 0.0327692 | 0.0170540 | 1.9214960 | 0.0571017 |
| regionAsia | 0.0168546 | 0.0154794 | 1.0888439 | 0.2784605 |
| regionEurope | 0.0290678 | 0.0222393 | 1.3070421 | 0.1937614 |
| regionOceania | 0.0236311 | 0.0213172 | 1.1085497 | 0.2698980 |
| Income.composition.of.resources | 0.5574610 | 0.0502895 | 11.0850397 | 0.0000000 |
As previously stated, the explicit structure of the model is \[log(Life.expectancy)=3.858-0.004144\times StatusDeveloping-2.364*10^{-8}\times GDP-0.0214*\times HIV.AIDS+\]\[ 0.003373\times Total.expenditures+0.03277\times regionAmericas+0.01685\times regionAsia+0.02907\times regionEurope+\]\[ 0.02363\times regionOceania+0.5575\times Income.composition.of.resources\] in which \(log(\)Life.expectancy\()\) is the response variable and Status, GDP, HIV.AIDS, Total.expenditures, region, and Income.composition.of.resources are the explanatory variables.
We then conducted residual analyses on the final model. The residual plots show minor violations to the assumption \(\epsilon \sim U(0,\sigma^2)\).
Residual Analysis Plots
Alongside the residual analysis, goodness-of-fit measures were also conducted using the following measures:
| SSE | R.sq | R.adj | Cp | AIC | SBC | PRESS | |
|---|---|---|---|---|---|---|---|
| log.model | 0.2923658 | 0.852188 | 0.8408179 | 10 | -751.39 | -722.9481 | 0.3461065 |
Finally, we explained how to interpret the estimated regression coefficients. For this, we elaborated upon the coefficient for the binary dummy variable Status using the following logic:
Let us presume the assumption a priori that all explanatory variables, with the exception of Status, are held constant at 0. The two countries are equally the same besides the fact that the Status of one country will be Developing, or 1, and the other country is set to Developed, or 0. Then \[log(Developing)-log(Developed)=-0.004144 \to log(\frac{Developing}{Developed})=-0.004144 \to Developing=.995856\times Developed\] The above equation can re-written in the following way: \[Developing-Developed=.995856\times Developed \to \frac {Developing-Developed}{Developed}=-0.0041436=-0.4135058\%\] The life expectancy of a developing country vis-Ă -vis a developed country is -0.4135058 percent. Another way to calculate the percent increase (or decrease) in the response for every one-unit increase in the independent variable is to utilize the following equation and apply it to the individual coefficients: \((e^{\beta_i}-1)\times 100\).
In the following section, we use bootstrap sampling to generate a bootstrap regression model of the log-transformed model discussed in the previous section. Following the bootstrap model, we will construct bootstrap confidence intervals for each of the coefficients to assess the stability and significance of each coefficient.
A function will be defined in order to generate the bootstrap samples and regression using 1,000 replicates.
## redefine the log-transformed model
log.model <- lm(log(Life.expectancy)~.-Country, data = Country.stats)
## define number of bootstrap replicates
B = 1000 # choose 1000 bootstrap replicates to sample
## define number of parameters, sample size, and empty coefficient matrix
num.p = dim(model.frame(log.model))[2]+2 # returns number of parameters in the model. Added an additional "2" to the parameters because the dim() function does not account for the variable region which is separated into four distinct categories
smpl.n = dim(model.frame(log.model))[1] # sample size
## zero matrix to store bootstrap coefficients
coef.mtrx = matrix(rep(0, B*num.p), ncol = num.p)
##
for (i in 1:B){
bootc.id = sample(1:smpl.n, smpl.n, replace = TRUE) # fit final model to the bootstrap sample
log.model.btc = lm(log(Life.expectancy)~.-Country, data = Country.stats[bootc.id,])
coef.mtrx[i,] = coef(log.model.btc) # extract coefs from bootstrap regression model
}
Then, a function will be defined for histograms that represent each of the individual regression coefficient estimates and their sampling distributions.
boot.hist = function(cmtrx, bt.coef.mtrx, var.id, var.nm){
## bt.coef.mtrx = matrix for storing bootstrap estimates of coefficients
## var.id = variable ID (1, 2, ..., k+1)
## var.nm = variable name on the hist title, must be the string in the double quotes
## coefficient matrix of the final model
## Bootstrap sampling distribution of the estimated coefficients
x1.1 <- seq(min(bt.coef.mtrx[,var.id]), max(bt.coef.mtrx[,var.id]), length=300 )
y1.1 <- dnorm(x1.1, mean(bt.coef.mtrx[,var.id]), sd(bt.coef.mtrx[,var.id]))
# height of the histogram - use it to make a nice-looking histogram.
highestbar = max(hist(bt.coef.mtrx[,var.id], plot = FALSE)$density)
ylimit <- max(c(y1.1,highestbar))
hist(bt.coef.mtrx[,var.id], probability = TRUE, main = var.nm, xlab="",
col = "azure1",ylim=c(0,ylimit), border="lightseagreen")
lines(x = x1.1, y = y1.1, col = "red3")
lines(density(bt.coef.mtrx[,var.id], adjust=2), col="blue")
}
par(mar=c(2,2,2,2))
par(mfrow=c(4,3)) # histograms of bootstrap coefs
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=1, var.nm ="Intercept" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=2, var.nm ="StatusDeveloping" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=3, var.nm ="GDP" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=4, var.nm ="HIV.AIDS" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=5, var.nm ="Total.expenditure" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=6, var.nm ="regionAmericas" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=7, var.nm ="regionAsia" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=8, var.nm ="regionEurope" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=9, var.nm ="regionOceania" )
boot.hist(bt.coef.mtrx=coef.mtrx, var.id=10, var.nm ="Income.composition.of.resources" )
The 10 histograms seen above contain two normal-density curves:
The density curves in each of the 10 histograms are all relatively similar and all distributions are approximately normal with only minor deviations.
Finally, we construct 95% bootstrap confidence intervals for each of the coefficients and then combine them to the output of the final model to elucidate further observations about the model.
num.p = (dim(coef.mtrx)[2]) # number of parameters
btc.ci = NULL
btc.wd = NULL
for (i in 1:num.p){
lci.025 = round(quantile(coef.mtrx[,i], 0.025, type = 2), 8) #lower bound of 95% CI
uci.975 = round(quantile(coef.mtrx[,i], 0.975, type = 2 ),8) #upper bound of 95% CI
btc.wd[i] = uci.975 - lci.025 #difference between the upper and lower bounds
btc.ci[i] = paste("[", round(lci.025,4),", ", round(uci.975,4),"]")
}
#as.data.frame(btc.ci)
kable(as.data.frame(cbind(formatC(summary(log.model)$coef,4,format="f"), btc.ci.95=btc.ci)),
caption = "Regression Coefficient Matrix") #combine inferential statistics of the model with the bootstrap CI
| Estimate | Std. Error | t value | Pr(>|t|) | btc.ci.95 | |
|---|---|---|---|---|---|
| (Intercept) | 3.8581 | 0.0399 | 96.7685 | 0.0000 | [ 3.7794 , 3.9361 ] |
| StatusDeveloping | -0.0041 | 0.0197 | -0.2104 | 0.8337 | [ -0.0433 , 0.0361 ] |
| GDP | -0.0000 | 0.0000 | -0.0653 | 0.9480 | [ 0 , 0 ] |
| HIV.AIDS | -0.0214 | 0.0039 | -5.5304 | 0.0000 | [ -0.0326 , -0.0132 ] |
| Total.expenditure | 0.0034 | 0.0019 | 1.7733 | 0.0788 | [ -0.0027 , 0.0089 ] |
| regionAmericas | 0.0328 | 0.0171 | 1.9215 | 0.0571 | [ 0.003 , 0.0642 ] |
| regionAsia | 0.0169 | 0.0155 | 1.0888 | 0.2785 | [ -0.0116 , 0.0476 ] |
| regionEurope | 0.0291 | 0.0222 | 1.3070 | 0.1938 | [ -0.0114 , 0.0742 ] |
| regionOceania | 0.0236 | 0.0213 | 1.1085 | 0.2699 | [ -0.0117 , 0.0637 ] |
| Income.composition.of.resources | 0.5575 | 0.0503 | 11.0850 | 0.0000 | [ 0.4536 , 0.6537 ] |
The table shown above summarizes the inferential statistics of the final model, in which, the the significance tests of each coefficients based on the p-values are consistent with the corresponding bootstrap confidence intervals.
In the following section, we will first restate the residuals from original model and explain their distribution. Next, we will take bootstrap samples of the residual and construct a confidence interval around the residuals.
The distribution of the residuals of the final model are shown in the following histogram.
hist(sort(log.model$residuals),n=40,
xlab = "Residuals",
col = "lightblue",
border = "navy",
main = "Histogram of Residuals")
The histogram shown above reveals the following information about the distribution of the residuals:
This section will take 1,000 bootstrap samples of the residuals from the final model and construct a confidence interval around said residuals. The following code will take the sample and then construct histograms of the residual distributions.
## Final model
log.model <- lm(log(Life.expectancy)~.-Country, data = Country.stats)
model.resid = log.model$residuals
##
B=1000
num.p = dim(model.matrix(log.model))[2] # number of parameters
samp.n = dim(model.matrix(log.model))[1] # sample size
btr.mtrx = matrix(rep(0,num.p*B), ncol=num.p) # zero matrix to store boot coefs
for (i in 1:B){
## Bootstrap response values
bt.lg.expectancy = log.model$fitted.values +
sample(log.model$residuals, samp.n, replace = TRUE) # bootstrap residuals
# send the boot response to the data
btr.model = lm(bt.lg.expectancy ~ .-Country-Life.expectancy, data = Country.stats) # bootstrap regression model of original model
btr.mtrx[i,]=btr.model$coefficients #store coefficients in the zero matrix
}
boot.hist = function(bt.coef.mtrx, var.id, var.nm){
## bt.coef.mtrx = matrix for storing bootstrap estimates of coefficients
## var.id = variable ID (1, 2, ..., k+1)
## var.nm = variable name on the hist title, must be the string in the double quotes
## Bootstrap sampling distribution of the estimated coefficients
x1.1 <- seq(min(bt.coef.mtrx[,var.id]), max(bt.coef.mtrx[,var.id]), length=300 )
y1.1 <- dnorm(x1.1, mean(bt.coef.mtrx[,var.id]), sd(bt.coef.mtrx[,var.id]))
# height of the histogram - use it to make a nice-looking histogram.
highestbar = max(hist(bt.coef.mtrx[,var.id], plot = FALSE)$density)
ylimit <- max(c(y1.1,highestbar))
hist(bt.coef.mtrx[,var.id], probability = TRUE, main = var.nm, xlab="",
col = "azure1",ylim=c(0,ylimit), border="lightseagreen")
lines(x = x1.1, y = y1.1, col = "red3") # normal density curve
lines(density(bt.coef.mtrx[,var.id], adjust=2), col="blue") # loess curve
}
par(mar=c(2,2,2,2))
par(mfrow=c(4,3)) # histograms of bootstrap coefs
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=1, var.nm ="Intercept" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=2, var.nm ="StatusDeveloping" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=3, var.nm ="GDP" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=4, var.nm ="HIV.AIDS" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=5, var.nm ="Total.expenditure" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=6, var.nm ="regionAmericas" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=7, var.nm ="regionAsia" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=8, var.nm ="regionEurope" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=9, var.nm ="regionOceania" )
boot.hist(bt.coef.mtrx=btr.mtrx, var.id=10, var.nm ="Income.composition.of.resources" )
Above are the residual bootstrap sampling distributions of each estimated regression coefficient. The normal and LOESS curves are close to each other. This also indicates that the inference of the significance of variables based on p-values and residual bootstrap will yield the same results.
The 95% bootstrap residual confidence intervals are combined with the inferential statistics of the final model in the following table.
#
num.p = dim(coef.mtrx)[2] # number of parameters
btr.ci = NULL #define an empty vector to store bootstrap residual CI
btr.wd = NULL #define an empty vector to store the difference between upper and lower bound
for (i in 1:num.p){
lci.025 = round(quantile(btr.mtrx[, i], 0.025, type = 2),8) # lower bound
uci.975 = round(quantile(btr.mtrx[, i],0.975, type = 2 ),8) # upper bound
btr.wd[i] = uci.975 - lci.025
btr.ci[i] = paste("[", round(lci.025,4),", ", round(uci.975,4),"]")
}
#as.data.frame(btc.ci)
kable(as.data.frame(cbind(formatC(summary(log.model)$coef,4,format="f"), btr.ci.95=btr.ci)),
caption = "Regression Coefficient Matrix with 95% Residual Bootstrap CI")
| Estimate | Std. Error | t value | Pr(>|t|) | btr.ci.95 | |
|---|---|---|---|---|---|
| (Intercept) | 3.8581 | 0.0399 | 96.7685 | 0.0000 | [ 3.7847 , 3.9316 ] |
| StatusDeveloping | -0.0041 | 0.0197 | -0.2104 | 0.8337 | [ -0.0413 , 0.0328 ] |
| GDP | -0.0000 | 0.0000 | -0.0653 | 0.9480 | [ 0 , 0 ] |
| HIV.AIDS | -0.0214 | 0.0039 | -5.5304 | 0.0000 | [ -0.0292 , -0.0142 ] |
| Total.expenditure | 0.0034 | 0.0019 | 1.7733 | 0.0788 | [ -1e-04 , 0.0069 ] |
| regionAmericas | 0.0328 | 0.0171 | 1.9215 | 0.0571 | [ -0.0012 , 0.0666 ] |
| regionAsia | 0.0169 | 0.0155 | 1.0888 | 0.2785 | [ -0.0122 , 0.0462 ] |
| regionEurope | 0.0291 | 0.0222 | 1.3070 | 0.1938 | [ -0.0128 , 0.0731 ] |
| regionOceania | 0.0236 | 0.0213 | 1.1085 | 0.2699 | [ -0.0217 , 0.0634 ] |
| Income.composition.of.resources | 0.5575 | 0.0503 | 11.0850 | 0.0000 | [ 0.4622 , 0.6475 ] |
The coefficients fall within the the confidence interval and therefore verify the significance tests represented by the p-values.
In the analysis conducted above, we first restated the preliminary analysis of the data and the final model found in the previous report. Since the last report discovered minor violations to the assumption \(\epsilon \sim U(0,\sigma^2)\), we used bootstrap cases with 1,000 replicates to construct confidence intervals around the regression coefficients and residuals. It was found that all the tests for significance represented by the p-value were verified by the confidence intervals.
We combine all inferential statistics, including the bootstrap CIs, in the following table:
kable(as.data.frame(cbind(formatC(summary(log.model)$coef[,-3],4,format="f"), btc.ci.95=btc.ci,btr.ci.95=btr.ci)),
caption="Final Combined Inferential Statistics: p-values and Bootstrap CIs")
| Estimate | Std. Error | Pr(>|t|) | btc.ci.95 | btr.ci.95 | |
|---|---|---|---|---|---|
| (Intercept) | 3.8581 | 0.0399 | 0.0000 | [ 3.7794 , 3.9361 ] | [ 3.7847 , 3.9316 ] |
| StatusDeveloping | -0.0041 | 0.0197 | 0.8337 | [ -0.0433 , 0.0361 ] | [ -0.0413 , 0.0328 ] |
| GDP | -0.0000 | 0.0000 | 0.9480 | [ 0 , 0 ] | [ 0 , 0 ] |
| HIV.AIDS | -0.0214 | 0.0039 | 0.0000 | [ -0.0326 , -0.0132 ] | [ -0.0292 , -0.0142 ] |
| Total.expenditure | 0.0034 | 0.0019 | 0.0788 | [ -0.0027 , 0.0089 ] | [ -1e-04 , 0.0069 ] |
| regionAmericas | 0.0328 | 0.0171 | 0.0571 | [ 0.003 , 0.0642 ] | [ -0.0012 , 0.0666 ] |
| regionAsia | 0.0169 | 0.0155 | 0.2785 | [ -0.0116 , 0.0476 ] | [ -0.0122 , 0.0462 ] |
| regionEurope | 0.0291 | 0.0222 | 0.1938 | [ -0.0114 , 0.0742 ] | [ -0.0128 , 0.0731 ] |
| regionOceania | 0.0236 | 0.0213 | 0.2699 | [ -0.0117 , 0.0637 ] | [ -0.0217 , 0.0634 ] |
| Income.composition.of.resources | 0.5575 | 0.0503 | 0.0000 | [ 0.4536 , 0.6537 ] | [ 0.4622 , 0.6475 ] |
After combining all inferential statistics, we verify the conclusion of the last report in which we found that:
An analysis of the bootstrap confidence intervals for the regression coefficients and residuals finds that, despite minor violations to the assumption \(\epsilon \sim U(0,\sigma^2)\), the significance tests are verified by the bootstrap confidence intervals as all coefficients fall within the lower and upper bounds.