library(faraway)
library(rcompanion)
library(MASS)


infmort <- get(data("infmort"))
infdf <- na.omit(infmort)
infdf$oilExport = ifelse(infdf$oil=='no oil exports', 0, 1)
par(mfrow=c(1,2))
plotNormalHistogram(infdf$income)
qqnorm(infdf$income, ylab="Distribution of Income")
qqline(infdf$income, col="red")

plotNormalHistogram(infdf$mortality)
qqnorm(infdf$mortality, ylab="Distribution of Mortality")
qqline(infdf$mortality, col="red")

Income and Mortality plots show they have outliers and needs data transformation.

Data transformation plots for income. Log transformation fits well with income data.

#Transform income data
T_sqrt <- sqrt(infdf$income) #Square root
T_cub <- sign(infmort$income) * abs(infmort$income)^(1/3) #Cube root
T_log <- log(infmort$income,10) #Log transformation
#Boxcox transformation
Box <- boxcox(infdf$income ~ 1, lambda = seq(-6,6,0.1), plotit=F)      # Try values -6 to 6 by 0.1
Cox <- data.frame(Box$x, Box$y)            # Create a data frame with the results
Cox2 <- Cox[with(Cox, order(-Cox$Box.y)),] # Order the new data frame by decreasing y
lambda <- Cox2[1, "Box.x"]                 # Extract that lambda
T_box <- (infdf$income ^ lambda - 1)/lambda   # Transform the original data

#Tukey transformation
T_tuk <- transformTukey(infdf$income, plotit=FALSE)
## 
##     lambda      W Shapiro.p.value
## 390 -0.275 0.9557        0.001875
## 
## if (lambda >  0){TRANS = x ^ lambda} 
## if (lambda == 0){TRANS = log(x)} 
## if (lambda <  0){TRANS = -1 * x ^ lambda}
par(mfrow=c(3,2))
plotNormalHistogram(T_sqrt, main="Square Root Transform")
plotNormalHistogram(T_cub, main="Cube Root Transform")
plotNormalHistogram(T_log, main="Log Transform")
plotNormalHistogram(T_box, main="Boxcox Transform")
plotNormalHistogram(T_tuk, main="Tukey Transform")

Data transformation plots for mortality. Log transformation fits well with mortality data.

#Transform mortality data
T_sqrt <- sqrt(infdf$mortality) #Square root
T_cub <- sign(infdf$mortality) * abs(infdf$mortality)^(1/3) #Cube root
T_log <- log(infdf$mortality,10) #Log transformation

#Tukey transformation
T_tuk <- transformTukey(infmort$mortality, plotit=FALSE)
## 
##     lambda      W Shapiro.p.value
## 404  0.075 0.9702         0.02198
## 
## if (lambda >  0){TRANS = x ^ lambda} 
## if (lambda == 0){TRANS = log(x)} 
## if (lambda <  0){TRANS = -1 * x ^ lambda}
par(mfrow=c(2,2))
plotNormalHistogram(T_sqrt, main="Square Root Transform")
plotNormalHistogram(T_cub, main="Cube Root Transform")
plotNormalHistogram(T_log, main="Log Transform")
plotNormalHistogram(T_tuk, main="Tukey Transform")

#Add transformation columns to data frame
infdf$incomeTrans <- log(infdf$income,10)

infdf$mortalityTrans <- log(infdf$mortality,10)

reg <- lm(mortalityTrans ~ incomeTrans + oilExport, data = infdf)
summary(reg)
## 
## Call:
## lm(formula = mortalityTrans ~ incomeTrans + oilExport, data = infdf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.69622 -0.16527 -0.00856  0.15632  1.02877 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.10070    0.13122  23.630   <2e-16 ***
## incomeTrans -0.52114    0.04897 -10.642   <2e-16 ***
## oilExport    0.34311    0.10506   3.266   0.0015 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2847 on 98 degrees of freedom
## Multiple R-squared:  0.551,  Adjusted R-squared:  0.5418 
## F-statistic: 60.13 on 2 and 98 DF,  p-value: < 2.2e-16
plot(mortalityTrans ~ incomeTrans, infdf, pch=as.numeric(oilExport))
abline(3.10070, -0.52114)
abline(3.10070+0.34311, -0.52114, lty=2)

Regression equation

mortalityTrans = 3.10070 - 0.52114*incomeTrans + 0.34311*oilExport

\(\beta_0 = 3.10070, \beta_1 = -0.52114\ and\ \beta_2=0.34311\)

Model Explanation

The dotted line in the above graph explains mortality and income relation in oil exporting countries and thick line explains the relationship between mortality and income in non-exporting nations.

Binary Predictor or Categorical Predictor in the model is oilExport, the value of 1 means country exports oil, 0 means the country does not export oil. Due to the presence of binary predictor, the model yields two distinct response functions.

Mean response function for non oil exporting countries(oilExport=0): \(\mu_Y = 3.10070 - 0.52114\times incomeTrans + 0.34311\times 0 = 3.10070 - 0.52114\times incomeTrans\)

Mean response function for oil exporting countries(oilExport=1): \(\mu_Y = 3.10070 - 0.52114\times incomeTrans + 0.34311\times 1 = 3.44381 - 0.52114\times incomeTrans\)

However both functions have same slope \(\beta_1 =- 0.52114\), but different intercepts. The slope represents a change in the mean response \(\mu_Y\) for each additional unit increase in income. In this case mortality decreases.

Analysis

Is there a significant difference in mean mortality rates in oil exporting and non-exporting countries, after taking into account income?

\(H_0:\) Regardless of income, there is no difference in the mortality mean of oil exporting and non-exporting nations.

\(H_A:\) Regardless of income, there is a difference in the mortality mean of oil exporting and non-exporting nations.

Since p-value for oilExport is 0.0015 at 5% significance level (p-value should be less than 0.001), we reject null hypothesis. That means regardless of income level; there is a difference in mortality rate between oil exporting and non-exporting nations.

Confidence interval of 95%: (0.1346151, 0.5515986). We can be 95% confident that mortality rate in oil exporting nations is between 0.1346151 and 0.5515986 higher than mortality rate in non-exporting nations. This because intercept(\(\beta_0\)) for oil exporting nations is high, starts at 3.44381 and for non-exporting nations, it starts at 3.10070. It tells there are more deaths in oil exporting countries.


References