library(faraway)
library(rcompanion)
library(MASS)
infmort <- get(data("infmort"))
infdf <- na.omit(infmort)
infdf$oilExport = ifelse(infdf$oil=='no oil exports', 0, 1)
par(mfrow=c(1,2))
plotNormalHistogram(infdf$income)
qqnorm(infdf$income, ylab="Distribution of Income")
qqline(infdf$income, col="red")
plotNormalHistogram(infdf$mortality)
qqnorm(infdf$mortality, ylab="Distribution of Mortality")
qqline(infdf$mortality, col="red")
Income and Mortality plots show they have outliers and needs data transformation.
Data transformation plots for income. Log transformation fits well with income data.
#Transform income data
T_sqrt <- sqrt(infdf$income) #Square root
T_cub <- sign(infmort$income) * abs(infmort$income)^(1/3) #Cube root
T_log <- log(infmort$income,10) #Log transformation
#Boxcox transformation
Box <- boxcox(infdf$income ~ 1, lambda = seq(-6,6,0.1), plotit=F) # Try values -6 to 6 by 0.1
Cox <- data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 <- Cox[with(Cox, order(-Cox$Box.y)),] # Order the new data frame by decreasing y
lambda <- Cox2[1, "Box.x"] # Extract that lambda
T_box <- (infdf$income ^ lambda - 1)/lambda # Transform the original data
#Tukey transformation
T_tuk <- transformTukey(infdf$income, plotit=FALSE)
##
## lambda W Shapiro.p.value
## 390 -0.275 0.9557 0.001875
##
## if (lambda > 0){TRANS = x ^ lambda}
## if (lambda == 0){TRANS = log(x)}
## if (lambda < 0){TRANS = -1 * x ^ lambda}
par(mfrow=c(3,2))
plotNormalHistogram(T_sqrt, main="Square Root Transform")
plotNormalHistogram(T_cub, main="Cube Root Transform")
plotNormalHistogram(T_log, main="Log Transform")
plotNormalHistogram(T_box, main="Boxcox Transform")
plotNormalHistogram(T_tuk, main="Tukey Transform")
Data transformation plots for mortality. Log transformation fits well with mortality data.
#Transform mortality data
T_sqrt <- sqrt(infdf$mortality) #Square root
T_cub <- sign(infdf$mortality) * abs(infdf$mortality)^(1/3) #Cube root
T_log <- log(infdf$mortality,10) #Log transformation
#Tukey transformation
T_tuk <- transformTukey(infmort$mortality, plotit=FALSE)
##
## lambda W Shapiro.p.value
## 404 0.075 0.9702 0.02198
##
## if (lambda > 0){TRANS = x ^ lambda}
## if (lambda == 0){TRANS = log(x)}
## if (lambda < 0){TRANS = -1 * x ^ lambda}
par(mfrow=c(2,2))
plotNormalHistogram(T_sqrt, main="Square Root Transform")
plotNormalHistogram(T_cub, main="Cube Root Transform")
plotNormalHistogram(T_log, main="Log Transform")
plotNormalHistogram(T_tuk, main="Tukey Transform")
#Add transformation columns to data frame
infdf$incomeTrans <- log(infdf$income,10)
infdf$mortalityTrans <- log(infdf$mortality,10)
reg <- lm(mortalityTrans ~ incomeTrans + oilExport, data = infdf)
summary(reg)
##
## Call:
## lm(formula = mortalityTrans ~ incomeTrans + oilExport, data = infdf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.69622 -0.16527 -0.00856 0.15632 1.02877
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.10070 0.13122 23.630 <2e-16 ***
## incomeTrans -0.52114 0.04897 -10.642 <2e-16 ***
## oilExport 0.34311 0.10506 3.266 0.0015 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2847 on 98 degrees of freedom
## Multiple R-squared: 0.551, Adjusted R-squared: 0.5418
## F-statistic: 60.13 on 2 and 98 DF, p-value: < 2.2e-16
plot(mortalityTrans ~ incomeTrans, infdf, pch=as.numeric(oilExport))
abline(3.10070, -0.52114)
abline(3.10070+0.34311, -0.52114, lty=2)
mortalityTrans = 3.10070 - 0.52114*incomeTrans + 0.34311*oilExport
\(\beta_0 = 3.10070, \beta_1 = -0.52114\ and\ \beta_2=0.34311\)
The dotted line in the above graph explains mortality and income relation in oil exporting countries and thick line explains the relationship between mortality and income in non-exporting nations.
mortality is explained by income and oil exports.P-value for the t-tests suggest that the slope parameters for incomeTrans (P < 0.001) and oilExport (P < 0.001) are significantly different from Zero.Binary Predictor or Categorical Predictor in the model is oilExport, the value of 1 means country exports oil, 0 means the country does not export oil. Due to the presence of binary predictor, the model yields two distinct response functions.
Mean response function for non oil exporting countries(oilExport=0): \(\mu_Y = 3.10070 - 0.52114\times incomeTrans + 0.34311\times 0 = 3.10070 - 0.52114\times incomeTrans\)
Mean response function for oil exporting countries(oilExport=1): \(\mu_Y = 3.10070 - 0.52114\times incomeTrans + 0.34311\times 1 = 3.44381 - 0.52114\times incomeTrans\)
However both functions have same slope \(\beta_1 =- 0.52114\), but different intercepts. The slope represents a change in the mean response \(\mu_Y\) for each additional unit increase in income. In this case mortality decreases.
Is there a significant difference in mean mortality rates in oil exporting and non-exporting countries, after taking into account income?
\(H_0:\) Regardless of income, there is no difference in the mortality mean of oil exporting and non-exporting nations.
\(H_A:\) Regardless of income, there is a difference in the mortality mean of oil exporting and non-exporting nations.
Since p-value for oilExport is 0.0015 at 5% significance level (p-value should be less than 0.001), we reject null hypothesis. That means regardless of income level; there is a difference in mortality rate between oil exporting and non-exporting nations.
Confidence interval of 95%: (0.1346151, 0.5515986). We can be 95% confident that mortality rate in oil exporting nations is between 0.1346151 and 0.5515986 higher than mortality rate in non-exporting nations. This because intercept(\(\beta_0\)) for oil exporting nations is high, starts at 3.44381 and for non-exporting nations, it starts at 3.10070. It tells there are more deaths in oil exporting countries.