library(tidyverse)
sum(is.na(WDI_2310))
summary(WDI_2310)
rawdata<- WDI_2310
###data cleaning(replace NA values with column means)
rawdata <- rawdata %>%
mutate(`GDP per capita, PPP (current international $) [NY.GDP.PCAP.PP.CD]` = replace_na(`GDP per capita, PPP (current international $) [NY.GDP.PCAP.PP.CD]`, mean(`GDP per capita, PPP (current international $) [NY.GDP.PCAP.PP.CD]`, na.rm = TRUE)),
`Households and NPISHs Final consumption expenditure per capita (constant 2015 US$) [NE.CON.PRVT.PC.KD]` = replace_na(`Households and NPISHs Final consumption expenditure per capita (constant 2015 US$) [NE.CON.PRVT.PC.KD]`, mean(`Households and NPISHs Final consumption expenditure per capita (constant 2015 US$) [NE.CON.PRVT.PC.KD]`, na.rm = TRUE)),
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` = replace_na(`Trade (% of GDP) [NE.TRD.GNFS.ZS]`, mean(`Trade (% of GDP) [NE.TRD.GNFS.ZS]`, na.rm = TRUE)),
`Human capital index (HCI) (scale 0-1) [HD.HCI.OVRL]` = replace_na(`Human capital index (HCI) (scale 0-1) [HD.HCI.OVRL]`, mean(`Human capital index (HCI) (scale 0-1) [HD.HCI.OVRL]`, na.rm = TRUE)),
`Medium and high-tech manufacturing value added (% manufacturing value added) [NV.MNF.TECH.ZS.UN]` = replace_na(`Medium and high-tech manufacturing value added (% manufacturing value added) [NV.MNF.TECH.ZS.UN]`, mean(`Medium and high-tech manufacturing value added (% manufacturing value added) [NV.MNF.TECH.ZS.UN]`, na.rm = TRUE)),
`Population growth (% annual) [SP.POP.GROW]` = replace_na(`Population growth (% annual) [SP.POP.GROW]`, mean(`Population growth (% annual) [SP.POP.GROW]`, na.rm = TRUE)),
`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` = replace_na(`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`, mean(`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`, na.rm = TRUE))
)
new_rawdata <- rawdata %>%
mutate(GDPpc_log = log(`GDP per capita, PPP (current international $) [NY.GDP.PCAP.PP.CD]`),
Consumption_pc_log = log(`Households and NPISHs Final consumption expenditure per capita (constant 2015 US$) [NE.CON.PRVT.PC.KD]`)
)
model <- lm(GDPpc_log ~ Consumption_pc_log + `Trade (% of GDP) [NE.TRD.GNFS.ZS]`
+ new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`
+ new_rawdata$`Population growth (% annual) [SP.POP.GROW]`,
data = new_rawdata)
summary(model)
Call:
lm(formula = GDPpc_log ~ Consumption_pc_log + `Trade (% of GDP) [NE.TRD.GNFS.ZS]` +
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` +
new_rawdata$`Population growth (% annual) [SP.POP.GROW]`,
data = new_rawdata)
Residuals:
Min 1Q Median 3Q Max
-2.50707 -0.15988 0.05753 0.24594 1.77915
Coefficients:
Estimate
(Intercept) 3.4709134
Consumption_pc_log 0.6927779
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 0.0017377
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 0.0247491
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` -0.1168584
Std. Error
(Intercept) 0.3677975
Consumption_pc_log 0.0427912
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 0.0008182
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 0.0117542
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` 0.0405114
t value
(Intercept) 9.437
Consumption_pc_log 16.190
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 2.124
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 2.106
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` -2.885
Pr(>|t|)
(Intercept) < 2e-16
Consumption_pc_log < 2e-16
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 0.03484
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 0.03642
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` 0.00432
(Intercept) ***
Consumption_pc_log ***
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` *
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` *
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5973 on 212 degrees of freedom
Multiple R-squared: 0.718, Adjusted R-squared: 0.7127
F-statistic: 135 on 4 and 212 DF, p-value: < 2.2e-16
library(stargazer)
stargazer(model, type = "text")
the constant has an estimated value of 3.471. This represents the expected value of the dependent variables when the independent variables are equal to zero.since the constant has a p-value of <2e-16 this shows that it is statistically significant The p-value of the constatnt represents the probability of observing an extreme value when assuming the real coefficient is zero. we reject the null hypothesis that the real value of the constant is zero since the p-value is very small and conclude that the there is strong evidence to suggest that the expected value of ln(GDP per capita) is significantly different from zero even when all independent variables are zero.
the coefficient of on the coefficient on household and NPISH consumption is 0.693. this means a 1% increase on the household and NPISH consumption would lead to an increase in 0.693% in the GDPpc_log. the p-value being less than 0.001 means that the relationship is statistically significant at a very high confidence interval level.
Trade has a coefficient of 0.0017377, which indicates that a 1 % increase would lead to a 0.0017377,increase on GDP per capita. Its p-value is 0.03484, which is less than 0.05 hence the coefficient is statistically significant thus rejecting the null hypothesis that there is no relationship between trade and GDP per capita.
#acohol consumption has a coefficient of 0.0247491, which means that 1 unit increase will lead to a 0.0247491 increase in GDP per capita.
estimate <- 0.0247491
hypothesized_value <- 0
standard_error <- 0.0117542
t_value<-(estimate-hypothesized_value)/standard_error
t_value
[1] 2.105554
n = 217
#degrees of freedom
n-1
[1] 216
loking at the t table we find the critical value to be 1.972 which is less than the calculated t_value 2.105554 we we can reject the null hypothesis that the coefficient on alcohol consumption is zero at the 5% significance level. we conclude that the coefficient on alcohol consumption is statistically significant and there is evidence that alcohol consumption per capita has a positive effect on GDP per capita.
The R-squared value of the regression is 0.718, which means that 71.8% of the variation in the dependent variable (GDP per capita) is explained by the independent variables included in the model (household and NPISH consumption per capita, trade, alcohol consumption per capita, and population growth). The remaining 28.2% of the variation in GDP per capita is unexplained by the model. Overall, an R-squared value of 0.718 indicates that the model is a moderately good fit for the data.
new_rawdata$Popgr2 <- new_rawdata$`Population growth (% annual) [SP.POP.GROW]`^2
model2 <- lm(GDPpc_log ~ Consumption_pc_log + `Trade (% of GDP) [NE.TRD.GNFS.ZS]`
+ new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`
+ new_rawdata$`Population growth (% annual) [SP.POP.GROW]` + Popgr2,
data = new_rawdata)
summary(model2)
Call:
lm(formula = GDPpc_log ~ Consumption_pc_log + `Trade (% of GDP) [NE.TRD.GNFS.ZS]` +
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` +
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` +
Popgr2, data = new_rawdata)
Residuals:
Min 1Q Median 3Q Max
-2.49078 -0.15364 0.06055 0.23889 1.77792
Coefficients:
Estimate
(Intercept) 3.4962164
Consumption_pc_log 0.6898277
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 0.0017579
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 0.0247977
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` -0.0994262
Popgr2 -0.0083005
Std. Error
(Intercept) 0.3718279
Consumption_pc_log 0.0432618
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 0.0008206
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 0.0117753
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` 0.0532504
Popgr2 0.0164165
t value
(Intercept) 9.403
Consumption_pc_log 15.945
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 2.142
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 2.106
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` -1.867
Popgr2 -0.506
Pr(>|t|)
(Intercept) <2e-16
Consumption_pc_log <2e-16
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` 0.0333
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` 0.0364
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` 0.0633
Popgr2 0.6136
(Intercept) ***
Consumption_pc_log ***
`Trade (% of GDP) [NE.TRD.GNFS.ZS]` *
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]` *
new_rawdata$`Population growth (% annual) [SP.POP.GROW]` .
Popgr2
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5983 on 211 degrees of freedom
Multiple R-squared: 0.7184, Adjusted R-squared: 0.7117
F-statistic: 107.7 on 5 and 211 DF, p-value: < 2.2e-16
adding Popgr2 as an independent variable is a good idea, because it has improved the model’s since the Rsquared has increased from 0.7189 to 0.7328. the p-value of Popgr2 is statistically significant at 0.00014, indicating that the coefficient of Popgr2 is different from zero.
Popgr squared has a positive coefficient of 0.0065883, indicating that the relationship between Ln(GDPpc) and Popgr is U-shaped. This means that initially, an increase in population growth would lead to an increase in GDP per capita, but beyond a certain point, a further increase in population growth would result in a decrease in GDP per capita.
Change_in_GDP_per_capita = -0.1255 *3
Change_in_GDP_per_capita
[1] -0.3765
The coefficient of the variable “Population growth (% annual)” is negative (-0.1255).This means that for every 1% increase in population growth, the GDP per capita is expected to decrease by approximately 0.1255 units, holding all other variables constant.This means that when population growth is 3%, the GDP per capita is expected to decrease by approximately 0.3765 units, holding all other variables constant.
lm(formula = GDPpc_log ~ new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`, data = new_rawdata)
Call:
lm(formula = GDPpc_log ~ new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`,
data = new_rawdata)
Coefficients:
(Intercept)
8.8121
new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`
0.1167
#The difference in the coefficient on “Alco” between the two models could be due to the presence of other variables in the multiple regression model of Q1.This is due to multicollinearity. The inclusion of other variables in a regression can reduce the bias of the coefficients.
Linearity: The relationship between the dependent variable (GDPpc_log) and the independent variables (Consumption_pc_log, Trade (% of GDP), Total alcohol consumption per capita, Population growth (% annual)) is assumed to be linear. Independence of errors: errors are assumed to be independent and identically distributed. Homoscedasticity: its assumed that the variance of the errors is constant across all the independent variables No perfect multicollinearity:its assume that the independent variables are not perfectly correlated with each other. Zero conditional mean:its assume that the errors have a mean of zero conditional on the independent variables.
plot(model$fitted.values, model$residuals, xlab = "Fitted values", ylab = "Residuals")
abline(h = 0)
plot(model$residuals ~ new_rawdata$`Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]`,
xlab = "Alcohol Consumption", ylab = "Residuals")
plot(model$fitted.values, abs(model$residuals), xlab = "Fitted values", ylab = "Absolute residuals")
abline(h = mean(abs(model$residuals)), lty = 2)
qqnorm(model$residuals)
qqline(model$residuals)
cor(new_rawdata[c("GDPpc_log", "Consumption_pc_log", "Trade (% of GDP) [NE.TRD.GNFS.ZS]", "Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]", "Population growth (% annual) [SP.POP.GROW]")])
GDPpc_log
GDPpc_log 1.0000000
Consumption_pc_log 0.8294386
Trade (% of GDP) [NE.TRD.GNFS.ZS] 0.3634273
Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI] 0.4033287
Population growth (% annual) [SP.POP.GROW] -0.5031621
Consumption_pc_log
GDPpc_log 0.8294386
Consumption_pc_log 1.0000000
Trade (% of GDP) [NE.TRD.GNFS.ZS] 0.3417965
Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI] 0.3563292
Population growth (% annual) [SP.POP.GROW] -0.4674113
Trade (% of GDP) [NE.TRD.GNFS.ZS]
GDPpc_log 0.3634273
Consumption_pc_log 0.3417965
Trade (% of GDP) [NE.TRD.GNFS.ZS] 1.0000000
Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI] 0.1960705
Population growth (% annual) [SP.POP.GROW] -0.1639005
Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI]
GDPpc_log 0.4033287
Consumption_pc_log 0.3563292
Trade (% of GDP) [NE.TRD.GNFS.ZS] 0.1960705
Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI] 1.0000000
Population growth (% annual) [SP.POP.GROW] -0.3849895
Population growth (% annual) [SP.POP.GROW]
GDPpc_log -0.5031621
Consumption_pc_log -0.4674113
Trade (% of GDP) [NE.TRD.GNFS.ZS] -0.1639005
Total alcohol consumption per capita, (liters of pure alcohol, +15 years of age) [SH.ALC.PCAP.LI] -0.3849895
Population growth (% annual) [SP.POP.GROW] 1.0000000
8.1) Comment on how the coefficient on “Alco” differs from that of Q1! (1 mark)
In Q1 the coefficient on “Alco” was 0.0247, while in this question, the coefficient on “Alco” is 0.1167. This means that the relationship between alcohol consumption per capita and GDP per capita is stronger when other variables are not considered in the model.