In this project, we analyze the relationship between foreign direct investment inflows and the inflation rate, gold reserves, and trade openness of 13 countries as of 2020, and we examine the extent of this relationship, if it exists.
library(readxl)
data <- read_excel("~/Desktop/ENS /Macroeconomics/econometrics_Independent_Work/data.xlsx")
## New names:
## • `` -> `...1`
head(data)
## # A tibble: 6 × 13
## ...1 GDP fdi inflation hci trade unemployment freedom reserves tax
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austr… 1.33e12 1.41 0.847 0.778 44.0 6.46 78 4.25e10 47.4
## 2 Argen… 3.90e11 1.21 42 0.612 30.1 11.5 50 3.94e10 106.
## 3 Azerb… 4.27e10 1.19 2.76 0.591 72.0 6.46 62 7.63e 9 40.7
## 4 Germa… 3.85e12 3.71 0.507 0.762 81.1 3.81 76 2.68e11 48.8
## 5 Spain 1.28e12 2.63 -0.323 0.734 59.8 15.5 68 8.13e10 47
## 6 France 2.63e12 0.560 0.476 0.771 57.8 8.01 66 2.24e11 60.7
## # ℹ 3 more variables: `FDI in summ` <dbl>, wage <dbl>, export <dbl>
y <- data$fdi
x1 <- data$reserves
x2 <- data$trade
x3 <- data$inflation
dat <- data.frame(y,x1,x2,x3)
head(dat)
## y x1 x2 x3
## 1 1.4083576 42544629265 44.03953 0.8469055
## 2 1.2122067 39403734630 30.14814 42.0000000
## 3 1.1879039 7633754110 72.01787 2.7598095
## 4 3.7119907 268408603349 81.10855 0.5066899
## 5 2.6325211 81287702461 59.76872 -0.3227530
## 6 0.5598317 224236417868 57.76743 0.4764989
The data were obtained from the World Bank website as of 2020. Source of the
information
\(y\) - Foreign direct investment
inflows, measured as a percentage of GDP.
\(x1\) - Gold reserves in the country
(in US dollars).
\(x2\) - The degree of trade openness
of a country, which indicates the percentage of GDP formed by foreign
trade, measured as a percentage of GDP.
\(x3\) - Inflation rate in the country,
measured in percentage.
pairs(dat, lower.panel = NULL)
Preliminary conclusions based on the diagram:
There is a weak positive linear relationship between \(y\) and \(x_1\). There is a positive increasing
linear relationship between \(y\) and
\(x_2\). There is a negative decreasing
linear relationship between \(y\) and
\(x_3\).
Relationship between the predictors:
There is almost no noticeable linear relationship between \(x_1\) and \(x_2\). There is a negative relationship
between \(x_1\) and \(x_3\). There is a negative decreasing
linear relationship between \(x_2\) and
\(x_3\).
r <- cor(dat)
round(r,2)
## y x1 x2 x3
## y 1.00 0.10 0.98 -0.16
## x1 0.10 1.00 0.02 -0.35
## x2 0.98 0.02 1.00 -0.20
## x3 -0.16 -0.35 -0.20 1.00
Conclusions based on the correlation analysis:
Between \(y\) and \(x_1\), \(r = 0.10\), indicating a weak positive linear relationship between foreign direct investment and gold reserves. Between \(y\) and \(x_2\), \(r = 0.98\), indicating a strong positive increasing linear relationship between foreign direct investment and trade openness. Between \(y\) and \(x_3\), \(r = -0.16\), indicating a weak negative decreasing linear relationship between foreign direct investment and the inflation rate.
Relationship between the predictors:
Between \(x_1\) and \(x_2\), \(r =
0.02\), indicating almost no relationship. Between \(x_1\) and \(x_3\), \(r =
-0.35\), indicating a negative linear relationship. Between \(x_2\) and \(x_3\), \(r =
-0.20\), indicating a weak negative decreasing linear
relationship.
Among the predictors, \(x_2\) is highly positively correlated with the dependent variable \(y\), indicating that the degree of trade openness explains the volume of foreign direct investment inflows very well.
initModel <- lm(y~x1+x2+x3)
summary(initModel)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8873 -1.1105 0.1750 0.8436 2.8185
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.694e+00 8.623e-01 -5.443 0.000409 ***
## x1 2.357e-12 1.272e-12 1.853 0.096816 .
## x2 1.055e-01 5.537e-03 19.061 1.39e-08 ***
## x3 5.419e-02 4.289e-02 1.263 0.238224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.6 on 9 degrees of freedom
## Multiple R-squared: 0.9765, Adjusted R-squared: 0.9687
## F-statistic: 124.7 on 3 and 9 DF, p-value: 1.196e-07
It is necessary to check the predictors for multicollinearity in order to determine whether there is a strong relationship between them. If such a relationship exists, it may reduce the quality of the model by causing predictors to explain the same variation repeatedly.
To check for multicollinearity, we use the Variance Inflation Factor (VIF) analysis. If the VIF coefficient is less than 5, there is no problem. If the VIF coefficient is greater than 5, it indicates the presence of multicollinearity. In that case, the predictor that is more weakly correlated with the dependent variable \(y\) among the highly correlated predictors should be removed, and a new model should be constructed.
library(car)
## Loading required package: carData
vif(initModel)
## x1 x2 x3
## 1.139016 1.042864 1.184368
Conclusion:
According to the VIF analysis, all indicators are below 5, which means
there is no multicollinearity problem among the predictors. Therefore,
these predictors can be used in the model.
model <- lm(y~x1+x2+x3); model
##
## Call:
## lm(formula = y ~ x1 + x2 + x3)
##
## Coefficients:
## (Intercept) x1 x2 x3
## -4.694e+00 2.357e-12 1.055e-01 5.419e-02
summary(model)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8873 -1.1105 0.1750 0.8436 2.8185
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.694e+00 8.623e-01 -5.443 0.000409 ***
## x1 2.357e-12 1.272e-12 1.853 0.096816 .
## x2 1.055e-01 5.537e-03 19.061 1.39e-08 ***
## x3 5.419e-02 4.289e-02 1.263 0.238224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.6 on 9 degrees of freedom
## Multiple R-squared: 0.9765, Adjusted R-squared: 0.9687
## F-statistic: 124.7 on 3 and 9 DF, p-value: 1.196e-07
The regression model equation is as follows:
\(y = b_0 + b_1*x_1 + b_2*x_2 +
b_3*x_3\)
\(y = (-4.694e+00) + (2.357e-12)*x_1 +
(1.055e-01)*x_2 + (5.419e-02)*x_3\)
The economic meaning of the regression
equation:
\(b_0=-4.694e+00\) - the average FDI
level when all predictors are zero is negative.
\(b_1=2.357e-12\) - holding other factors constant, a one-unit increase in \(x_1\) (gold reserves) has an almost negligible positive effect on FDI.
\(b_2=1.055e-01\) - holding other factors constant, a one-unit increase in \(x_2\) (trade openness) increases FDI by 0.1055 units.
\(b_3=5.419e-02\) - holding other factors constant, a one-unit increase in \(x_3\) increases FDI by 0.05419 units (so the relationship is positive).
To evaluate the goodness of fit of the regression model, we use the decomposition of the total variability of the dependent variable \(y\) into explained and unexplained components.
SSR,SSE,SST
Regression Sum of Squares (SSR) Represents the part of the total
variation explained by the model: \[SSR=\sum_\limits {i=0} ^{n} (\widehat y_i -
\overline y)^2\]
Error (Residual) Sum of Squares (SSE) Represents the unexplained part of
the total variation: \[SSE=\sum_\limits {i=0}
^{n} (y_i - \widehat y)^2\]
Total Sum of Squares (SST) Represents the total variation in the
dependent variable: \[SST=\sum_\limits {i=0}
^{n} (y_i - \overline y)^2\]
SSE <- sum(residuals(model)^2);SSE
## [1] 23.04161
SSR <- sum((predict(model)-mean(y))^2);SSR
## [1] 957.7148
SST <- sum((mean(y)-y)^2);SST
## [1] 980.7564
#R-squared:
Rsq <- (SSR/SST);Rsq
## [1] 0.9765063
# We check whether the calculated **R-squared** value matches the value given in the **summary** output.
As we see the calculations from the above, the \(Multiple\ R^2 = 0.9765\) is the same as the value reported in the summary output.
Degrees of freedom: df1 and df2
n <- 13 # Sample size
k <- 3 # Number of predictors
df1 <- 3
df2 <- (n-k-1);df2
## [1] 9
Objective:
To test whether the overall regression model is statistically
significant — that is, whether at least one of the predictors has a
significant effect on the dependent variable, based on the sample
coefficient of determination.
Formulation of hypotheses:
\[H_0: \rho^2 = 0\]
\[H_1: \rho^2 \ne 0\]
Or
\[H_0:\beta_1=\beta_2=0\] \[H_1: \beta_i \ne 0\]
We take the significance level as \(\alpha = 0.1\), i.e., a 90% confidence level.
alpha <- 0.1
\[F = \frac{\frac {SSR}{k}}{\frac {SSE}{n-k-1}} = \frac {MSR}{MSE}\]
F <- (SSR/k)/(SSE/(n-k-1)); F
## [1] 124.6938
Critical F-value
qf(1-alpha,k, n-k-1)
## [1] 2.812863
qt(1-alpha/2, n-k-1)
## [1] 1.833113
\[F=124.6938 > F_{cr} =
2.813\]
Conclusion:
Since \(F = 124.6938 > F_{crit} =
2.813\), we reject the null hypothesis at the 10% significance
level. This indicates that the model is statistically significant, and
foreign direct investment inflows can be explained by changes in gold
reserves, trade openness, and inflation rate. With 90% confidence, the
model can be considered suitable for prediction.
summary(model)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8873 -1.1105 0.1750 0.8436 2.8185
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.694e+00 8.623e-01 -5.443 0.000409 ***
## x1 2.357e-12 1.272e-12 1.853 0.096816 .
## x2 1.055e-01 5.537e-03 19.061 1.39e-08 ***
## x3 5.419e-02 4.289e-02 1.263 0.238224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.6 on 9 degrees of freedom
## Multiple R-squared: 0.9765, Adjusted R-squared: 0.9687
## F-statistic: 124.7 on 3 and 9 DF, p-value: 1.196e-07
Formulation of hypotheses:
\[H_0:\beta_1=\beta_2=0\] \[H_1: \beta_i \ne 0\]
From the summary, we can see that for predictors \(x_1\) and \(x_2\), the \(P\text{-value} < \alpha = 0.1\), which means we can conclude that there is a significant relationship. However, for \(x_3\), the \(P\text{-value} = 0.24 > \alpha = 0.1\), indicating that the inflation rate is not significantly related to foreign direct investment at the 90% confidence level. At most, we can say there is a relationship with about 76% confidence, but including this variable lowers the quality of the model. Therefore, we exclude the variable \(x_3\) and rebuild the model.
model2 <- lm(y~x1+x2); model2
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Coefficients:
## (Intercept) x1 x2
## -4.070e+00 1.797e-12 1.041e-01
summary(model2)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2558 -1.1520 0.3321 0.8156 2.6143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.070e+00 7.277e-01 -5.593 0.00023 ***
## x1 1.797e-12 1.227e-12 1.465 0.17375
## x2 1.041e-01 5.583e-03 18.653 4.24e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.647 on 10 degrees of freedom
## Multiple R-squared: 0.9723, Adjusted R-squared: 0.9668
## F-statistic: 175.8 on 2 and 10 DF, p-value: 1.619e-08
New regression model equation:
\(y = (-4.070e+00) + (1.797e-12)*x1 +
(1.041e-01)*x2\)
The F-test → checks overall linear relationship. The t-test → checks individual relationships.
t-test
The variable \(x_2\) passes the t-test because \(P\text{(value)} = 4.24 \times 10^{-9} < 0.1\), meaning that \(x_2\) has a statistically significant impact on \(y\) at the 90% confidence level.
F-test \[H_0: \rho^2 =
0\]
\[H_1: \rho^2 \ne 0\]
n2 <- 13 # Sample size
k2 <- 2 # Number of predictors
alpha2 <- 0.2
SSE2 <- sum(residuals(model2)^2);SSE
## [1] 23.04161
SSR2 <- sum((predict(model2)-mean(y))^2);SSR
## [1] 957.7148
SST2 <- sum((mean(y)-y)^2);SST
## [1] 980.7564
#R-squared:
Rsq2 <- (SSR2/SST2);Rsq2
## [1] 0.9723403
Fst2 <- ((SSR2/k2)/(SSE2/(n2-k2-1)));Fst2
## [1] 175.7681
#F critical
Fcr2 <- qf(1-alpha2,k2,n2-k2-1);Fcr2
## [1] 1.898648
\(F_{st2} = 175.77 > F_{cr2} = 1.899\), which means that the null hypothesis \(H_0\) is rejected and the alternative hypothesis \(H_1\) is accepted. The Multiple R-squared is 97.2%, and the Adjusted R-squared is 96.7%. This indicates that the model explains a large proportion of the variation in \(y\), but there is still a possibility to slightly improve the model’s quality by adding additional explanatory variables.
Conclusion:
To avoid the model becoming a simple regression, we had to set the significance level to 0.2, because the excluded variable \(x_3\) was positively correlated with \(x_1\). Under the current conditions, we can proceed with 80% confidence.
\(S_e = 1.647\) — the residual standard error, represents the dispersion of the actual values around the regression line. It shows, on average, how much the observed values deviate from the values predicted by the model. This statistic is also used when comparing multiple models, where a smaller \(S_e\) indicates a better fit.
confint(model2, level=0.8)
## 10 % 90 %
## (Intercept) -5.068539e+00 -3.071411e+00
## x1 1.133745e-13 3.480490e-12
## x2 9.647903e-02 1.118012e-01
summary(model2)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2558 -1.1520 0.3321 0.8156 2.6143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.070e+00 7.277e-01 -5.593 0.00023 ***
## x1 1.797e-12 1.227e-12 1.465 0.17375
## x2 1.041e-01 5.583e-03 18.653 4.24e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.647 on 10 degrees of freedom
## Multiple R-squared: 0.9723, Adjusted R-squared: 0.9668
## F-statistic: 175.8 on 2 and 10 DF, p-value: 1.619e-08
Conclusion:
With an 80% confidence interval, \(b_0\) can vary between −5.068539 and
−3.071411, \(b_1\) can vary between
1.133745 × 10⁻¹³ and 3.480490 × 10⁻¹², and \(b_2\) can vary between 0.09647903 and
0.1118012.
Histogram
If the histogram has a bell-shaped or dome-like form, it can be considered approximately normally distributed.
hist(residuals(model2),pch=25)
Conclusion:
From the histogram, the residuals do not form a perfectly bell-shaped
distribution, but they are centered around zero with few large
deviations. Therefore, the residuals can be considered approximately
normally distributed, though not perfectly normal.
Q-Q Plot
The histogram and QQ plot are visual methods for assessing whether the residuals follow a normal distribution. If most of the points on the QQ plot lie on or very close to the reference line, the residuals can be considered approximately normally distributed.
Bmodel <- lm(y~x1+x2)
fit<-predict(Bmodel); fit
## 1 2 3 4 5 6
## 0.592756842 -0.859537751 3.443692531 4.858991830 2.300416215 2.348870196
## 7 8 9 10 11 12
## 33.339077564 1.694914631 2.465249419 -0.499536676 1.788863014 -0.005744327
## 13
## 2.452479720
res<-residuals(Bmodel); res
## 1 2 3 4 5 6 7
## 0.8156007 2.0717445 -2.2557887 -1.1470011 0.3321049 -1.7890385 0.7167702
## 8 9 10 11 12 13
## -0.4744065 -1.3775423 1.2122476 -1.1519840 2.6142750 0.4330180
qqnorm(res)
qqline(res)
Conclusion:
The residuals are distributed approximately along the straight line,
indicating that they can be considered approximately normally
distributed.
All plots
plot(Bmodel)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
Conclusion:
The residuals mostly lie within acceptable limits, but observation 7
shows high leverage and could influence the model. Overall, the model
seems stable, but influential points should be checked.
The main purpose of conducting the Durbin–Watson test is to determine whether there is autocorrelation among the residuals.
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
#dwtest
dwtest(Bmodel) #Test for independence of residuals
##
## Durbin-Watson test
##
## data: Bmodel
## DW = 2.5058, p-value = 0.8378
## alternative hypothesis: true autocorrelation is greater than 0
Conclusion:
Based on the Durbin–Watson test results, DW=2.5 va \(p-value=0.8378 > \alpha=0.2\), there is
no evidence of autocorrelation among the residuals. A DW value near 2
indicates independence of residuals, and values in the range of 1.5 to
2.5 are typically acceptable in practice.
Purpose of the test
The Jarque–Bera test is performed using the fBasics package and is used to check whether the skewness and kurtosis of the residuals conform to those of a normal distribution. The null hypothesis (\(H_0\)) of the Jarque–Bera test states that the residuals are normally distributed, meaning their skewness and kurtosis are equal to zero. If the \(p\text{-value} > \alpha\), we fail to reject \(H_0\), and therefore we can conclude that the residuals are normally distributed.
library(fBasics)
##
## Attaching package: 'fBasics'
## The following object is masked from 'package:car':
##
## densityPlot
jarqueberaTest(res)
##
## Title:
## Jarque-Bera Normality Test
##
## Test Results:
## STATISTIC:
## X-squared: 0.64
## P VALUE:
## Asymptotic p Value: 0.7262
Conclusion:
\[p-value=0.7262 > \alpha = 0,2\]
This means that the null hypothesis \(H_0\) is valid, and the residuals are
normally distributed.
Based on the collected data, the effects of gold reserves, trade openness, and inflation rate on the share of foreign direct investment (FDI) inflows were analyzed. Initially, three predictors were included in the model. However, according to the t-test results, the inflation rate (\(x_3\)) was found to be statistically insignificant and was therefore excluded from the final model. The remaining variables — gold reserves and trade openness — were retained as significant predictors at the 80% confidence level. The analysis revealed that trade openness is the strongest determinant of FDI inflows. The final model was found to be valid and suitable for forecasting purposes.
Economic implication: To enhance the inflow of foreign direct investment, countries should focus primarily on increasing their level of trade openness. Additionally, expanding national gold reserves can further strengthen investor confidence and contribute to greater investment inflows.