In this project we created a panel data for the data set about the fuel exports in Azerbaijan over the years from 2015 to 2020. We:
Understanding how different independent variables interact with growth can help to explain why international trade plays a crucial role in driving economic development.
A country’s economy becomes more productive as the proportion of import and export increases, since it allows for specialization, access to diverse markets, and the transfer of knowledge, resources, and technology, leading to overall economic growth and efficiency gains.That’s why we decided to analyze this case.
Our analysis will help the countries to focus on their import and export amounts which will lead to develop the economy of their country.
In our project we have main and the secondary hypothesis.
Our main hypothesis is:
H0: There is a positive impact of different variables on fuel export.
And our secondary hypothesis is:
Ha: There is no or negative impact of different variables on fuel export.
Our data is from the World Integrated Trade Solutions (WITS) website. Below is the description of the variables we used in our analysis.
| Variable | Description |
|---|---|
| Export.US.Thousand | The amount of exports in US dollars in millions |
| Import.US.Thousand | The amount of imports in US dollars in millions |
| AHS.Total.Tariff.Lines | The total number of tariff lines |
| World.Growth | Popularity of fuels |
| Country.Growth | Popularity of fuels |
First, we need to install all the required packages that we will use in this model.
In our dataset we have 1 dependent and 4 independent variables. Dependent variable is Export.US.Thousand and independent variables are Import.US.Thousand, AHS.Total.Tariff.Lines, World.Growth and Country.Growth. Our estimation equation look like below:
Export.US.Thousand = β0 + β1Import.US.Thousand + β2AHS.Total.Tariff.Lines + β3World.Growth + β4Country.Growth. + ε
Before starting our main econometric analysis we can look at the descriptive statistics of the dependent and the independent variables. Basically, we can interpret the results as below:
Independent variables
Import.US.Thousand: The minimum value is 0, indicating that there are some observations with no imports. The maximum value is 549,762.4, representing the highest value in thousands of US dollars for imports. The mean is 12,442.7, indicating the average level of imports. There is no missing values for this variable.
AHS.Total.Tariff.Lines: The minimum value is 1, suggesting that there is at least one tariff line recorded. The maximum value is 787, indicating the highest number of tariff lines. The mean is 25.2, representing the average number of tariff lines. There are 236 missing values for this variable.
World.Growth: The minimum value is -29.14, indicating a significant decrease in growth at the global level. The maximum value is 90.11, representing a substantial increase in growth. The mean is -3.157, suggesting a slight overall decline in growth. There are 191 missing values for this variable.
Country.Growth: The minimum value is -99.42, indicating a significant decrease in growth at the country level. The maximum value is 152,881.96, representing a substantial increase in growth. The mean is 1,405.57, suggesting an average level of popularity of fuel in the country level. There are 191 missing values for this variable.
Dependent variable
summary(cbind(Import.US.Thousand, AHS.Total.Tariff.Lines, World.Growth, Country.Growth))
## Import.US.Thousand AHS.Total.Tariff.Lines World.Growth
## Min. : 0.0 Min. : 1.0 Min. :-29.140
## 1st Qu.: 18.0 1st Qu.: 2.0 1st Qu.:-16.023
## Median : 371.8 Median : 13.0 Median : -7.010
## Mean : 12442.7 Mean : 25.2 Mean : -3.157
## 3rd Qu.: 2128.7 3rd Qu.: 26.0 3rd Qu.: 11.273
## Max. :549762.4 Max. :787.0 Max. : 90.110
## NA's :236 NA's :191
## Country.Growth
## Min. : -99.42
## 1st Qu.: -27.51
## Median : -9.81
## Mean : 1405.57
## 3rd Qu.: 17.57
## Max. :152881.96
## NA's :191
summary(Export.US.Thousand)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 7.8 378.7 310.0 12681.7
Earlier, we interpreted dependent and independent variables. Let’s estimate our model with fixed effects estimator. If we run these codes below, we will have information whether our data is balanced or not. Our output says that the data set is balanced. So, it means we have all data for the each employee, for the each time. We will focus on the coefficients and the lower block of our output. We take significance level as 5%.
As a result of our model, we can say that whenever the amount of import increases by 1 million dollars, then the export decreases by 2.32 million dollars.
When the amount of tariff lines increases by 1 unit, then the amount of export will increase about by 4155 units.
When the popularity of fuel increases in the world by 1 unit, then the amount of export will decrease about by 327 units.
When the popularity of fuel increases in the country by 1 unit, then the amount of export will decrease about by 295 units.
Here from first graph we can see highest sum of exports and partners for Azerbaian and vice versa for second graphs. However, on third graph we can see the correlation between main numeric variables and as we can see there is no heavy relation between any of them meaning that it will be easy for us to use any combination of variables unless NA values are not the issue.
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +
## World.Growth + Country.Growth, data = panel_data, model = "within")
##
## Unbalanced Panel: n = 22, T = 1-3, N = 51
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1712.875 -93.744 0.000 130.780 1549.279
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Import.US.Thousand -0.00044543 0.00229100 -0.1944 0.8474
## AHS.Total.Tariff.Lines -3.13942582 8.18135942 -0.3837 0.7044
## World.Growth 6.98630450 8.20477105 0.8515 0.4026
## Country.Growth 0.53347243 0.69954788 0.7626 0.4528
##
## Total Sum of Squares: 7992600
## Residual Sum of Squares: 7486200
## R-Squared: 0.06336
## Adj. R-Squared: -0.87328
## F-statistic: 0.422785 on 4 and 25 DF, p-value: 0.79067
This function allows us to calculate estimates of individual effects. Here we look at the individual effects of Azerbaijan. If it is negative, then the amount of export is lower than the average, however if it is positive, then the amount of export is higher than the average export In our output, it seems that all of the individual exports are higher than the average.
## Austria Belarus Belgium
## 447.965 -47.828 198.824
## Canada China France
## 331.256 508.039 656.976
## Georgia Germany Greece
## 539.822 984.844 364.776
## India Italy Kazakhstan
## 867.348 4144.648 223.739
## Netherlands Romania Russian Federation
## 209.719 228.267 295.909
## Spain Switzerland Turkey
## 660.322 235.032 2273.361
## Ukraine United Arab Emirates United Kingdom
## 345.896 -328.771 419.754
## United States
## 555.228
So, we may estimate the simple regression model for panel data. Simple regression model sometimes called Pooled OLS == POLS. It is not recommendable to estimate the POLS model, because we may have problems. First, we may have biased and inconsistent estimates. Second, we can have problems for autocorrelation and heteroskedasticity. Whenever our panel data set reduces into simple regression model, whenever individual effects disappear from the model, then we can use simple regression model. To check if we will run the next line.
Here, H0: All individual effects are statistically insignificant. In the output we get that p value is much more less than 5%, so we have to reject the Null Hypothesis. It means our model does not reduce to the simple regression model.
In this case it is better to use fixed effects estimator for the panel data model instead of the simple regression model.
##
## F test for individual effects
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + ...
## F = 7.5883, df1 = 21, df2 = 25, p-value = 2.319e-06
## alternative hypothesis: significant effects
We can check for the serial correlation in the residuals. This is a Breusch-Godfrey/Wooldridge test serial correlation. We see that the p value is less than 5%. Here H0: There is an autocorrelation in residuals. We have to reject H0. It means there is an autocorrelation in our residuals. We have problem in our panel data model.
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + ...
## chisq = 6.3477, df = 1, p-value = 0.01175
## alternative hypothesis: serial correlation in idiosyncratic errors
Here, we can check for heteroskedasticity in our residuals. This is a panel version of Breusch-Pagan test. H0: Residuals are homoskedastic (They are OK). We see that p value is much more than 5%, so we fail reject H0. It means they are not heteroskedastic, so they are OK.
##
## studentized Breusch-Pagan test
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + World.Growth + Country.Growth
## BP = 0.44404, df = 4, p-value = 0.9787
How can we cop with this problems?
We have to apply robust variance-covariance matrix estimator. We have to use coeftest function.
In the output below, we can trust to t-statistics and p values.
Earlier, because of the autocorrelation and heteroskedasticity problems we couldn’t trust standard errors, t-statistics and p values. Now we overcome these problems and we can trust.
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## Import.US.Thousand -0.00044543 0.00032479 -1.3714 0.1824
## AHS.Total.Tariff.Lines -3.13942582 5.42225431 -0.5790 0.5678
## World.Growth 6.98630450 4.52336281 1.5445 0.1350
## Country.Growth 0.53347243 0.33980244 1.5699 0.1290
Random effects output is a bit different. Here we have information on effects. Since there is no sense in interpreting the individual effects from random effects model, then we have information of aggregate characteristics of these individual effects. Here, we can see what is the variance of idiosyncratic error. It means what is the variance of residuals, individual effects and theta. Theta is the share of individual variance in the sum of idiosyncratic and individual variance. We would like theta to be close to zero. However, at the level of this course we want much on these output. We will be more focused on coefficients, R-squared statistics and jointly insignificant test for the model.
How to interpret?
Here, our dependent variable is Export.US.Thousand whenever the amount of import increases by 1 million dollars, then the export decreases by 5.9 million dollars.
When the amount of tariff lines increases by 1 unit, then the amount of export will decrease about by 2.8 units.
When the popularity of fuel increases in the world by 1 unit, then the amount of export will increase about by 4.6 units.
When the popularity of fuel increases in the country by 1 unit, then the amount of export will increase about by 3 units.
And the hypothesis of the joint insignificance of all parameters will fail to be rejected because p value is more than 5%. This means that all variables in our model are not jointly statistically significant.
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +
## World.Growth + Country.Growth, data = panel_data, model = "random")
##
## Unbalanced Panel: n = 22, T = 1-3, N = 51
##
## Effects:
## var std.dev share
## idiosyncratic 299446.5 547.2 0.231
## individual 997626.5 998.8 0.769
## theta:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5195 0.6388 0.6984 0.6680 0.6984 0.6984
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -674.0 -186.5 -103.6 14.0 24.5 2619.3
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 6.2332e+02 2.5624e+02 2.4325 0.01499 *
## Import.US.Thousand -5.8786e-04 2.0477e-03 -0.2871 0.77405
## AHS.Total.Tariff.Lines -2.7869e+00 6.9273e+00 -0.4023 0.68746
## World.Growth 4.6405e+00 6.7802e+00 0.6844 0.49371
## Country.Growth 2.9578e-01 5.8803e-01 0.5030 0.61497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 12347000
## Residual Sum of Squares: 12265000
## R-Squared: 0.010919
## Adj. R-Squared: -0.075088
## Chisq: 1.12885 on 4 DF, p-value: 0.88967
what we can do is to estimate 2 ways panel data model. In the name of fixed.time we add dummy variables for each time. We would like to introduce 2 way random error model. Here, we extend our approach with time specific effects. So, we assume there is something happened in some specific years that affected the whole Partner countries.
In the results we have estimates of the parameters and the dummies for each time.
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +
## World.Growth + Country.Growth + Year, data = panel_data,
## model = "within", index = c("Partner.Name", "Year"))
##
## Unbalanced Panel: n = 22, T = 1-3, N = 51
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1469.96 -114.57 0.00 118.78 1185.71
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Import.US.Thousand -1.6862e-03 2.2350e-03 -0.7545 0.4582
## AHS.Total.Tariff.Lines 2.5435e+01 2.0640e+01 1.2323 0.2303
## World.Growth -4.7702e+00 9.5763e+00 -0.4981 0.6231
## Country.Growth 7.6717e-02 6.8597e-01 0.1118 0.9119
## Year2019 1.0738e+03 5.2468e+02 2.0466 0.0523 .
## Year2020 6.9673e+02 5.2837e+02 1.3186 0.2003
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 7992600
## Residual Sum of Squares: 6072000
## R-Squared: 0.2403
## Adj. R-Squared: -0.65153
## F-statistic: 1.2125 on 6 and 23 DF, p-value: 0.33557
We can estimate pooled regression model using plm function.
## Pooling Model
##
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +
## World.Growth + Country.Growth, data = panel_data, model = "pooling",
## index = c("Partner.Name", "Year"))
##
## Unbalanced Panel: n = 22, T = 1-3, N = 51
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -747.416 -518.091 -368.470 47.916 5007.423
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 528.6155333 265.1134265 1.9939 0.05211 .
## Import.US.Thousand -0.0023175 0.0032216 -0.7193 0.47557
## AHS.Total.Tariff.Lines 4.1548176 9.8475287 0.4219 0.67505
## World.Growth -0.3268138 9.7717648 -0.0334 0.97346
## Country.Growth -0.2954591 0.8818187 -0.3351 0.73911
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 56101000
## Residual Sum of Squares: 55205000
## R-Squared: 0.015976
## Adj. R-Squared: -0.069591
## F-statistic: 0.186705 on 4 and 46 DF, p-value: 0.94415
Since e estimated fixed and random effects models, we would like to decide which one is more appropriate for our case. To decide this, we have to use the Hausman test. Here, H0: Correlation or covariance between the individual effects and x characteristic are not correlated (This implies we should use random effects estimator). H1: We may have some correlation between the individual effects and the characteristics (We should apply fixed effects).
In the output we have p value a lot more than 5%. So, we fail to reject the Null hypothesis. It means we prefer random effects over fixed effects.
##
## Hausman Test
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + ...
## chisq = 0.7732, df = 4, p-value = 0.942
## alternative hypothesis: one model is inconsistent
Now we check whether individual effects for the random effects model are statistically significant. We will use plm test. Here we get Lagrange Multiplier Test - (Breusch-Pagan). We see that p value is less than 5%, so we reject the H0. It means they are jointly insignificant and they should be included in the model. So, random effects model is more appropriate than the simple regression model.
##
## Lagrange Multiplier Test - (Breusch-Pagan)
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + ...
## chisq = 33.188, df = 1, p-value = 8.366e-09
## alternative hypothesis: significant effects
We can also heck the very same hypothesis for the fixed effects model. We get p value less than 5%. So, individual effects are not jointly equal to zero. So, they should be in a panel data model. It means fixed effects does not reduce itself to the simple regression model.
And finally, using fixed effects is better than random effects model, because of the Hausman test results.
##
## F test for individual effects
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + ...
## F = 7.5883, df1 = 21, df2 = 25, p-value = 2.319e-06
## alternative hypothesis: significant effects
Now we are allowed to check for time effects. We may verify the hypothesis that the time effects are statistically significant. We will use pFtest. We can also use plm. From the results we see that the p value is more than 5 %, so we fail to reject the H0. It means they are not not jointly insignificant.
##
## F test for individual effects
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + ...
## F = 2.6784, df1 = 2, df2 = 23, p-value = 0.09001
## alternative hypothesis: significant effects
We may check for the serial correlation. We get here Breusch-Godfrey/Wooldridge test. Our p value is more than 5%. So, we fail to reject H0. It means there is not an autocorrelation that exists in our residuals.
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + ...
## chisq = 1.2691, df = 1, p-value = 0.2599
## alternative hypothesis: serial correlation in idiosyncratic errors
And for the heteroskedasticity p value is more than 5%. We fail to reject H0. So, heteroskedasticity does not appear in the residuals.
##
## studentized Breusch-Pagan test
##
## data: Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + World.Growth + Country.Growth
## BP = 0.44404, df = 4, p-value = 0.9787
Our main hypothesis was that there is a positive impact of different variables on fuel export.. As a result of our analysis, we came to the conclusion that our hypothesis is correct, so we fail to reject it.