Abstract

In this project we created a panel data for the data set about the fuel exports in Azerbaijan over the years from 2015 to 2020. We:

estimated model with fixed and random effects estimators,
checked whether individual effects are significant,
performed the Hausman specification test to decide which estimator is more preferable,

Understanding how different independent variables interact with growth can help to explain why international trade plays a crucial role in driving economic development.

A country’s economy becomes more productive as the proportion of import and export increases, since it allows for specialization, access to diverse markets, and the transfer of knowledge, resources, and technology, leading to overall economic growth and efficiency gains.That’s why we decided to analyze this case.

Our analysis will help the countries to focus on their import and export amounts which will lead to develop the economy of their country.

Introduction and data

In our project we have main and the secondary hypothesis.

Our main hypothesis is:

H0: There is a positive impact of different variables on fuel export.

And our secondary hypothesis is:

Ha: There is no or negative impact of different variables on fuel export.

Our data is from the World Integrated Trade Solutions (WITS) website. Below is the description of the variables we used in our analysis.

Variable	Description
Export.US.Thousand	The amount of exports in US dollars in millions
Import.US.Thousand	The amount of imports in US dollars in millions
AHS.Total.Tariff.Lines	The total number of tariff lines
World.Growth	Popularity of fuels
Country.Growth	Popularity of fuels

First, we need to install all the required packages that we will use in this model.

In our dataset we have 1 dependent and 4 independent variables. Dependent variable is Export.US.Thousand and independent variables are Import.US.Thousand, AHS.Total.Tariff.Lines, World.Growth and Country.Growth. Our estimation equation look like below:

Export.US.Thousand = β0 + β1Import.US.Thousand + β2AHS.Total.Tariff.Lines + β3World.Growth + β4Country.Growth. + ε

Before starting our main econometric analysis we can look at the descriptive statistics of the dependent and the independent variables. Basically, we can interpret the results as below:

Independent variables
- Import.US.Thousand: The minimum value is 0, indicating that there are some observations with no imports. The maximum value is 549,762.4, representing the highest value in thousands of US dollars for imports. The mean is 12,442.7, indicating the average level of imports. There is no missing values for this variable.
- AHS.Total.Tariff.Lines: The minimum value is 1, suggesting that there is at least one tariff line recorded. The maximum value is 787, indicating the highest number of tariff lines. The mean is 25.2, representing the average number of tariff lines. There are 236 missing values for this variable.
- World.Growth: The minimum value is -29.14, indicating a significant decrease in growth at the global level. The maximum value is 90.11, representing a substantial increase in growth. The mean is -3.157, suggesting a slight overall decline in growth. There are 191 missing values for this variable.
- Country.Growth: The minimum value is -99.42, indicating a significant decrease in growth at the country level. The maximum value is 152,881.96, representing a substantial increase in growth. The mean is 1,405.57, suggesting an average level of popularity of fuel in the country level. There are 191 missing values for this variable.
Dependent variable
- Export.US.Thousand: The minimum value is 0, indicating that there are some observations with no exports. The first quartile (25th percentile) is 11, meaning that 25% of the observations have export values less than or equal to 11. The median is 7,808, representing the middle value of the distribution of export values. The mean is 378,655, suggesting a relatively high average level of exports. The third quartile (75th percentile) is 309,983, meaning that 75% of the observations have export values less than or equal to 309,983. The maximum value is 12,681,732, indicating the highest export value recorded.

summary(cbind(Import.US.Thousand, AHS.Total.Tariff.Lines, World.Growth, Country.Growth))

##  Import.US.Thousand AHS.Total.Tariff.Lines  World.Growth    
##  Min.   :     0.0   Min.   :  1.0          Min.   :-29.140  
##  1st Qu.:    18.0   1st Qu.:  2.0          1st Qu.:-16.023  
##  Median :   371.8   Median : 13.0          Median : -7.010  
##  Mean   : 12442.7   Mean   : 25.2          Mean   : -3.157  
##  3rd Qu.:  2128.7   3rd Qu.: 26.0          3rd Qu.: 11.273  
##  Max.   :549762.4   Max.   :787.0          Max.   : 90.110  
##                     NA's   :236            NA's   :191      
##  Country.Growth     
##  Min.   :   -99.42  
##  1st Qu.:   -27.51  
##  Median :    -9.81  
##  Mean   :  1405.57  
##  3rd Qu.:    17.57  
##  Max.   :152881.96  
##  NA's   :191

summary(Export.US.Thousand)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     7.8   378.7   310.0 12681.7

Earlier, we interpreted dependent and independent variables. Let’s estimate our model with fixed effects estimator. If we run these codes below, we will have information whether our data is balanced or not. Our output says that the data set is balanced. So, it means we have all data for the each employee, for the each time. We will focus on the coefficients and the lower block of our output. We take significance level as 5%.

As a result of our model, we can say that whenever the amount of import increases by 1 million dollars, then the export decreases by 2.32 million dollars.

When the amount of tariff lines increases by 1 unit, then the amount of export will increase about by 4155 units.

When the popularity of fuel increases in the world by 1 unit, then the amount of export will decrease about by 327 units.

When the popularity of fuel increases in the country by 1 unit, then the amount of export will decrease about by 295 units.

Some informative plots

Here from first graph we can see highest sum of exports and partners for Azerbaian and vice versa for second graphs. However, on third graph we can see the correlation between main numeric variables and as we can see there is no heavy relation between any of them meaning that it will be easy for us to use any combination of variables unless NA values are not the issue.

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + 
##     World.Growth + Country.Growth, data = panel_data, model = "within")
## 
## Unbalanced Panel: n = 22, T = 1-3, N = 51
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -1712.875   -93.744     0.000   130.780  1549.279 
## 
## Coefficients:
##                           Estimate  Std. Error t-value Pr(>|t|)
## Import.US.Thousand     -0.00044543  0.00229100 -0.1944   0.8474
## AHS.Total.Tariff.Lines -3.13942582  8.18135942 -0.3837   0.7044
## World.Growth            6.98630450  8.20477105  0.8515   0.4026
## Country.Growth          0.53347243  0.69954788  0.7626   0.4528
## 
## Total Sum of Squares:    7992600
## Residual Sum of Squares: 7486200
## R-Squared:      0.06336
## Adj. R-Squared: -0.87328
## F-statistic: 0.422785 on 4 and 25 DF, p-value: 0.79067

This function allows us to calculate estimates of individual effects. Here we look at the individual effects of Azerbaijan. If it is negative, then the amount of export is lower than the average, however if it is positive, then the amount of export is higher than the average export In our output, it seems that all of the individual exports are higher than the average.

##              Austria              Belarus              Belgium 
##              447.965              -47.828              198.824 
##               Canada                China               France 
##              331.256              508.039              656.976 
##              Georgia              Germany               Greece 
##              539.822              984.844              364.776 
##                India                Italy           Kazakhstan 
##              867.348             4144.648              223.739 
##          Netherlands              Romania   Russian Federation 
##              209.719              228.267              295.909 
##                Spain          Switzerland               Turkey 
##              660.322              235.032             2273.361 
##              Ukraine United Arab Emirates       United Kingdom 
##              345.896             -328.771              419.754 
##        United States 
##              555.228

So, we may estimate the simple regression model for panel data. Simple regression model sometimes called Pooled OLS == POLS. It is not recommendable to estimate the POLS model, because we may have problems. First, we may have biased and inconsistent estimates. Second, we can have problems for autocorrelation and heteroskedasticity. Whenever our panel data set reduces into simple regression model, whenever individual effects disappear from the model, then we can use simple regression model. To check if we will run the next line.

Here, H0: All individual effects are statistically insignificant. In the output we get that p value is much more less than 5%, so we have to reject the Null Hypothesis. It means our model does not reduce to the simple regression model.

In this case it is better to use fixed effects estimator for the panel data model instead of the simple regression model.

## 
##  F test for individual effects
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +  ...
## F = 7.5883, df1 = 21, df2 = 25, p-value = 2.319e-06
## alternative hypothesis: significant effects

We can check for the serial correlation in the residuals. This is a Breusch-Godfrey/Wooldridge test serial correlation. We see that the p value is less than 5%. Here H0: There is an autocorrelation in residuals. We have to reject H0. It means there is an autocorrelation in our residuals. We have problem in our panel data model.

## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +  ...
## chisq = 6.3477, df = 1, p-value = 0.01175
## alternative hypothesis: serial correlation in idiosyncratic errors

Here, we can check for heteroskedasticity in our residuals. This is a panel version of Breusch-Pagan test. H0: Residuals are homoskedastic (They are OK). We see that p value is much more than 5%, so we fail reject H0. It means they are not heteroskedastic, so they are OK.

## 
##  studentized Breusch-Pagan test
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +     World.Growth + Country.Growth
## BP = 0.44404, df = 4, p-value = 0.9787

How can we cop with this problems?

We have to apply robust variance-covariance matrix estimator. We have to use coeftest function.

In the output below, we can trust to t-statistics and p values.

Earlier, because of the autocorrelation and heteroskedasticity problems we couldn’t trust standard errors, t-statistics and p values. Now we overcome these problems and we can trust.

## 
## t test of coefficients:
## 
##                           Estimate  Std. Error t value Pr(>|t|)
## Import.US.Thousand     -0.00044543  0.00032479 -1.3714   0.1824
## AHS.Total.Tariff.Lines -3.13942582  5.42225431 -0.5790   0.5678
## World.Growth            6.98630450  4.52336281  1.5445   0.1350
## Country.Growth          0.53347243  0.33980244  1.5699   0.1290

Random effects output is a bit different. Here we have information on effects. Since there is no sense in interpreting the individual effects from random effects model, then we have information of aggregate characteristics of these individual effects. Here, we can see what is the variance of idiosyncratic error. It means what is the variance of residuals, individual effects and theta. Theta is the share of individual variance in the sum of idiosyncratic and individual variance. We would like theta to be close to zero. However, at the level of this course we want much on these output. We will be more focused on coefficients, R-squared statistics and jointly insignificant test for the model.

How to interpret?

Here, our dependent variable is Export.US.Thousand whenever the amount of import increases by 1 million dollars, then the export decreases by 5.9 million dollars.

When the amount of tariff lines increases by 1 unit, then the amount of export will decrease about by 2.8 units.

When the popularity of fuel increases in the world by 1 unit, then the amount of export will increase about by 4.6 units.

When the popularity of fuel increases in the country by 1 unit, then the amount of export will increase about by 3 units.

And the hypothesis of the joint insignificance of all parameters will fail to be rejected because p value is more than 5%. This means that all variables in our model are not jointly statistically significant.

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + 
##     World.Growth + Country.Growth, data = panel_data, model = "random")
## 
## Unbalanced Panel: n = 22, T = 1-3, N = 51
## 
## Effects:
##                    var  std.dev share
## idiosyncratic 299446.5    547.2 0.231
## individual    997626.5    998.8 0.769
## theta:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5195  0.6388  0.6984  0.6680  0.6984  0.6984 
## 
## Residuals:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -674.0  -186.5  -103.6    14.0    24.5  2619.3 
## 
## Coefficients:
##                           Estimate  Std. Error z-value Pr(>|z|)  
## (Intercept)             6.2332e+02  2.5624e+02  2.4325  0.01499 *
## Import.US.Thousand     -5.8786e-04  2.0477e-03 -0.2871  0.77405  
## AHS.Total.Tariff.Lines -2.7869e+00  6.9273e+00 -0.4023  0.68746  
## World.Growth            4.6405e+00  6.7802e+00  0.6844  0.49371  
## Country.Growth          2.9578e-01  5.8803e-01  0.5030  0.61497  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    12347000
## Residual Sum of Squares: 12265000
## R-Squared:      0.010919
## Adj. R-Squared: -0.075088
## Chisq: 1.12885 on 4 DF, p-value: 0.88967

what we can do is to estimate 2 ways panel data model. In the name of fixed.time we add dummy variables for each time. We would like to introduce 2 way random error model. Here, we extend our approach with time specific effects. So, we assume there is something happened in some specific years that affected the whole Partner countries.

In the results we have estimates of the parameters and the dummies for each time.

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + 
##     World.Growth + Country.Growth + Year, data = panel_data, 
##     model = "within", index = c("Partner.Name", "Year"))
## 
## Unbalanced Panel: n = 22, T = 1-3, N = 51
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -1469.96  -114.57     0.00   118.78  1185.71 
## 
## Coefficients:
##                           Estimate  Std. Error t-value Pr(>|t|)  
## Import.US.Thousand     -1.6862e-03  2.2350e-03 -0.7545   0.4582  
## AHS.Total.Tariff.Lines  2.5435e+01  2.0640e+01  1.2323   0.2303  
## World.Growth           -4.7702e+00  9.5763e+00 -0.4981   0.6231  
## Country.Growth          7.6717e-02  6.8597e-01  0.1118   0.9119  
## Year2019                1.0738e+03  5.2468e+02  2.0466   0.0523 .
## Year2020                6.9673e+02  5.2837e+02  1.3186   0.2003  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    7992600
## Residual Sum of Squares: 6072000
## R-Squared:      0.2403
## Adj. R-Squared: -0.65153
## F-statistic: 1.2125 on 6 and 23 DF, p-value: 0.33557

We can estimate pooled regression model using plm function.

## Pooling Model
## 
## Call:
## plm(formula = Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines + 
##     World.Growth + Country.Growth, data = panel_data, model = "pooling", 
##     index = c("Partner.Name", "Year"))
## 
## Unbalanced Panel: n = 22, T = 1-3, N = 51
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -747.416 -518.091 -368.470   47.916 5007.423 
## 
## Coefficients:
##                           Estimate  Std. Error t-value Pr(>|t|)  
## (Intercept)            528.6155333 265.1134265  1.9939  0.05211 .
## Import.US.Thousand      -0.0023175   0.0032216 -0.7193  0.47557  
## AHS.Total.Tariff.Lines   4.1548176   9.8475287  0.4219  0.67505  
## World.Growth            -0.3268138   9.7717648 -0.0334  0.97346  
## Country.Growth          -0.2954591   0.8818187 -0.3351  0.73911  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    56101000
## Residual Sum of Squares: 55205000
## R-Squared:      0.015976
## Adj. R-Squared: -0.069591
## F-statistic: 0.186705 on 4 and 46 DF, p-value: 0.94415

Since e estimated fixed and random effects models, we would like to decide which one is more appropriate for our case. To decide this, we have to use the Hausman test. Here, H0: Correlation or covariance between the individual effects and x characteristic are not correlated (This implies we should use random effects estimator). H1: We may have some correlation between the individual effects and the characteristics (We should apply fixed effects).

In the output we have p value a lot more than 5%. So, we fail to reject the Null hypothesis. It means we prefer random effects over fixed effects.

## 
##  Hausman Test
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +  ...
## chisq = 0.7732, df = 4, p-value = 0.942
## alternative hypothesis: one model is inconsistent

Now we check whether individual effects for the random effects model are statistically significant. We will use plm test. Here we get Lagrange Multiplier Test - (Breusch-Pagan). We see that p value is less than 5%, so we reject the H0. It means they are jointly insignificant and they should be included in the model. So, random effects model is more appropriate than the simple regression model.

## 
##  Lagrange Multiplier Test - (Breusch-Pagan)
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +  ...
## chisq = 33.188, df = 1, p-value = 8.366e-09
## alternative hypothesis: significant effects

We can also heck the very same hypothesis for the fixed effects model. We get p value less than 5%. So, individual effects are not jointly equal to zero. So, they should be in a panel data model. It means fixed effects does not reduce itself to the simple regression model.

And finally, using fixed effects is better than random effects model, because of the Hausman test results.

## 
##  F test for individual effects
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +  ...
## F = 7.5883, df1 = 21, df2 = 25, p-value = 2.319e-06
## alternative hypothesis: significant effects

Now we are allowed to check for time effects. We may verify the hypothesis that the time effects are statistically significant. We will use pFtest. We can also use plm. From the results we see that the p value is more than 5 %, so we fail to reject the H0. It means they are not not jointly insignificant.

## 
##  F test for individual effects
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +  ...
## F = 2.6784, df1 = 2, df2 = 23, p-value = 0.09001
## alternative hypothesis: significant effects

We may check for the serial correlation. We get here Breusch-Godfrey/Wooldridge test. Our p value is more than 5%. So, we fail to reject H0. It means there is not an autocorrelation that exists in our residuals.

## 
##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +  ...
## chisq = 1.2691, df = 1, p-value = 0.2599
## alternative hypothesis: serial correlation in idiosyncratic errors

And for the heteroskedasticity p value is more than 5%. We fail to reject H0. So, heteroskedasticity does not appear in the residuals.

## 
##  studentized Breusch-Pagan test
## 
## data:  Export.US.Thousand ~ Import.US.Thousand + AHS.Total.Tariff.Lines +     World.Growth + Country.Growth
## BP = 0.44404, df = 4, p-value = 0.9787

Results

Our main hypothesis was that there is a positive impact of different variables on fuel export.. As a result of our analysis, we came to the conclusion that our hypothesis is correct, so we fail to reject it.

Panel Data Model for Export and Import of Fuels in Azerbaijan from 2015 to 2020

Leyla Ellazova, Narmina Abdullayeva, Islam Islamov

2023-06-12

Abstract

Introduction and data

Some informative plots

Results

Thank you!