1 Introduction

We are analyzing the relationship between the debt to GDP ratio and the imports to exports ratio for five countries. Our outcome variable is debt to GDP ratio. Our numerical explanatory variables are year and trade ratio, and our categorical explanatory variable is country.

In general, the debt to GDP ratio tends to go up during times of economic hardship when countries borrow to boost a depressed economy, or during wars when governments also tend to borrow heavily. The imports to exports ratio is dependent on a mostly different set of factors, such as interest rates and the value of the domestic currency. A more valuable domestic currency tends to result in a greater imports to exports ratio. It is possible that there is little or no relationship between these two economic ratios.

We obtained our data from the International Monetary Fund. There are several possible limitations to this data. First of all, we only selected countries with available data. Thus our selection of countries may not be a good representation of the relationship worldwide. Moreover, to obtain the imports to exports data, we divided the imports data by the exports data. Normally, economists take the difference rather than the ratio (known as net exports), but we took the ratio instead so that we could better take appropriate regressions by not dealing with negative values. Additionally, the IMF data set we used for debt to GDP ratio (Historical Public Debt Database) draws its data from sources that differ for each country (more detail on methodology can be found in this working paper). The way that this information is collected may vary by source, limiting the accuracy of our conclusions. The data set we sourced import and export data from has similar limitations (details on which can be found here).


2 Exploratory data analysis

Correlation coefficient and data visualization

Data preview
country year debt_to_gdp_log10 trade_ratio
Australia 1966 1.615231 1.0225588
Australia 1967 1.593794 1.0032633
Australia 1968 1.582193 1.1108776
Australia 1969 1.552996 0.9611279
Australia 1970 1.530264 0.9494408
Australia 1971 1.495912 0.8942754

Correllation coefficient

## [1] -0.3280534

Data visualization

In our data set, each row represents one observation of trade ratio and debt to GDP ratio for one country in a given year. Each point in the scatter plot is a visual representation of this observation, although the graph does not show year, and in the case of the uncolored plot does not show country.

The uncolored scatter plot shows a roughly linear relationship between our numerical variables. Based on the correlation coefficient, there appears to be a weak correlation between trade ratio and debt to GDP ratio. However, when faceted by country, it is apparent that the slope for each country differs significantly. This suggests that if a relationship between trade ratio and debt to GDP ratio exists, it may be complicated by unidentified country specific factors at play.

When adjusted through the application of a log 10 scale, the outcome variable debt to GDP ratio displays a roughly normal distribution. The numerical explanatory variable trade ratio also displays a roughly normal distribution. While debt to GDP ratio and trade ratio do vary over time, there is not a consistent pattern to this variation.


3 Multiple regression

term estimate std_error statistic p_value conf_low conf_high
intercept 1.916 0.200 9.600 0.000 1.523 2.310
trade_ratio -0.562 0.199 -2.826 0.005 -0.953 -0.170
countryCanada -0.221 0.389 -0.567 0.571 -0.988 0.546
countryMexico 0.158 0.214 0.742 0.459 -0.262 0.579
countryParaguay -0.470 0.207 -2.277 0.024 -0.877 -0.063
countryPeru -0.299 0.215 -1.391 0.166 -0.722 0.125
trade_ratio:countryCanada 0.711 0.412 1.726 0.086 -0.101 1.523
trade_ratio:countryMexico 0.102 0.209 0.489 0.625 -0.310 0.515
trade_ratio:countryParaguay 0.535 0.201 2.654 0.009 0.138 0.932
trade_ratio:countryPeru 0.479 0.215 2.226 0.027 0.055 0.903

This regression models the debt to GDP ratio as a function of 2 variables – the country and the trade ratio – using an interaction model. The trade ratio is a numerical explanatory variable, while country is a categorical explanatory variable.

3.1 Statistical interpretation

The baseline country for this interaction model is Australia. Each value of the format “countryCanada” represents the difference in intercept for each country relative to Australia. Each value of the format trade_ratio:countryCanada represents the difference in slope relative to Australia. Thus the table provides 5 different linear models for debt to GDP ratio as a function of trade ratio – one for each country. For every increase of one unit of trade ratio, there is an associated average increase/decrease of m in debt to GDP ratio. In the form \(y = mx + b\) (Debt to GDP ratio = \(m\) (trade ratio) + \(b\)), the linear functions are as follows:

  1. Australia: \(\widehat{log10(debt/gdp)} = -22.472(trade ratio) + 46.433\)
  2. Canada: \(\widehat{log10(debt/gdp)}= -24.445(trade ratio) + 96.825\)
  3. Mexico: \(\widehat{log10(debt/gdp)} = 34.97 (trade ratio)+ 32.067\)
  4. Paraguay: \(\widehat{log10(debt/gdp)} = -36.979(trade ratio) + 65.83\)
  5. Peru: \(\widehat{log10(debt/gdp)} = -23.19(trade ratio) + 59.031\)

These linear functions reflect the colored scatter plot from our EDA. Each country has a different value for slope and intercept, as shown in the regression lines.

Our model is very limited. Standard error values are quite large relative to the corresponding regression coefficients. The most likely reason for this is that there is little or no relationship between the debt to GDP ratio and the trade ratio. It is also possible that this model is a poor fit for the data.

3.2 Non-statistical interpretation

The multiple regression model shows the relationship between the debt to GDP ratio and the trade ratio for 5 countries. Overall, for most countries, an increase in the trade ration (imports/exports) is associated with an decrease in debt to GDP ratio. However, the size of this decrease seems to vary somewhat by country. This variation suggests that the nature of this relationship might not be straightforward.


4 Inference for multiple regression

All confidence intervals represented in the multiple regression table above are based on a 95% confidence standard. The “baseline” for this interaction model is Australia. Since the intercept and slope confidence intervals for Australia do not contain zero, there is likely a correlation between trade ratio and the log10 of the debt to GDP ratio. Specifically, we are 95% confident that the true slope lies between \(-0.953\) and \(-0.170\), while the true intercept lies between \(1.523\) and \(2.310\). The other “estimate” values in the table represent the differences in slopes compared to Australia’s and the differences in intercepts compared to Australia’s. In 5 of 8 cases, the confidence interval contains zero, so for these we cannot conclude that there is any statistical difference in slope or intercept compared to Australia. For the other 3 cases (slope for Peru, both slope and intercept for Paraguay) the confidence interval does not contain zero, therefore we can conclude that the linear models for Peru and Paraguay differ from Australia’s model. Practically speaking, the models for Australia, Canada, and Mexico should be the same, while the models for Peru and Paraguay should be different.

Using the p-values from the regression table (regression table, p_value column), we can conduct hypothesis tests on our multiple regression model. For all tests, we will use \(\alpha = 0.05\). First, let’s look at the slope and intercept for our baseline (country = Australia).

  1. For the slope: Null hypothesis: no relationship exists between the trade ratio and debt to GDP ratio.
    Alternative hypothesis: a relationship exists.
  2. For the intercept: Null hypothesis: the intercept is equal to zero.
    Alternative hypothesis: the intercept does not equal zero.

From the regression table, we see that the p-values for baseline slope and intercept are \(0\) and \(0.005\) respectively. Thus, we can reject both null hypotheses and conclude that there is a relationship between trade ratio and log 10 debt to GDP ratio with a nonzero intercept.

Next, let’s look at the relative slopes and intercepts for our other 4 countries. For each, our null hypothesis is that there is no change from the baseline, while our alternative hypothesis is that there is a change.

For the intercepts, we see that only one country, Paraguay, has a p-value (\(0.024\)) less than our \(\alpha\). Thus, for Paraguay we can reject the null hypothesis and conclude that there is a statistically significant difference in intercept between Paraguay and Australia. In all other intercept cases, we fail to reject the null hypotheses. Statistically, these countries have the same intercept as Australia.

For the slopes, we see that only two countries, Paraguay and Peru, have p-values less than alpha (\(0.009\) and \(0.027\) respectively). Thus, for these two countries, we can conclude that there are statistically significant differences in slopes between those of Paraguay and Peru and that of Australia. For all other countries, we fail to reject the null hypotheses, meaning that statistically speaking they have the same slopes as Australia.

The conclusion of these hypothesis tests is ultimately the same as the conclusion reached through the examination of confidence intervals. Essentially the models for Australia, Canada, and Mexico are the same, while the linear models of Peru (slope) and Paraguay (slope and intercept) differ from the baseline and from each other.

4.1 Residual Analysis

While the histogram of residuals does display a tail, the left skew is not dramatic and the general distribution is still roughly normal. This roughly distribution is centered at approximately \(0\).

Looking at the faceted scatter plot of residuals plotted against the explanatory variable trade ratio, there appears to be a roughly constant spread about the line y = \(0\) for most countries. While the spread for Mexico and Paraguay appears less even, this pattern does not appear to be dramatic and there is still a roughly even spread.

In order to perform inference for regression, four conditions must be met:

  1. A linear relationship between the explanatory and outcome variable
  2. A roughly normal distribution of residuals, centered at \(0\)
  3. A constant spread of residuals, and
  4. Independence of the residuals.

We have already established a linear relationship between our explanatory variable and our outcome variable during our exploratory data analysis. The above plots demonstrate that the following two conditions are met, as described above. Finally, for our purposes we will assume that the independence condition has been met.

Thus, we can conclude that our interpretation of the p-values and confidence intervals of our multiple regression model is valid.


5 Conclusion

From our analysis, we were unable to draw a strong conclusion about the relationship between debt to GDP ratio and trade ratio. For the general situation in which country and year are not taken into account, we found a correlation coefficient of \(-0.328\). While this is a weak correlation, it may not be insignificant.

While analyzing the data without taking country into account yielded a weak relationship, looking at individual countries revealed a different story. We created a multiple regression interaction model based on the categorical explanatory variable of country and the numerical explanatory variable of trade ratio. We chose an interaction model over a parallel slopes model because there was at least 1 significant interaction term (a significant slope difference). Our model produced 5 specific linear functions - one for each country. Most of these had negative slopes that reflected the initial correlation coefficient. However, this model showed the data to lack consistency within and across countries. For each country, model error was high with relatively large standard error values. This suggests that our model may not be an accurate representation of the data.

Unfortunately, there were significant limitations to our analysis. First of all, we found that the linear models for two countries differed significantly from the others. This implies that there are other factors at play –which vary by country– that influence the relationship between our main two variables. This subject is quite complicated, so these confounding variables make it even more difficult to understand the true relationship between our main variables. We certainly are unable to conclude anything regarding causation, as this was not a random experiment. Additionally, the methods by which the data was obtained were not consistent, so any combination of these data sets may be introducing additional country-specific collection errors. Finally, we only chose five countries for our analysis, none of which were randomly selected. We selected these countries because the IMF had the appropriate data on them for the time range we were interested in.

Any future analysis of this relationship could solve this last problem by finding appropriate data for countries actually selected at random. In addition, a future study could add more countries to the analysis, which would contribute to more accurate and precise conclusions. Any future study should also make sure to select data from one source with a consistent collection methodology, so as to prevent differences between countries based on how the data was collected. It would also be beneficial to choose a more informative baseline than Australia, such as an economic superpower like the United States or China. Doing so would allow us to more accurately compare differences between countries.

While at this time we do not have the ability to come to a conclusion about the relationship between debt to GDP ratio and trade ratio, the results of our analysis suggest that such a relationship may exist. However, in order to confirm this intuition, we would need more data and more consistently obtained data, as well as perhaps a more refined model.


6 Citations and References

Historical Public Debt Database (HPDD). (2016, November 30). Retrieved March 29,
2018, from IMF Data website: http://data.imf.org/?sk=806ED027-520D-497F-9052-63EC199F5E63

Direction of Trade Statistics (DOTS). (2018, March 23). Retrieved March 29,
2018, from IMF Data website: http://data.imf.org/?sk=9D6028D4-F14A-464C-A2F2-59B2CD424B85


Supplementary Materials

Here is an interactive world map from the IMF displaying debt to GDP ratios over time. Clicking the play button on the bottom left-hand corner of the screen will start an animation showing how this economic index has changed over time. We recommend starting the animation at 1970, as fewer countries have data before this point.

Debt to GDP Ratio Worldwide, 2015

Debt to GDP Ratio Worldwide, 2015