Exploratory Analysis

To get familiar with the dataset, it is useful to plot the different dimensions of the data so it would be easier for us to understand the underlying relationships. To do so, we may use the ExPanDaR library which is designed specifically to explore panel data. This is just a “play around” to get familiar with the data.

#Load the libraries
library(ExPanDaR)
library(dplyr)
library(plyr)
library(ggplot2)
library(kableExtra)
#Load the data
dta_wide <- readRDS(url("https://github.com/AndresMtnezGlez/heptaomicron/raw/main/dta_wide.rds"))

Variables’ units

First thing we can do is to produce a descriptive statistics table which will help us to understand what are the variables’ units and what transformations should we apply.

t <- prepare_descriptive_table(dta_wide)
t$kable_ret %>% kable_styling("condensed", full_width = F, position = "center")
Descriptive Statistics
N Mean Std. dev. Min. 25 % Median 75 % Max.
year 732 1,990.000 17.619 1,960.000 1,975.000 1,990.000 2,005.000 2,020.000
CPIH 355 85.059 14.739 40.903 72.905 86.132 99.462 110.152
DGDP 702 61.616 34.658 6.949 25.111 66.341 94.222 119.537
DMGT 701 137.596 218.756 0.159 8.456 48.412 168.434 1,454.179
DXGT 701 145.801 243.550 0.126 8.052 45.089 177.303 1,668.459
LTIR 594 7.257 4.178 0.090 4.310 6.565 9.278 27.740
PDEB 367 74.778 33.684 6.098 54.434 66.225 99.494 181.130
PGDP 613 574.182 677.564 12.798 136.478 240.323 774.272 3,040.828
POPU 732 24,618.448 25,591.056 497.800 6,670.152 10,360.400 47,010.279 83,124.069
REER 732 0.932 0.371 0.090 0.794 1.000 1.000 2.268
STIR 597 6.179 5.214 -0.330 2.330 4.630 9.350 24.560
TFFP 697 84.662 18.839 35.186 72.432 89.531 100.000 153.084
UVGD 701 403.214 647.889 0.271 25.454 140.612 391.462 3,602.780
id 732 6.500 3.454 1.000 3.750 6.500 9.250 12.000

Share of missing values

To decide what transformations should or should not apply we will take a look to the share of missing values in our data. As we can see there are three variables that are seriously incomplete. The first and most important is CPIH, which lacks of about 25 years of information. It means that we should deflactate using the Deflector of GDP whenever is possible. We also see that the PDEB is also incomplete. We should see carefully which countries are lacking of data for our dependent variable. Finally, we see that there are missing observations for LTIR and STRIR in 2020 and maybe 2019, so we might have to think about shortening the sample

prepare_missing_values_graph(dta_wide, ts_id = "year")

Peaking at some variables

Let’s check our objective, the public debt and see how it interacts with the interest rate.

ret <- prepare_by_group_bar_graph(dta_wide, by_var = "cntry", var = "PDEB",
                                  stat_fun = mean, order_by_stat = TRUE) 
ret$plot + ggtitle("Mean Public Debt") 

The good thing about the above graph is that it gives us a sense of the diverse mean public debt levels in the EZ. The bad thing is that tell us little about the dispersion of this value. To solve it we can make a violin plot to check the distributions.

ret <- prepare_by_group_violin_graph(dta_wide, by_var = "cntry", var = "PDEB",
                                     order_by_mean = TRUE)
ret

It might be also interesting to check the interest rate in the long term.

ret <- prepare_by_group_bar_graph(dta_wide, by_var = "cntry", var = "LTIR",
                                  stat_fun = mean, order_by_stat = TRUE) 
ret$plot + ggtitle("Mean Long Term Nominal Interest Rate") 

We can also see the time dimension of this variables with a sense of the dispersion.

graph <- prepare_trend_graph(dta_wide[c("year", "LTIR", "PDEB")], "year")
graph$plot

Let’s talk correlation

ret <- prepare_correlation_graph(dta_wide)