TFM Dataset

Exploratory Analysis

To get familiar with the dataset, it is useful to plot the different dimensions of the data so it would be easier for us to understand the underlying relationships. To do so, we may use the ExPanDaR library which is designed specifically to explore panel data. This is just a “play around” to get familiar with the data.

#Load the libraries
library(ExPanDaR)
library(dplyr)
library(plyr)
library(ggplot2)
library(kableExtra)
#Load the data
dta_wide <- readRDS(url("https://github.com/AndresMtnezGlez/heptaomicron/raw/main/dta_wide.rds"))

Variables’ units

First thing we can do is to produce a descriptive statistics table which will help us to understand what are the variables’ units and what transformations should we apply.

t <- prepare_descriptive_table(dta_wide)
t$kable_ret %>% kable_styling("condensed", full_width = F, position = "center")

Descriptive Statistics
	N	Mean	Std. dev.	Min.	25 %	Median	75 %	Max.
year	732	1,990.000	17.619	1,960.000	1,975.000	1,990.000	2,005.000	2,020.000
CPIH	355	85.059	14.739	40.903	72.905	86.132	99.462	110.152
DGDP	702	61.616	34.658	6.949	25.111	66.341	94.222	119.537
DMGT	701	137.596	218.756	0.159	8.456	48.412	168.434	1,454.179
DXGT	701	145.801	243.550	0.126	8.052	45.089	177.303	1,668.459
LTIR	594	7.257	4.178	0.090	4.310	6.565	9.278	27.740
PDEB	367	74.778	33.684	6.098	54.434	66.225	99.494	181.130
PGDP	613	574.182	677.564	12.798	136.478	240.323	774.272	3,040.828
POPU	732	24,618.448	25,591.056	497.800	6,670.152	10,360.400	47,010.279	83,124.069
REER	732	0.932	0.371	0.090	0.794	1.000	1.000	2.268
STIR	597	6.179	5.214	-0.330	2.330	4.630	9.350	24.560
TFFP	697	84.662	18.839	35.186	72.432	89.531	100.000	153.084
UVGD	701	403.214	647.889	0.271	25.454	140.612	391.462	3,602.780
id	732	6.500	3.454	1.000	3.750	6.500	9.250	12.000

Share of missing values

To decide what transformations should or should not apply we will take a look to the share of missing values in our data. As we can see there are three variables that are seriously incomplete. The first and most important is CPIH, which lacks of about 25 years of information. It means that we should deflactate using the Deflector of GDP whenever is possible. We also see that the PDEB is also incomplete. We should see carefully which countries are lacking of data for our dependent variable. Finally, we see that there are missing observations for LTIR and STRIR in 2020 and maybe 2019, so we might have to think about shortening the sample

prepare_missing_values_graph(dta_wide, ts_id = "year")

Peaking at some variables

Let’s check our objective, the public debt and see how it interacts with the interest rate.

ret <- prepare_by_group_bar_graph(dta_wide, by_var = "cntry", var = "PDEB",
                                  stat_fun = mean, order_by_stat = TRUE) 
ret$plot + ggtitle("Mean Public Debt")

The good thing about the above graph is that it gives us a sense of the diverse mean public debt levels in the EZ. The bad thing is that tell us little about the dispersion of this value. To solve it we can make a violin plot to check the distributions.

ret <- prepare_by_group_violin_graph(dta_wide, by_var = "cntry", var = "PDEB",
                                     order_by_mean = TRUE)
ret

It might be also interesting to check the interest rate in the long term.

ret <- prepare_by_group_bar_graph(dta_wide, by_var = "cntry", var = "LTIR",
                                  stat_fun = mean, order_by_stat = TRUE) 
ret$plot + ggtitle("Mean Long Term Nominal Interest Rate")

We can also see the time dimension of this variables with a sense of the dispersion.

graph <- prepare_trend_graph(dta_wide[c("year", "LTIR", "PDEB")], "year")
graph$plot

Let’s talk correlation

ret <- prepare_correlation_graph(dta_wide)