Descriptive Statistics and Multilevel Linear Model Analysis

In this chapter, I evaluate the hypotheses developed in the theory chapter here using descriptive statistics and mulitlevel modeling. Primarily, this analysis seeks to inform academic perspectives on the conditional relationship that exists between international capital flows and domestic perceptions of political corruption in the MENA region.

The first set of analyses provides descriptive information on the variables of interest. In the second section, I estimate inferential statistics starting with a basic (naieve) model and building up to multi-level models.

I present descriptive statistics to clear the theoretical brush, and I estimate multilevel models to isolate the effect of time-invariant country characteristics that condition the relationship between finance capital and corruption.

Although political scientists frequently work with hierarchical or multilevel data structures, the specific models they employ vary widely. For the purposes of this analysis, consider the following model:

\[y_i = \beta_0 + \beta_1 x_i + \beta_2 z_j + \epsilon_i\]

This model consists of an amalgamation of two levels of analysis - level-1 (\(x_i\)) and level-2 (\(z_j\)) factors.

Dependent variable: Perceptions of Corruption

  • Variable name is wbgi_cce

    • It’s normally distributed and skewed left (mean is lower than median)
    • Based on World Bank Index values ranging from -2 to 2.

The following code summarizes the dependent variable:

hist(data\(wbgi_cce) summary(data\)wbgi_cce)

hist(data$wbgi_cce, main = NULL, xlab = "Density", ylab = "Perception of Corruption")

dv_summary <- ggplot(data, aes(x=wbgi_cce, y=country)) +
  geom_segment(aes(yend=country), xend=0, colour="grey50") + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank() # No horizontal grid lines
  )
dv_summary

corruption_ka_open_panel <- ggplot(data = data) +
  geom_point(mapping = aes(year, wbgi_cce)) + 
  facet_wrap(~country)

corruption_ka_open_panel

Independent variable of interest: Capital Account Openness

  • Variable name is ka_open

    • Its distribution is funky, kinda striated
    • The variable is an index based on principal components of IMF AREARS data values

This code summarizes the independent variable of interest:

hist_IV <- hist(data$ka_open, xlab = "Density", ylab = "Capital Account Openness", main = NULL)

iv_summary <- ggplot(data, aes(x=ka_open, y=country)) +
  geom_segment(aes(yend=country), xend=0, colour="grey50") + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank()
  )

iv_summary

Variation in confounding (or leveled) variables

This section introduces confounding variables to the analysis. Here’s what I’m working with now:

Continuous-scale Variables:

  1. Contract intensive money; proxy for property rights: cim and cim_sq
  2. Total flows of finance capital per year: finance_flows

Factor variables:

  1. Political economy grouping: pol_econ_index (Cammett and Diwan)
  2. Regime type (Clement and Springborg): regime
  3. Region from which most trade originates: tradelib_region
  4. Historical development trajectory - devtraj.

1. Contract Intensive Money

This code summarizes the Contract Intensvie Money variable, cim, and its squared variation, cim_sq.

cim_panel <- ggplot(data, aes(x=cim, y=country)) +
  geom_segment(aes(yend=country), xend=0) + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank()
  )
  
cim_panel

cim_sq_panel <- ggplot(data, aes(x=cim_sq, y=country)) +
#   geom_segment(aes(yend=country), xend=0) + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank()
  )
  
cim_sq_panel

2. Total Finance Flows

This code summarizes the ln_finance_flows variable.

finance_flows_panel <- ggplot(data, aes(x=ln_finance_flows, y=country)) +
  geom_segment(aes(yend=country), xend=0) + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank()
  )
  
finance_flows_panel

3. Political Economy grouping (Cammett and Diwan)

This section adds a factor variable, pol_econ_index, which indicates whether the countries belong in the RRLA, RPLA, or RRLP political economy grouping.

The following table shows the relative distribution of the variable across countries in the region.

Country Group
Algeria RRLA
Bahrain RRLP
Cyprus
Egypt RPLA
Iran RRLA
Iraq RRLA
Israel
Jordan RPLA
Kuwait RRLP
Lebanon RPLA
Libya RRLP
Morocco RPLA
Oman RRLP
Qatar RRLP
Saudi Arabia RRLP
Sudan
Syria RRLA
Tunisia RPLA
Turkey RPLA
United Arab Emirates RRLP
Yemen RRLA

4. Regime Type (Henry and Springborg)

This section adds a factor variable, regime, which indicates whether the countries belong in the bunker, bully, monarchy, GCC, or semidemocracy category. These designations are from @(Henry and Springborg)

The following table shows the relative distribution of the variable across countries in the region.

Country Group
Algeria Bunker
Bahrain GCC
Cyprus
Egypt Bully
Iran Bully
Iraq Bunker
Israel Semi-democracy
Jordan Monarchy
Kuwait GCC
Lebanon Semi-democracy
Libya Bunker
Morocco Monarchy
Oman GCC
Qatar GCC
Saudi Arabia GCC
Sudan Bunker
Syria Bunker
Tunisia Bully
Turkey Semi-democracy
United Arab Emirates GCC
Yemen Bunker

“The six bunker states – Algeria, Iraq, Libya, Sudan, Syria and Yemen – display the least institutional capacity of any of the MENA states to manage their economies…” [@Henry_and_Springborg2010, p 114].

“Egypt, Tunisia, and the area controlled by the Palestinian Authority are not ruled from bunkers by elites beholden to clans, tribes, or other traditional social formations. In the case of Egypt and Tunisia…the ruling elites are at once both more narrowly and broadly based. Their rule rests almost exclusively on the institutional power of the military/security/party apparatus, but because these elites are not drawn from a clearly identifed social formation, they are at least not unrepresentative of their relatively homogenous political communities.” [@Henry_and_Springborg2010, p 162.]

“The monarchies in the region are better positioned than praetorian republics to take advantage of the opportunites of globalization. They have more active private sectors, some of which have joint ventures and other constructive relationships with multinational companies, in petroleum-related industries for the most part. Many of them also have concentrated financial systems…that enable them to engage in a controlled liberalizataion consonant with the Washington Consensus. [@Henry_and_Springborg2010, p 212.]

5. Region from which most trade originates

This lists the countries of the world that serve as the largest trade partner (as measured as gross imports and exports) for each of the countries in the MENA region:

I’ve included this information in the data set coded in two ways: identifying the largest two trade partners by region.

Country Primary Trade Partner Secondary Trade Partner
Algeria USA EU
Bahrain MENA USA
Cyprus EU EU
Egypt USA EU
Iran China MENA
Iraq USA China
Israel USA EU
Jordan MENA USA
Kuwait Japan Korea
Lebanon EU USA
Libya EU EU
Morocco EU EU
Oman MENA China
Qatar Japan Korea
Saudi Arabia USA Japan
Sudan China MENA
Syria
Tunisia EU EU
Turkey Russia EU
United Arab Emirates Japan India
Yemen China MENA

Goods, Value of Imports and Exports, USD

Source: IMF Direction of Trade Statistics (DOTS)

Note: The Direction of Trade Statistics (DOTS) present current figures on the value of merchandise exports and imports disaggregated according to a country’s primary trading partners.

Note: Imports are reported on a c.i.f. basis and exports are reported on a f.o.b. basis, with the exception of a few countries for which imports are also available free on board (f.o.b.).

6. Historical Development trajectory Model

This factor variable, which I created using longitudinal clustering analaysis, takes four values, as follows:

  • Consistently Open
  • Initially Open, then closing
  • Initially Closed and Quickly Opening
  • Initially Closed and Gradually Opening

Two countries are outliers from these categories - Syria and Oman.

Here’s a graphic of the development trajectories and a corresponding table:

(1) (2) (3) (4) (5)
Consistently Open Initially Open, then Closing Initially Closed and Quickly Opening Initially Closed and Gradually Opening Outliers
Bahrain Lebanon Jordan Iraq Oman
Qatar Kuwait Cyprus Libya Syria
UAE Saudi Arabia Egypt Tunisia .
. . Israel Turkey .
. . . Sudan .
. . . Iran .
. . . Algeria .
. . . Morocco .

Relating Capital Openness to Corruption

This section contains code that looks at the co-variation of capital-account openness and corruption across countries.

corruption_ka_open_panel <- ggplot(data = data) +
  geom_point(mapping = aes(ka_open, wbgi_cce)) + 
  facet_wrap(~country)

corruption_ka_open_panel

NOTE - It might make sense to rearrange this material so that what’s shown below here moves to a different place (i.e., descriptives and inferentials)

Notes on Panel data estimation (from the plm documentation material)

plm is a general function for the estimation of linear panel models. It supports the following estimation methods:

  • Fixed effects (within)
  • Pooled OLS (pooling)
  • First-differences (fd)
  • Between effects model (between)
  • Random effects (AKA the error components model) (random)

It also supports unbalanced panels and two–way effects (although not with all methods).

For random effects models, four estimators of the transformation parameter are available by setting random.method to one of “swar” (Swamy and Arora (1972)) (default), “amemiya” (Amemiya (1971)), “walhus” (Wallace and Hussain (1969)), or “nerlove” (Nerlove (1971)).

For first–difference models, the intercept is maintained (which from a specification viewpoint amounts to allowing for a trend in the levels model). The user can exclude it from the estimated specification the usual way by adding “-1” to the model formula.

Instrumental variables estimation is obtained using two–part formulas, the second part indicating the instrumental variables used. This can be a complete list of instrumental variables or an update of the first part. If, for example, the model is y ~ x1 + x2 + x3, with x1 and x2 endogenous and z1 and z2 external instruments, the model can be estimated with:

  • formula=y ~ x1+x2+x3 | x3+z1+z2,
  • formula=y~x1+x2+x3 | .-x1-x2+z1+z2.

Balestra and Varadharajan-Krishnakumar’s or Baltagi’s method is used if inst.method="bvk" or if inst.method="baltagi", respectively.

The Hausman–Taylor estimator is computed if model = "ht".

Model 1: Basic FE model

Here’s the fixed-effects model looking at the naive relationship between the DV and IV:

est_wbgi_cce <- plm(wbgi_cce ~ ka_open,
                    data=data, 
                    index = c("country","year"),
                    model="within")

summary(est_wbgi_cce)$coef
##           Estimate Std. Error  t-value  Pr(>|t|)
## ka_open -0.1817211  0.1463699 -1.24152 0.2155963

Model 1.2: FE model adding the CIM intervening variable

This model simply adds the first confounding variable to the estimates.

est_wbgi_cce <- plm(wbgi_cce ~ ka_open + cim,
                    data=data,
                    index = c("country","year"),
                    model="within")

summary(est_wbgi_cce)$coef
##           Estimate Std. Error    t-value   Pr(>|t|)
## ka_open -0.3000633  0.1551125 -1.9344886 0.05443623
## cim      0.5155554  0.5649812  0.9125178 0.36257333
  • The coefficient for CIM is positive quadratically and negative linearly; negative at the median, negative at the maximum.

  • With a high level of CIM (close to 1), you’ll end up with an effect of -5. CIM predicts corruption over and above the variation that we would expect in wbgi_cce.

  • Presentation thought: try using the manipulate command to talk about CIM

Model 1.3: FE Model with CIM and Politial Economy Grouping

This model simply adds the first two confounding variable to the estimates.

est_wbgi_cce <- plm(wbgi_cce ~ ka_open + cim + factor(pol_econ_index),
                    data=data,
                    index = c("country","year"),
                    model="between")

summary(est_wbgi_cce)$coef
##                              Estimate Std. Error    t-value  Pr(>|t|)
## (Intercept)                -2.2677664  1.3586213 -1.6691674 0.1209432
## ka_open                     0.3320296  0.3424033  0.9697031 0.3513342
## cim                         2.1280893  1.5689897  1.3563437 0.1999609
## factor(pol_econ_index)RRLA -0.1536806  0.3600329 -0.4268514 0.6770457
## factor(pol_econ_index)RRLP  0.3480631  0.2631282  1.3227890 0.2105602

Model 1.4: FE model with all the confounders

Here I estimate the following model:

Governance = Banking confidence + Capital Account Openness + FDI flows + Trade partners + Domestic Contestation

I need to troubleshoot this, however. Right now, the below code isn’t compiling.

{r fe_all} est_wbgi_cce <- plm(wbgi_cce ~ ka_open + cim_sq + regime + ln_finance_flows + tradelib_region_primary + contest_ln_ttl, data=data, model = within)

est_wbgi_cce <- plm(wbgi_cce ~ ka_open + cim_sq + regime + ln_finance_flows + tradelib_region_primary + contest_ln_ttl, data=data, model = within)

summary(est_wbgi_cce)$coef

Model 2: Multi-level model

MLM Step 1: Obtain group means

Collecting group means (i.e., calculating means assuming a completely-pooled model) constitutes the first step in constructing a MLM. Group-mean centering variables is an important step in MLM. By group mean centering predictors, one receives an unadjusted estimate of the outcome variable. Group mean centering a predictor produces an estimate of the outcome variable at the average level of that predictor for each unit of analysis (country-level in this case).

groupmean_wbgi_cce <- aggregate(data$wbgi_cce, list(data$country), FUN = mean, na.rm=TRUE, data=data)

names(groupmean_wbgi_cce)<- c('country','groupmean_wbgi_cce')

groupmean_wbgi_cce
##                 country groupmean_wbgi_cce
## 1               Algeria        -0.62459236
## 2               Bahrain         0.33695715
## 3                Cyprus                NaN
## 4                 Egypt        -0.48311674
## 5                  Iran        -0.61493159
## 6                  Iraq        -1.40291764
## 7                Israel         0.96396438
## 8                Jordan         0.16050359
## 9                Kuwait         0.60497866
## 10              Lebanon        -0.66427816
## 11                Libya        -0.98591404
## 12              Morocco        -0.17406009
## 13                 Oman         0.36900749
## 14                Qatar         0.89276696
## 15         Saudi Arabia        -0.22797188
## 16                Sudan        -1.23452989
## 17                Syria        -0.88039590
## 18              Tunisia        -0.01988032
## 19               Turkey        -0.12692156
## 20 United Arab Emirates         0.83558780
## 21                Yemen        -0.88545023

MLM Step 2: Group-mean centering

Group-mean centering the measure of corruption, wbgi_cce, provides an estimate of the expected outcome for each country relative to the average level of corruption in the pooled model. Adding an additional level to the model, e.g., political economy grouping allows one to calculate country-level averages within their respective group along with deviations from a grand mean.

The aggregate function allows one to aggregate the predictor of interest (in the above case, perceptions of corruption) by group. Since the subsequent analyses focus on country as the unit of analysis, I aggregate across ‘id’, which represents the country name in the dataset. When group mean centering variables in the data, I aggregate across whatever group or variable I’ve assigned as the unit of analysis. The names function gives the new group mean centered variable the new name groupmean_wbgi_cce in the data file, again aggregating by country id.

The following code repeats this process for the two continuous-scaled explanatory variables, cim and ln_finance_flows, and

groupmean_cim <- aggregate(data$cim_sq, list(data$country), FUN = mean, data=data, na.rm=TRUE)
names(groupmean_cim)<- c('country','groupmean_cim')

groupmean_ln_finance_flows <- aggregate(data$ln_finance_flows, list(data$country), FUN = mean, data=data, na.rm=TRUE)
names(data$ln_finance_flows)<- c('country','groupmean_ln_finance_flows')

Merge Grouped Mean Data with the overall dataset in order to use group mean centered variables in subsequent analyses.

MLM Step 3: