Chapter 2: Quantitative Data Description and Analysis

This chapter introduces the descriptive and inferential statistical models I employ to evaluate the hypotheses developed in the previous theory chapter. Doing so, I build on the theoretical framework introduced above to draw links between the variables I use to quantify phenomena of interest and the underlying concepts for which they serve as observable implications.

Given the occassionally-opaque and often-proprietary nature of the predominately financial data I analyze, this chapter also includes a short discussion regarding the integrity of the data as it relates to the concepts they austensibly reify.

Indicators of Interest and their Operationalization:

1. Financial Integration

I examine international financial integration as a key variable in this dissertation. This variable consists of two distinct components: First, it captures government constraints on cross-border economic behavior and, particularly, on trade and international capital markets and investment. Second, it implies a quantification of cross-border economic activity. While the first aspect of the variable takes domestic policy choices regarding international marketization into consideration, the second aspect of the variable focuses on measures of trade and capital flows. This section describes trends in these variables in the Middle East.

There is no single measure that I am aware of that encompasses these different aspects of economic integration. Rather, I draw on several data sources to measure the concept.

De jure integration

First, I measure de jure international economic liberalization using the (Chinn and Ito 2009) measure of capital account openness. This index includes an aggregate measure of the extensity of state capital controls pertaining to FDI, portfolio investment, reserve accounts, and foreign investment in the domestic economy for the countries of the world from 1970 onward.

An examination of trends in this index shows that in the 1970s and 1980s the MENA region was divided between highly liberalized and mostly closed markets. Countries along the Persian Gulf, along with Lebanon, comprised the former category while the rest of the region occupied the latter. By the 1990s, however, many previously closed countries had opened their markets to foreign investment. In some cases, such as Egypt, Jordan, Israel, and Yemen, the degree of capital account liberalization was dramatic. In others, such as Cyprus, Iraq, Iran, and Sudan, substantial liberalization occurred during the 2000s, but to a lesser extent. These trends are found in the following figures sowing average Chinn-Ito Index Score of capital account openness by decade, 1970-2007. Higher values of the index indicate greater capital account openness.

[Note: Insert development trajectory model information here]

De facto integration

As shown in the following figure, during the 1970s there existed relatively little variation in capital flows among the countries of the MENA region. By the 1980s, however, countries exhibited a wide diversity in capital flows relative to the size of their economies. Several entrepôt countries (notably Qatar, Bahrain, and Kuwait) experienced massive expansion in capital flow extensity. For example, while throughout the 1970s Bahrain’s gross capital flows were the equivalent of 90 percent of its GDP, by 1989 gross capital flows had grown to 3,400 percent of GDP. Most of the other countries in the region experienced extensive but relatively modest growth in foreign assets and liabilities. By the 2000s, however, only three countries averaged gross capital flows less than their GDP; the other 18 countries of the MENA region averaged anywhere between gross capital flows as 115 percent of GDP to over 800 percent. Figure 2 shows these trends. Most recently, the MENA region has experienced a surge of foreign capital, both in absolute terms and relative to other regions of the world. As described by (???), “global FDI inflows have increased rapidly during the last decade by a rate of 141 percent, from $700 billion in 1998 to $1.7 trillion in 2008. FDI inflows in the Middle East, however, have increased at the phenomenal rate of 630 percent, twelve times faster than the global FDI.”

While the Chinn-Ito index evaluates the extent to which governments are open to international investment, it says nothing about the actual, de facto movement of foreign exchange. To measure these capital flows, I turn to primary data provided through External Wealth of Nations project (Lane and Milesi-Ferretti (2007)). This data set includes information on gross liabilities and assets for 145 countries covering the period 1970–2004. Additionally, it contains information about “the composition of international financial positions, including FDI, portfolio equity investment, external debt, and official reserves” (26). The data set also accounts for problems common to country-level data “such as valuation effects, and also corrects for some differences across countries in data definitions and variable construction” (26-27).

These figures show average gross financial flows by decade as a percentage of GDP, 1970-2007. Note that the top figure excludes Qatar and the bottom figure excludes Qatar, Bahrain, and Cyprus. In these countries average gross capital flows were greater than 500% of GDP.

De facto Integration Conditional on Regime Type

[NOTE: this section probably won’t stay here]

To assist with meaningful cross-national comparisons, I examine gross financial flows as a percentage of country GDP. These data are presented in Figure 3 by country and decade and in Figure 4 by regime type over time.

Adjudicating measures of financial integration

Measurements of de jure international financial liberalization, operationalized through the Chinn-Ito measure of capital account openness, motivate much of the research for this dissertation. There are three primary ways one could reasonably criticize this choice.

First, de jure measures of capital account openness reflect legal restrictions (or a lack thereof) on capital movements. Collateral risks or benefits of these restrictions are likely to be realized more accurately through measures of de facto integration, which can be quite different. Additionally, the Chinn-Ito measure does not capture the degree of enforcement of capital controls; many countries have capital controls that are strict in statute but not in practice. Similarly, capital account liberalization is neither a necessary nor a sufficient condition for changes to capital flows. The actual degree of integration of an economy into international capital markets is not truly captured using any measure of de jure capital account openness.

The choice of whether to measure financial flows with de jure versus de facto measures is a serious one. It should be justified on the basis of how well each measure provides leverage toward addressing the theoretically substantive questions raised in my thesis. As the project moves forward, there will be situations where is it more appropriate to use one measure rather than the other. On the other hand, much of what drives de facto financial capital flows is exogenous to a given country. The international economy as a whole governs the direction and magnitude of these capital streams. For this reason, I feel relatively confident about the choice to focus initially on de jure capital account openness: doing so gives greater insight into the internal, political dynamics of a country. Additionally, in terms of case selection, capital account policies and financial capital flows in Kuwait and Jordan follow one another in a roughly parallel manner.

Another potential point of criticism is that, despite the extensive coverage of the Chinn-Ito index, there could be other regulations that effectively act as capital controls but are not counted as controls. For example, “regulations that limit the foreign exchange exposure of domestic banks could, in some circumstances, have the same effect as capital controls.” There is not much that I can do to remedy this particular shortcoming of the Chinn-Ito index, or of de jure measures in general. On the other hand, as I learn more about individual cases, I should be able to grasp the extent to which capital controls versus other forms of regulation govern the movement of foreign investments.

Third, the Chinn-Ito index largely informs the ease with which private capital flows enter and leave a country. It does not encompass the effects of official flows, including foreign aid, and other flows such as remittances (which, I believe may appear in the current account of the balance of payments rather than the capital account). For some purposes, it will be preferable to examine only financial investments, consisting of FDI, portfolio equity inflows, debt inflows, and foreign borrowing by domestic banks. For other purposes, however, I will want to include other flows as well.

The real difficulty lies, I believe, in trying to understand situations in which private capital and official capital substitute for one another. For example, Saudi Arabia provides development aid to Jordan. Some of this occurs through official development assistance programs, but the Saudi government also invests in state-owned Jordanian companies as another means of providing support (and maintaining leverage). Additionally, remittances and cross border migrant employment play a major role in the economies of the Middle East. In the poorer countries of the region, such as Jordan, Oman, and Yemen, remittances have historically accounted for up to 50 percent of a country’s GDP.

It’s often possible to describe, qualitatively, cases in which private capital and official capital flows substitute for one another, but in many circumstances the details of how much money is involved and where exactly it comes from and goes to are unclear. I don’t know if scholars have attempted to quantify the degree to which different types of capital flows substitute for one another in a systematic way, but it may be worth investigating.

Descriptive Statistics and Multilevel Linear Model Analysis

In this chapter, I evaluate the hypotheses developed in the theory chapter here using descriptive statistics and mulitlevel modeling. Primarily, this analysis seeks to inform academic perspectives on the conditional relationship that exists between international capital flows and domestic perceptions of political corruption in the MENA region.

The first set of analyses provides descriptive information on the variables of interest. In the second section, I estimate inferential statistics starting with a basic (naieve) model and building up to multi-level models.

I present descriptive statistics to clear the theoretical brush, and I estimate multilevel models to isolate the effect of time-invariant country characteristics that condition the relationship between finance capital and corruption.

Although political scientists frequently work with hierarchical or multilevel data structures, the specific models they employ vary widely. For the purposes of this analysis, consider the following model:

\[y_i = \beta_0 + \beta_1 x_i + \beta_2 z_j + \epsilon_i\]

This model consists of an amalgamation of two levels of analysis - level-1 (\(x_i\)) and level-2 (\(z_j\)) factors.

Descriptive Statistics and Multilevel Linear Model Analysis

In this chapter, I evaluate the hypotheses developed in the theory chapter by using descriptive statistics, a fixed-effects specification of panel data relationships, and a set of multilevel modeling techniques to adjudicate among between-country differences. Primarily, this analysis seeks to inform academic perspectives on the conditional relationship that exists between international capital flows and domestic perceptions of political corruption in the MENA region.

The first set of analyses provides descriptive information on the variables of interest. In the second section, I estimate inferential statistics starting with a basic (naieve) model and building up to multi-level models.

I present descriptive statistics to clear the theoretical brush, and I estimate multilevel models to isolate the effect of time-invariant country characteristics that condition the relationship between finance capital and corruption.

Although political scientists frequently work with hierarchical or multilevel data structures, the specific models they employ vary widely. For the purposes of this analysis, consider the following model:

\[y_i = \beta_0 + \beta_1 x_i + \beta_2 z_j + \epsilon_i\]

This model consists of an amalgamation of two levels of analysis - level-1 (\(x_i\)) and level-2 (\(z_j\)) factors.

Dependent variable: Perceptions of Corruption

  • Variable name is wbgi_cce

    • It’s normally distributed and skewed left (mean is lower than median)
    • Based on World Bank Index values ranging from -2 to 2.

The following code summarizes the dependent variable:

hist(data\(wbgi_cce) summary(data\)wbgi_cce)

hist(data$wbgi_cce, main = NULL, xlab = "Density", ylab = "Perception of Corruption")

dv_summary <- ggplot(data, aes(x=wbgi_cce, y=country)) +
  geom_segment(aes(yend=country), xend=0, colour="grey50") + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank() # No horizontal grid lines
  )
dv_summary

corruption_panel <- ggplot(data = data) +
  geom_point(mapping = aes(year, wbgi_cce)) + 
  facet_wrap(~country)
corruption_panel

Independent variable of interest: Capital Account Openness

  • Variable name is ka_open

    • Its distribution is funky, kinda striated
    • The variable is an index based on principal components of IMF AREARS data values

This code summarizes the independent variable of interest:

hist_IV <- hist(data$ka_open, xlab = "Density", ylab = "Capital Account Openness", main = NULL)

iv_summary <- ggplot(data, aes(x=ka_open, y=country)) +
  geom_segment(aes(yend=country), xend=0, colour="grey50") + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank()
  )

iv_summary

ka_open_panel <- ggplot(data = data) +
  geom_point(mapping = aes(year, wbgi_cce)) + 
  facet_wrap(~country)

ka_open_panel

Variation in confounding (or leveled) variables

This section introduces confounding variables to the analysis. Here’s what I’m working with now:

Continuous-scale Variables:

  1. Contract intensive money; proxy for property rights: cim and cim_sq
  2. Total flows of finance capital per year: finance_flows

Factor variables:

[NOTE: I would like to present trends in factor variables in a better way - I’ll probably use color-coded regional maps]

  1. Political economy grouping: pol_econ_index (Cammett and Diwan)
  2. Regime type (Clement and Springborg): regime
  3. Region from which most trade originates: tradelib_region
  4. Historical development trajectory - devtraj.

1. Contract Intensive Money

This code summarizes the Contract Intensvie Money variable, cim, and its squared variation, cim_sq.

cim_panel <- ggplot(data, aes(x=cim, y=country)) +
  geom_segment(aes(yend=country), xend=0) + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank()
  )
  
cim_panel

cim_year <- ggplot(data = data) +
  geom_point(mapping = aes(year, cim)) + 
  facet_wrap(~country)

cim_year

2. Total Finance Flows

This code summarizes the ln_finance_flows variable.

finance_flows_panel <- ggplot(data, aes(x=ln_finance_flows, y=country)) +
  geom_segment(aes(yend=country), xend=0) + geom_point(size=1) + scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +
  theme_bw() +
  theme(panel.grid.major.y = element_blank()
  )
  
finance_flows_panel

finance_flows_year <- ggplot(data = data) +
  geom_point(mapping = aes(year, ln_finance_flows)) + 
  facet_wrap(~country)

finance_flows_year

3. Political Economy grouping (Cammett and Diwan)

This section adds a factor variable, pol_econ_index, which indicates whether the countries belong in the RRLA, RPLA, or RRLP political economy grouping. See (???).

The following table shows the relative distribution of the variable across countries in the region.

Country Group
Algeria RRLA
Bahrain RRLP
Cyprus
Egypt RPLA
Iran RRLA
Iraq RRLA
Israel
Jordan RPLA
Kuwait RRLP
Lebanon RPLA
Libya RRLP
Morocco RPLA
Oman RRLP
Qatar RRLP
Saudi Arabia RRLP
Sudan
Syria RRLA
Tunisia RPLA
Turkey RPLA
United Arab Emirates RRLP
Yemen RRLA

4. Regime Type (Henry and Springborg)

This section adds a factor variable, regime, which indicates whether the countries belong in the bunker, bully, monarchy, GCC, or semidemocracy category. These designations are from (Henry and Springborg 2010).

The following table shows the relative distribution of the variable across countries in the region.

Country Group
Algeria Bunker
Bahrain GCC
Cyprus
Egypt Bully
Iran Bully
Iraq Bunker
Israel Semi-democracy
Jordan Monarchy
Kuwait GCC
Lebanon Semi-democracy
Libya Bunker
Morocco Monarchy
Oman GCC
Qatar GCC
Saudi Arabia GCC
Sudan Bunker
Syria Bunker
Tunisia Bully
Turkey Semi-democracy
United Arab Emirates GCC
Yemen Bunker

“The six bunker states – Algeria, Iraq, Libya, Sudan, Syria and Yemen – display the least institutional capacity of any of the MENA states to manage their economies…” (Henry and Springborg 2010, p 114).

“Egypt, Tunisia, and the area controlled by the Palestinian Authority are not ruled from bunkers by elites beholden to clans, tribes, or other traditional social formations. In the case of Egypt and Tunisia…the ruling elites are at once both more narrowly and broadly based. Their rule rests almost exclusively on the institutional power of the military/security/party apparatus, but because these elites are not drawn from a clearly identifed social formation, they are at least not unrepresentative of their relatively homogenous political communities.” (Henry and Springborg 2010, p 162).

“The monarchies in the region are better positioned than praetorian republics to take advantage of the opportunites of globalization. They have more active private sectors, some of which have joint ventures and other constructive relationships with multinational companies, in petroleum-related industries for the most part. Many of them also have concentrated financial systems…that enable them to engage in a controlled liberalizataion consonant with the Washington Consensus. (Henry and Springborg 2010, p 212).

5. Region from which most trade originates

This lists the countries of the world that serve as the largest trade partner (as measured as gross imports and exports) for each of the countries in the MENA region:

I’ve included this information in the data set coded in two ways: identifying the largest two trade partners by region based on gross value of the Import and Export of goods.

Country Primary Trade Partner Secondary Trade Partner
Algeria USA EU
Bahrain MENA USA
Cyprus EU EU
Egypt USA EU
Iran China MENA
Iraq USA China
Israel USA EU
Jordan MENA USA
Kuwait Japan Korea
Lebanon EU USA
Libya EU EU
Morocco EU EU
Oman MENA China
Qatar Japan Korea
Saudi Arabia USA Japan
Sudan China MENA
Syria
Tunisia EU EU
Turkey Russia EU
United Arab Emirates Japan India
Yemen China MENA

Source: IMF Direction of Trade Statistics (DOTS)

Note: DOTS present current figures on the value of merchandise exports and imports disaggregated according to a country’s primary trading partners.

Note: Imports are reported on a c.i.f. basis and exports are reported on a f.o.b. basis, with the exception of a few countries for which imports are also available free on board (f.o.b.).

6. Historical Development trajectory Model

This factor variable, which I created using longitudinal clustering analaysis, takes four values, describeds as follows:

  • Consistently Open
  • Initially Open, then closing
  • Initially Closed and Quickly Opening
  • Initially Closed and Gradually Opening

Two countries are outliers from these categories - Syria and Oman.

Here’s a graphic of the development trajectories and a corresponding table. The final section of this draft includes details on how I constructed this model.

(1) (2) (3) (4) (5)
Consistently Open Initially Open, then Closing Initially Closed and Quickly Opening Initially Closed and Gradually Opening Outliers
Bahrain Lebanon Jordan Iraq Oman
Qatar Kuwait Cyprus Libya Syria
UAE Saudi Arabia Egypt Tunisia .
. . Israel Turkey .
. . . Sudan .
. . . Iran .
. . . Algeria .
. . . Morocco .

~~ Relating Capital Openness to Corruption: This section contains code that looks at the co-variation of capital-account openness and corruption across countries.~~

NOTE - It might make sense to rearrange this material so that what’s shown below here moves to a different place (i.e., descriptives and inferentials)

Note: clean up section with actual panel model write up (once I find it)

~~plm is a general function for the estimation of linear panel models. It supports the following estimation methods:

  • Fixed effects (within)
  • Pooled OLS (pooling)
  • First-differences (fd)
  • Between effects model (between)
  • Random effects (AKA the error components model) (random)

It also supports unbalanced panels and two–way effects (although not with all methods).

For random effects models, four estimators of the transformation parameter are available by setting random.method to one of “swar” (Swamy and Arora (1972)) (default), “amemiya” (Amemiya (1971)), “walhus” (Wallace and Hussain (1969)), or “nerlove” (Nerlove (1971)).

For first–difference models, the intercept is maintained (which from a specification viewpoint amounts to allowing for a trend in the levels model). The user can exclude it from the estimated specification the usual way by adding “-1” to the model formula.

Instrumental variables estimation is obtained using two–part formulas, the second part indicating the instrumental variables used. This can be a complete list of instrumental variables or an update of the first part. If, for example, the model is y ~ x1 + x2 + x3, with x1 and x2 endogenous and z1 and z2 external instruments, the model can be estimated with:

  • formula=y ~ x1+x2+x3 | x3+z1+z2,
  • formula=y~x1+x2+x3 | .-x1-x2+z1+z2.

Balestra and Varadharajan-Krishnakumar’s or Baltagi’s method is used if inst.method="bvk" or if inst.method="baltagi", respectively.

The Hausman–Taylor estimator is computed if model = "ht".~~

Model 1: Basic FE model

Here’s the fixed-effects model looking at the naive relationship between the DV and IV:

[NOTE that I need to add a Hausman test here to justify using an illustrative FE model]

est_wbgi_cce <- plm(wbgi_cce ~ ka_open,
                    data=data, 
                    index = c("country","year"),
                    model="within")

summary(est_wbgi_cce)$coef
##           Estimate Std. Error  t-value  Pr(>|t|)
## ka_open -0.1817211  0.1463699 -1.24152 0.2155963

Devoid of any potentially confounding variables and evaluating within-country effects, the relationship between corruption and cross-border capital flows appears indeterminate from the 1970s onward. This indeterminacy may dissapear, however, when conditioning on other factors.

Model 1.2: A FE model adding the CIM intervening variable

As hypothesized in Chapter 2, adding potentially confounding variables (cim in this case) to the model, we see a very different relationship between capital flows and governance outcomes. As shown below, cim predicts corruption over and above the variation that we would expect between wbgi_cce irrespective of the consideration of domestic economic instituational.

With the inclusion of (cim) in the model - i.e., by conditioning on individuals’ views of the general credibility of their country’s economic institutions and property rights - the estimated effect of cross national capital flows nearly doubles and becomes statistically signifcant.

est_wbgi_cce <- plm(wbgi_cce ~ ka_open + cim,
                    data=data,
                    index = c("country","year"),
                    model="within")

summary(est_wbgi_cce)$coef
##           Estimate Std. Error    t-value   Pr(>|t|)
## ka_open -0.3000633  0.1551125 -1.9344886 0.05443623
## cim      0.5155554  0.5649812  0.9125178 0.36257333

Ceteris paribus, increased capital openness will lead to increase perceptions of corruption, although the effects is small and statistically insignificant. All things (ceteris), however, are rarely equal (paribus) in the study of political phenomena. The above fixed effects models suggest the existance of a sorting mechanism, the presence of which will serve to condition the effects of globalization on goverance.

Given the potentially infinite number of variables that could serve to divide the effects of globalization between positive and negative outcomes, how is it possible to adjudicate different factors? As explored in Chapter 2, there are six prime suspects

Model 2: Multi-level model

MLM Step 1: Obtain group means

Collecting group means (i.e., calculating means assuming a completely-pooled model) constitutes the first step in constructing a MLM. Group-mean centering variables is an important step in MLM. By group mean centering predictors, one receives an unadjusted estimate of the outcome variable. Group mean centering a predictor produces an estimate of the outcome variable at the average level of that predictor for each unit of analysis (country-level in this case).

groupmean_wbgi_cce <- aggregate(data$wbgi_cce, list(data$country), FUN = mean, na.rm=TRUE, data=data)

names(groupmean_wbgi_cce)<- c('country','groupmean_wbgi_cce')

groupmean_wbgi_cce
##                 country groupmean_wbgi_cce
## 1               Algeria        -0.62459236
## 2               Bahrain         0.33695715
## 3                Cyprus                NaN
## 4                 Egypt        -0.48311674
## 5                  Iran        -0.61493159
## 6                  Iraq        -1.40291764
## 7                Israel         0.96396438
## 8                Jordan         0.16050359
## 9                Kuwait         0.60497866
## 10              Lebanon        -0.66427816
## 11                Libya        -0.98591404
## 12              Morocco        -0.17406009
## 13                 Oman         0.36900749
## 14                Qatar         0.89276696
## 15         Saudi Arabia        -0.22797188
## 16                Sudan        -1.23452989
## 17                Syria        -0.88039590
## 18              Tunisia        -0.01988032
## 19               Turkey        -0.12692156
## 20 United Arab Emirates         0.83558780
## 21                Yemen        -0.88545023

MLM Step 2: Group-mean centering

Group-mean centering the measure of corruption, wbgi_cce, provides an estimate of the expected outcome for each country relative to the average level of corruption in the pooled model. Adding an additional level to the model, e.g., political economy grouping allows one to calculate country-level averages within their respective group along with deviations from a grand mean.

The aggregate function allows one to aggregate the predictor of interest (in the above case, perceptions of corruption) by group. Since the subsequent analyses focus on country as the unit of analysis, I aggregate across ‘id’, which represents the country name in the dataset. When group mean centering variables in the data, I aggregate across whatever group or variable I’ve assigned as the unit of analysis. The names function gives the new group mean centered variable the new name groupmean_wbgi_cce in the data file, again aggregating by country id.

groupmean_cim <- aggregate(data$cim_sq, list(data$country), FUN = mean, data=data, na.rm=TRUE)
names(groupmean_cim)<- c('country','groupmean_cim')

groupmean_ln_finance_flows <- aggregate(data$ln_finance_flows, list(data$country), FUN = mean, data=data, na.rm=TRUE)
names(data$ln_finance_flows)<- c('country','groupmean_ln_finance_flows')

Merge Grouped Mean Data with the overall dataset in order to use group mean centered variables in subsequent analyses.

MLM Step 3:

[Finish model here]

A Group-Based Trajectory Model of Financial Openness in the Middle East, 1970-2013

[*Note - this section describes how I created the development trajectory model used above. I’ll probably move it somewhere else]

Trajectory Model Overview

The conceptual aim of this section is to describe and justify a typology for the various policies chosen in the countries of the Middle East regarding cross-border financial regulations from 1970-2013. To this end, it presents a quantitative, group-based trajectory model identifying clusters of countries that enacted similar policies at similar times.[^1] [^1]:Group-based trajectory models, also known as development trajectory models, are a specialized application of finite mixture models. In finite mixture models, analysts combine multiple distinct distributions to model data from populations known or suspected to contain a finite number of separate subpopulations. Such analysts use finite mixture models to help identify the structure of sub-groups (e.g. cluster analysis) or to examine unknown or potential distributional shapes (e.g. latent class analysis). On this point, see (???). The model distinguishes four groups, which I describe as follows: 1) consistently open; 2) initially open and then closing; 3) initially closed and quickly opening; and 4) initially closed and gradually opening. Figure 1 summarizes these group trajectories over time and by density while the above Table lists the countries of the region by group assignment.

While the four groups describe almost all countries in the region, two outliers, Syria and Oman, fail to fit any of these trajectories. Unique in the Middle East and North Africa (MENA) region, Syria maintained almost entirely closed capital account policies with brief exceptions in the 1970s. Oman also followed a distinctive trajectory, maintaining fully liberalized financial policies from the 1970s through the mid-1990s and again from the early 2000s onward. Between these two periods, however, Oman restricted capital investment flows. Last, note that due to missing data this analysis does not include Yemen.

The Group-Based Development Trajectory Model: Background and Rationale

Researchers often develop classifications using a combination of analysis and subjective insight. The use of subjective classification rules, however, risks creating groups that reflect random variation or that fail to identify meaningful but counter-intuitive patterns of progression. Uncertainty about the reliability of group assignment can also invalidate conventional statistical tests employed in general linear models that assume certainty of group membership assignment.

In a series of publications, Daniel Nagin, Bobby Jones, and several co-authors developed a method, group-based trajectory modeling, for sorting populations into groups based on shared actions and traits over time that overcomes such difficulties. B. L. Jones and Nagin (2007); B. L. Jones, Nagin, and Roeder (2001); Nagin (1999); Nagin and Tremblay (2001); D. Nagin (2005). Rather than assuming the existence of one or more shared progressions, the group-based development trajectory model they recommend tests whether and to what extent distinct trajectories emerge from longitudinal data series.

For such applications this group-based model is preferable to more-general ‘’growth analyses’‘approaches such as repeated measures multivariate analysis of variance (MANOVA) or structural equation modeling (SEM). Andruff et al. (2009). Such techniques are limited insofar as they would estimate a single trajectory based on an average score of individual trajectories in a sample. They estimate individual differences with a random coefficient representing the variability surrounded averaged intercept and slope values. In this sense, standard growth modes are useful for studying research questions where one expects units in a sample to vary in the same direction across time, albeit with variation among units. Group-based development trajectory modeling, on the other hand, identifies distinct subgroups of individuals following a similar pattern of change over time on a given variable. Similarly, the model improves on two common methods for modeling and analyzing developmental trajectories, hierarchical modeling and latent curve analysis. While such methods would describe population variability with multivariate continuous distribution functions, ``the group-based approach utilizes a multinomial modeling strategy and is designed to identify relatively homogenous clusters of developmental trajectories.’’ Loughran, Stephens, and Haviland (2007).

Scholars first developed the alternative group-based development trajectory model, employed here, for applications in developmental psychology and criminology fields to discuss typologies of human behavior, such as criminal recidivism and childhood delinquency. (???). In this sense, there are no apparent discipline-specific constraints on the method. Indeed, there exist several applications of this method in political science.

Mustillo (2009) for example, explicitly uses a group-based development trajectory model to discuss typologies of new political party volatility in Latin America. Plutzer (2002) draws on the technique to discuss divergent trajectories among voting turnout trends in the U.S., while Johnston, Jones, and Jen (2009) uses the model to categorize voting patterns in British electorates. Last, several studies incorporate the method in analyzing trends in terrorist group activity (LaFree and Dugan (2004), Dugan and Yang (2012)). As Mustillo notes, the technique ``has potential applications in political science but has not been widely used.’’ (???). These applications address small- to large-N population sizes and examine a similarly diverse range of units of analyses.

One should consider the relative paucity of applications in political science, however, in a somewhat broader methodological context. Development trajectory analysis is a subset of a general set of latent class analysis (LCA) tools employed widely in political science. Bakk, Oberski, and Vermunt (2014), for example, cites over twenty notable applications of latent class analysis in political science, published from 1985 to 2013 and ranging across all empirical subfields of the discipline.

The Group-Based Development Trajectory Model: Form and Application

(B. L. Jones and Nagin 2007) describes this statistical model as follows. Suppose \(Y_i=y_{i1},y_{i2},y_{i3},\ldots,y_{iT}\) represents a sequence of measurements for one unit, \(i\), over a number of time periods, \(t\), and that \(P(Y_i)\) represents the probability of \(Y_i\). In the context of this project, \(Y\) measures the capital account openness of a given country, \(i\), for a given year, \(t\). The group-based trajectory model assumes that a given population contains of a mixture of underlying trajectories, \(j\), such that

\[P(Y_i )=\displaystyle\sum_j{\pi_j P^j (Y_i)},\] where \(P^j (Y_i)\) is the probability of \(Y_i\) given membership in group \(j\), and \(\pi_j\) is the probability of group \(j\). For a given value of \(j\), one assumes conditional independence ``for the sequential realizations of the elements of \(Y_i,y_{it},\) over the \(T\) periods of measurement.’’ (D. Nagin 2005, 26–27). Thus,

\[P^j(Y_i )=\prod\limits^{T} p^{jt} (y_{it})\]

This conditional independence assumption deserves some discussion as it may often seem implausible. In the context of this research project, the assumption means that conditional upon a country, \(i\), being a member of a given trajectory group, \(j\), one assumes that the country’s financial openness outcomes over successive years, \(T\), are independent of one another. This seems to imply that outcomes at a given time are uncorrelated with past outcomes. As Nagin notes, “at the level of the group, which is not observed, this is indeed the case.” For units within a given group, however, outcomes are not “serially correlated in the sense that individual-level deviations from the group trend are uncorrelated”. (???, p 27).

As such, the conditional independence assumption is made at the group- rather than the individual-level. Investigating the consequences of violating the conditional independence assumption in a group-based development trajectory model, Thomas Loughan, employs a Monte Carlo simulation to observe how the general model behaves when applied to serially correlated data (i.e., when successive outcomes are not independent conditional on trajectory group). He finds that “even in the presence of severely serially correlated error terms, traditional model estimation yields trajectory parameters which are unbiased, yet underestimates the associated standard errors of some parameters. Furthermore, we find that serial dependence may also exacerbate the level of separation between trajectory groups.” (???).

Within this model, a multinomial logit function estimates the group membership probabilities \(\pi_j,j=1,\ldots,J:\)

\[\pi_j = e^{\theta j}/\sum_1^J e^{\theta j}\]

where \(\theta_1\) is normalized to zero, ensuring that each such probability falls between 0 and 1.

Given the nature of the capital account openness data, conditional on membership in group \(j\), it is reasonable to assume that \(p^{jt} (y_{it} )\) follows a censored normal distribution. In this case, a latent variable, \(y_{it}^{*j},\) establishes a linkage between time and country-specific trends. Doing so, it is possible to think of this latent variable as measuring the potential for undertaking a given set of capital account policies.

Though a higher order polynomial relationship could exist between \(y_{it}^{*j},\) and time, I estimate the relationship as quadratic based on visual trends in development trajectories:

\[y_{it}^{*j}=\beta_0^j + \beta_1^j Year_{it} + \beta_2^j Year_{it}^2+ \epsilon_{it}\]

where \(\epsilon_{it}\) is a pooled error term, with an assumed normal distribution, a mean of zero and constant standard deviation, \(\sigma\).

The objective of this modeling exercise is to identity groups that share distinctive individual-level trajectories. After applying the formal statistical criteria described above, it is still necessary to use subjective judgment to make a well-founded decision on the number of groups to include in the model. Selecting a reasonable number of groups involves a two-stage process. In the first stage, an analyst assigns an upper limit to and polynomial order for the total number of possible groups based on contextual knowledge and general trends in the data. In the second stage, the analyst estimates a model for each possible group up to the preset maximum number of groups, using a Bayesian Information Criterion (BIC) to guide model choice [D. Nagin (2005)).[^3][^3]: The BIC is a general model selection specification based on the likelihood function. For a discussion of BIC as a model selection criterion, see (Gelman and Rubin (1999); Raftery (1995); Raftery (1999)). For a given model, one can calculate the BIC as follows:

\[BIC=log(L)-0.5k \, log(N)\]

where \(L\) is the value of the model’s maximum likelihood, \(N\) is the sample size, and \(k\) is the number of parameters in the model, which are determined by the order of the polynomial used to model each trajectory and the number of groups. Following this criterion, expanding the model by adding another trajectory group is only desirable if the resulting increase in fit, as measured by the change in \(log(L)\), is larger than the penalty for adding more parameters.(D. Nagin 2005: 64-66.)

Justifying a Four-Group Model

For this project, following these stages yields six distinct groups, four of which include more than one country. First, I set the upper limit for the total possible groups to seven based on trends in model convergence with different potential group sizes and the relatively small size of the population. In the context of capital account openness in the Middle East, a small but nontrivial number of countries maintained fully liberalized financial policies. To accommodate these groups, all of the models in the first-stage include one group specified to follow a zero-order polynomial trajectory. I model the trajectories for the remaining groups using quadratic polynomials to reflect non-linear development paths.1

For example, the hypothetical three-group model includes one group following a zero-order trajectory and two groups following a quadratic trajectory. The four-group model includes one zero-order group and three quadratic groups, and so forth.

\begin{table}
\begin{center}
\caption{Using BIC to Select the Number of Groups to Include in the Model}
\vspace{3 mm}
\label{tab2}
\newcolumntype{R}{>{\centering}X}
\begin{tabular}{c|c|c|c}
No. of Groups & BIC(\textit{N}=834) & BIC(\textit{N}=20) & Prob. Correct Model\\\hline

2 & -429.35 &   -418.16 &   0.00\\
3 & -300.61 &   -281.95 &   0.00\\
4 & -275.67 &   -249.55 &   0.00\\
5 & -239.43 &   -205.86 &   0.00\\
6 & -175.4 &    -134.36 &   0.99\\
7 & -208.84 &   -160.34 &   0.01\\

\end{tabular}
\end{center}
\end{table}

The table reports BIC scores for models made up of two to seven groups. This table reports two BIC scores. The first refers to the full population of country-years while the smaller reported sample size pertains to country-level observations without a time component. Describing this choice Nagin writes,

“In theory \(N\) is meant to measure the number of independent observations that make up the sample. Because the intra-individual observations are not totally independent, the \(N\) across individual and time overstates the theoretically correct \(N\). On the other hand, \(N\) measured by the number of individuals understates the true \(N\), because intra-individual variation across assessments is to some degree independent.” D. Nagin (2005), p. 68]. As a result of this ambiguity, the two values bracket a theoretically ``correct’’ BIC score.

For either value of \(N\), the BIC scores in Table ? rise steadily (i.e., become less negative) as the number of groups increase to six and then begin to decline. Therefore, the BIC calculations seem to select a six-group model as best. It is difficult to judge whether the six-group model choice is better than another, however, without using a concrete standard for calibrating and then judging the magnitude of change in BIC.

Bayes factors are the typical statistical construct used for calibrating the substantive importance of a change in BIC scores and testing a hypothesis regarding the comparison of two or more models.[^fn] [^fn]: (???; See also, e.g., Kass and Raftery 1995).

Due to the difficulty or impossibility of calculating the Bayes factor, however, Kass and Raftery (1995)], and (Schwarz and others 1978) and (Kass and Wasserman 1995) argue for the use of a related metric for comparing models. Following their work, ``let \(p_j\) denote the probability that a model with \(j\) groups is the correct model from a set of \(J\) different models. They show that \(p_j\) is reasonably approximated by:

\[e^{BIC_j - BIC_{max}}/\sum_j e^{BIC_j - BIC_{max}}\]

where \(BIC_{max}\) is the maximum BIC score of the \(J\) different models under consideration (???). The fourth column of Table 2 reports the probabilities that the models with varying numbers of groups are the true model as computed by the above equation. With this BIC-based probability approximation, the probability that a six-group model is correct is \(0.99\).

References

Andruff, Heather, Natasha Carraro, Amanda Thompson, Patrick Gaudreau, and Benoît Louvet. 2009. “Latent Class Growth Modelling: A Tutorial.” Tutorials in Quantitative Methods for Psychology 5 (1): 11–24.

Bakk, Zsuzsa, Daniel L Oberski, and Jeroen K Vermunt. 2014. “Relating Latent Class Assignments to External Variables: Standard Errors for Correct Inference.” Political Analysis. Oxford University Press, mpu003.

Chinn, Menzie, and Hiro Ito. 2009. “The Chinn-Ito Index: A de Jure Measure of Financial Openness.” Available from Chinn-Ito_website. Htm.

Dugan, Laura, and Sue-Ming Yang. 2012. “Introducing Group-Based Trajectory Analysis and Series Hazard Modeling: Two Innovative Methods to Systematically Examine Terrorism over Time.” In Evidence-Based Counterterrorism Policy, 113–49. Springer.

Gelman, Andrew, and Donald B Rubin. 1999. “Evaluating and Using Statistical Methods in the Social Sciences: A Discussion of ?a Critique of the Bayesian Information Criterion for Model Selection?” Sociological Methods & Research 27 (3). SAGE PUBLICATIONS, INC.: 403–10.

Henry, Clement Moore, and Robert Springborg. 2010. Globalization and the Politics of Development in the Middle East. Vol. 1. Cambridge University Press.

Johnston, Ron, Kelvyn Jones, and Min-Hua Jen. 2009. “Regional Variations in Voting at British General Elections, 1950–2001: Group-Based Latent Trajectory Analysis.” Environment and Planning A 41 (3). SAGE Publications Sage UK: London, England: 598–616.

Jones, Bobby L, and Daniel S Nagin. 2007. “Advances in Group-Based Trajectory Modeling and an Sas Procedure for Estimating Them.” Sociological Methods & Research 35 (4). Sage Publications Sage CA: Los Angeles, CA: 542–71.

Jones, Bobby L, Daniel S Nagin, and Kathryn Roeder. 2001. “A Sas Procedure Based on Mixture Models for Estimating Developmental Trajectories.” Sociological Methods & Research 29 (3). Sage Publications: 374–93.

Kass, Robert E, and Adrian E Raftery. 1995. “Bayes Factors.” Journal of the American Statistical Association 90 (430). Taylor & Francis Group: 773–95.

Kass, Robert E, and Larry Wasserman. 1995. “A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwarz Criterion.” Journal of the American Statistical Association 90 (431). Taylor & Francis: 928–34.

LaFree, Gary, and Laura Dugan. 2004. “How Does Studying Terrorism Compare to Studying Crime?” In Terrorism and Counter-Terrorism, 53–74. Emerald Group Publishing Limited.

Lane, Philip R, and Gian Maria Milesi-Ferretti. 2007. “The External Wealth of Nations Mark Ii: Revised and Extended Estimates of Foreign Assets and Liabilities, 1970–2004.” Journal of International Economics 73 (2). Elsevier: 223–50.

Loughran, Tom, Mel Stephens, and Amelia Haviland. 2007. “Three Essays on the Modeling of Development.” Citeseer.

Mustillo, Thomas J. 2009. “Modeling New Party Performance: A Conceptual and Methodological Approach for Volatile Party Systems.” Political Analysis. SPM-PMSAPSA, mpp007.

Nagin, Daniel. 2005. Group-Based Modeling of Development. Harvard University Press.

Nagin, Daniel S. 1999. “Analyzing Developmental Trajectories: A Semiparametric, Group-Based Approach.” Psychological Methods 4 (2). American Psychological Association: 139.

Nagin, Daniel S, and Richard E Tremblay. 2001. “Analyzing Developmental Trajectories of Distinct but Related Behaviors: A Group-Based Method.” Psychological Methods 6 (1). American Psychological Association: 18.

Plutzer, Eric. 2002. “Becoming a Habitual Voter: Inertia, Resources, and Growth in Young Adulthood.” American Political Science Review 96 (01). Cambridge Univ Press: 41–56.

Raftery, Adrian E. 1995. “Bayesian Model Selection in Social Research.” Sociological Methodology. JSTOR, 111–63.

———. 1999. “Bayes Factors and Bic: Comment on ?a Critique of the Bayesian Information Criterion for Model Selection?” Sociological Methods & Research 27 (3). SAGE PUBLICATIONS, INC.: 411–27.

Schwarz, Gideon, and others. 1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6 (2). Institute of Mathematical Statistics: 461–64.


  1. The model outcome is the same when using cubic or quartic polynomials.