Consider four examples. Each of these is a subset from one of the causal loop diagrams constructed by group model builders in RYK. For the purposes of a tractably small example, I’ve extracted just the nodes child health
, child attendance
, child focus on education
, child interest in eduction
, child inclusion in learning
and child results
; and the edges that connect these nodes.
It’s easy to see that there are some similarites between these, as well as some differences. Our goal here is to work with a large number of CLDs in order to see which elements are widely agreed upon in all CLDs, and to see what differences (geographic, economic, etc.) among the model builders systmatically drive differences in the CLDs.
Here, to quantitatively asses this, we will borrow commonly used techniques from ecology to calulate a ‘distance’ measure among any two CLDs. We may use the resulting pairwise distances to:
These methods begin with constructing a presence-absence matrix or a weighted matrix, in which the things being compared are in rows, and the characteristics used for measurement are in the columns. In ecology, the rows might plots in which species were sampled, and the columns species names. Cells would hold an indicating of whether that species is present or absent in a plot.
In this case, the first step is to decompose the CLDs into their characteristics. On strategy for this decompostion is to make an ‘edgelist’, where each edge consists of a ‘cause’ node, an ‘effect’ node, and the the polarity of connnection between them. For instance, Child health increases child attendance
is an edge found in three of our four examples above.
Complete edgelists for each of the examples above look like this.
You’ll see that Child health increases child attendance
is represented in the edgelists for each model that contain it, as Edge 6 in GGPS Bahudi Pur Machian, Edge 3 in GPS 74 NP and Edge 3 in GCCMS Qaiser Chohan.
If we treat each of the edges as we would a species in ecological analysis, and each of the schools as a plot, the resulting presence-absence matrix looks like this, with 1 indicating presence and 0 indicating absence.
C attendance increases C focus ed. | C attendance increases C inclusion L | C focus ed. increases C inclusion L | C focus ed. increases C results | C health increases C attendance | C inclusion L increases C attendance | C inclusion L increases C results | C interest ed. increases C attendance | C results increases C inclusion L | |
---|---|---|---|---|---|---|---|---|---|
GGCMS QAISER CHOHAN | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
GGPS 74 NP | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
GGPS BAHUDI PUR MACHIAN | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
GPS Chack 76/NP | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 |
You can now see the edge Child health increases child attendance
in column 5, in which each of the CLDs (rows) in which that edge occurs are marked with a 1.
We may turn this presence-absence matrix into distances with a number of different distance formulae. Here, we’ll use Euclidian distance, a very simple distance metric defined by sqrt(sum(x[ij]-x[ik])^2)
where x[ij]
and x[ik]
refer to the values in edge (column) i
and CLDs (rows) j
and k
. The triangular distance matrix below gives the pairwise euclidian distances among CLDs (in both columns and rows).
GGCMS QAISER CHOHAN | GGPS 74 NP | GGPS BAHUDI PUR MACHIAN | GPS Chack 76/NP | |
---|---|---|---|---|
GGCMS QAISER CHOHAN | 0.000000 | 2.000000 | 2.645751 | 2.645751 |
GGPS 74 NP | 2.000000 | 0.000000 | 2.236068 | 1.732051 |
GGPS BAHUDI PUR MACHIAN | 2.645751 | 2.236068 | 0.000000 | 2.000000 |
GPS Chack 76/NP | 2.645751 | 1.732051 | 2.000000 | 0.000000 |
Here you’ll see that the distance between GPS 76NP and GPS 74 NP is the smallest (about 1.7), because these two CLDs only differ in three edges (Child focus on education increases hild inlusion in learning
, Child health increases child attendance
and Child interest in education increases child attendance
) out of the nine possible.
Then, if we want to visualize this pairwise distance matrix, we may do it with nonmetric multidimensional scaling. This is a version as implemented in the vegan function metaMDS, which tries several different random starts of the NMDS to produce the dimensional reduction with the least stress on the the original multidimensional distance matrix. In this case, our stress is essentially zero, as the model is very simple.
Distances between each CLD point in the ordination space below are proportional to their Eucldian distances, based on columns (edges) we provided. You’ll see that the distance between GPS 76NP and GPS 74 NP is the smallest, and that other distances.
However, these distances are only as good as the presence absence matrix we provide. Consider the edge Child attendance increases child inclusion in learning
. Currently, this edge is considered “present” only in GCCMS Qaiser Chohan.
C attendance increases C focus ed. | C attendance increases C inclusion L | C focus ed. increases C inclusion L | |
---|---|---|---|
GGCMS QAISER CHOHAN | 0 | 1 | 0 |
GGPS 74 NP | 1 | 0 | 1 |
GGPS BAHUDI PUR MACHIAN | 1 | 0 | 1 |
GPS Chack 76/NP | 1 | 0 | 1 |
However, in each of the other three, we have both the edges Child attendance increases child focus on education
and Child focus on education increases child inclusion in learning
. We may consider that in a CLD, these two edges entails the causal chain Child attendance increaseses child inclusion in learning
(via child focus on education).
Put in terms of a matrix, we would need to add new edges making the inferred link explicit.
We’ll now use numerical codes for our nodes, or things will get very crowded. We will code “increases” as 1, “decreases” as -1, and nodes with numerical codes as follows:
node | code |
---|---|
Child focus on education | 1 |
Child interest in education | 2 |
Child inclusion in learning | 3 |
Child results | 26 |
Child health | 29 |
Child attendance | 30 |
When we expand the edgelist to include causal chains, the resulting edgelists are:
And then, adding these inferred links of a cause and its ultimate effect to our presence-absence matrix, the result is:
1,-1,1 | 1,-1,2 | 1,-1,26 | 1,-1,3 | 1,-1,30 | 1,1,1 | 1,1,2 | 1,1,26 | 1,1,3 | 1,1,30 | 2,-1,2 | 2,-1,26 | 2,-1,30 | 2,1,1 | 2,1,2 | 2,1,26 | 2,1,3 | 2,1,30 | 26,-1,26 | 26,-1,30 | 26,1,1 | 26,1,2 | 26,1,26 | 26,1,3 | 26,1,30 | 29,1,1 | 29,1,2 | 29,1,26 | 29,1,3 | 29,1,30 | 3,-1,1 | 3,-1,2 | 3,-1,26 | 3,-1,3 | 3,-1,30 | 3,1,1 | 3,1,2 | 3,1,26 | 3,1,3 | 3,1,30 | 30,-1,1 | 30,-1,2 | 30,-1,3 | 30,-1,30 | 30,1,1 | 30,1,2 | 30,1,26 | 30,1,3 | 30,1,30 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GGCMS QAISER CHOHAN | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
GGPS 74 NP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
GGPS BAHUDI PUR MACHIAN | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
GPS Chack 76/NP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
And the pairwise distances resulting from these presence-absence matrices based on ultimate effects are slightly different from our original analysis which only includied direct links.
In the future, we can run each of these methods, to see how our choice of method affects our conclusions.
Here we use complete edgelists (all nodes) for all RYK and Ghazni CLDs.
This function tests a number of environmental variables against the ordination to see how well they fit. As an initial pass, I’ve run for each school “DISTRICT” and the average summed child responsese to sections B, C, D, E, and F. You can see that, for this data, there is a significant difference by discrict (r-squared 0.311, p=0.009). The D and E variables look like they could possibly be signficant once we include more data (that is, they have an ok r-squared, but a p>0.05).
You can see in the cluster plot that for the most part, each disrict occupies a different space in the ordination - this is why we get a reasonably high r-squared for this variable.
## [1] 23 7
##
## ***VECTORS
##
## NMDS1 NMDS2 r2 Pr(>r)
## B 0.22978 0.97324 0.1869 0.106
## C 0.39708 0.91779 0.1276 0.217
## D 0.13753 0.99050 0.2274 0.072 .
## E -0.34184 -0.93976 0.2150 0.079 .
## F 0.60921 0.79301 0.1016 0.325
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
##
## ***FACTORS:
##
## Centroids:
## NMDS1 NMDS2
## DISTRICT_NAMEGhazni -2.2696 -1.0259
## DISTRICT_NAMEMoqur 0.7155 -1.3023
## DISTRICT_NAMEQarabagh -0.3490 -1.2626
## DISTRICT_NAMERahim Yar Khan 0.3567 0.8625
## SCHOOL_NAMEAbo Ali Sina -0.4461 -2.3216
## SCHOOL_NAMEBahawodin 0.9377 -1.4399
## SCHOOL_NAMECMS Chah Khaji Wala -0.8269 3.2084
## SCHOOL_NAMEDelaram 0.3853 -0.3175
## SCHOOL_NAMEGBPS 108/P -2.1488 0.7691
## SCHOOL_NAMEGBPS Allah Dina (JAM ABDULLAH) 0.2617 1.0931
## SCHOOL_NAMEGBPS CHAK 222 P -0.7438 -0.1573
## SCHOOL_NAMEGGCMS QAISER CHOHAN -0.5959 0.8781
## SCHOOL_NAMEGGPS 225P 0.0994 0.2378
## SCHOOL_NAMEGGPS 74 NP 0.5152 1.7916
## SCHOOL_NAMEGGPS 91/P (Basti Kot Doctor ) 0.9701 0.6491
## SCHOOL_NAMEGGPS BAHUDI PUR MACHIAN 1.5545 2.9100
## SCHOOL_NAMEGGPS Manzoor Khan Gola (NFS Gulshan Arrain) 3.8526 0.3357
## SCHOOL_NAMEGPS 48/P -1.7055 -1.0137
## SCHOOL_NAMEGPS Chack 76/NP 2.7420 1.4926
## SCHOOL_NAMEGPS Chak 82/NP -0.4966 0.3390
## SCHOOL_NAMEKhalid Bin Walid 0.9252 -1.8589
## SCHOOL_NAMEMirza Khil 0.2895 -1.1148
## SCHOOL_NAMENawabad -0.4902 -1.0846
## SCHOOL_NAMENFS Sawan Awan 1.5155 -0.4586
## SCHOOL_NAMEnone_found -6.8025 1.0196
## SCHOOL_NAMEShahid Faiz Mohammd -0.1106 -0.3817
## SCHOOL_NAMEShahr E Kohna Girl High School -0.3915 -3.7799
## SCHOOL_NAMEZarkashan 0.7096 -0.7956
##
## Goodness of fit:
## r2 Pr(>r)
## DISTRICT_NAME 0.3081 0.018 *
## SCHOOL_NAME 1.0000 1.000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
What are the most-prevalent causal links?
cause_label | polarity | effect_label | models_with_direct | models_with_indirect |
---|---|---|---|---|
Child focus on education | 1 | Child inclusion in learning | 18 | 2 |
Child interest in education | 1 | Child inclusion in learning | 16 | 5 |
Child attendance | 1 | Child inclusion in learning | 14 | 7 |
Child inclusion in learning | 1 | Child results | 13 | 4 |
Child health | 1 | Child attendance | 12 | 1 |
Child inclusion in learning | 1 | Child interest in education | 11 | 10 |
Child inclusion in learning | 1 | Child focus on education | 8 | 10 |
Lesson learning | 1 | Child inclusion in learning | 8 | 1 |
Learning material | 1 | Child inclusion in learning | 8 | 5 |
Child interest in education | 1 | Child focus on education | 7 | 8 |
Child attendance | 1 | Child focus on education | 7 | 9 |
Child attendance | 1 | Child interest in education | 7 | 11 |