This is a RMarkdown version of the tutorial “Performing fuzzy- and crisp set QCA with R A user-oriented beginner’s guide” version 07.03.2018 Authors: Dr. Eva Thomann Ioana-Elena Oana Stefan Wittwer Bamberg, March 18

I use most of the examples and the text from this source, often verbatim. Different from the tutorial, I use real data so as to have computable examples.

Data

We use the data set from the QCA package, also often used by Ragin and others.

data(LR)
data(LC)
data (LM)
data(LF)

LR are the raw data, LC is a crisp set, LM is a multi-value fuzzy set, and LF a fuzzy set version of LR.

A data frame containing 18 rows and the following 6 columns:

Switching to the naming style in the tutorial:

mydata <- LR

Chapter 4 Calibration and operation with sets

(Note. Chapters 1-3 are mainly on generic R and are skipped here.)

4.1 Calibration

4.1.1 Fuzzy sets

Direct method of calibration

We want to practice calibration on the DEV variable. Firs we copy this over into the name used in the tutorial, rawvar:

(To remove DEV we could domydata$DEV <- NULL but let’s not.)

The values in this variable are:

 [1]  720 1098  586  468  590  983  795  390  424  662  517 1008  350  320  331  367  897 1038

To calibrate DEV, we in principle should use theory. Here, we use what looks like reasonable values from the 18 values:

  • e - threshold for nonmembership: 300
  • c - crossover point: 600
  • i - full membership: 800
mydata$MYFUZZYSET <- round(calibrate(mydata$rawvar, type = "fuzzy", thresholds = "e=300, c=600, i=800", logistic=TRUE), digits=2)
mydata$MYFUZZYSET
 [1] 0.85 1.00 0.47 0.21 0.48 1.00 0.95 0.11 0.15 0.71 0.31 1.00 0.08 0.06 0.07 0.09 0.99 1.00

(NOTE. The naming convention is to use capital var names for sets, and small letters for negated sets, see below.)

The data in the fuzzy set vary between zero and one. logistic = TRUE is the default; to use a linear method, specify logistic = FALSE.

Insert here what the meaning and difference between the two is.

Values can be overwritten by hand, if needed, like so:

mydata$MYFUZZYSET[mydata$rawvar == 3] <- 0.05

doing so requires a good reason, of course, and it reduces reproducability.

Multi-value fuzzy sets

This is a ‘manual’ method, essentially a recoding of a variable. For instance:

mydata$MYFUZZYSET <- NA
mydata$MYFUZZYSET[(mydata$rawvar >= 1)&(mydata$rawvar <= 400)] <- 0
mydata$MYFUZZYSET[(mydata$rawvar >= 401)&(mydata$rawvar <= 800)] <- 0.3
mydata$MYFUZZYSET[(mydata$rawvar >= 801)] <- 1
mydata$MYFUZZYSET
 [1] 0.3 1.0 0.3 0.3 0.3 1.0 0.3 0.0 0.3 0.3 0.3 1.0 0.0 0.0 0.0 0.0 1.0 1.0

Avoid 0.5 as value!

Crisp sets

For calibratig a crisp set we need only to specify the crossover point, e.g. 600. Of course, this can also be done directly on the data by recoding.

mydata$CRISPSET <- calibrate(mydata$rawvar, type = "crisp", thresholds = 600, include = TRUE)
mydata$CRISPSET
 [1] 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1 1

If you set include = TRUE, values of 50 will be calibrated as set membership 1. If it is set to FALSE, values of 50 will be counted as set membership 0.

Avoid 0.5 as value!

Tips for calibration

Labelling sets

Future versions of QCA will have a lot of commands that allow you to negate sets using lowercase notation. In this scenario, whether you use upper- or lowercase letters is really decisive. To avoid problems and confusion, always use uppercase letters for labelling your sets, and lowercase letters for their negation.

Keep labels for sets short.

Finding calibration thresholds

The QCA package does offer data-driven ways to find calibration thresholds. They are not included in this manual because the resulting sets are very hard to interpret. The most important analytic choice is that of the threshold that establishes the difference in kind. Whenever possible, use conceptual and theoretical criteria for the crossover point and avoid using purely empirical criteria such as descriptive statistics. Avoid using the median as crossover point: it can usually not be interpreted other than the set of “cases with values equal as or higher than 50% of the other cases”. Similarly, if you use the sample mean (e.g., for unemployment) as crossover point, the conceptual meaning of the set is “unemployment above average in the cases observed”. All this does not mean that empirical criteria are not important for determining calibration thresholds. In particular, you should avoid overly skewed sets (4.2.2 and 8.1) and empirical cases on the crossover point (4.2.3).

Technically, it is easy to calibrate sets. However, calibration is essentially a process of concept formation and definition that interacts decisively with your theory, research design and results. Therefore, calibration is one of the most demanding analytic phases of a QCA analysis. Take a lot of time for calibration, and try out different options. If you find that several different crossover points are equally plausible, try them all and see how that affects the analysis. This is called a robustness test and a very good thing to do (see also Maggetti and Levi-Faur 2013). See 4.2.1 and the online appendix of Hinterleitner, Sager and Thomann (2016) for an illustration.

After calibrating your raw data, save the calibrated sets in a new dataset. Some commands do not work if the dataset does not contain only variables that range from 0-1.

myfuzzydata <- subset(mydata, select = c("mydata$MYFUZZYSET", "mydata$CRISPSET")) write.csv2(myfuzzydata, "myfuzzydata.csv")

4.2 Calibration diagnostics

You can visualize the calibration with an XY plot, which will equally show you not only if there are cases on the 0.5 threshold, but also how the cases distribute in the set (see also section 5 on graphs).

A basic way of doing this can be to plot a set against its raw scores and set a horizontal line at the crossover point, as well as a vertical line at the raw value that indicates the crossover point (here: 600).

mydata$MYFUZZYSET <- round(calibrate(mydata$rawvar, type = "fuzzy", thresholds = "e=300, c=600, i=800", logistic=TRUE), digits=2)
plot(mydata$rawvar, mydata$MYFUZZY, pch=18, col="black", main='MYFUZZYSET',
xlab=' Raw score ',
ylab=' Fuzzy score ')
abline(h=0.5, col="black") 
abline(v= 600, col="black")

In the example below, we plot the calibration of “MYFUZZYSET”. The crossover point is set at 600, and indicated by a vertical black line (v= 60); we have also added a horizontal black line at set membership 0.5 (h=0.5 – optional). In addition, the last two command lines (optional) add two dotted vertical lines to the graph to indicate two alternative plausible crossover points (500 and 700) that we decided could be tested for robustness. The plot shows us whether, for example, changing the crossover point from 500 to 700 would change the qualitative set membership of an empirical case (indicated by a dot in-between the black and the left-hand side dotted line).

plot(mydata$rawvar, mydata$MYFUZZYSET, pch=18, col="black", main='MYFUZZYSET',
   xlab=' Raw score ',
   ylab=' Fuzzy score ')
abline(h=0.5, col="black")
abline(v= 600, col="black")
abline(v= 500, col="black", lty="dotted") 
abline(v= 700, col="black", lty="dotted")

In the following plot, we only test for one possible alternative crossover point (25 = regular crossover point, 500 = alternative crossover point). The line for the alternative crossover point is shaded grey, and we also add the data curve for the alternative calibration using the points option – again, shaded blue. The alternative is computed using the linear calibration option.

mydata$MYALTERNATIVESET <- round(calibrate(mydata$rawvar, type = "fuzzy", thresholds = "e=300, c=600, i=800", logistic=FALSE), digits=2)
plot(mydata$rawvar, mydata$MYFUZZYSET, pch=18, col="black", main=' MYFUZZYSET ',
xlab=' Raw score ',
ylab=' Fuzzy score ')
points(mydata$rawvar, mydata$MYALTERNATIVESET, pch=18, col="blue", lty="dotted")
abline(h=0.5, col="black")
abline(v= 600, col="black")
abline(v= 500, col="black", lty="dotted")

4.2.2. Skewness

To check the skewness of your set, you can identify the number of cases that have set membership above 0.5 and check whether there is a disproportionate amount of them in your data.

skewMYSET <- as.numeric(mydata$MYFUZZYSET > 0.5) 
skewMYSET
 [1] 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1 1
sum(skewMYSET)
[1] 8

as.numeric turns the logical test result into zeros and 1s, and the sum over that is the number of cases GT 0.5. 8 cases out of 18 does not seem too imbalanced.

Obtain the percentage of cases with set membership above 0.5 (right-hand side value):

prop.table(table(skewMYSET))
skewMYSET
        0         1 
0.5555556 0.4444444 

Identify the names of the cases with set membership above 0.5:

rownames(subset(mydata, MYFUZZYSET > 0.5))
[1] "AU" "BE" "FR" "DE" "IE" "NL" "SE" "UK"

4.2.3 Cases on crossover point

You can check the number of cases that have membership of 0.5 in a set:

checkMYSET <- as.numeric(mydata$MYFUZZYSET == 0.5) 
sum(checkMYSET)
[1] 0

It should be zero.

If the value is not zero for a set, you can identify which cases are offending. Here’s a way of doing this for multiple sets with one command:

rownames(subset(mydata, checkMYSET1==1 | checkMYSET2==1 | checkMYSET3==1))

To exclude these cases, run something like this (note that the trailing comma is needed):

mydata = mydata[mydata$MYFUZZYSET != 0.5,]

In summary, here’s a template code for what you could use as a standard procedure when calibrating a fuzzy set using the direct calibration method:

# descriptive statistics
describe(mydata$var)

# check for missings (% of cases with missings) 
varmiss <- as.numeric(is.na(mydata$var)) prop.table(table(varmiss))

# calibration (fuzzy set, direct calibration method, rounded to 2 digits) 
mydata$MYFUZZYSET <- round(calibrate(mydata$var, type = "fuzzy", thresholds = "e=1, c=2.5, i=4", logistic = TRUE), digits=2)

# Number of cases on the crossover point 
checkMYSET <- as.numeric(mydata$MYSET == 0.5) 
sum(checkMYSET)

# check for skewness (% of cases with membership > 0.5) 
skewMYSET <- as.numeric(mydata$MYSET > 0.5)
prop.table(table(skewMYSET))

# visualize calibration
plot(mydata$var, mydata$MYSET, pch=18, col="black",
  main='MYSET', 
  xlab=' Raw score ',
  ylab=' Fuzzy score ')
abline(h=0.5, col="black") 
abline(v= 25, col="black")

4.3 Calculating membership in operations on sets

If you create a new set (a negated set, a disjunction or a conjunction, or a combination of these), you may want to store it as an object (NEWSET <- operation), so that you can use it for further analysis. You can also (but don’t have to) tie it to the dataset by using the dollar sign (mydata$NEWSET <- operation), so that it appears as a new variable in the dataset. You have several options to calculate the cases’ membership in combined sets.

A. Automatically, using compute()

The compute() command allows you to directly calculate the membership in any expression you wish to indicate, e.g. in truth table rows or the solution term. You can use either the tilde ~ sign or lowercase notation to negate sets, but be consistent. You can also skip the ’*’ sign if you prefer. Below we calculate the cases’ membership in the set set1*SET2 + set3*SET4*set5 + SET6 and store the values as an object “sol” for further use.

# use the crisp set for these operations
mydata <- LF

This is directly from the help text:

compute("DEV*ind + URB*STB", data = LF)
 [1] 0.27 0.89 0.91 0.16 0.58 0.19 0.31 0.09 0.13 0.72 0.34 0.99 0.02 0.01 0.03 0.20 0.33 0.98
sol <- compute("myset1*MYSET2 + myset3*MYSET4*myset5 + MYSET6", data=mydata)

B. Via logical operators

To calculate the cases’ membership in a negated set, you can simply subtract the set from 1:

mydata$dev <- 1-mydata$DEV
mydata$DEV
 [1] 0.81 0.99 0.58 0.16 0.58 0.98 0.89 0.04 0.07 0.72 0.34 0.98 0.02 0.01 0.01 0.03 0.95 0.98
mydata$dev
 [1] 0.19 0.01 0.42 0.84 0.42 0.02 0.11 0.96 0.93 0.28 0.66 0.02 0.98 0.99 0.99 0.97 0.05 0.02

Note the use of capitals and small letters. This way, you can also directly negate the set within a different command, without previously creating the negated set as a separate object. We do this here:

path1 <- fuzzyand(mydata$DEV, mydata$URB, 1-mydata$LIT)
path1
 [1] 0.01 0.02 0.02 0.02 0.01 0.01 0.01 0.04 0.07 0.02 0.10 0.01 0.02 0.01 0.01 0.03 0.01 0.01

In the tutorial, we seem to miss computation of path2, but the logical union can also be shown with other sets:

myunion <- fuzzyor(mydata$DEV, mydata$URB)
myunion
 [1] 0.81 0.99 0.98 0.16 0.58 0.98 0.89 0.09 0.16 0.72 0.34 1.00 0.17 0.02 0.03 0.30 0.95 0.99

This also works on crisp sets, which should be easier to follow:

# use the crisp set for these operations
mydata <- LC
# calculate negated set
mydata$dev <- 1-mydata$DEV
mydata$DEV
 [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1
mydata$dev
 [1] 0 0 0 1 0 0 0 1 1 0 1 0 1 1 1 1 0 0

Fuzzy AND:

mydata$DEV
 [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1
mydata$URB
 [1] 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1
fuzzyand(mydata$DEV, mydata$URB)
 [1] 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1

A more complex AND, including a negated set on the fly:

path1 <- fuzzyand(mydata$DEV, mydata$URB, 1-mydata$STB)
path1
 [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

Logical OR:

myunion <- fuzzyor(mydata$DEV, mydata$URB)
mydata$DEV
 [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1
mydata$URB
 [1] 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1
myunion
 [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1

(some content skipped which requires more understanding of method)

Chapter 5. XY-Plots

These are plots of a condition against the outcome. You have four commands at disposal to produce xy-plots with R. For QCA, we will usually work with either xy.plot() (for plotting single conditions or customized predefined sets; XYplot() from the QCA package equally works) or pimplot() (for plotting the solution term).

1. xy.plot (package SetMethods)

 mydata <- LF

First plot:

xy.plot("DEV", "SURV", data = mydata, labs = rownames(mydata), necessity=TRUE, 
        jitter = TRUE, main = "DEV as necessary for SURV", xlab = "DEV", ylab = "SURV")

It also automatically integrates lines for the quadrants, and a black diagonal; and it indicates the parameters of fit (consistency, coverage, PRI or RoN), which can be set to necessity (necessity=TRUE) or sufficiency (skip the necessity option or set necessity=FALSE). Use “1- “for negating sets, for example 1-mydata$myy. This plot also indicates you Haesebrouck’s consistency value, in addition to the other parameters of fit (see Haesebrouck 2015). What these values mean will be explained in Chp. 6.

The relation above is not of a kind that suggests an influence of DEV on SUIRV. Let’s try another relation:

xy.plot("LIT", "SURV", data = mydata, labs = rownames(mydata), necessity=TRUE, 
        jitter = TRUE, main = "LIT as necessary for SURV", xlab = "LIT", ylab = "SURV")

Here we see a much more > More graphic functions after we covered sufficiency and necessity.

Chapter 6. Analysis of necessity

See file Dusa2019-chp5-necessity.Rmd for more conceptual explanations.

If you want to “deductively” test the necessity for single necessary conditions or theoretically defined disjunctions representing higher-order constructs, the pof command (package: QCA) can be used if the option relation = “nec” is set (if it is set to “suf”, then the command tests for the sufficiency of the listed conditions). If you want to test for several conditions, create the list the conditions for which you want to test first. For negating conditions or the outcome, use either a tilde or 1-. This command gives you the consistency, coverage and relevance (RoN) of necessity for each listed condition. 8 In the example below, we test the necessity of conditions 1, 2, and 3 for the negated outcome.

mydata <- LF

Subsetting on three conditions:

conds <- subset(mydata, select = c("DEV", "URB", "LIT"))

Testing necessity for outcome ~SURV (which, remember, is not symmetrical to the outcome SURV):

pof(conds, '~SURV', mydata, relation = "nec")

        inclN   RoN   covN  
--------------------------- 
1  DEV  0.322  0.593  0.334 
2  URB  0.239  0.766  0.382 
3  LIT  0.573  0.387  0.414 
--------------------------- 

For some reason this function returns the data first, which is annoying. It can be changed by using $ notation to get data frames for the conds, see next example. I think that the example above is not fully correct for the current implementation of pof().

Add an interpretation of what this means.

You can also use this command for testing the necessity of only one condition (in position one, DEV). (Note that the expression as indicated on p.37, which only would suggest to put DEV in position 1 does throw an error.)

pof(mydata$DEV, '~SURV', mydata, relation = "nec")

        inclN   RoN   covN  
--------------------------- 
1  DEV  0.322  0.593  0.334 
--------------------------- 

You can also use pof() for complex condition sets. You can use either lowercase notation or tildes to negate conditions. For example, we want to know whether the condition “COND1 + cond2*COND3” is necessary (<=) for the outcome:

pof("DEV + lit*IND <= SURV", data = mydata)

               inclN   RoN   covN  
---------------------------------- 
1  DEV         0.831  0.811  0.775 
2  lit*IND     0.049  0.947  0.311 
3  expression  0.842  0.751  0.726 
---------------------------------- 

By using =>, you can test the same for sufficiency.

You can also make an XY plot that integrates the parameters of fit. This can be done for necessity (necessity=TRUE), but also for sufficiency (necessity=FALSE). This has the advantage that you can make a visual diagnostic of contradictory cases (in the upper left quadrant) and trivialness. We have done this before, but here again for ease of reference:

xy.plot("DEV", "SURV", data = mydata, necessity=TRUE)

6.2 SuperSubset procedure

The superSubset() command (package: QCA) “inductively” identifies all supersets of the outcome, both single conditions, conjunctions and disjunctions (unions) of sets. So using this command, you can basically skip the steps described in 6.1. The option incl.cut serves to specify a minimal consistency threshold that the conditions need to pass (for fuzzy sets, usually 0.9; typically 1.0 for crisp sets). Using cov.cut, you can also specify a coverage cutoff (here: 0.6), below which the necessary conditions and are deemed trivial and will not appear in the list. Use a tilde ~ to negate the outcome, e.g. outcome = “~OUTCOME”.

superSubset(mydata, outcome = "SURV",
  conditions = "DEV, URB, LIT, IND, STB",
  incl.cut = 0.9, cov.cut = 0.6)

                inclN   RoN   covN  
----------------------------------- 
1  LIT          0.991  0.509  0.643 
2  STB          0.920  0.680  0.707 
3  LIT*STB      0.915  0.800  0.793 
4  DEV+URB+IND  0.903  0.704  0.716 
----------------------------------- 

This command will give you a lot of supersets, but to decide whether they can be deemed necessary, you will still have to 1) check for deviant cases consistency in kind (plot the result), 2) check for empirical relevance / trivialness (Goertz 2006), and 3) identify whether the superset makes theoretical sense as a necessary condition, that is, whether the sets combined with the logical OR represent some higher-order concept (see Schneider and Wagemann 2012). Note also that the different supersets produced by this command are alternatives to each other: for example, if A*B is necessary, in the output you will find A*B, but also A, and also B. In summary, this command is useful because it gives you all potential necessary conditions in one go; but do not use it for mindless data-mining.

If you store the results of superSubset() as an object (here: nec), you can plot all compound necessary conditions by using pimplot():

nec <- superSubset(mydata, outcome = "SURV",
  conditions = "DEV, URB, LIT, IND, STB",
  incl.cut = 0.9, cov.cut = 0.6)

pimplot(data=mydata, results=nec, outcome="SURV", necessity=TRUE, 
        all_labels=TRUE, jitter=TRUE)

Use the backward arrow above the plot window to view all plots. The latter also indicate all parameters of fit (including RoN) and the cases with membership > 0.5 in the outcome are labelled.

Chapter 7 Analysis of sufficiency

Testing for single sufficient conditions

While the QCA technique always uses the truth table procedure for the analysis of sufficiency, in principle the sufficiency (consistency, coverage and PRI) of single conditions can also be (“deductively”) tested using the pof() command of the QCA package (just specify relation = “suf”), and visualized using, for instance, the xy.plot() command (again, specify necessity=FALSE) (see section 6.1). Alternatively, you can use QCAfit() of the SetMethods package (which additionally gives you the Haesebrouck consistency and also works for necessity, specify necessity=TRUE). You can negate the outcome by specifying neg.out=TRUE. If you don’t want to label your condition, skip cond.lab= “COND”.

QCAfit(LF$STB, LF$SURV, cond.lab= "STB", necessity=FALSE, neg.out=FALSE)
    Cons.Suf Cov.Suf   PRI Cons.Suf(H)
STB    0.707    0.92 0.664        0.65

You can also do this for several conditions at once. Just build an object (here: conds) first that contains your conditions.

conds <- subset(LF, select = c("DEV", "URB", "LIT"))
QCAfit(conds, LF$SURV, cond.lab= c("DEV", "URB", "LIT"), necessity=FALSE, neg.out=FALSE)
     Cons.Suf Cov.Suf    PRI Cons.Suf(H)
DEV    0.7746  0.8310 0.7431      0.7198
URB    0.7714  0.5387 0.7202      0.6903
LIT    0.6428  0.9906 0.5868      0.5643
~DEV   0.2743  0.2852 0.0982      0.2584
~URB   0.4017  0.5681 0.3047      0.3697
~LIT   0.1684  0.0962 0.0000      0.1598

For combinations of conditions, we need truth tables

7.2 Building the truth table and logical minimazation

Very important: the zeros and ones in TTs are not to be confused with crisp set values! The zeros and ones refer to logical conditions, not set membership.

First you want to figure out where to set the raw consistency threshold. For this you can produce a basic truth table, with raw consistencies (sorted by descending) and indicating individual cases (package: QCA).

ttSURV <- truthTable(data=LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB",
              incl.cut=1.00, sort.by="incl, n", complete=FALSE, show.cases=TRUE) 
ttSURV 

  OUT: output value
    n: number of cases in configuration
 incl: sufficiency inclusion score
  PRI: proportional reduction in inconsistency

     DEV URB LIT IND STB   OUT    n  incl  PRI   cases      
32    1   1   1   1   1     0     4  0.904 0.886 BE,CZ,NL,UK
22    1   0   1   0   1     0     2  0.804 0.719 FI,IE      
24    1   0   1   1   1     0     2  0.709 0.634 FR,SE      
 6    0   0   1   0   1     0     1  0.529 0.228 EE         
 5    0   0   1   0   0     0     2  0.521 0.113 HU,PL      
31    1   1   1   1   0     0     1  0.445 0.050 DE         
23    1   0   1   1   0     0     1  0.378 0.040 AU         
 2    0   0   0   0   1     0     2  0.278 0.000 IT,RO      
 1    0   0   0   0   0     0     3  0.216 0.000 GR,PT,ES   

It seems that all outcome values have been coded to zero.
Suggestion: lower the inclusion score for the presence of the outcome,
the relevant argument is "incl.cut" which now has a value of 1.

Follong this advise to the generally recommended minimal inclusion score (.8), we get:

ttSURV <- truthTable(data=LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB",
              incl.cut=.80, sort.by="incl, n", complete=FALSE, show.cases=TRUE) 
ttSURV

  OUT: output value
    n: number of cases in configuration
 incl: sufficiency inclusion score
  PRI: proportional reduction in inconsistency

     DEV URB LIT IND STB   OUT    n  incl  PRI   cases      
32    1   1   1   1   1     1     4  0.904 0.886 BE,CZ,NL,UK
22    1   0   1   0   1     1     2  0.804 0.719 FI,IE      
24    1   0   1   1   1     0     2  0.709 0.634 FR,SE      
 6    0   0   1   0   1     0     1  0.529 0.228 EE         
 5    0   0   1   0   0     0     2  0.521 0.113 HU,PL      
31    1   1   1   1   0     0     1  0.445 0.050 DE         
23    1   0   1   1   0     0     1  0.378 0.040 AU         
 2    0   0   0   0   1     0     2  0.278 0.000 IT,RO      
 1    0   0   0   0   0     0     3  0.216 0.000 GR,PT,ES   

The truth table command provides several options: * We set the raw consistency threshold using incl.cut. If you skip this option, then all outcomes are coded 0 except for rows with consistency 1. * n.cut is used to specify a frequency threshold (by default: 1). * The sort.by option can be set such that the rows are ordered by raw consistency (“incl”), or by the N (“n”), or by both (as done here). With decreasing = FALSE, the truth table rows would be ordered in ascending (instead of descending) order, e.g., by ascending raw consistencies. * With the show.cases option, we can choose whether we want to indicate the single cases contained in a truth table row (TRUE) or not (FALSE). * If the complete option is set to TRUE, then all logical remainders are also displayed; if FALSE, then only empirically observed truth table rows are displayed.

The next command produces the truth table from c145a in the ‘standard’ form, with all 32 possible combination of the conditions:

ttSURV <- truthTable(data=LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB",
              incl.cut=.80, complete=TRUE, show.cases=TRUE) 
ttSURV

  OUT: output value
    n: number of cases in configuration
 incl: sufficiency inclusion score
  PRI: proportional reduction in inconsistency

     DEV URB LIT IND STB   OUT    n  incl  PRI   cases      
 1    0   0   0   0   0     0     3  0.216 0.000 GR,PT,ES   
 2    0   0   0   0   1     0     2  0.278 0.000 IT,RO      
 3    0   0   0   1   0     ?     0    -     -              
 4    0   0   0   1   1     ?     0    -     -              
 5    0   0   1   0   0     0     2  0.521 0.113 HU,PL      
 6    0   0   1   0   1     0     1  0.529 0.228 EE         
 7    0   0   1   1   0     ?     0    -     -              
 8    0   0   1   1   1     ?     0    -     -              
 9    0   1   0   0   0     ?     0    -     -              
10    0   1   0   0   1     ?     0    -     -              
11    0   1   0   1   0     ?     0    -     -              
12    0   1   0   1   1     ?     0    -     -              
13    0   1   1   0   0     ?     0    -     -              
14    0   1   1   0   1     ?     0    -     -              
15    0   1   1   1   0     ?     0    -     -              
16    0   1   1   1   1     ?     0    -     -              
17    1   0   0   0   0     ?     0    -     -              
18    1   0   0   0   1     ?     0    -     -              
19    1   0   0   1   0     ?     0    -     -              
20    1   0   0   1   1     ?     0    -     -              
21    1   0   1   0   0     ?     0    -     -              
22    1   0   1   0   1     1     2  0.804 0.719 FI,IE      
23    1   0   1   1   0     0     1  0.378 0.040 AU         
24    1   0   1   1   1     0     2  0.709 0.634 FR,SE      
25    1   1   0   0   0     ?     0    -     -              
26    1   1   0   0   1     ?     0    -     -              
27    1   1   0   1   0     ?     0    -     -              
28    1   1   0   1   1     ?     0    -     -              
29    1   1   1   0   0     ?     0    -     -              
30    1   1   1   0   1     ?     0    -     -              
31    1   1   1   1   0     0     1  0.445 0.050 DE         
32    1   1   1   1   1     1     4  0.904 0.886 BE,CZ,NL,UK

The command below produces a truth table for the negated outcome, with raw consistency threshold 0.828, frequency threshold 1, sorted by descending raw consistency (and, raw consistency being equal, by N), showing only empirically observed rows, and showing individual cases. Note that we used lowercase letters here for labelling the truth table object because it analyzes the negated outcome. You will know that ttOUTCOME is the truth table for the positive outcome, and ttoutcome, for the negated outcome.

ttsurv <- truthTable(LF, outcome="~SURV", conditions = "DEV, URB, LIT, IND, STB",
            incl.cut=0.828, n.cut=1, sort.by="incl, n", decreasing=TRUE, complete=FALSE, show.cases=TRUE)
ttsurv

  OUT: output value
    n: number of cases in configuration
 incl: sufficiency inclusion score
  PRI: proportional reduction in inconsistency

     DEV URB LIT IND STB   OUT    n  incl  PRI   cases      
 1    0   0   0   0   0     1     3  1.000 1.000 GR,PT,ES   
 2    0   0   0   0   1     1     2  0.982 0.975 IT,RO      
23    1   0   1   1   0     1     1  0.974 0.960 AU         
31    1   1   1   1   0     1     1  0.971 0.950 DE         
 6    0   0   1   0   1     1     1  0.861 0.772 EE         
 5    0   0   1   0   0     1     2  0.855 0.732 HU,PL      
22    1   0   1   0   1     0     2  0.498 0.281 FI,IE      
24    1   0   1   1   1     0     2  0.495 0.366 FR,SE      
32    1   1   1   1   1     0     4  0.250 0.106 BE,CZ,NL,UK

The minimize() command (package: QCA) performs logical minimization of the truth table (i.e., the object we created with the truth table command, below: ttOUTCOME). First we calculate the conservative solution and obtain all details (consistency, raw and unique coverage). We also want to display the cases (show.cases) contained in the prime implicants (optional). The use.tilde option can be set to FALSE if you prefer using uppercase and lowercase notation for sets and their negation; and to TRUE if your prefer to denote negated sets with a tilde. It can be skipped and then upper-/lowercase notation is used.

csSURV <- minimize(ttSURV, details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
csSURV

n OUT = 1/0/C: 6/12/0 
  Total      : 18 

Number of multiple-covered cases: 0 

M1: DEV*URB*LIT*IND*STB + DEV*urb*LIT*ind*STB => SURV

                        inclS   PRI   covS   covU   cases 
--------------------------------------------------------------- 
1  DEV*URB*LIT*IND*STB  0.904  0.886  0.454  0.393  BE,CZ,NL,UK 
2  DEV*urb*LIT*ind*STB  0.804  0.719  0.265  0.204  FI,IE 
--------------------------------------------------------------- 
   M1                   0.870  0.843  0.658 

To calculate the parsimonious solution, we tell the software to include logical remainders (those with outcome “?”) representing simplifying assumptions.

psSURV <- minimize(ttSURV, include="?", details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE)
psSURV

n OUT = 1/0/C: 6/12/0 
  Total      : 18 

Number of multiple-covered cases: 0 

M1: DEV*ind + URB*STB => SURV

            inclS   PRI   covS   covU   cases 
--------------------------------------------------- 
1  DEV*ind  0.815  0.721  0.284  0.194  FI,IE 
2  URB*STB  0.874  0.845  0.520  0.430  BE,CZ,NL,UK 
--------------------------------------------------- 
   M1       0.850  0.819  0.714 

Box 7: Useful tools for truth table analysis

psOUTCOME <- psSURV
mydata <- LF

Once you see the truth table, in order to find the appropriate raw consistency threshold, you may want to plot different truth table rows. You can plot truth table rows easily using pimplot(), including the names of cases with membership > 0.5 in the row and all parameters of fit (consistency, coverage, PRI, Haesebrouck’s consistency). However, note that this requires you to already have calculated a solution which you stored as an object (here: psOUTCOME; you could use any raw consistency to begin with, just to get such a provisional solution object to work with). Here we plot the truth table rows number 1 and 26. Use the backward arrow above the plot window in order to get inspect all the different plots:

pimplot(data=mydata, results=psOUTCOME, ttrows=c("1", "5", "32"), outcome= "SURV")

Note that the rows must have at least one case to get plotted.

You can also simply plot all truth table rows above a certain raw consistency level (here: 0.8) at once. The resulting plots have the row number as label of the X axis:

pimplot(data=mydata, results=psOUTCOME, incl.tt=0.8, outcome= "SURV")

Assessing the consequences of setting a frequency threshold:

If you set a frequency threshold of more than 1 and you want to see which rows are now treated as logical remainders by imposing such a threshold, type

ttoutcome$excluded

Exporting the truth table

You can save the truth table as a csv file. This will make it very easy to then copy-paste the table into your word file (e.g. the online appendix).

write.csv(ttOUTCOME$tt, "mytt.csv")

Box 8: Default settings for logical minimization

The row.dom option for logical minimization, if set to TRUE, is used to further eliminate redundant prime implicants when solving the PI chart, applying the principle of row dominance: if a prime implicant X covers the same configurations as another prime implicant Y and in the same time covers other configurations which Y does not cover, then Y is redundant and eliminated. By setting all.sol=TRUE, you can derive all possible solutions, irrespective of the number of prime implicants.

To obtain a subset of the solution space, set row.dom=TRUE and all.sol=FALSE. This is the default setting that the QCA package implements if you do not specify these two options (as done here in this manual). Conversely, for revealing the full extent of model ambiguity, set row.dom=FALSE and all.sol=TRUE (see Baumgartner 2015 and Baumgartner and Thiem 2015). The usage of all.sol = TRUE does not represent the opinion of the QCA package author, where the default option is FALSE.

By presenting the templates using the default options, we do not intend to make a recommendation. It is good to be aware of the fact that there are often many possible solutions, and that you have several possibilities how to deal with this.

7.3 Standard Analysis: Specifying directional expectations

Note. This section needs explanations. What does “standard analaysis” mean? What are directional expectations?

To derive an intermediate solution using Standard Analysis (Ragin 2008), we may want to specify directional expectations. Just like with the parsimonious solution, we tell the software to include simplifying assumptions; but then we specify the directional expectations (dir.exp) for each condition in the same order as the conditions were listed when creating the truth table (see section 7.2). In the example, we assume that condition 1 contributes to the outcome when present (1); condition 2 contributes to the outcome when absent (0); and we have no directional expectation (“-“) for condition 3.

7.4 Enhanced Standard Analysis: Excluding truth table rows

When deriving an intermediate solution and performing an Enhanced Standard Analysis, we may or may not specify directional expectations; but in addition, we exclude those truth table rows from the analysis of sufficiency that a) display a negated necessary condition for the same outcome, b) display a sufficient condition for the negated outcome, or c) are logically implausible (the “pregnant man”). You have several possibilities to do this. In the running text, we discuss two of them; see Box 10 for more options.

7.4.1 Excluding truth table rows before logical minimization

We can build a new truth table where we simply tell the software to code those rows with outcome 0 that display a certain configuration of conditions – we can do this only for logical remainders, or for all truth table rows, whether they are empirically observed or logical remainders. This possibility is an easy and transparent way if you think that neither your observations nor your simplifying assumptions should contradict prior findings and / or logic, and you want to see what you did to the truth table before turning to logical minimization.

In a first step, we build the truth table (here: for the negated outcome and for 5 conditions) that also displays the logical remainders (complete=TRUE):

As we have seen above (see c147), there are two sufficient conditions for the outcome SURV: M1: DEV*URB*LIT*IND*STB + DEV*urb*LIT*ind*STB => SURV. They are hence not tenable for the negated outcome, surv. Let’s find the rows with these two patterns:

I cannot use either of M1’s conditions: Whenever I add the last condition element, I get an error, even though there are matching rows in the TT.

Having found rows, we can recode these to OUT=0 in the TT:

it is worth unpackiung this command as it makes compact use of the object

With the recoded TT, one can then calculated enhanced solutions, such as the enhanced parsimonous solution:

---
title: "QCA Tutorial Thomann et al"
author: "Peter Reimann"
output:
  html_notebook: default
---

This is a RMarkdown version of the tutorial "Performing fuzzy- and crisp set QCA with R
A user-oriented beginner’s guide"
version 07.03.2018
Authors:
Dr. Eva Thomann Ioana-Elena Oana Stefan Wittwer
Bamberg, March 18

I use most of the examples and the text from this source, often verbatim. Different from the tutorial, I use real data so as to have computable examples. 

```{r message=FALSE, include=FALSE}
library(tidyverse)
library(QCA)
library(SetMethods)
library(VennDiagram)
```

### Data

We use the data set from the QCA package, also often used by Ragin and others.

```{r}
data(LR)
data(LC)
data (LM)
data(LF)
```

LR are the raw data, LC is a crisp set, LM is a multi-value fuzzy set, and LF a fuzzy set version of LR.

A data frame containing 18 rows and the following 6 columns:

* DEV	
    + Level of development: it is the GDP per capita (USD) in the raw data, calibrated in the
binary crisp version to 0 if below 550 USD and 1 otherwise. For the multi-value crisp
version, two thresholds were used: 550 and 850 USD.
* URB	
    + Level of urbanization: percent of the population in towns with 20000 or more
inhabitants, calibrated in the crisp versions to 0 if below 50% and 1 if above.
* LIT	
     + Level of literacy: percent of the literate population, calibrated in the crisp versions
to 0 if below 75% and 1 if above.
* IND	
     + Level of industrialization: percent of the industrial labor force, calibrated in the
crisp versions to 0 if below 30% and 1 if above.
* STB	
      + Government stability: a “political-institutional” condition added to the previous
four “socioeconomic” ones. The raw data has the number of cabinets which governed
in the period under study, calibrated in the crisp versions to 0 if 10 or above and to 1
if below 10.
* SURV	
      + Outcome: survival of democracy during the inter-war period: calibrated to 0 if negative,
and 1 if positive raw data.

Switching to the naming style in the tutorial:

```{r}
mydata <- LR
```

# Chapter 4 Calibration and operation with sets

(*Note. Chapters 1-3 are mainly on generic R and are skipped here.*)

## 4.1 Calibration

### 4.1.1 Fuzzy sets

#### Direct method of calibration

We want to practice calibration on the DEV variable. Firs we copy this over into the name used in the tutorial, `rawvar`:

```{r echo=FALSE}
mydata$rawvar <- mydata$DEV
```
(To remove DEV we could do`mydata$DEV <- NULL` but let's not.)

The values in this variable are:
```{r echo=FALSE} 
mydata$rawvar
```

To calibrate DEV, we in principle should use theory. Here, we use what looks like reasonable values from the 18 values: 

* e - threshold for nonmembership: 300
* c - crossover point:  600
* i - full membership: 800

```{r}
mydata$MYFUZZYSET <- round(calibrate(mydata$rawvar, type = "fuzzy", thresholds = "e=300, c=600, i=800", logistic=TRUE), digits=2)
mydata$MYFUZZYSET
```
(NOTE. The naming convention is to use capital var names for sets, and small letters for negated sets, see below.)

The data in the fuzzy set vary between zero and one. `logistic = TRUE` is the default; to use a linear method, specify `logistic = FALSE`. 

> Insert here what the meaning and difference between the two is. 

Values can be overwritten by hand, if needed, like so:
```
mydata$MYFUZZYSET[mydata$rawvar == 3] <- 0.05
```
doing so requires a good reason, of course, and it reduces reproducability. 

#### Multi-value fuzzy sets

This is a 'manual' method, essentially a recoding of a variable. For instance:

```{r}
mydata$MYFUZZYSET <- NA
mydata$MYFUZZYSET[(mydata$rawvar >= 1)&(mydata$rawvar <= 400)] <- 0
mydata$MYFUZZYSET[(mydata$rawvar >= 401)&(mydata$rawvar <= 800)] <- 0.3
mydata$MYFUZZYSET[(mydata$rawvar >= 801)] <- 1
mydata$MYFUZZYSET
```
Avoid 0.5 as value! 

#### Crisp sets

For calibratig a crisp set we need only to specify the crossover point, e.g. 600. Of course, this can also be done directly on the data by recoding. 

```{r}
mydata$CRISPSET <- calibrate(mydata$rawvar, type = "crisp", thresholds = 600, include = TRUE)
mydata$CRISPSET
```
If you set include = TRUE, values of 50 will be calibrated as set membership 1. If it is set to FALSE, values of 50 will be counted as set membership 0.

Avoid 0.5 as value! 

### Tips for calibration

#### Labelling sets

Future versions of QCA will have a lot of commands that allow you to negate sets using lowercase notation. In this scenario, whether you use upper- or lowercase letters is really decisive. To avoid problems and confusion, always use uppercase letters for labelling your sets, and lowercase letters for their negation.

Keep labels for sets short. 

#### Finding calibration thresholds

The QCA package does offer data-driven ways to find calibration thresholds. They are not included in this manual because the resulting sets are very hard to interpret. The most important analytic choice is that of the threshold that establishes the difference in kind. Whenever possible, use conceptual and theoretical criteria for the crossover point and avoid using purely empirical criteria such as descriptive statistics. Avoid using the median as crossover point: it can usually not be interpreted other than the set of “cases with values equal as or higher than 50% of the other cases”. Similarly, if you use the sample mean (e.g., for unemployment) as crossover point, the conceptual meaning of the set is “unemployment above average in the cases observed”. All this does not mean that empirical criteria are not important for determining calibration thresholds. In particular, you should avoid overly skewed sets (4.2.2 and 8.1) and empirical cases on the crossover point (4.2.3).

Technically, it is easy to calibrate sets. However, calibration is essentially a process of concept formation and definition that interacts decisively with your theory, research design and results. Therefore, calibration is one of the most demanding analytic phases of a QCA analysis. Take a lot of time for calibration, and try out different options. If you find that several different crossover points are equally plausible, try them all and see how that affects the analysis. This is called a robustness test and a very good thing to do (see also Maggetti and Levi-Faur 2013). See 4.2.1 and the online appendix of Hinterleitner, Sager and Thomann (2016) for an illustration.

After calibrating your raw data, save the calibrated sets in a new dataset. Some commands do not work if the dataset does not contain only variables that range from 0-1. 

```
myfuzzydata <- subset(mydata, select = c("mydata$MYFUZZYSET", "mydata$CRISPSET")) write.csv2(myfuzzydata, "myfuzzydata.csv")
```

## 4.2 Calibration diagnostics

You can visualize the calibration with an XY plot, which will equally show you not only if there are cases on the 0.5 threshold, but also how the cases distribute in the set (see also section 5 on graphs).

A basic way of doing this can be to plot a set against its raw scores and set a horizontal line at the crossover point, as well as a vertical line at the raw value that indicates the crossover point (here: 600). 

```{r}
mydata$MYFUZZYSET <- round(calibrate(mydata$rawvar, type = "fuzzy", thresholds = "e=300, c=600, i=800", logistic=TRUE), digits=2)
plot(mydata$rawvar, mydata$MYFUZZY, pch=18, col="black", main='MYFUZZYSET',
xlab=' Raw score ',
ylab=' Fuzzy score ')
abline(h=0.5, col="black") 
abline(v= 600, col="black")
```

In the example below, we plot the calibration of “MYFUZZYSET”. The crossover point is set at 600, and indicated by a vertical black line (v= 60); we have also added a horizontal black line at set membership 0.5 (h=0.5 – optional). In addition, the last two command lines (optional) add two dotted vertical lines to the graph to indicate two alternative plausible crossover points (500 and 700) that we decided could be tested for robustness. The plot shows us whether, for example, changing the crossover point from 500 to 700 would change the qualitative set membership of an empirical case (indicated by a dot in-between the black and the left-hand side dotted line).

```{r}
plot(mydata$rawvar, mydata$MYFUZZYSET, pch=18, col="black", main='MYFUZZYSET',
   xlab=' Raw score ',
   ylab=' Fuzzy score ')
abline(h=0.5, col="black")
abline(v= 600, col="black")
abline(v= 500, col="black", lty="dotted") 
abline(v= 700, col="black", lty="dotted")
```

In the following plot, we only test for one possible alternative crossover point (25 = regular crossover point, 500 = alternative crossover point). The line for the alternative crossover point is shaded grey, and we also add the data curve for the alternative calibration using the points option – again, shaded blue. The alternative is computed using the linear calibration option. 

```{r}
mydata$MYALTERNATIVESET <- round(calibrate(mydata$rawvar, type = "fuzzy", thresholds = "e=300, c=600, i=800", logistic=FALSE), digits=2)
plot(mydata$rawvar, mydata$MYFUZZYSET, pch=18, col="black", main=' MYFUZZYSET ',
xlab=' Raw score ',
ylab=' Fuzzy score ')
points(mydata$rawvar, mydata$MYALTERNATIVESET, pch=18, col="blue", lty="dotted")
abline(h=0.5, col="black")
abline(v= 600, col="black")
abline(v= 500, col="black", lty="dotted")
```

### 4.2.2. Skewness
To check the skewness of your set, you can identify the number of cases that have set membership above 0.5 and check whether there is a disproportionate amount of them in your data.

```{r}
skewMYSET <- as.numeric(mydata$MYFUZZYSET > 0.5) 
skewMYSET
sum(skewMYSET)
```
`as.numeric` turns the logical test result into zeros and 1s, and the sum over that is the number of cases GT 0.5. `r sum(skewMYSET)` cases out of 18  does not seem too imbalanced. 

Obtain the percentage of cases with set membership above 0.5 (right-hand side value):

```{r}
prop.table(table(skewMYSET))
```

Identify the names of the cases with set membership above 0.5:

```{r}
rownames(subset(mydata, MYFUZZYSET > 0.5))
```

### 4.2.3 Cases on crossover point

You can check the number of cases that have membership of 0.5 in a set:

```{r}
checkMYSET <- as.numeric(mydata$MYFUZZYSET == 0.5) 
sum(checkMYSET)
```
It should be zero. 

If the value is not zero for a set, you can identify which cases are offending. Here's a way of doing this for multiple sets with one command:

```
rownames(subset(mydata, checkMYSET1==1 | checkMYSET2==1 | checkMYSET3==1))
```

To exclude these cases, run something like this (note that the trailing comma is needed): 

```{r}
mydata = mydata[mydata$MYFUZZYSET != 0.5,]
```

In summary, here's a template code for what you could use as a standard procedure when calibrating a fuzzy set using the direct calibration method:

```
# descriptive statistics
describe(mydata$var)

# check for missings (% of cases with missings) 
varmiss <- as.numeric(is.na(mydata$var)) prop.table(table(varmiss))

# calibration (fuzzy set, direct calibration method, rounded to 2 digits) 
mydata$MYFUZZYSET <- round(calibrate(mydata$var, type = "fuzzy", thresholds = "e=1, c=2.5, i=4", logistic = TRUE), digits=2)

# Number of cases on the crossover point 
checkMYSET <- as.numeric(mydata$MYSET == 0.5) 
sum(checkMYSET)

# check for skewness (% of cases with membership > 0.5) 
skewMYSET <- as.numeric(mydata$MYSET > 0.5)
prop.table(table(skewMYSET))

# visualize calibration
plot(mydata$var, mydata$MYSET, pch=18, col="black",
  main='MYSET', 
  xlab=' Raw score ',
  ylab=' Fuzzy score ')
abline(h=0.5, col="black") 
abline(v= 25, col="black")
```

## 4.3 Calculating membership in operations on sets

If you create a new set (a negated set, a disjunction or a conjunction, or a combination of these), you may want to store it as an object (NEWSET <- operation), so that you can use it for further analysis. You can also (but don’t have to) tie it to the dataset by using the dollar sign (mydata$NEWSET <- operation), so that it appears as a new variable in the dataset.
You have several options to calculate the cases’ membership in combined sets.

#### A. Automatically, using compute()

The compute() command allows you to directly calculate the membership in any expression you wish to indicate, e.g. in truth table rows or the solution term. You can use either the tilde ~ sign or lowercase notation to negate sets, but be consistent. You can also skip the '*' sign if you prefer. Below we calculate the cases’ membership in the set `set1*SET2 + set3*SET4*set5 + SET6` and store the values as an object “sol” for further use. 

```{r}
# use the crisp set for these operations
mydata <- LF
```

This is directly from the help text:

```{r}
compute("DEV*ind + URB*STB", data = LF)
```

```
sol <- compute("myset1*MYSET2 + myset3*MYSET4*myset5 + MYSET6", data=mydata)
```

#### B. Via logical operators

To calculate the cases’ membership in a negated set, you can simply subtract the set from 1:

```{r}
mydata$dev <- 1-mydata$DEV
mydata$DEV
mydata$dev
```
Note the use of capitals and small letters. This way, you can also directly negate the set within a different command, without previously creating the negated set as a separate object. We do this here:

```{r}
path1 <- fuzzyand(mydata$DEV, mydata$URB, 1-mydata$LIT)
path1
```

In the tutorial, we seem to miss computation of path2, but the logical union can also be shown with other sets:

```{r}
myunion <- fuzzyor(mydata$DEV, mydata$URB)
myunion
```

This also works on crisp sets, which should be easier to follow:

```{r}
# use the crisp set for these operations
mydata <- LC
# calculate negated set
mydata$dev <- 1-mydata$DEV
mydata$DEV
mydata$dev
```

Fuzzy AND:
```{r}
mydata$DEV
mydata$URB
fuzzyand(mydata$DEV, mydata$URB)
```

A more complex AND, including a negated set on the fly:

```{r}
path1 <- fuzzyand(mydata$DEV, mydata$URB, 1-mydata$STB)
path1
```

Logical OR: 
```{r}
myunion <- fuzzyor(mydata$DEV, mydata$URB)
mydata$DEV
mydata$URB
myunion
```

(some content skipped which requires more understanding of method)

# Chapter 5. XY-Plots

These are plots of a condition against the outcome. You have four commands at disposal to produce xy-plots with R. For QCA, we will usually work with either xy.plot() (for plotting single conditions or customized predefined sets; XYplot() from the QCA package equally works) or pimplot() (for plotting the solution term).

#### 1. xy.plot (package SetMethods)

```{r}
 mydata <- LF
```
First plot:

```{r}
xy.plot("DEV", "SURV", data = mydata, labs = rownames(mydata), necessity=TRUE, 
        jitter = TRUE, main = "DEV as necessary for SURV", xlab = "DEV", ylab = "SURV")
```

It also automatically integrates  lines for the quadrants, and a black diagonal; and it indicates the parameters of fit (consistency, coverage, PRI or RoN), which can be set to necessity (necessity=TRUE) or sufficiency (skip the necessity option or set necessity=FALSE). Use “1- “for negating sets, for example 1-mydata$myy. This plot also indicates you Haesebrouck’s consistency value, in addition to the other parameters of fit (see Haesebrouck 2015). What these values mean will be explained in  Chp. 6. 

The relation above is not of a kind that  suggests an influence of DEV  on SUIRV. Let's try another  relation:

```{r}
xy.plot("LIT", "SURV", data = mydata, labs = rownames(mydata), necessity=TRUE, 
        jitter = TRUE, main = "LIT as necessary for SURV", xlab = "LIT", ylab = "SURV")
```
Here we see a much more 
> More graphic functions after we covered sufficiency and necessity.

# Chapter 6. Analysis of necessity


See file Dusa2019-chp5-necessity.Rmd for more conceptual explanations. 


If you want to “deductively” test the necessity for single necessary conditions or theoretically defined disjunctions representing higher-order constructs, the pof command (package: QCA) can be used if the option relation = "nec" is set (if it is set to "suf", then the command tests for the sufficiency of the listed conditions). If you want to test for several conditions, create the list the conditions for which you want to test first. For negating conditions or the outcome, use either a tilde or 1-. This command gives you the consistency, coverage and relevance (RoN) of necessity for each listed condition. 8 In the example below, we test the necessity of conditions 1, 2, and 3 for the negated outcome.

```{r}
mydata <- LF
```

Subsetting on three conditions:

```{r echo=TRUE}
conds <- subset(mydata, select = c("DEV", "URB", "LIT"))
```

Testing necessity for outcome ~SURV (which, remember, is not symmetrical to the outcome SURV):
```{r}
pof(conds, '~SURV', mydata, relation = "nec")
```
For some reason this function returns the data first, which is annoying. It can be changed by using $ notation to get data frames for the conds, see next example. I think that the example above is not fully correct for the current implementation of pof(). 

> Add an interpretation of what this means.  

You can also use this command for testing the necessity of only one condition (in position one, DEV). (Note that the expression as indicated on p.37, which only would suggest to put DEV in position 1 does throw an error.) 

```{r}
pof(mydata$DEV, '~SURV', mydata, relation = "nec")
```
You can also use pof() for complex condition sets. You can use either lowercase notation or tildes to negate conditions. For example, we want to know whether the condition “COND1 + cond2*COND3” is necessary (<=) for the outcome:

```{r}
pof("DEV + lit*IND <= SURV", data = mydata)
```

By using =>, you can test the same for sufficiency.

You can also make an XY plot that integrates the parameters of fit. This can be done for necessity (necessity=TRUE), but also for sufficiency (necessity=FALSE). This has the advantage that you can make a visual diagnostic of contradictory cases (in the upper left quadrant) and trivialness. We have done this before, but here again for ease of reference:

```{r}
xy.plot("DEV", "SURV", data = mydata, necessity=TRUE)
```

## 6.2 SuperSubset procedure

The `superSubset()` command (package: QCA) “inductively” identifies all supersets of the outcome, both single conditions, conjunctions and disjunctions (unions) of sets. So using this command, you can basically skip the steps described in 6.1. The option `incl.cut` serves to specify a minimal consistency threshold that the conditions need to pass (for fuzzy sets, usually 0.9; typically 1.0 for crisp sets). Using `cov.cut`, you can also specify a coverage cutoff (here: 0.6), below which the necessary conditions and are deemed trivial and will not appear in the list. Use a tilde ~ to negate the outcome, e.g. outcome = "~OUTCOME".

```{r}
superSubset(mydata, outcome = "SURV",
  conditions = "DEV, URB, LIT, IND, STB",
  incl.cut = 0.9, cov.cut = 0.6)
```
This command will give you a lot of supersets, but to decide whether they can be deemed necessary, you will still have to 1) check for deviant cases consistency in kind (plot the result), 2) check for empirical relevance / trivialness (Goertz 2006), and 3) identify whether the superset makes theoretical sense as a necessary condition, that is, whether the sets combined with the logical OR represent some higher-order concept (see Schneider and Wagemann 2012). Note also that the different supersets produced by this command are alternatives to each other: for example, if `A*B` is necessary, in the output you will find `A*B`, but also A, and also B. In summary, this command is useful because it gives you all potential necessary conditions in one go; but do not use it for mindless data-mining.

If you store the results of superSubset() as an object (here: nec), you can plot all compound necessary conditions by using pimplot():

```{r c141}
nec <- superSubset(mydata, outcome = "SURV",
  conditions = "DEV, URB, LIT, IND, STB",
  incl.cut = 0.9, cov.cut = 0.6)

pimplot(data=mydata, results=nec, outcome="SURV", necessity=TRUE, 
        all_labels=TRUE, jitter=TRUE)
```

Use the backward arrow above the plot window to view all plots. The latter also indicate all parameters of fit (including RoN) and the cases with membership > 0.5 in the outcome are labelled. 

# Chapter 7 Analysis of sufficiency

## Testing for single sufficient conditions

While the QCA technique always uses the truth table procedure for the analysis of sufficiency, in principle the sufficiency (consistency, coverage and PRI) of single conditions can also be (“deductively”) tested using the pof() command of the QCA package (just specify relation = "suf"), and visualized using, for instance, the xy.plot() command (again, specify necessity=FALSE) (see section 6.1). Alternatively, you can use QCAfit() of the SetMethods package (which additionally gives you the Haesebrouck consistency and also works for necessity, specify necessity=TRUE). You can negate the outcome by specifying neg.out=TRUE. If you don’t want to label your condition, skip cond.lab= "COND".

```{r c142}
QCAfit(LF$STB, LF$SURV, cond.lab= "STB", necessity=FALSE, neg.out=FALSE)
```
You can also do this for several conditions at once. Just build an object (here: conds) first that contains your conditions.

```{r c143}
conds <- subset(LF, select = c("DEV", "URB", "LIT"))
QCAfit(conds, LF$SURV, cond.lab= c("DEV", "URB", "LIT"), necessity=FALSE, neg.out=FALSE)
```

For combinations of conditions, we need truth tables

## 7.2 Building the truth table and logical minimazation

Very important: the zeros and ones in TTs are not to be confused with crisp set values! The zeros and ones refer to logical conditions, not set membership. 

First you want to figure out where to set the raw consistency threshold. For this you can produce a basic truth table, with raw consistencies (sorted by descending) and indicating individual cases (package: QCA).

```{r c145}
ttSURV <- truthTable(data=LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB",
              incl.cut=1.00, sort.by="incl, n", complete=FALSE, show.cases=TRUE) 
ttSURV 
```
Follong this advise to the generally recommended minimal inclusion score (.8), we get: 

```{r c145a}
ttSURV <- truthTable(data=LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB",
              incl.cut=.80, sort.by="incl, n", complete=FALSE, show.cases=TRUE) 
ttSURV
```

The truth table command provides several options: 
* We set the raw consistency threshold using incl.cut. If you skip this option, then all outcomes are coded 0 except for rows with consistency 1. 
* n.cut is used to specify a frequency threshold (by default: 1). 
* The sort.by option can be set such that the rows are ordered by raw consistency ("incl"), or by the N ("n"), or by both (as done here). With decreasing = FALSE, the truth table rows would be ordered in ascending 
(instead of descending) order, e.g., by ascending raw consistencies.
* With the show.cases option, we can choose whether we want to indicate the single cases contained in a truth table row (TRUE) or not (FALSE). 
* If the complete option is set to TRUE, then all logical remainders are also displayed; if FALSE, then only empirically observed truth table rows are displayed. 

The next command produces the truth table from c145a in the 'standard' form, with all 32 possible combination of the conditions: 

```{r c145b}
ttSURV <- truthTable(data=LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB",
              incl.cut=.80, complete=TRUE, show.cases=TRUE) 
ttSURV
```


The command below produces a truth table for the negated outcome, with raw consistency threshold 0.828, frequency threshold 1, sorted by descending raw consistency (and, raw consistency being equal, by N), showing only empirically observed rows, and showing individual cases. Note that we used lowercase letters here for labelling the truth table object because it analyzes the negated outcome. You will know that ttOUTCOME is the truth table for the positive outcome, and ttoutcome, for the negated outcome.

```{r c146}
ttsurv <- truthTable(LF, outcome="~SURV", conditions = "DEV, URB, LIT, IND, STB",
            incl.cut=0.828, n.cut=1, sort.by="incl, n", decreasing=TRUE, complete=FALSE, show.cases=TRUE)
ttsurv
```

The minimize() command (package: QCA) performs logical minimization of the truth table (i.e., the object we created with the truth table command, below: ttOUTCOME). First we calculate the **conservative** solution and obtain all details (consistency, raw and unique coverage). We also want to display the cases (show.cases) contained in the prime implicants (optional). The use.tilde option can be set to FALSE if you prefer using uppercase and lowercase notation for sets and their negation; and to TRUE if your prefer to denote negated sets with a tilde. It can be skipped and then upper-/lowercase notation is used.

```{r C147}
csSURV <- minimize(ttSURV, details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
csSURV
```
To calculate the **parsimonious** solution, we tell the software to include logical remainders (those with outcome “?”) representing simplifying assumptions.

```{r}
psSURV <- minimize(ttSURV, include="?", details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE)
psSURV
```

### Box 7: Useful tools for truth table analysis

```{r}
psOUTCOME <- psSURV
mydata <- LF
```


Once you see the truth table, in order to find the appropriate raw consistency threshold, you may want to plot different truth table rows. You can plot truth table rows easily using pimplot(), including the names of cases with membership > 0.5 in the row and all parameters of fit (consistency, coverage, PRI, Haesebrouck’s consistency). However, note that this requires you to already have calculated a solution which you stored as an object (here: psOUTCOME; you could use any raw consistency to begin with, just to get such a provisional solution object to work with). Here we plot the truth table rows number 1 and 26. Use the backward arrow above the plot window in order to get inspect all the different plots:

```{r c149}
pimplot(data=mydata, results=psOUTCOME, ttrows=c("1", "5", "32"), outcome= "SURV")
```
Note that the rows must have at least one case to get  plotted. 

You can also simply plot all truth table rows above a certain raw consistency level (here: 0.8) at once. The resulting plots have the row number as label of the X axis:

```{r c150}
pimplot(data=mydata, results=psOUTCOME, incl.tt=0.8, outcome= "SURV")
```

Assessing the consequences of setting a frequency threshold: 

If you set a frequency threshold of more than 1 and you want to see which rows are now treated as logical remainders by imposing such a threshold, type

```
ttoutcome$excluded
```
Exporting the truth table

You can save the truth table as a csv file. This will make it very easy to then copy-paste the table into your word file (e.g. the online appendix).
```
write.csv(ttOUTCOME$tt, "mytt.csv")
```

### Box 8: Default settings for logical minimization

The `row.dom` option for logical minimization, if set to TRUE, is used to further eliminate redundant prime implicants when solving the PI chart, applying the principle of row dominance: if a prime implicant X covers the same configurations as another prime implicant Y and in the same time covers other configurations which Y does not cover, then Y is redundant and eliminated. By setting `all.sol=TRUE`, you can derive all possible solutions, irrespective of the number of prime implicants.

To obtain a subset of the solution space, set `row.dom=TRUE` and `all.sol=FALSE`. This is the default setting that the QCA package implements if you do not specify these two options (as done here in this manual). Conversely, for revealing the full extent of model ambiguity, set `row.dom=FALSE` and `all.sol=TRUE` (see Baumgartner 2015 and Baumgartner and Thiem 2015). The usage of `all.sol = TRUE` does not represent the opinion of the QCA package author, where the default option is FALSE.

By presenting the templates using the default options, we do not intend to make a recommendation. It is good to be aware of the fact that there are often many possible solutions, and that you have several possibilities how to deal with this.

## 7.3 Standard Analysis: Specifying directional expectations

> Note. This section needs explanations. What does "standard analaysis" mean? What are directional expectations? 

To derive an intermediate solution using Standard Analysis (Ragin 2008), we may want to specify directional expectations. Just like with the parsimonious solution, we tell the software to include simplifying assumptions; but then we specify the directional expectations (dir.exp) for each condition in the same order as the conditions were listed when creating the truth table (see section 7.2). In the example, we assume that condition 1 contributes to the outcome when present (1); condition 2 contributes to the outcome when absent (0); and we have no directional expectation (“-“) for condition 3.

```{r}
isSURV  <- minimize(ttSURV, include = "?", details=TRUE, show.cases=TRUE, row.dom=TRUE, 
          all.sol=FALSE, dir.exp = "1, 0, -, 0, 1") 
# Note that you need as many expectations as there are conditions
isSURV
```

## 7.4 Enhanced Standard Analysis: Excluding truth table rows

When deriving an intermediate solution and performing an Enhanced Standard Analysis, we may or may not specify directional expectations; but in addition, we exclude those truth table 
rows from the analysis of sufficiency that a) display a negated necessary condition for the same outcome, b) display a sufficient condition for the negated outcome, or c) are logically implausible (the “pregnant man”). You have several possibilities to do this. In the running text, we discuss two of them; see Box 10 for more options.

### 7.4.1 Excluding truth table rows before logical minimization

We can build a new truth table where we simply tell the software to code those rows with outcome 0 that display a certain configuration of conditions – we can do this only for logical remainders, or for all truth table rows, whether they are empirically observed or logical remainders. This possibility is an easy and transparent way if you think that neither your observations nor your simplifying assumptions should contradict prior findings and / or logic, and you want to see what you did to the truth table before turning to logical minimization.

In a first step, we build the truth table (here: for the negated outcome and for 5 conditions) that also displays the logical remainders (complete=TRUE):

```{r c155}
ettoutcome <- truthTable(LF, outcome="~SURV", conditions = "DEV, URB, LIT, IND, STB",
              incl.cut=0.8, n.cut=1, sort.by="incl, n", decreasing=TRUE, complete=TRUE, show.cases=TRUE)
ettoutcome
```

As we have seen above (see c147), there are two sufficient conditions for the outcome SURV:
`M1: DEV*URB*LIT*IND*STB + DEV*urb*LIT*ind*STB => SURV`. 
They are hence not tenable for the negated outcome, `surv`. Let's find the rows with these two patterns: 

```{r c156}
rows <- findRows("DEV*urb*LIT*ind", ettoutcome, remainders = FALSE)
rows
```
I cannot use either of M1's conditions: Whenever I add the last condition element, I get an error, even though there are matching rows in the TT. 


```{r}
rows <- findRows(obj = ettoutcome, type = 2) # contradictory simplifying assumptions
rows
```
Having found rows, we can recode these to OUT=0 in the TT: 

```{r c159}
ettoutcome$tt[as.character(rows), "OUT"] <- 0 
ettoutcome
```
> it is worth unpackiung this command as it makes compact use of the object 

With the recoded TT, one can then calculated enhanced solutions, such as the enhanced parsimonous solution:

```{r c161}
epsoutcome <- minimize(ettoutcome, include="?", details=TRUE, show.cases=TRUE, 
              row.dom=TRUE, all.sol=FALSE)
epsoutcome
```

