2020-05-24

This walkthrough explores two aspects of conservation optimism (CO) among the 2161 people that took part in the Life in Conservation online survey:


Let’s start by looking at the distribution of responses in each CO question.

We can also look at which countries people are thinking about when asked “Which country’s conservation context are you most familiar with?” - national CO.

It appears we have a reasonable spread - although results are clustered in the USA, UK, and India.

Differences in CO

Let’s look at differences in national and general CO.

Let’s compare general and national CO. For now, let’s assume the data are numeric (rather than ordinal). The mean of the mean across the ten general CO questions is 2.42. The mean national CO is 2.23. Although there a statistically significant difference between the two, the actual difference is very small (paired two-sided t-test).

We can extend this, to look at the difference in optimism within countries (only showing those countries with >30 responses).

Associations between general and national CO

So we’ve looked at differences in the average level of general and national CO, but what about the correlation between the two.

All the CO questions are Likert-scaled, so are ordinal. Furthermore, some previous exploratory factor analysis (see here) suggested the following relationship between the items and general CO.

# The model
model_SO_simple <- '
SO =~ SO_1 + SO_2 + SO_3 +SO_4 + SO_5+ SO_6 + SO_7 + SO_8 + SO_10

# Correlated error terms 
SO_1 ~~ SO_2
SO_3 ~~ SO_4 

SO_5 ~~ SO_6 
SO_5 ~~ SO_8 
SO_6 ~~ SO_8
'

# Now lets run the SEM with the same
fit_SO_simple <- lavaan::cfa(model_SO_simple,  estimator = "WLSMVS", DF2[,SO_row_name], ordered = c("SO_1" , "SO_2" , "SO_3" , "SO_4", "SO_5", "SO_6", "SO_7", "SO_8",  "SO_10")  )

Let’s explore the correlation between national CO (ordinal) and general CO (latent variable). We can’t do this in a single step (as far as I’m aware), so we’ll split it into two.

First, we perform confirmatory factor analysis (with the custom correlated error structure described above). We then extract the factor scores (which represents where a respondent sits (estimated) along the range of a latent variable) - which we think represents the latent variable general CO (the model is well fit, with a RMSEA of 0.067 (95% CI 0.06 - 0.08)).

# The model
model_SO_simple <- '
SO =~ SO_1 + SO_2 + SO_3 +SO_4 + SO_5+ SO_6 + SO_7 + SO_8 + SO_10

# Correlated error terms 
SO_1 ~~ SO_2
SO_3 ~~ SO_4 

SO_5 ~~ SO_6 
SO_5 ~~ SO_8 
SO_6 ~~ SO_8
'

# Now lets run the SEM with the same
fit_SO_simple <- lavaan::cfa(model_SO_simple,  estimator = "WLSMVS", DF2[,SO_row_name], ordered = c("SO_1" , "SO_2" , "SO_3" , "SO_4", "SO_5", "SO_6", "SO_7", "SO_8",  "SO_10")  )

# Extract the factor scores 
Factor_main <-lavPredict(fit_SO_simple, newdata = DF2)
colnames(Factor_main) <- c("General.CO")

# And combine it with the original data 
DF2 <- cbind(DF2,Factor_main )

Second, we regress the factor scores (representing general CO) against national CO, within an ordinal regression. First, we run the ordinal regression.

## Waiting for profiling to be done...
OR 2.5% CI 97.5% CI
General.CO 7.51 6.24 9.06

Then we want to check that the equal interval assumption is met (this is a really useful walkthrough. There are more diagnostics that we can do, but the most informative is the plot shown below. Simply, we’re looking to see that for each predictor variable (in our case, only general CO), the distance between the levels of the dependent variable (national CO) remain similar. This doesn’t mean that the symbols of the same type have to be vertically aligned. It simply means that the distance between the two symbols looking at each horizontal line are similarly spaced, which in our case they are (nicely).

OK, now we’ve done the diagnostics, we can visualise the relationship between general and national CO.

# Create a dummy data frame which we use to predict levels SO_11 that we can plot 
newdat <- data.frame(General.CO = seq(min(DF2$General.CO), max(DF2$General.CO), length.out = 100))

# Predict outcomes 
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))

# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("General.CO" ),
  variable.name = "variable", value.name="value")

# Rename columns
colnames(lnewdat) <- c("General CO", "Level", "Probability" )

The latent variable general optimism is presented in standardized units. I.e. 1 on the x-axis of the below figure means 1 standard deviation of the latent unmeasured construct. So if we look at 2 on the x-axis, this is someone who has very low general CO. If we look up from there, we can see that they have a high probability of saying “probably won’t” when asked if the conservation goals for their country will be met. Similarly, you those have high general CO are likely to answer “probably will” or “definitely will” for the same national CO question. Neat!

Sub-factors of general CO and national CO

So we’ve explored the relationship between general and national CO. However, when specifying the structure of the relationship between each general CO question and the latent construct, we accounted for the dependencies between the paired questions (each pair supposedly corresponded to on Aichi strategic target). A (less well fit) alternative is to model two-orders of latent variable, shown below:

# The model
model_SO_all <- '
# Target 1
SO.AT1  =~ SO_1 + SO_2

# Target 2
SO.AT2 =~ SO_3 +SO_4 

# Target 3 and 4
SO.AT3.4 =~ SO_5+SO_6 + SO_8 

# Conservation optimism 
SO =~ SO.AT1 + SO.AT2 + SO.AT3.4 + SO_7 + SO_10 
'

# Now lets run the SEM with the same
fit_SO_all <- lavaan::sem(model_SO_all,  estimator = "WLSMVS", DF2[,SO_row_name], ordered = c("SO_1" , "SO_2" , "SO_3" , "SO_4", "SO_5", "SO_6", "SO_7", "SO_8", "SO_10")  )

Here, we assume that:

  • SO.AT1 = General optimism towards the goal to “Address the underlying causes of biodiversity loss by mainstreaming biodiversity across government and society”
  • SO.AT2 = General optimism towards the goal to “Reduce the direct pressures on biodiversity and promote sustainable use”
  • SO.AT3.4 = General optimism towards the goal to “improve the status of biodiversity by safeguarding ecosystems, species and genetic diversity”
  • The explanation for the other two variables is provided elsewhere - I won’t bother going into it now.

The RMSEA of this model is 0.068 (95% CI 0.06 - 0.08).

We might then choose to look at the relationship between each of these three ‘sub-factors’ of general CO and national CO. We’ll do that by just extracting three factors, corresponding to the seven constituent questions.

I’m going to proceed with the analysis. However, it is not appropriate, because we only have two items on which to estimate the two latent variables. I’m just do it for ‘fun’.

General-nature CO and national CO

options(knitr.kable.NA = '') # NA are blank
fa_out <-fa(DF2[, SO_row_name[c(1:6,8)]], nfactors = 3, cor = "poly",  fm = "wls",  rotate =  "oblimin") # run the model with two factors
## Loading required namespace: GPArotation
# loadings as df
fact_ma <- as.data.frame((fa_out$loadings)[1:7,1:3])
fact_ma[,1] <- ifelse(fact_ma[,1] <=.3, NA, fact_ma[,1])
fact_ma[,2] <- ifelse(fact_ma[,2] <=.3, NA, fact_ma[,2])
fact_ma[,3] <- ifelse(fact_ma[,3] <=.3, NA, fact_ma[,3])

# Add row names
lab_1 <- subset(variables_df, Code %in% SO_row_name[c(1:6,8)], select=c("Question"))
rownames(fact_ma) <- paste0(lab_1$Question[c(1:7)], " (", rownames(fact_ma), ")")
fact_ma <- round(fact_ma, 2)

# Present as a nice table
fact_ma %>%
    kable(format = "html", table.attr = "style='width:100%;'",
           col.names = c("Factor 1", "Factor 2",  "Factor 3")) %>%
  kable_styling()%>%
  row_spec(1:2,  background = "#b3e2cd")%>%
  row_spec(3:4,  background = "#fdcdac")%>%
  row_spec(5:7,  background = "#cbd5e8")
Factor 1 Factor 2 Factor 3
Public support for conservation will grow over the next ten years (SO_1) 0.47
Government spending on conservation will grow over the next ten years (SO_2) 1.01
The harmful impact of people on nature will be less in ten years’ time than it is now (SO_3) 0.6
Human society will be more environmentally sustainable in ten years’ time than it is now (SO_4) 0.9
There will be more wildlife in ten years’ time than there is today (SO_5) 0.93
There will be more natural areas and habitats in ten years’ time than there are today (SO_6) 0.75
Nature will be able to provide the same benefits to people in ten years’ time as now (SO_8) 0.47
# Extract the main factor scores 
Factor_main <- fa_out$scores[,1:3]
colnames(Factor_main) <- c("CO.Nature","CO.Sustainability", "CO.Support" )

# And combine it with the other data 
DF2 <- cbind(DF2, Factor_main )
# Ordinal regression 
ordinal_reg <- polr(SO_11_ord ~ CO.Nature + CO.Sustainability + CO.Support, data = DF2, Hess=TRUE)

# Create dummy DF
newdat <- data.frame(CO.Nature = seq(min(DF2$CO.Nature), max(DF2$CO.Nature), length.out = 100),
            CO.Sustainability = rep(mean(DF2$CO.Sustainability),100),
            CO.Support = rep(mean(DF2$CO.Support),100))

# Predict outcomes 
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))

# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("CO.Nature","CO.Sustainability", "CO.Support" ),
  variable.name = "variable", value.name="value")

# Rename columns
colnames(lnewdat) <- c("CO Nature", "CO Sustainability", "CO Support", "Level", "Probability" )
ggplot(lnewdat, aes(x = `CO Nature`, y = Probability, colour = Level)) +  geom_line()

General-sustainability CO and national CO

# Create dummy DF
newdat <- data.frame(CO.Nature = rep(mean(DF2$CO.Nature),100),
            CO.Sustainability = seq(min(DF2$CO.Sustainability), max(DF2$CO.Sustainability), length.out = 100), CO.Support = rep(mean(DF2$CO.Support),100))

# Predict outcomes 
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))

# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("CO.Nature","CO.Sustainability", "CO.Support" ),
  variable.name = "variable", value.name="value")

# Rename columns
colnames(lnewdat) <- c("CO Nature", "CO Sustainability", "CO Support", "Level", "Probability" )
ggplot(lnewdat, aes(x = `CO Sustainability`, y = Probability, colour = Level)) +  geom_line()

General-support CO and national CO

# Create dummy DF
newdat <- data.frame(CO.Nature = rep(mean(DF2$CO.Nature),100),
            CO.Sustainability= rep(mean(DF2$CO.Sustainability),100), CO.Support  = seq(min(DF2$CO.Support), max(DF2$CO.Support), length.out = 100) ) 

# Predict outcomes 
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))

# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("CO.Nature","CO.Sustainability", "CO.Support" ),
  variable.name = "variable", value.name="value")

# Rename columns
colnames(lnewdat) <- c("CO Nature", "CO Sustainability", "CO Support", "Level", "Probability" )

There is a weak association between the sub-factors of general CO and national CO. I suspect this is because most of the sub-factors are estimated from only two items.

Predictors of general CO

Here, I’m just going to look at the associations between general CO and respondents characteristics, using structural equation models (more details on decisions in the SEM process can be found here. Significance codes: 0.001 = ***, 0.01 = **, 0.05 = *,0.1 = ..

Age

Variable Estimate p-value
age_year -0.5489 ***

Gender

There does not appear to be a significant difference by gender.

Variable Estimate p-value
gender_Male 0.0500 .
gender_Other -0.0585

Position

The reference level is ‘researcher’.

Variable Estimate p-value
position_Administration 0.1032
position_Bachelorsstudent 0.1409
position_Consultantself_employed -0.0771
position_Fieldworker -0.0056
position_Graduatestudent 0.0736
position_Intern 0.8533 ***
position_Manager 0.0105
position_Other 0.1251 **
position_Policymaker 0.1759
position_Ranger 0.6103 ***
position_Unknown 0.0615

Desk/non-desk

The reference level is ‘desk’.

Variable Estimate p-value
position_simple 0.0182

Years in conservation

Variable Estimate p-value
years_cons -0.1951 ***

Education

The reference level is ‘college’.

Variable Estimate p-value
education_University -0.0350
education_Primary -0.0218
education_Secondary -0.2069 .

University/non-university

The reference level is non-university.

Variable Estimate p-value
education_simple 0.0448

Predictors of national CO

OK, lets do the same but looking at predictors of national CO, using ordinal regression (but not bothering to check the model assumptions, and doing it as one multi-variate analysis. The continuous variables, like age, have been scaled. Reference levels are:

Clearly something going a bit awry with ‘educationUnknown’ - probably because there are so few observation, so I need to look into that.

OR 2.5% CI 97.5% CI
age_year 0.82 0.47 1.44
genderMale 1.16 0.96 1.40
genderOther 0.34 0.07 1.67
genderPrefer not to say 1.17 0.41 3.31
positionBachelors student 1.68 0.70 4.06
positionConsultant/self-employed 0.76 0.40 1.42
positionFieldworker 0.96 0.52 1.79
positionGraduate student 0.63 0.35 1.13
positionIntern 1.71 0.65 4.48
positionManager 0.71 0.40 1.25
positionOther 0.78 0.43 1.41
positionPolicymaker 0.69 0.28 1.68
positionRanger 1.06 0.40 2.76
positionResearcher 0.67 0.39 1.14
years_cons 0.66 0.51 0.86
educationUnknown 3888603.64 3888599.08 3888608.20
educationPrimary 0.76 0.13 4.28
educationSecondary 1.44 0.51 4.05
educationUniversity 1.13 0.70 1.84