COVID-19 and ranger perception: data exploration

Hypothesis

Below are some possible hypothesis that we could test:

Threats to biodiversity are perceived to have increased since the start of COVID-19
The perceived change in threats to biodiversity are not homogeneous, and vary between individuals and locations.
1. Existing measures of threat (Red List Index for each country)
2. Between regions
3. Between remote and non-remote sites
4. Based on individual years of experience
Threats to biodiversity are perceived to have increased in places where COVID-19 is perceived to have negatively impacted management activities.

Overview of sections

The following sections illustrate:

Patterns of missing data
Summaries of each variable
Bivariate associations between variables
Multivariate associations

Patterns of missing data

The raw dataset contains 915 observations. The next plot shows the pattern of missing data.

## 
##             Agree          Disagree    Not applicable    Strongly agree 
##               213               295               136               106 
## Strongly disagree 
##                61

## 
##             Agree          Disagree    Strongly agree Strongly disagree 
##               213               295               106                61

The dataset containing missing observations (including where the respondents indicate the question is ‘not applicable’) is 915. Removing these missing observation, the dataset contains 493 complete observations. We could use multiple imputation methods to impute synthetic values for the truly ‘missing data’, but this is not suitable for imputing the ‘not applicable’ observations. Alternatively, instead of using a single SEM, we could repeat the analysis for each explanatory variable in turn (and using complete observations for that variable), which might yield slightly larger datasets for each analysis. However, we then use the information that exists across patterns of responses (discussed below).

Summaries of each variable

Now, let’s look at some summaries of each variable.

Observations by site

The below shows the number of sites with a given number of observations. For instance, there are 360 sites with only one observation but 1 sites with 11 observations. This suggest there is probably no need for a random effect for site.

# Number of sites with a given number of observations
table(table(ranger_3$Q1PA_Name))

## 
##   1   2   3   4   5   7   9  10  11 
## 360  24  10   1   1   1   2   1   1

Observations by country

First, which countries are represented in the sample. There are 53 unique countries represented in the dataset.

levels(factor(ranger_3$Q2Country_ENG))

##  [1] "Argentina"                        "Australia"                       
##  [3] "Bhutan"                           "Brazil"                          
##  [5] "Canada"                           "Chile"                           
##  [7] "China"                            "Colombia"                        
##  [9] "Costa Rica"                       "Côte d'Ivoire"                   
## [11] "Democratic Republic of the Congo" "Ecuador"                         
## [13] "France"                           "Germany"                         
## [15] "Ghana"                            "Guinea"                          
## [17] "Iceland"                          "India"                           
## [19] "Indonesia"                        "Israel"                          
## [21] "Kenya"                            "Liberia"                         
## [23] "Malaysia"                         "Mexico"                          
## [25] "Mozambique"                       "Namibia"                         
## [27] "Nepal"                            "Netherlands"                     
## [29] "New Zealand"                      "Nigeria"                         
## [31] "Pakistan"                         "Paraguay"                        
## [33] "Peru"                             "Philippines"                     
## [35] "Portugal"                         "Romania"                         
## [37] "Solomon Islands"                  "South Africa"                    
## [39] "Spain"                            "Sri Lanka"                       
## [41] "Switzerland"                      "Tanzania"                        
## [43] "Thailand"                         "The Bahamas"                     
## [45] "UAE"                              "Uganda"                          
## [47] "United Kingdom"                   "Uruguay"                         
## [49] "USA"                              "Venezuela"                       
## [51] "Vietnam"                          "Zambia"                          
## [53] "Zimbabwe"

We can also look at the distribution of observations between countries. There is 1 country with 97 observations, but then a reasonable spread of observations with the other countries.

# Number of countries with a given number of observations
table(table(ranger_3$Q2Country_ENG))

## 
##  1  2  3  4  5  6  9 10 11 12 13 15 16 18 19 22 24 25 69 97 
## 15  5  7  2  4  2  2  1  1  2  3  1  1  1  1  1  1  1  1  1

Observations by region

We can also look at the observation in each region, which shows reasonably good spread between regions.

# Plot as table 
table(ranger_3$RegionCode)

## 
##                 South America                 North America 
##                            74                            70 
##                          Asia                        Africa 
##                           201                            60 
##                        Europe         Australia and Oceania 
##                            56                            19 
## Central America and Caribbean 
##                            13

Age

summary(ranger_3$Q3Age_Number)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.00   31.00   37.00   39.06   45.00   68.00

Gender

# Inspect 
summary(as.factor(ranger_3$Q4Sex_ENG))

## Female   Male 
##     78    415

Years experience

summary(ranger_3$Q5Experince_Number)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    4.00    8.00   10.85   15.00   40.00

Living and working in remote locations without healthcare facility

# Recode to factor 
ranger_3$Q6__Living_Working_Remotelocation_without_healthcareFacility <- as.factor(ranger_3$Q6__Living_Working_Remotelocation_without_healthcareFacility)

# Table
table(ranger_3$Q6__Living_Working_Remotelocation_without_healthcareFacility)

## 
##  No Yes 
## 339 154

Red List Index score

I’ve included this additional variable, which is the Red List Index score. The Red List Index “shows trends in the status of groups of species based only on genuine improvements or deteriorations in status of sufficient magnitude to qualify species for listing in more threatened or less threatened Red List Categories.” I’ve use the aggregated RLI across the five taxonomic groups birds, mammals, amphibians, cycads and warm-water reef-forming corals. This score is disaggregated by country, which has been weighted by the fraction of each species’ distribution occurring within a particular country. The RLI ranges between 1 (no threatened species) and 0 (all highly threatened species).

summary(ranger_3$rli)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5706  0.7736  0.8210  0.8034  0.8489  0.9836

Perceived negative impacts of COVID-19 on conservation

This can probably be better displayed as a divergent barplot, but it’ll do for now. We can respondents overwhelmingly agree or strongly agree with the statements. We can’t read to much into this in absolute terms, because all items are positively worded and so there is likely to be acquiescence bias.

Note that we’ve removed the not applicable category here. When we present the results, it might be worth still including the ‘not applicable’ responses in the descriptive statistics, since this information is still useful. Then subset the data, excluding the ‘not applicable’ responses, in the statistical analysis.

# Subset the labels and data for impacts 
labels.imp <- labels_list[9:14]
data.imp<- ranger_3[9:14]

# Recode variable name 
names(data.imp)<- labels.imp

# Melt into DF
out <- melt(apply(data.imp, 2, table))


# Add paragraph breaks
out$X2 <- as.factor(sapply( strwrap(out$X2, 55, simplify=FALSE), paste, collapse="\n" ))
labels.imp <- as.factor(sapply( strwrap(labels.imp, 55, simplify=FALSE), paste, collapse="\n" ))

# Remove Question numbers 
out$X2 <- trimws(gsub("Q7 |Q8|Q9 |Q10|Q11|Q12", "", out$X2))
labels.imp <- trimws(gsub("Q7 |Q8|Q9 |Q10|Q11|Q12", "", labels.imp))

# Response order
level_order2 <- c("Strongly agree", "Agree", "Disagree","Strongly disagree")

# The plot
ggplot(data=out, aes(x= factor(X2, level = rev(labels.imp)), y= value, fill=  factor(X1, level = level_order2))) + geom_bar(stat="identity", position="fill", width = 0.85)  + coord_flip() + theme(legend.position=c(.1,-.18) , legend.direction="horizontal", legend.title = element_blank(),  axis.text.y = element_text(size=11), plot.margin = margin(t = 0.05, r = 0.05, b = 1.5, l = 0.05, "cm")) + guides(fill = guide_legend(reverse = TRUE)) + ylab("Percent") + xlab("Statement")  + scale_y_continuous(breaks = c(0, .25, 0.5,.75, 1), labels = c("0%", "25%", "50%", "75%", "100%"))

Perceived threat changes since COVID-19

Again, probably better displayed divergently. However, interesting to see respondents are more ambivalent about changes in threats. Again, see the comments above about the non-applicable responses.

# Subset the labels and data for impacts 
labels.imp <- labels_list[15:19]
data.imp<- ranger_3[15:19]

# Recode variable name 
names(data.imp)<- labels.imp

# Melt into DF
out <- melt(apply(data.imp, 2, table))

# Add paragraph breaks
out$X2 <- as.factor(sapply( strwrap(out$X2, 55, simplify=FALSE), paste, collapse="\n" ))
labels.imp <- as.factor(sapply( strwrap(labels.imp, 55, simplify=FALSE), paste, collapse="\n" ))

# Remove Question numbers 
out$X2 <- trimws(gsub("Q44|Q45|Q46|Q47|Q48", "", out$X2))
labels.imp <- trimws(gsub("Q44|Q45|Q46|Q47|Q48", "", labels.imp))

# Response order
level_order2 <- c("Strongly agree", "Agree", "Disagree","Strongly disagree")

# The plot
ggplot(data=out, aes(x= factor(X2, level = rev(labels.imp)), y= value, fill=  factor(X1, level = level_order2))) + geom_bar(stat="identity", position="fill", width = 0.85)  + coord_flip() + theme(legend.position=c(.1,-.18) , legend.direction="horizontal", legend.title = element_blank(),  axis.text.y = element_text(size=11), plot.margin = margin(t = 0.05, r = 0.05, b = 1.5, l = 0.05, "cm")) + guides(fill = guide_legend(reverse = TRUE)) + ylab("Percent") + xlab("Statement")  + scale_y_continuous(breaks = c(0, .25, 0.5,.75, 1), labels = c("0%", "25%", "50%", "75%", "100%"))

Bivariate associations between variables

My main point of interest here is how impacts of COVID-19 on conservation activities and resultant changes to threats to biodiversity vary between regions and remote vs non-remote areas (and by experience). For starters, I’m just going to graphically display the predicted probabilities of saying agree or strongly agree to each set of items, within each region.

Impacts of COVID-19 and regions

I’ve not bothered trying to visualise the uncertainty around the estimates, since this is a bit of a headache. Nevertheless, you can see the predicted probabilities of saying agree or strongly agree to each of the impact statements (numbered 1-6, corresponding to the order presented above and in the survey). I’ve added the lines so we can see trends for the same impact across regions.

A couple of take-homes:

There appears to consistent patterns agreement with some impacts of COVID-19 across regions. For instance, many respondents agreed that community engagement had been negatively impacted by COVID-19 across all regions. However, fewer agreed that COVID-19 had affected monitoring and census activities across countries.
At the same time, there was variability in agreement about the scale of impacts between regions. For instance, many agreed with all of the statements in Africa but fewer in other places like Europe.
Also, there was quite wide spread in agreement with different goals between different regions. For instance, there was a wider spread in probability of agreement for each impact in LAC, but narrower spread in Africa.

In summary, there appear to be important and interesting regional differences. Trying to understand the source of these differences might suggest interesting further analysis. For instance, factors that might influence the perceived impact could include:

The number of cases of COVID-19
The amount of conservation resources/funding available (perhaps based on data by Waldron)
The quality of governance (possibly METT or national governance in general)
Different conservation priorities in different areas

Perceived threat changes by regions

Similarly, we can also look at perceived threat changes between regions. Once again, there appears to be a couple of interesting trends:

Unsurprisingly, perceived threat changes appear to be associated with the normal prevalence of those threats. For instance, there is limited subsistence hunting in Europe, and so it is to be expected that there is limited perceived increase in hunting. Equally, subsistence hunting is prevalent in Africa and South America, and this is also where there are the greatest perceived increases in hunting pressure.
Again, there is greater spread in perceived threat increases in some places compared to others. For instance, there appears to be greater spread of threat increases in North America than South America.

Once again, asking why these differences exist might reveal some interesting ways to use this data.

Impacts of COVID-19 and remote vs. non-remote

To interpret these, look to see if there appears to be differences between pairs of response levels between “yes” and “no”. I.e. does agree in the “yes” column appear to be significantly different to agree in the “no column”. For the most part, there does not seem to be significant differences in agreement based on remoteness.

Perceived threat changes and remote vs. non-remote

As above, to interpret these, look to see if there appears to be differences between pairs of response levels between “yes” and “no”. Here, there does appear to be significant differences in perceived threat change in remote vs non-remote locations. Those in remote locations are significantly more likely to agree with the statements, and significantly less likely to disagree with them. In other words, those in remote locations perceive greater threat changes than those not in remote locations.

## adding dummy grobs

Impacts of COVID-19 and experience

Here we can also examine the relationship between experience and agreement with each impact statement. These plots (accompanied by an inspection of p-values), suggests that likelihood of agreement doesn’t really change with age.

Impacts of COVID-19 and experience

Perceived threat changes and RLI

We can also examine the association between the impacts of COVID-19 and Red List Index score - i.e. are the impacts on management perceived to be worse in places that are highly threatened? For the most part, there does not appear to be clear associations.

Perceived threat changes and RLI

Similarly, we can look at the associations between perceived threat change and RLI. Here, for the most part it appears like those in countries with high RLI (meaning few threatened species) are more likely to disagree with the statements saying the threat has increased. Conversely, those in countries with low RLI (meaning a high proportion of threatened species) are more likely to say the threats have increased. In other words, the perceived threat change appears to be greatest in areas that are already most threatened.

## adding dummy grobs

Multivariate associations

Hypothesis 1 simply involved looking at the patterns of agreement and disagreement with each statement. However, hypothesis 2 and 3 explore sources of variation in agreement within among respondents and between locations, explored through multivariate analysis. Within this multivariate the response variable could be perceived threat to biodiversity. The explanatory variables could include Red List Index scores, regions, remote v non-remote locations, and perceived impact of COVID-19 on management activities. Before fitting the multivariate analysis, we need to decide on how to model the response variable.

Perceived threat to biodiversity

We have a number of options to model the perceived threat to biodiversity.

The simplest option would be to run a series of multivariate ordinal logistic regression, one for each threat. This would involve making the fewest assumptions. However, we would not leverage the information existing across the five statements. Moreover, it might make the interpretation a bit tedious. I think it is worth running this analysis for ourselves, but I’m not sure this should be the final result.
Another option is if we believe there was some underlying driver of threat then we could use the covariation between the five threat variables to estimate a latent threat variable. In other words, we’d have to make the case that there is a single factor that increases all threat risks. I think this is perhaps justifiable, and we can test this assumption through exploratory factor analysis.
Finally, another option is to stack the observations row-wise, creating a single variable that was the level of agreement for all statements, and then using the statement name as an explanatory variable, and respondent ID as a random effect. This does not assume there is a single latent variable driving all responses, and takes into account the different likelihood of people agreeing with each goal type as well as accounting for the pseudoreplication introduced by having multiple observations from the same individual. However, it is slightly less elegant and might be more challenging to explain and justify (even though it’s fine from a statistical perspective).

Among these three options, the second is perhaps the easiest to rule in or out as an approach. Doing so involves:

Breaking the data into a training and test dataset.
Exploring the covariance between the threat statements.
Conducting a parallel analysis to assess the number of factors to extract.
Conducting an explanatory analysis (with the training data).
Conducting a confirmatory analysis with the test data.
Fitting a SEM with the core explanatory variables.

Before proceeding through these steps, lets create numeric duplicates of the ordinal data.

# Function to convert from ordinal to numeric
ord_num_threat_imp <- function(x){
  ifelse(x == "Strongly disagree", 1,
         ifelse(x == "Disagree", 2, 
              ifelse(x == "Agree", 3, 
                              ifelse(x == "Strongly agree", 4, 999))))
}

# Recode threats as numeric 
threat_recode <- data.frame(apply(ranger_3[,threat_resp_list ],2,ord_num_threat_imp ))
impact_recode <- data.frame(apply(ranger_3[,impact_resp_list ],2,ord_num_threat_imp ))

# Numeric classifier 
names(threat_recode) <- paste0(names(threat_recode), "_num")
names(impact_recode) <- paste0(names(impact_recode), "_num")

# Bind to original dataset
ranger_3 <- cbind(ranger_3, threat_recode)
ranger_3 <- cbind(ranger_3, impact_recode)

# List of variables 
threat_resp_list_num <- names(threat_recode)
impact_resp_list_num <- names(impact_recode)

1. Breaking the data into a training and test dataset

We’re splitting the data into training and test data, with 70% of randomly selected observations used as the training dataset and the remaining used as the test data.

# Choosing the sample size (the seed has been set above, so is reproducible)
smp_size <- floor(.70*nrow(ranger_3))

## Training observations
train_obs <- sample(seq_len(nrow(ranger_3)), size = smp_size)

# Subset to test and training 
ranger_train <- ranger_3[train_obs, ]
ranger_test <- ranger_3[-train_obs, ]

2. Exploring the covariance between the threat statements

The figure below describes the polychoric correlation between the threat statements using the training data. There seems to be strong positive correlation between all threats, which is promising.

The Ordinal Alpha shows good internal consistency between items.

# Ordinal Alpha 
round(alpha(poly_cor$rho)$total,2)

##  raw_alpha std.alpha G6(smc) average_r   S/N median_r
##       0.92      0.92    0.92       0.7 11.84      0.7

3. Conducting a parallel analysis to assess the number of factors to extract

Parallel analysis using polychoric correlation and WLS estimator, suggesting the extraction of 1 factor.

## Parallel analysis suggests that the number of factors =  1  and the number of components =  NA

4. Conducting an explanatory analysis (with the training data)

Next, we can conducted the factor analysis extracting 1 factor and using and WLSMVS estimator with polychoric correlation. We can see all items are heavily and significantly loaded onto a single factor.

# Factor analysis
fa_mod1 <- efaUnrotate(ranger_train[,threat_resp_list_num], nf = 1, estimator = "WLSMVS", ordered = c("Q44S_IllegalHunting_ForSubsistance_Increased_num", "Q45_IllegalHunting_ForProfit_Increased_num" , "Q46_Illegal_Logging_Increased_num","Q47_IllegalEnchroachment_Increased_num", "Q48_Increase_OtherPressures_NTFP_OnPA_num"))

fa_mod1_out[1:5,c(3,6,9)]

##                                             PE.rhs    PE.est PE.pvalue
## 1 Q44S_IllegalHunting_ForSubsistance_Increased_num 0.9537914         0
## 2       Q45_IllegalHunting_ForProfit_Increased_num 0.9397049         0
## 3                Q46_Illegal_Logging_Increased_num 0.8651787         0
## 4           Q47_IllegalEnchroachment_Increased_num 0.8186173         0
## 5        Q48_Increase_OtherPressures_NTFP_OnPA_num 0.7061899         0

The root mean square error of approximation of this model is 0.128 (95% CI 0.09 - 0.17), and the Comparative Fit Index (CFI) is 0.998. The Tucker–Lewis index is 0.996. Unfortunately, this shows poor model fit, which is somewhat unexpected given the above appeared to be going well.

5. Conducting a confirmatory analysis with the test data

Now, lets do the final step and perform the CFA with the test data (using and WLSMVS estimator with polychoric correlation), using the structure identified above and inspect the fit statistics.

# The model
model_threat_1 <- '
threat =~ Q44S_IllegalHunting_ForSubsistance_Increased_num + Q45_IllegalHunting_ForProfit_Increased_num + Q46_Illegal_Logging_Increased_num + Q47_IllegalEnchroachment_Increased_num + Q48_Increase_OtherPressures_NTFP_OnPA_num
'

# Now lets run the SEM with the same
fit_model_threat_1 <- lavaan::cfa(model = model_threat_1,  estimator = "WLSMVS", data=ranger_test[,threat_resp_list_num], ordered = c("Q44S_IllegalHunting_ForSubsistance_Increased_num", "Q45_IllegalHunting_ForProfit_Increased_num" , "Q46_Illegal_Logging_Increased_num","Q47_IllegalEnchroachment_Increased_num", "Q48_Increase_OtherPressures_NTFP_OnPA_num"))

As above, we can check the fit statistics. The root mean square error of approximation of this model is 0.073 (95% CI 0 - 0.15), and the Comparative Fit Index (CFI) is 0.998. The Tucker–Lewis index is 0.996. Again, this describes poor model fit. There might be any number of reasons for this. However, we have seen that region is an important variable associated with threat. We can see if the model has a better fit with its inclusion.

6. Fitting a SEM with the core explanatory variables

# The model
model_threat_2 <- '

# Estimate threat
threat =~ Q44S_IllegalHunting_ForSubsistance_Increased_num + Q45_IllegalHunting_ForProfit_Increased_num + Q46_Illegal_Logging_Increased_num + Q47_IllegalEnchroachment_Increased_num + Q48_Increase_OtherPressures_NTFP_OnPA_num

### Regression part
threat ~ 

# Region 
RegionCode_South_America  + RegionCode_Asia + RegionCode_Africa + RegionCode_Europe + RegionCode_Australia_and_Oceania + RegionCode_Central_America_and_Caribbean    
'

# Now lets run the SEM with the same
fit_model_threat_2 <- lavaan::sem(model = model_threat_2,  estimator = "WLSMVS", data=ranger_train, ordered = c("Q44S_IllegalHunting_ForSubsistance_Increased_num", "Q45_IllegalHunting_ForProfit_Increased_num" , "Q46_Illegal_Logging_Increased_num","Q47_IllegalEnchroachment_Increased_num", "Q48_Increase_OtherPressures_NTFP_OnPA_num"))

RMSEA = 0.075 (95% CI 0.06 - 0.09)
CFI = 0.994
TLI = 0.998.

This is a massive improvement, and comes well within the range of acceptable fit measures. We can also see if the fit improves through the inclusion of the other variables we believe to be important correlates.

# The model
model_threat_3 <- '

# Estimate threat
threat =~ Q44S_IllegalHunting_ForSubsistance_Increased_num + Q45_IllegalHunting_ForProfit_Increased_num + Q46_Illegal_Logging_Increased_num + Q47_IllegalEnchroachment_Increased_num + Q48_Increase_OtherPressures_NTFP_OnPA_num

### Regression part
threat ~ 

# Region 
RegionCode_South_America  + RegionCode_Asia + RegionCode_Africa + RegionCode_Europe + RegionCode_Australia_and_Oceania + RegionCode_Central_America_and_Caribbean +

# Experience 
Q5Experince_Number +

# Remote location 
Q6__Living_Working_Remotelocation_without_healthcareFacility_Yes +

# RLI
rli
'

# Now lets run the SEM with the same
fit_model_threat_3 <- lavaan::sem(model = model_threat_3,  estimator = "WLSMVS", data=ranger_train, ordered = c("Q44S_IllegalHunting_ForSubsistance_Increased_num", "Q45_IllegalHunting_ForProfit_Increased_num" , "Q46_Illegal_Logging_Increased_num","Q47_IllegalEnchroachment_Increased_num", "Q48_Increase_OtherPressures_NTFP_OnPA_num"))

#  return the coefficients derived from regression
P_table_regre <- P_table[ which(P_table$PE.op == '~'), ]
P_table_regre[,c(3,11,8)]

##                                                              PE.rhs
## 6                                          RegionCode_South_America
## 7                                                   RegionCode_Asia
## 8                                                 RegionCode_Africa
## 9                                                 RegionCode_Europe
## 10                                 RegionCode_Australia_and_Oceania
## 11                         RegionCode_Central_America_and_Caribbean
## 12                                               Q5Experince_Number
## 13 Q6__Living_Working_Remotelocation_without_healthcareFacility_Yes
## 14                                                              rli
##       PE.std.lv       PE.pvalue
## 6   1.091619066 0.0000012879604
## 7   0.317924393 0.0626130062680
## 8   0.974085927 0.0000006147344
## 9  -0.462300101 0.0207343759743
## 10  0.280401927 0.2923386771809
## 11  0.321644256 0.3445781633430
## 12 -0.008627366 0.1609070534137
## 13  0.271672667 0.0234664214321
## 14  0.004593891 0.9957724203832

RMSEA = 0.059 (95% CI 0.04 - 0.08)
CFI = 0.994
TLI = 0.999.

The fit measures are even better. Although we do expect them to decrease as the model becomes more complex, this is still a relatively simply model. Inspecting the results, it is somewhat surprising that RLI is not associated with the latent threat variable, as suggested in the bivariate analysis. This might indicate a degree of masking going on. Let’s repeat the analysis, but without region.

# The model
model_threat_4 <- '

# Estimate threat
threat =~ Q44S_IllegalHunting_ForSubsistance_Increased_num + Q45_IllegalHunting_ForProfit_Increased_num + Q46_Illegal_Logging_Increased_num + Q47_IllegalEnchroachment_Increased_num + Q48_Increase_OtherPressures_NTFP_OnPA_num

### Regression part
threat ~ 

# Experience 
Q5Experince_Number +

# Remote location 
Q6__Living_Working_Remotelocation_without_healthcareFacility_Yes +

# RLI
rli
'

# Now lets run the SEM with the same
fit_model_threat_4 <- lavaan::sem(model = model_threat_4,  estimator = "WLSMVS", data=ranger_train, ordered = c("Q44S_IllegalHunting_ForSubsistance_Increased_num", "Q45_IllegalHunting_ForProfit_Increased_num" , "Q46_Illegal_Logging_Increased_num","Q47_IllegalEnchroachment_Increased_num", "Q48_Increase_OtherPressures_NTFP_OnPA_num"))

# Return the coefficients derived from regression
P_table_regre <- P_table[ which(P_table$PE.op == '~'), ]
P_table_regre[,c(3,11,8)]

##                                                             PE.rhs    PE.std.lv
## 6                                               Q5Experince_Number -0.009440558
## 7 Q6__Living_Working_Remotelocation_without_healthcareFacility_Yes  0.531696919
## 8                                                              rli -2.046765435
##       PE.pvalue
## 6 0.14596460600
## 7 0.00001191381
## 8 0.00895001886

RMSEA = 0.059 (95% CI 0.03 - 0.08)
CFI = 0.998
TLI = 0.999

This suggests that the inclusion of the region variable did mask the association with RLI. I think we should think about other relevant variables that vary between locations, rather than including region (since this is likely to obscure important regional effects that we might be interested in).

Perceived impacts of COVID-19 on conservation effort

Another key explanatory variable or set of variables is the perceived impact of COVID-19 on conservation efforts. We could include each statement as a variable in its own right. This might offer the advantage of showing which impacts of COVID-19 appear to be more closely associated with changes in threat status. However, these variables might be highly correlated with each other, potentially suggesting they have a common driving latent variable - disruption from COVID-19. Let’s repeat some of the steps above, exploring covariance between the impact variables.

1. Exploring the covariance between the threat statements

This suggests relatively strong covariance between the impact statements. As a result, I’d be inclined not to include each as a separate variable in the full SEM.

2. Conducting a parallel analysis to assess the number of factors to extract

Parallel analysis using polychoric correlation and WLS estimator, again suggesting the extraction of 1 factor.

## Parallel analysis suggests that the number of factors =  1  and the number of components =  NA

3. Conducting an explanatory analysis (with the training data)

Now, let’s do the factor analysis extracting 1 factor and using and WLSMVS estimator with polychoric correlation. All items are heavily and significantly loaded onto a single factor.

# Factor analysis
fa_mod1 <- efaUnrotate(ranger_train[,impact_resp_list_num], nf = 1, estimator = "WLSMVS", ordered = c("Q7_NegativeImpacts_LawEnforcementActivity_num","Q8_NegativelyImpacted_HabitatManagement_num", "Q9_NegativelyImpacted_WildlifeMonitoringCensus_num","Q10_Negativelyimpacted_CommunityEducationConservation_num","Q11_NegativelyImpacted_HumanWildlifeConflict_num","Q12_NegativeImpacted_Training_CapacityBuilding_num"))

fa_mod1_out[1:5,c(3,6,9)]

##                                                      PE.rhs    PE.est PE.pvalue
## 1             Q7_NegativeImpacts_LawEnforcementActivity_num 0.8264260         0
## 2               Q8_NegativelyImpacted_HabitatManagement_num 0.8261414         0
## 3        Q9_NegativelyImpacted_WildlifeMonitoringCensus_num 0.8141798         0
## 4 Q10_Negativelyimpacted_CommunityEducationConservation_num 0.7437867         0
## 5          Q11_NegativelyImpacted_HumanWildlifeConflict_num 0.7129135         0

RMSEA = 0.109 (95% CI 0.08 - 0.14)
CFI = 0.99
TLI = 0.984

The root mean square error of approximation is not great - it’s higher than we’d ideally like, but not outside the realms of what would be considered an adequate model.

4. Conducting a confirmatory analysis with the test data

Now, lets perform the CFA with the test data (using and WLSMVS estimator with polychoric correlation).

# The model
model_impact_1 <- '
impact =~ Q7_NegativeImpacts_LawEnforcementActivity_num + Q8_NegativelyImpacted_HabitatManagement_num +  Q9_NegativelyImpacted_WildlifeMonitoringCensus_num + Q10_Negativelyimpacted_CommunityEducationConservation_num + Q11_NegativelyImpacted_HumanWildlifeConflict_num + Q12_NegativeImpacted_Training_CapacityBuilding_num 
'

# Now lets run the SEM with the same
fit_model_impact_1 <- lavaan::cfa(model = model_impact_1,  estimator = "WLSMVS", data=ranger_test[,impact_resp_list_num], ordered = c("Q7_NegativeImpacts_LawEnforcementActivity_num","Q8_NegativelyImpacted_HabitatManagement_num", "Q9_NegativelyImpacted_WildlifeMonitoringCensus_num","Q10_Negativelyimpacted_CommunityEducationConservation_num","Q11_NegativelyImpacted_HumanWildlifeConflict_num","Q12_NegativeImpacted_Training_CapacityBuilding_num"))

RMSEA = 0.104 (95% CI 0.05 - 0.16)
CFI = 0.995
TLI = 0.991

The fit measures appear adequate using the test data.

Full SEM

Let’s combine what we’ve done so far, with the latent perceived change in threat as the response variable, and experience, remote vs non-remote, RLI and latent impact of COVID-19 on conservation as explanatory variables. Additionally, I’ve scaled and centred the continuous variables (experience and RLI) so they’re more easily comparable.

# Scale and centre continous variables  
ranger_3$Q5Experince_Number <- scale(ranger_3$Q5Experince_Number, scale = T, center = T)
ranger_3$rli <- scale(ranger_3$rli, scale = T, center = T)

# The model
model_threat_full_1 <- '
### Latent part ###
# Estimate threat
threat =~ Q44S_IllegalHunting_ForSubsistance_Increased_num + Q45_IllegalHunting_ForProfit_Increased_num + Q46_Illegal_Logging_Increased_num + Q47_IllegalEnchroachment_Increased_num + Q48_Increase_OtherPressures_NTFP_OnPA_num

# Estimate impact 
impact =~ Q7_NegativeImpacts_LawEnforcementActivity_num + Q8_NegativelyImpacted_HabitatManagement_num +  Q9_NegativelyImpacted_WildlifeMonitoringCensus_num + Q10_Negativelyimpacted_CommunityEducationConservation_num + Q11_NegativelyImpacted_HumanWildlifeConflict_num + Q12_NegativeImpacted_Training_CapacityBuilding_num 


### Regression part
threat ~ 

# Experience 
Q5Experince_Number +

# Remote location 
Q6__Living_Working_Remotelocation_without_healthcareFacility_Yes +

# RLI 
rli +

# Impact 
impact
'

# Now lets run the SEM with the same
fit_model_full_1 <- lavaan::sem(model = model_threat_full_1,  estimator = "WLSMVS", data=ranger_3, ordered = c("Q44S_IllegalHunting_ForSubsistance_Increased_num", "Q45_IllegalHunting_ForProfit_Increased_num" , "Q46_Illegal_Logging_Increased_num","Q47_IllegalEnchroachment_Increased_num", "Q48_Increase_OtherPressures_NTFP_OnPA_num" ))

RMSEA = 0.046 (95% CI 0.03 - 0.06)
CFI = 0.995
TLI = 0.996.

The fit measures indicate good model fit. Let’s look at the results, focusing just on the regression part. These results conform to our expectations, as discussed below.

#  return the coefficients derived from regression
P_table_regre <- P_table[ which(P_table$PE.op == '~'), ]
P_table_regre[,c(3,11,8)]

##                                                              PE.rhs  PE.std.lv
## 12                                               Q5Experince_Number -0.1184627
## 13 Q6__Living_Working_Remotelocation_without_healthcareFacility_Yes  0.4713914
## 14                                                              rli -0.1698734
## 15                                                           impact  0.3490911
##                PE.pvalue
## 12 0.0129160850631611090
## 13 0.0000021477376701551
## 14 0.0004542515939762737
## 15 0.0000000000003670397

We can also visualise the SEM, as show below (this can be neatened for when it comes to publication, including more informative labels etc.)

Summary discussion of hypothesis

Based on this provisional analysis, we can explore the results in relation to our original hypothesis.

1. Threats to biodiversity are perceived to have increased since the start of COVID-19

The figure under the section Perceived threat changes since COVID-19 suggests a rough balance of those that agreed and those that disagrees with statements about increased risk of illegal activities. However, it appeared a higher proportion believed there was an increase in other pressures. Furthermore, there is likely to be variance in these perceptions within the conservation community, as discussed next.

2. The perceived change in threats to biodiversity are not homogeneous, and vary between individuals and locations

Those with more experience and living in countries with higher RLI scores appear more likely to report lower perceived changes in threat. Conversely, those in remote areas and where who report higher perceived impacts of COVID-19 on management activities are also more likely to report greater perceived threats changes. Regions was excluded in post hoc model adjustment, because it appeared to covary with RLI score.

3. Threats to biodiversity are perceived to have increased in places where COVID-19 is perceived to have disrupted management activities

Those more likely to report higher levels of threat change also appear more likely to report higher levels of impact of COVID-19 on management activities.