2020-05-24
This walkthrough explores two aspects of conservation optimism (CO) among the 2161 people that took part in the Life in Conservation online survey:
Let’s start by looking at the distribution of responses in each CO question.
We can also look at which countries people are thinking about when asked “Which country’s conservation context are you most familiar with?” - national CO.
It appears we have a reasonable spread - although results are clustered in the USA, UK, and India.
Let’s look at differences in national and general CO.
Let’s compare general and national CO. For now, let’s assume the data are numeric (rather than ordinal). The mean of the mean across the ten general CO questions is 2.42. The mean national CO is 2.23. Although there a statistically significant difference between the two, the actual difference is very small (paired two-sided t-test).
We can extend this, to look at the difference in optimism within countries (only showing those countries with >30 responses).
So we’ve looked at differences in the average level of general and national CO, but what about the correlation between the two.
All the CO questions are Likert-scaled, so are ordinal. Furthermore, some previous exploratory factor analysis (see here) suggested the following relationship between the items and general CO.
# The model
model_SO_simple <- '
SO =~ SO_1 + SO_2 + SO_3 +SO_4 + SO_5+ SO_6 + SO_7 + SO_8 + SO_10
# Correlated error terms
SO_1 ~~ SO_2
SO_3 ~~ SO_4
SO_5 ~~ SO_6
SO_5 ~~ SO_8
SO_6 ~~ SO_8
'
# Now lets run the SEM with the same
fit_SO_simple <- lavaan::cfa(model_SO_simple, estimator = "WLSMVS", DF2[,SO_row_name], ordered = c("SO_1" , "SO_2" , "SO_3" , "SO_4", "SO_5", "SO_6", "SO_7", "SO_8", "SO_10") )
Let’s explore the correlation between national CO (ordinal) and general CO (latent variable). We can’t do this in a single step (as far as I’m aware), so we’ll split it into two.
First, we perform confirmatory factor analysis (with the custom correlated error structure described above). We then extract the factor scores (which represents where a respondent sits (estimated) along the range of a latent variable) - which we think represents the latent variable general CO (the model is well fit, with a RMSEA of 0.067 (95% CI 0.06 - 0.08)).
# The model
model_SO_simple <- '
SO =~ SO_1 + SO_2 + SO_3 +SO_4 + SO_5+ SO_6 + SO_7 + SO_8 + SO_10
# Correlated error terms
SO_1 ~~ SO_2
SO_3 ~~ SO_4
SO_5 ~~ SO_6
SO_5 ~~ SO_8
SO_6 ~~ SO_8
'
# Now lets run the SEM with the same
fit_SO_simple <- lavaan::cfa(model_SO_simple, estimator = "WLSMVS", DF2[,SO_row_name], ordered = c("SO_1" , "SO_2" , "SO_3" , "SO_4", "SO_5", "SO_6", "SO_7", "SO_8", "SO_10") )
# Extract the factor scores
Factor_main <-lavPredict(fit_SO_simple, newdata = DF2)
colnames(Factor_main) <- c("General.CO")
# And combine it with the original data
DF2 <- cbind(DF2,Factor_main )
Second, we regress the factor scores (representing general CO) against national CO, within an ordinal regression. First, we run the ordinal regression.
## Waiting for profiling to be done...
OR | 2.5% CI | 97.5% CI | |
---|---|---|---|
General.CO | 7.51 | 6.24 | 9.06 |
Then we want to check that the equal interval assumption is met (this is a really useful walkthrough. There are more diagnostics that we can do, but the most informative is the plot shown below. Simply, we’re looking to see that for each predictor variable (in our case, only general CO), the distance between the levels of the dependent variable (national CO) remain similar. This doesn’t mean that the symbols of the same type have to be vertically aligned. It simply means that the distance between the two symbols looking at each horizontal line are similarly spaced, which in our case they are (nicely).
OK, now we’ve done the diagnostics, we can visualise the relationship between general and national CO.
# Create a dummy data frame which we use to predict levels SO_11 that we can plot
newdat <- data.frame(General.CO = seq(min(DF2$General.CO), max(DF2$General.CO), length.out = 100))
# Predict outcomes
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))
# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("General.CO" ),
variable.name = "variable", value.name="value")
# Rename columns
colnames(lnewdat) <- c("General CO", "Level", "Probability" )
The latent variable general optimism is presented in standardized units. I.e. 1 on the x-axis of the below figure means 1 standard deviation of the latent unmeasured construct. So if we look at 2 on the x-axis, this is someone who has very low general CO. If we look up from there, we can see that they have a high probability of saying “probably won’t” when asked if the conservation goals for their country will be met. Similarly, you those have high general CO are likely to answer “probably will” or “definitely will” for the same national CO question. Neat!
So we’ve explored the relationship between general and national CO. However, when specifying the structure of the relationship between each general CO question and the latent construct, we accounted for the dependencies between the paired questions (each pair supposedly corresponded to on Aichi strategic target). A (less well fit) alternative is to model two-orders of latent variable, shown below:
# The model
model_SO_all <- '
# Target 1
SO.AT1 =~ SO_1 + SO_2
# Target 2
SO.AT2 =~ SO_3 +SO_4
# Target 3 and 4
SO.AT3.4 =~ SO_5+SO_6 + SO_8
# Conservation optimism
SO =~ SO.AT1 + SO.AT2 + SO.AT3.4 + SO_7 + SO_10
'
# Now lets run the SEM with the same
fit_SO_all <- lavaan::sem(model_SO_all, estimator = "WLSMVS", DF2[,SO_row_name], ordered = c("SO_1" , "SO_2" , "SO_3" , "SO_4", "SO_5", "SO_6", "SO_7", "SO_8", "SO_10") )
Here, we assume that:
The RMSEA of this model is 0.068 (95% CI 0.06 - 0.08).
We might then choose to look at the relationship between each of these three ‘sub-factors’ of general CO and national CO. We’ll do that by just extracting three factors, corresponding to the seven constituent questions.
I’m going to proceed with the analysis. However, it is not appropriate, because we only have two items on which to estimate the two latent variables. I’m just do it for ‘fun’.
options(knitr.kable.NA = '') # NA are blank
fa_out <-fa(DF2[, SO_row_name[c(1:6,8)]], nfactors = 3, cor = "poly", fm = "wls", rotate = "oblimin") # run the model with two factors
## Loading required namespace: GPArotation
# loadings as df
fact_ma <- as.data.frame((fa_out$loadings)[1:7,1:3])
fact_ma[,1] <- ifelse(fact_ma[,1] <=.3, NA, fact_ma[,1])
fact_ma[,2] <- ifelse(fact_ma[,2] <=.3, NA, fact_ma[,2])
fact_ma[,3] <- ifelse(fact_ma[,3] <=.3, NA, fact_ma[,3])
# Add row names
lab_1 <- subset(variables_df, Code %in% SO_row_name[c(1:6,8)], select=c("Question"))
rownames(fact_ma) <- paste0(lab_1$Question[c(1:7)], " (", rownames(fact_ma), ")")
fact_ma <- round(fact_ma, 2)
# Present as a nice table
fact_ma %>%
kable(format = "html", table.attr = "style='width:100%;'",
col.names = c("Factor 1", "Factor 2", "Factor 3")) %>%
kable_styling()%>%
row_spec(1:2, background = "#b3e2cd")%>%
row_spec(3:4, background = "#fdcdac")%>%
row_spec(5:7, background = "#cbd5e8")
Factor 1 | Factor 2 | Factor 3 | |
---|---|---|---|
Public support for conservation will grow over the next ten years (SO_1) | 0.47 | ||
Government spending on conservation will grow over the next ten years (SO_2) | 1.01 | ||
The harmful impact of people on nature will be less in ten years’ time than it is now (SO_3) | 0.6 | ||
Human society will be more environmentally sustainable in ten years’ time than it is now (SO_4) | 0.9 | ||
There will be more wildlife in ten years’ time than there is today (SO_5) | 0.93 | ||
There will be more natural areas and habitats in ten years’ time than there are today (SO_6) | 0.75 | ||
Nature will be able to provide the same benefits to people in ten years’ time as now (SO_8) | 0.47 |
# Extract the main factor scores
Factor_main <- fa_out$scores[,1:3]
colnames(Factor_main) <- c("CO.Nature","CO.Sustainability", "CO.Support" )
# And combine it with the other data
DF2 <- cbind(DF2, Factor_main )
# Ordinal regression
ordinal_reg <- polr(SO_11_ord ~ CO.Nature + CO.Sustainability + CO.Support, data = DF2, Hess=TRUE)
# Create dummy DF
newdat <- data.frame(CO.Nature = seq(min(DF2$CO.Nature), max(DF2$CO.Nature), length.out = 100),
CO.Sustainability = rep(mean(DF2$CO.Sustainability),100),
CO.Support = rep(mean(DF2$CO.Support),100))
# Predict outcomes
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))
# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("CO.Nature","CO.Sustainability", "CO.Support" ),
variable.name = "variable", value.name="value")
# Rename columns
colnames(lnewdat) <- c("CO Nature", "CO Sustainability", "CO Support", "Level", "Probability" )
ggplot(lnewdat, aes(x = `CO Nature`, y = Probability, colour = Level)) + geom_line()
# Create dummy DF
newdat <- data.frame(CO.Nature = rep(mean(DF2$CO.Nature),100),
CO.Sustainability = seq(min(DF2$CO.Sustainability), max(DF2$CO.Sustainability), length.out = 100), CO.Support = rep(mean(DF2$CO.Support),100))
# Predict outcomes
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))
# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("CO.Nature","CO.Sustainability", "CO.Support" ),
variable.name = "variable", value.name="value")
# Rename columns
colnames(lnewdat) <- c("CO Nature", "CO Sustainability", "CO Support", "Level", "Probability" )
ggplot(lnewdat, aes(x = `CO Sustainability`, y = Probability, colour = Level)) + geom_line()
# Create dummy DF
newdat <- data.frame(CO.Nature = rep(mean(DF2$CO.Nature),100),
CO.Sustainability= rep(mean(DF2$CO.Sustainability),100), CO.Support = seq(min(DF2$CO.Support), max(DF2$CO.Support), length.out = 100) )
# Predict outcomes
newdat <- cbind(newdat, predict(ordinal_reg, newdat, type = "probs"))
# Put into a new DF
lnewdat <- reshape::melt(newdat, id.vars = c("CO.Nature","CO.Sustainability", "CO.Support" ),
variable.name = "variable", value.name="value")
# Rename columns
colnames(lnewdat) <- c("CO Nature", "CO Sustainability", "CO Support", "Level", "Probability" )
There is a weak association between the sub-factors of general CO and national CO. I suspect this is because most of the sub-factors are estimated from only two items.
Here, I’m just going to look at the associations between general CO and respondents characteristics, using structural equation models (more details on decisions in the SEM process can be found here. Significance codes: 0.001 = ***
, 0.01 = **
, 0.05 = *
,0.1 = .
.
Variable | Estimate | p-value |
---|---|---|
age_year | -0.5489 | *** |
There does not appear to be a significant difference by gender.
Variable | Estimate | p-value |
---|---|---|
gender_Male | 0.0500 | . |
gender_Other | -0.0585 |
The reference level is ‘researcher’.
Variable | Estimate | p-value |
---|---|---|
position_Administration | 0.1032 | |
position_Bachelorsstudent | 0.1409 | |
position_Consultantself_employed | -0.0771 | |
position_Fieldworker | -0.0056 | |
position_Graduatestudent | 0.0736 | |
position_Intern | 0.8533 | *** |
position_Manager | 0.0105 | |
position_Other | 0.1251 | ** |
position_Policymaker | 0.1759 | |
position_Ranger | 0.6103 | *** |
position_Unknown | 0.0615 |
The reference level is ‘desk’.
Variable | Estimate | p-value |
---|---|---|
position_simple | 0.0182 |
Variable | Estimate | p-value |
---|---|---|
years_cons | -0.1951 | *** |
The reference level is ‘college’.
Variable | Estimate | p-value |
---|---|---|
education_University | -0.0350 | |
education_Primary | -0.0218 | |
education_Secondary | -0.2069 | . |
The reference level is non-university.
Variable | Estimate | p-value |
---|---|---|
education_simple | 0.0448 |
OK, lets do the same but looking at predictors of national CO, using ordinal regression (but not bothering to check the model assumptions, and doing it as one multi-variate analysis. The continuous variables, like age, have been scaled. Reference levels are:
Clearly something going a bit awry with ‘educationUnknown’ - probably because there are so few observation, so I need to look into that.
OR | 2.5% CI | 97.5% CI | |
---|---|---|---|
age_year | 0.82 | 0.47 | 1.44 |
genderMale | 1.16 | 0.96 | 1.40 |
genderOther | 0.34 | 0.07 | 1.67 |
genderPrefer not to say | 1.17 | 0.41 | 3.31 |
positionBachelors student | 1.68 | 0.70 | 4.06 |
positionConsultant/self-employed | 0.76 | 0.40 | 1.42 |
positionFieldworker | 0.96 | 0.52 | 1.79 |
positionGraduate student | 0.63 | 0.35 | 1.13 |
positionIntern | 1.71 | 0.65 | 4.48 |
positionManager | 0.71 | 0.40 | 1.25 |
positionOther | 0.78 | 0.43 | 1.41 |
positionPolicymaker | 0.69 | 0.28 | 1.68 |
positionRanger | 1.06 | 0.40 | 2.76 |
positionResearcher | 0.67 | 0.39 | 1.14 |
years_cons | 0.66 | 0.51 | 0.86 |
educationUnknown | 3888603.64 | 3888599.08 | 3888608.20 |
educationPrimary | 0.76 | 0.13 | 4.28 |
educationSecondary | 1.44 | 0.51 | 4.05 |
educationUniversity | 1.13 | 0.70 | 1.84 |