“Obviously the best way to treat missing data is not to have them.” – Orchard and Woodbury (1972, 697)
In an effort to add additional theoretical and practical clarity to the class, a number of days will include brief simulations demonstrating important concepts or tools. Today we will cover missing data and selection. As we will see, there are only particular conditions in which missingness matters for your estimates. We will also see that multilevel structure matters for common solutions to selection problems. As with just about everything else, techniques for dealing with missing data and selection issues are model dependent and hence sensitive to specification (c.f. Thoemmes and Rose (2014)).
Of particular note is that, just like with model specification as discussed last time, there is no one-size-fits-all methodological guidance to be had. Ways for dealing with missing data and selection valid under particular conditions which depend on the underlying structural model in mind. See, for example, the previous citation which demondstrates the conditional independence assumptions invoked for various missing data mechanisms and their implications for which variables to include in, for example, an imputation proceedure. Just like with the older literature which led to “garbage can” regressions, the advice for multiple ampitation had largely been the same – throw everything in. This can, of course, lead to amplified bias when the conditional-independence-through-control approach fails.
For additional guidance on implimentations of multiple imputation, I suggest taking a look at this free book or this helpful review. For a nice overview of underlying theory I’d take a look at this slightly out-dated article, the classic book by Rubin, his initial formalization (Rubin 1977), or the review with discussions (Rubin 1996).
Worth note is that, despite decades of research, the appropriate way of dealing with missing data remains an active area of research. For example, it was only recently that multiple amputation – imposing the desired complex multivariate missingness structure on simulated data – was developed (see Schouten and Vink 2018) and only about a decade ago that political scientists started using more complex imputation techniques for structured data (i.e. Honaker and King 2010). There is still much to be learned and done in this area! If you’re interested in checking out the bleeding edge I suggest taking a look at Mohan and Pearl (forthcoming), Cham et al. (2017), and Gutman and Rubin (2015), the latter of which treats causal effect estimation as the missing data problem that it is.
First, let’s define terms.
Missing Completely at Random: Abbreviated to MCAR. Means exactly what it says. Whether or not an observation is missing is unrelated to any values in the data set, missing or observed. In other words, the probability of an observation being observed is the same for all observations. It can be helpful to recognize that if we take a simple random sample from a population that the un-sampled observations are MCAR. This should hint to you that with messy observational data such an assumption is generally unrealistically strong.
Missing at Random: Abbreviated to MAR. Does not mean exactly what it says, which can generate confusion. MAR means that the missingness is not related to the missing values themselves, but that it is related to the values of some other variable. Sometimes people say that this should be called “Missing Conditionally at Random,” since conditional on the observable generating the missingness it would be completely random, but MCAR is already taken. The critical idea is that the missingness mechanism can be accounted for with observables. Much like a simple random sample is related to MCAR, a block design with block indicators always observed would be MAR. When not induced by the researcher, as in an experiment, MAR is an assumption which cannot be verified. Some have argued that when two or more variables have missing data that the MAR assumption becomes dubious, largely because of how surprisingly difficult it is to simulate!Note that MAR is closely related with an ignorable missing data mechanism (see Little and Rubin (2002, 119)). Importantly, ingorable does not mean that we can simply ignore the missing data. Rather, we need to condition on those factors that influence the missing data rate.
Missing Not at Random: Abbreviated to MNAR. This means that there is a relationship between the the probability of a value being missing and the missing value itself. Also known as nonignorable nonresponse.
Here is a simple example to distinguish the above missingness mechanisms. Suppose that we are collecting information on bodyweight. If everyone is equally likely to disclose their weight then any missingness would be MCAR. If political methodologists are systematically less likely to disclose their weight (perhaps because political methodologists enjoy inducing missingness in epidemological data), then the missingness would be MAR. If, on the other hand, all individuals are less likely to disclose their weight because they are overweight then the missingness would be NMAR. It can be helpful to think of missingness in terms of sample design, as noted above, since there is a direct relationship between missingness mechanisms and sampling mechanisms. See Westreich (2012) for examples and a discussion of the relationship between Berkson’s bias, selection bias, and missing data.
Of critical importance – MAR cannot be distinguished from MNAR with observables. As recently put by Ogbum et al. 2019 on a related topic “no fact about the observed data alone can be informative about ignorability, since ignorability is compatible with any observed data distribution … [there is no] checkable approach to causal inference.”
For precise mathematical definitions and additional discussion see Mealli and Rubin (2015) or Seaman et al. (2013).
Much like omitted variable bias, missing data and selection have generated contradictory advice over the years. Some recent research, like Lall (2016) argues that “political scientists increasingly recognize that multiple imputation represents as superior strategy for analyzing missing data to the widely used method of listwise deletion” and that “multiple imputation is always more efficient than listwise deletion, even in this worst-case scenario it is still the preferable strategy.”
Perhaps it is correct that this has been the attitude of political scientists as they learn about multiple imputation, but there is hardly a case to be made for blind or universal application of canned imputation method – there is no panacea. As demonstrated by Arel-Bundock and Pelc (2018), the complete case estimator (listwise deletion) can be unbiased even if data are not missing completely at random. Bartlett et al. (2014) demonstrate virtually the same thing; namely that analyzing complete cases gives valid inferenced provided that the probability of being a complete case is independent of the outcome in the model of interest. Pepinsky (2018) similarly points out that bias occurs only when there is MAR/MNAR in the outcome and that multiple imputation might be biased when missingness in the right-hand-side is MNAR and is generally biased when the outcome is MNAR. White and Carlin (2010) reference missing data mechanisms under which multiple imputation is biased and analyzing complete cases avoids bias; “these mechanisms are perhaps overlooked because they do not correspond to the MCAR, MAR, or MNAR categories, but cut across this classification.” As noted above, this is an area in which there is much work yet to be done.
For our purposes we will focus on relatively simple cases to clearly illustrate a number of key points. If you’re interested in applying multiple imputation to multilevel data I suggest reading this book chapter and Grund et al. (2017). For a bit more theory you can take a look at these slides.
Rather than thinking about missingness within a particular variable, we will instead focus on whether or not the row is removed in listwise deletion. That is, rather than thinking about missingness having something to do with the columns of your data set we will think of the problem in terms of the rows. We can do this because with listwise deletion it doesn’t matter where the missingness (on the x or y variable) so much as why there is missingness within the observation at all. From this view, the distinction between MAR and MNAR more or less reduces to an omitted variable problem – it is MAR when you have observables to render it such whereas it is MNAR when you don’t have such data available.
Let’s start by generating some data.
set.seed(8675309)
nobs <- 1000
x <- rnorm(nobs)
y <- x + rnorm(nobs)
d <- data.frame(y,x)
# Missing Completely at Random
d$mcar <- rbinom(nobs,1,0.7)
# Missing Induced by X
d$mx <- rbinom(nobs,1,1/(1+exp(-1-2*d$x)))
# Missing Induced by Y
d$my <- rbinom(nobs,1,1/(1+exp(-1-2*d$y)))
head(d)
## y x mcar mx my
## 1 -1.2639956 -0.9965824 0 0 0
## 2 -0.5413684 0.7218241 0 1 1
## 3 -1.7872832 -0.6172088 0 0 0
## 4 2.7307071 2.0293916 1 1 1
## 5 2.6972383 1.0654161 0 0 1
## 6 0.7528768 0.9872197 1 1 1
What we have produced is an outcome and a covariate as well as indicators we will call complete case dummies indicating whether or not we will retain that case in the analysis. Let’s run the models:
library(texreg)
full <- lm(y~x, data=d)
mcar <- lm(y~x, data=d[d$mcar == 1,])
misx <- lm(y~x, data=d[d$mx == 1,])
misy <- lm(y~x, data=d[d$my == 1,])
htmlreg(list(full,mcar,misx,misy),doctype = F,caption = "")
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | 0.02 | 0.00 | -0.01 | 0.49*** | |
| (0.03) | (0.04) | (0.05) | (0.04) | ||
| x | 1.03*** | 1.01*** | 1.05*** | 0.76*** | |
| (0.03) | (0.04) | (0.05) | (0.04) | ||
| R2 | 0.51 | 0.50 | 0.41 | 0.34 | |
| Adj. R2 | 0.51 | 0.50 | 0.41 | 0.34 | |
| Num. obs. | 1000 | 697 | 659 | 612 | |
| RMSE | 1.03 | 1.04 | 1.03 | 0.92 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||
Good enough for government work. The first three models are all on target while the last model is biased. This is because missing data causes bias when it is due to the outcome, not necessarily because of the outcome. If the data is MAR this means that you have observables available to achieve a conditional independence condition. To know if you have done this with any particular specification you need knowledge of the DAG, a la Pearl. If, instead, you lack either the knowledge or the data then the data is MNAR. This leads us to a closely related question of selection which both clarifies the cause of the bias as well as potential solutions and pitfalls.
To understand why this is the pattern of bias, let’s think about selection problems in detail and remind ourselves how regression works. Let’s simulate a few fewer points. First, let’s remember what we are doing.
nobs <- 250
x <- rnorm(nobs)
y <- rnorm(nobs,x)
d <- data.frame(y,x)
x_on_y <- lm(x ~ y, data = d)
y_on_x <- lm(y ~ x, data = d)
htmlreg(list(y_on_x,x_on_y), doctype = F, caption = "")
| Model 1 | Model 2 | ||
|---|---|---|---|
| (Intercept) | 0.04 | -0.04 | |
| (0.06) | (0.04) | ||
| x | 1.05*** | ||
| (0.06) | |||
| y | 0.50*** | ||
| (0.03) | |||
| R2 | 0.52 | 0.52 | |
| Adj. R2 | 0.52 | 0.52 | |
| Num. obs. | 250 | 250 | |
| RMSE | 0.97 | 0.67 | |
| p < 0.001, p < 0.01, p < 0.05 | |||
The regression of y on x is different than the regression of x on y because of the residual score minimized. For a glance at mathematical details check out the answers to this post. For our purposes is it important to think of the effect of this for the fitted regression line. Again, let’s induce a bit of missingness with our data and try to visualize what happens by imposing MNAR propper.
d$mnarx <- ifelse(d$x > -1, 1, 0)
d$mnary <- ifelse(d$y > -1, 1, 0)
unbiased <- lm(y ~ x, data = d[d$mnarx == 1,])
biased <- lm(y ~ x, data = d[d$mnary == 1,])
rev1 <- lm(x ~ y, data = d[d$mnarx == 1,])
rev2 <- lm(x ~ y, data = d[d$mnary == 1,])
htmlreg(list(biased,unbiased,rev1,rev2),doctype = F,caption="")
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | 0.38*** | 0.05 | 0.15*** | -0.03 | |
| (0.06) | (0.07) | (0.04) | (0.05) | ||
| x | 0.74*** | 1.03*** | |||
| (0.07) | (0.09) | ||||
| y | 0.37*** | 0.49*** | |||
| (0.03) | (0.05) | ||||
| R2 | 0.36 | 0.38 | 0.38 | 0.36 | |
| Adj. R2 | 0.36 | 0.37 | 0.37 | 0.36 | |
| Num. obs. | 193 | 207 | 207 | 193 | |
| RMSE | 0.81 | 0.98 | 0.59 | 0.66 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||
Again, we have one instance of bias and one instance of no bias followed by the “reversed” regressions. Compare to the regressions in the previous table which have the full data. We can see that in the regression of y on x, missingness induced by the y variable changes produces bias in the estimates whereas missingness induced by x does not. Likewise, in the “reversed” regressions we have the same phenomenon but switching “y” to “x” and vice versa in the previous sentence.
Let’s take a closer look at what is going on.
library(ggplot2)
library(gridExtra)
ggplot(d,aes(x=x,y=y, color= as.factor(mnary))) +
geom_point() +
geom_smooth(method="lm") +
ggtitle("Regression of y on x, Missing via y") +
labs(color="Observed?")-> g1
ggplot(d,aes(x=x,y=y, color= as.factor(mnarx))) +
geom_point() +
geom_smooth(method="lm") +
ggtitle("Regression of y on x, Missing via x") +
labs(color="Observed?") -> g2
grid.arrange(g1,g2,nrow=2)
In the above, all of the points would be included in the “full” specification. We can think of there existing subpopulations defined by observability. In each plot, blue indicates the observed subpopulation while red indicates the unobserved subpopulation. In the first plot we can see where the bias is coming from – there are different estimated relationships between the observed and unobserved subpopulations. Now imagine how including the red observations in the blue regression would change the slope of the line. We can see that they are generally below and to the left of that regression line and so would “pull” the regression line in that direction, increasing the slope.
By contrast, when missingness is induced by x, the unobserved subpopulation is merely further to the left; because the estimated relationship within the unobserved subpopulation effectively connects smoothly to the estimated relationship in the observed subpopulation we can see why missingness of this sort doesn’t lead to bias. Conditioning on the complete case is fine because the estimate is conditionally independent from the strata.
This sort of bias is importantly related to selection models.
Let’s start by defining a function to generate data.
# cor: selection parameter
# n_opg: number of observations per group
# n_groups: number of groups
# group_intercept: group intercepts
# group_means: group means of X
# group_slopes: group slopes
# NOTE: All multilevel structure on selection equation
heck_data <- function(cor,n_opg,n_groups = 1,
group_intercept=rep(0,n_groups),
group_means=rep(0,n_groups),
group_slopes=rep(1,n_groups)){
require(MASS)
error_cor <- matrix(c(1,cor,cor,1),ncol=2)
errors <- mvrnorm(n_opg*n_groups,
mu=rep(0,2),Sigma=error_cor,empirical=T)
tmp <- list()
for(i in 1:n_groups){
x1 <- rnorm(n_opg,group_means[i],1)
s <- (group_intercept[i] + group_slopes[i]*x1 + errors[,1])>0
group <- rep(i,n_opg)
tmp[[i]] <- cbind(x1,s,group)
}
data <- data.frame(do.call(rbind,tmp))
data$y_full <- data$x1 + errors[,2]
data$y_sele <- with(data,ifelse(s==TRUE,y_full,NA))
data
}
Let’s first look at some traditional self selection data.
set.seed(8675309)
dat <- heck_data(cor=0.6,
n_opg=2000)
summary(dat)
## x1 s group y_full
## Min. :-3.45888 Min. :0.0000 Min. :1 Min. :-4.525356
## 1st Qu.:-0.65074 1st Qu.:0.0000 1st Qu.:1 1st Qu.:-0.963561
## Median : 0.02812 Median :1.0000 Median :1 Median :-0.007173
## Mean : 0.02990 Mean :0.5125 Mean :1 Mean : 0.029900
## 3rd Qu.: 0.69139 3rd Qu.:1.0000 3rd Qu.:1 3rd Qu.: 1.026509
## Max. : 2.84298 Max. :1.0000 Max. :1 Max. : 4.749384
##
## y_sele
## Min. :-2.1431
## 1st Qu.: 0.1613
## Median : 0.9458
## Mean : 0.9558
## 3rd Qu.: 1.6613
## Max. : 4.7494
## NA's :975
Let’s estimate a few models:
# Full model, as if we observed everything
m1 <- lm(y_full ~ x1,data=dat)
# Model with selection bias
m2 <-lm(y_sele ~ x1,data=dat)
# Heckman first stage
first_stage <- glm(s~x1,data=dat,family=binomial(link="probit"))
first_stage_lp <- first_stage$linear.predictors
dat$imr <- dnorm(first_stage_lp)/pnorm(first_stage_lp)
# Heckman second stage
second_stage <- lm(y_sele ~ x1 + imr, data=dat)
htmlreg(list(m1,m2,first_stage,second_stage),doctype = F, caption = "",
custom.model.names = c("Full","Selection","First","Second"))
| Full | Selection | First | Second | ||
|---|---|---|---|---|---|
| (Intercept) | -0.00 | 0.53*** | 0.02 | 0.02 | |
| (0.02) | (0.04) | (0.03) | (0.19) | ||
| x1 | 1.03*** | 0.70*** | 1.09*** | 1.01*** | |
| (0.02) | (0.04) | (0.05) | (0.12) | ||
| imr | 0.63** | ||||
| (0.23) | |||||
| R2 | 0.51 | 0.28 | 0.29 | ||
| Adj. R2 | 0.51 | 0.28 | 0.28 | ||
| Num. obs. | 2000 | 1025 | 2000 | 1025 | |
| RMSE | 1.00 | 0.92 | 0.92 | ||
| AIC | 1927.33 | ||||
| BIC | 1938.53 | ||||
| Log Likelihood | -961.66 | ||||
| Deviance | 1923.33 | ||||
| p < 0.001, p < 0.01, p < 0.05 | |||||
The Heckman selection model is a control function estimator; a new variable – here the inverse Mill’s ratio – is calculated from the first stage and included in the second stage to purge the regressor error dependency.
Unsurprisingly, the first model produced an estimate close to the truth. When selection is not taken into account, OLS produces biased estimates. The first stage equation is perhaps not of much interest on its own – we were able to estimate the effect of x1 on selection. More importantly, when the inverse Mill’s ratio was included in the second stage regression the effect of x1 on y is close to the truth. Furthermore, we find evidence for selection with the statistical significance of the control function – this is effectively a regression based Hausman test.
Let’s take a look at what happened:
with(dat,plot(x1,y_full,col="grey"))
abline(a=coef(m1)[1],b=coef(m1)[2],lwd=4,col="grey")
with(dat,points(x1,y_sele,col="black"))
abline(a=coef(m2)[1],b=coef(m2)[2],lwd=4,col="black")
abline(a=coef(second_stage)[1],b=coef(second_stage)[2],lwd=4,col="blue")
legend("topleft",c("Full Sample","Selected Sample","Heckman Adjustment"),fill=c("grey","black","blue"))
Just like above, it is the pattern of the missingness in it’s relationship with the outcome which produces the bias.
Let’s write a little simulation function to estimate a bunch of models to look at the consequences of Multilevel structure on the estimates. We will estimate the first stage with both fixed and random effects to compare.
heck_sim <- function(cor,n_opg,nsims,n_groups=1,
group_intercept=rep(0,n_groups),
group_means=rep(0,n_groups),
group_slopes=rep(1,n_groups),
re=F,fe=F){
vanilla_sims <- list()
correct_sims <- list()
re_sims <- list()
fe_sims <- list()
for(i in 1:nsims){
d <- heck_data(cor,n_opg,n_groups,group_intercept,group_means,group_slopes)
#Without Correction
vanilla <- lm(y_sele~x1,data=d)
#Traditional Heckman Selection
first_stage <- glm(s~x1,data=d,family=binomial(link="probit"))
first_stage_lp <- first_stage$linear.predictors
d$IMR <- dnorm(first_stage_lp)/pnorm(first_stage_lp)
second_stage <- lm(y_sele~x1+IMR,data=d)
vanilla_sims[[i]] <- coefficients(vanilla)
correct_sims[[i]] <- coefficients(second_stage)
if (re == T){
require(lme4)
#Random Effect Heckman. NOTE: Fit with PIRLS for speed
first_st_re <- glmer(s~(x1|group),data=d,family=binomial(link="probit"),nAGQ=0)
first_st_re_lp <- predict(first_st_re,type="link")
d$IMR_re <- dnorm(first_st_re_lp)/pnorm(first_st_re_lp)
second_stage_re <- lm(y_sele~x1+IMR_re,data=d)
re_sims[[i]] <- coefficients(second_stage_re)
}
if (fe == T){
#Fixed Effect Heckman
first_st_fe <- glm(s~x1*as.factor(group),data=d,family=binomial(link="probit"))
first_st_fe_lp <- predict(first_st_fe,type="link")
d$IMR_fe <- dnorm(first_st_fe_lp)/pnorm(first_st_fe_lp)
second_stage_fe <- lm(y_sele~x1+IMR_fe,data=d)
fe_sims[[i]] <- coefficients(second_stage_fe)
}
print(paste0(i/nsims," done"))
flush.console()
}
vanilla_out <- data.frame(cor,do.call(rbind,vanilla_sims))
correct_out <- data.frame(cor,do.call(rbind,correct_sims))
if(re==T){
re_out <- data.frame(cor,do.call(rbind,re_sims))
}
if(fe==T){
fe_out <- data.frame(cor,do.call(rbind,fe_sims))
}
if(re==T & fe == T){
out <- list(vanilla = vanilla_out,
heckman = correct_out,
random = re_out,
fixed = fe_out)
}
if(re==T & fe == F){
out <- list(vanilla = vanilla_out,
heckman = correct_out,
random = re_out)
}
if(re==F & fe == T){
out <- list(vanilla = vanilla_out,
heckman = correct_out,
fixed = fe_out)
}
if(re==F & fe == F){
out <- list(vanilla = vanilla_out,
heckman = correct_out)
}
out
}
Let’s run a ton of models in a proper Monte Carlo experiment to look at the behavior of the basic Heckman selection model.
set.seed(123)
library(pbapply)
library(parallel)
clust <- makeCluster(detectCores())
clusterExport(clust,c("heck_sim","heck_data"))
cors <- seq(-.99,.99,length=50)
sims <- pblapply(cors,function(x)heck_sim(x,2000,50),cl=clust)
all_vanilla <- do.call(rbind,lapply(sims,"[[","vanilla"))
all_heckman <- do.call(rbind,lapply(sims,"[[","heckman"))
with(all_heckman,plot(cor,x1,ylim=c(0,2),col="blue",pch = 19,
xlab="Selection Parameter",ylab="Coefficient Estimate"))
with(all_vanilla,points(cor,x1,col="red",pch = 19))
legend("bottomleft",c("Heckman","OLS"),fill=c("blue","red"))
There are two features particularly worth noting. First, OLS is unbiased only when there is no selection. The Heckman model, under these conditions, is unbiased but has a higher variance. This follows from being a two-stage proceedure and is why the standard errors in the table above are incorrect – my suggestion is to bootstrap everything for inference!
But what happens when there is multilevel structure? Let’s turn on random slopes, intercepts, and group means in the first stage equation. For simplicity we will estimate fixed rather than random effects.
clust <- makeCluster(detectCores())
clusterExport(clust,c("heck_sim","heck_data"))
cors <- seq(-.99,.99,length=25)
big_sims <- pblapply(cors,function(x)heck_sim(x,n_opg=100,
n_groups=10,
group_intercept = seq(-1,1,length=10),
group_means = seq(-1,1,length=10),
group_slopes = seq(-1,1,length=10),
nsims=25,
re=F,fe=T),cl=clust)
all_vanilla <- do.call(rbind,lapply(big_sims,"[[","vanilla"))
all_heckman <- do.call(rbind,lapply(big_sims,"[[","heckman"))
all_fe <- do.call(rbind,lapply(big_sims,"[[","fixed"))
#Traditional Heckman, fixed effects, and vanilla
with(all_heckman,plot(cor,x1,col="blue",pch = 19))
with(all_fe,points(cor,x1,col="green",pch = 19))
with(all_vanilla,points(cor,x1,col="red",pch = 19))
legend("bottomleft",c("Heckman","Fixed Effects Heck","OLS"),fill=c("blue","green","red"))
Woah! In the blue we have the estimates from the vanilla Heckman model which ignored multilevel structure. Not only is it wildly wrong, it is wildly wrong in a way which is worse than OLS! This is, generally speaking, a common problem with models offering de-biased estimates. If the assumptions underlying the method are violated then the “cure” can be far worse than the disease!
Let’s zoom in to compare the fixed effects estimates to the OLS estimates:
with(all_vanilla,plot(cor,x1,col="red",pch = 19))
with(all_fe,points(cor,x1,col="green",pch = 19))
legend("bottomleft",c("Fixed Effects Heck","OLS"),fill=c("green","red"))
Beautiful. When we capture the multilevel structure of the data we are able to arrive at unbiased estimates of the effect of interest. OLS is still biased, but compared to an inappropriately specified Heckman selection model it is great.
Moral of the story – if you see a paper with results that are widely different between OLS and Heckman, instrumental variables, or what have you don’t automatically believe them about them having de-biased the estimate! If it was performed in a setting with structured data and that structure wasn’t respected, they could easily be wildly far from the truth.
As mentioned at the get-go, this is an area of active research and there remains much to be done. One growing approach is the use of copula based methods for accounting for non-random sample selection, the classic Heckman selection model being a special case c.f. Wojtys et al. 2018. There are also hybrid approaches, like Heckman imputation models Galimard et al. 2018 some of which are designed for use in multilevel frameworks Hammon and Zinn 2020. Paired with the theoretical research mentioned above, this is likely to continue to be a vibrant field of research.
The best way to learn R is to use R, and the best way to really understand methods is to poke around at them either with the mathematical theory or simulations. I suggest considering the following questions: