Code
library(DT)
library(readstata13)
df <- read.dta13("20260510_TESTDATA.dta")
df_s <- df[,14:19]
datatable(df_s)Whenever socio-economic researchers conduct quantitative studies using primary data collected through surveys, they face the challenge of how to present the data and the results of the analyses.
One of the challenges, surely, is caused by use of multiple-item scales. It is often not easy to measure the respondent’s attitudes or opinions with one single question (item).
Multiple-item scales are used to increase the accuracy, reliability, and validity of measurements for complex, abstract concepts (like attitudes, emotions, or satisfaction) that a single question cannot fully capture. By averaging multiple items, they minimize random error and provide a more comprehensive, stable, and nuanced representation of the construct.
Key reasons to use multiple-item scales:
Increased Reliability: Multiple items help cancel out random measurement errors (e.g., a participant misinterpreting one question), leading to more consistent results.
Higher Construct Validity: They ensure the full scope of a complex concept is captured, rather than just one narrow aspect.
Reduced Bias: They minimize the risk of unknown, specific biases in wording or interpretation that can heavily influence a single-item measure.
Internal Consistency: Researchers can calculate statistics (e.g., Cronbach’s alpha) to verify that all items are measuring the same underlying concept.
Nuanced Measurement: They allow for measuring multidimensional, complex, or abstract constructs that cannot be summarized in a single query.
You will often read about (abstract) concepts, constructs, variables, and items. These terms may confuse you.
Concepts are broad, abstract ideas or generalizations, such as intelligence or satisfaction, that help us understand phenomena.
Constructs are measurable versions of concepts designed for a particular research study (e.g., measuring intelligence through an IQ test).
Variables or items are the measurable indicators of a construct.
Depending on the field of research, the above terms are interpreted differently. Especially in psychology, researchers tend to speak about variables where economists mean constructs. And it’s easy to see that concepts and constructs are not that different, both being abstract in nature.
Luckily, (applied) researchers often do not have to reinvent the wheel. For many concepts and constructs, attempts have been made by fellow researchers in the past. It is good practice that in publications, researchers explain the methodology and the logic of the way they have operationalized the concepts used in their research. In doing so, they often use the terms validity and reliability.
Validity measures how well a tool measures what it is intended to measure. Reliability measures consistency whether a tool produces the same results over repeated trials. Reliability is a precondition, in the sense that an unreliable cannot be valid. That said, reliability does not imply validity.
Developing, adapting, and even using existing questionnaires unaltered, is a bit of an art. A summary of things to keep in mind and think about:
There’s an interesting analogy, here. Suppose, in a high-jump competition, the contenders are asked to jump over bars at 150, 155 and 160 centimeters. The results are bound to be correlated: persons who have personal records around 120 centimeters will have three zeroes, while the high-end contenders will easily succeed three times. The reliability of your competition (a test) will be high in the 150 to 160 centimeters range, but at the low and high ends of the scale, you have little information to distinguish between the contenders. In reverse, highly correlated items may be meaningless. And, in consequence, the same holds true for traditional reliability measures like Cronbach’s alpha which are based on correlations between items!
After taking into account the above issues and pitfalls, you have collected a lot of data, and now you can proceed to analyze the data and answer your research questions.
There are a couple of steps to follow:
Start with a descriptive analysis of your data. What are the minimum, maximum, and mean scores for all of the items in your questionnaire? Are there differences between relevant groups? Visualizations are helpful.
The items are not that interesting by themselves. They derive their relevance from being part of a set of items that jointly measure a construct. But is each set of items truly measuring that construct (and not some other construct)? That is, the scores have to be uni-dimensional, and unrelated to other constructs.
Once we have determined which items are indicators of our constructs, we can proceed to use the average of these items as the score on our construct.
Once we have scores for all constructs, we can use a regression-like technique to test our cause-and-effect model.
Let’s use the conceptual model of one of our DBA-candidates as an example.
The (draft) questionnaire has no less than 48 items. From the diagram, I would conclude that the (12) items related to social factors can be grouped in various dimensions, like balancing work and family duties and cultural and societal attitudes.
If these items are derived from existing models and questionnaires, then the expectation would be that (i) items 6-8 are relatively strongly correlated, and items 10-12 too, and (ii) that the correlations between the items in the former group with those in the latter group are lower. That is, these two groups of items are measuring different things.
If these items have been developed from scratch, that is, this research is novel, then we would use Exploratory Factor Analysis (EFA) to detect the dimensions in the data. And since the candidate has pre-grouped the items, we would hope and expect that these dimensions coincide with these pre-determined dimensions (or groups, or factors).
Quantitative research is normally deductive, and (inductive) theory building using large-scale standardized surveys is uncommon. Whenever a candidate is using quantitative approach, we expect that theories and models exist, and that these theories have been tested using empirical studies that have made use of structured questionnaires. We would expect the candidate to have searched for such questionnaires in the literature.
Given the existence of theories and related data collection instruments, EFA does not make little sense: the exploration has been done, and we can limit ourselves to Confirmatory Factor Analysis (CFA) which is an entirely different technique!
CFA takes the concepts and constructs of the theory as a starting point, and checks if the data collected is in accordance with the theory. Practically speaking, the candidate could apply a CFA to the blocks of items (items 6-9; items 10-12; and all other blocks), and test whether (i) items load on the blocks they belong to, and (ii) do not load on the blocks they do not belong to.
The candidate has not collected data, yet. But for the sake of the example, let’s generate some data. We have generated data on 6 items belonging to the social factors; 4 items on economic factors; and 3 items of the dependent variable (competitiveness). For simplicity, we assume that there’s only one social factor and one economic factor. The items are part of uni-dimensional scales. We have generated data for 300 respondents.
As we often tell our students: don’t plagiarize, because we will catch you! If you cannot resist the temptation, then the safest way is to fake the data! It is easy, and there are some powerful tools to create a data set based on supposedly correlated data. That’s what we have done here. You can use these tools to replicate analyses if you have only summary statistics (means; standard deviations; correlations) at your disposal. Or, like here, simulate data to check if techniques are able to reveal the structures you have put in. Don’t use these tools for faking data, you may get caught!
OK, what did we do? We generated data for social and economic items. We did it in such a way that two social items are strongly correlated with one another, but moderately with the other social items. In addition, one economic item is lowly correlated with any other social or economic item.
Let’s go through the motions!
Descriptive information on individual items can be quite telling by itself, even though at the end of the day it will be absorbed in the scores on the constructs. But to have an idea of how many people agree or disagree with the questions asked in the questionnaire, is inspiring.
For that reason, avoid presenting it in dull tables or even annexes. Especially the ggplot package in R, and many extensions of ggplot, offer a wide range of well-documented publication-quality visualizations that make your thesis look better.
First, you can browse through the data. Below, we have read the full data set into R, before selecting and presenting the 6 social items.
library(DT)
library(readstata13)
df <- read.dta13("20260510_TESTDATA.dta")
df_s <- df[,14:19]
datatable(df_s)After some data preparations, like labeling the data, we have used the gglikert() function of the ggstats package to show the distribution of answers in our faked data set (note that we have assumed a 7-point answering scale, instead of the 5-point scale suggested by the candidate).
The code is self-explanatory. The items are sorted in ascending order of endorsement. The gglikert() function computes the proportion of positive and negative answers. You can choose from a range of color palettes.
library(ggstats)
gglikert(df_s, sort = "ascending") + scale_fill_likert(pal = scales::brewer_pal(palette = "PRGn")) As explained, it does not make sense, to explore the dimensions in the data using EFA. The theoretical dimensions are derived from the literature, and we can directly move on to CFA.
Still, some researchers do use EFA, especially if there are a lot of items, many of which have been developed for the study at hand.
We have generated the data in such a way that the 6 social items have 2 dimensions, and that the 4 economic items form a uni-dimensional scale with the exception of one item that does not correlate with any other items.
Note that we have excluded the competitiveness items. We usually treat the dependent variable separately, since our model suggests a causal relation (and hence, a more or less strong correlation) between the independent variables to the dependent variable!
We have done the EFA in STATA, but it can be done in R, or JASP, or any other software.
EFA is not an easy technique. Once you have the data, you’re only a few clicks away from getting results, but even though there are a lot of guidelines on how to perform an EFA and how to interpret the results, it is seldom the case that two analysts working on the same data set will come to the same result. The reason is that a lot of decisions have to be taken in the process. Among these decisions:
The type of factor or principal component analysis to use.
The number of factors to retrieve.
The minimum value of factor loadings to keep an item.
The type of rotation of factors.
It is strongly recommended to follow the workflows recommended in journal articles, and to discuss the results with an expert. Still, even when using the commonly recommended workflows which can be found all over the Internet and on YouTube, even inexperienced will come to pretty robust results.
Reporting on an EFA should follow the standards of, for example, the APA.
We will not dwell on the issue too long here, and stick to showing the final output which indeed reveals the structure of the data that we put in:
We have extracted three factors. Items s1 to s4 make up a social factor, and items s5 and s6 make up a second social factor. Items e2 to e4 form our economic factor, while item e1 does not load on any of the three extracted factors. Actually, the blank cells do contain numbers, but for clarity \(loadings<.4\) are not displayed.
After EFA, it is almost redundant to do a reliability analysis. It is usually the case that items with high loadings on one and the same factor, form a reliable scale. For reliability, or better, internal consistency, a commonly used measure is Cronbach’s ⍺ (alpha).
Some researchers would actually start out with a reliability analysis, and use it for taking a decision on the items to be retained. Since we have argued that EFA is kind of redundant for typical quantitative studies anyway, we would consider this an acceptable approach. Nevertheless, EFA and CFA add value since they also take into consideration validity. To be more specific: a reliable item is assumed not be related to other constructs. In short, our approach would be to perform CFA, and if needed followed by a reliability check (mainly because it is so common to report Cronbach’s ⍺; reliability can be deduced directly from the CFA results).
The reliability analysis for the social factors shows that the Cronbach’s ⍺ as a measure of internal consistency is .82. A value of .70 is considered acceptable, and .80 as good. Based on these results, we could continue with one scale consisting of 6 items, for social factors. The last column show what would happen to Cronbach’s ⍺ if the item would be dropped. It turns out that the Cronbach’s ⍺ would hardly change, and from that perspective, there’s no reason to drop the items. The column item-rest correlation, as a rule of thumb, has to exceed .30, so again there’s no reason to get worried.
For the economic factor, the picture is different. The Cronbach’s ⍺ of .74 is acceptable, but dropping e1 would increase the value of ⍺ substantially, to .84. Moreover, the item-rest correlation of the first item is unacceptably low. Therefore, we will drop the item from the scale.
Confirmatory Factor Analysis (CFA) is very different from EFA. CFA is one of the types of analyses that fall under the umbrella of Structural Equation Modeling (SEM). SEM, over the years, has become the standard to be used in publishing the results of quantitative studies in leading academic journals. SEM captures a variety of commonly used models, including (complex) regression models that contain mediating variables and/or models that contain causal paths and measurement components. Although SEM is complex conceptually and mathematically, using SEM is surprisingly easy. In STATA, for example, the user only has to draw the model via an easy-to-use Graphical Unit Interface, and press a button to estimate the model. In R, SEM can be performed using the lavaan package which is very well documented.
For our data, we have used STATA. After drawing the model, we have estimated the coefficients. After estimating a base model, we can ask which modifications would significantly improve the model fit.
In the figure above, we have added a correlation between the error terms of items s5 and s6. This was done after inspecting ways to improve the model. The goodness-of-fit statistics specific to SEM models were very good. Adding such terms to the model indicates that the related variables are different from the other items related to the factor (or latent variable, in SEM terms) labeled SOC.
In our usual approach, we would start with a CFA. We would then add a reliability analysis, and use the (related) outcomes of both analyses to justify the items to keep as measurements of our constructs.
After the above preparations, we are ready for our final analysis!
But: a lot of what was done in the preparations, is of (academic) value. Your research, the items - newly developed or adapted - in your questionnaire, the correlation between the items, the validity and reliability of the scales, and so on and so forth, are invaluable input for next researchers of the same topic.
Of course, especially if your research is applied and policy oriented, your focus will be on the estimated model and the test of hypotheses.
In the last step, model estimation, our candidate probably is interested in the effects of social and economic factors on competitiveness, and on policy implications. In this case, we can take a simple, traditional approach, or a more advanced approach.
In the simple approach, we simply compute a score on the two (or three!?) factors. If we do the same for competitiveness, we can run a (simple, linear) regression model.
\(COMP = \beta _{0} + \beta_{1}*SOC + \beta_{2}*ECO+ \varepsilon\)
In this formula, SOC is the mean of, in this example, the 6 items measuring the social factors. Depending on your interpretation of the preparatory analyses, you can split SOC into two factors, measured by 4 and 2 items respectively. Whichever decision yo take, a regression with 2 or 3 independent variables is easy to do.
A SEM model is a hybrid of latent variables (factors) and observed variables (measurements; our items). To the measurement part, we can add a structural part, with relations between the latent variables.
In the diagram below, we have deleted the first item on economic factors; retained all six social items as measurements of a uni-dimensional social scale; and added a latent variable and its measurement on the dependent variable (competitiveness).
This approach is considered superior to the traditional approach, because the latent variables are considered to be free of measurement error. This is because the measurement errors are modeled explicitly, in contrast to the traditional approach where measurement errors are fully absorbed in the sum score (the simple average of all items measuring the construct). A disadvantage is that estimation of complex models with a structural and measurement component might fail.