DATA ANALYSIS EVALUATION OF THE FEASIBILITY OF ADOPTING THE METHODOLOGY
INTRODUCTION
As this is a study in the field of aeronautical innovation and technological development, specifically focused on validating the innovation management methodology, the study will be applied to six research groups that generate technological development products within the Colombian National Army military institution. The researchers participating in this study are located in various parts of the country, necessitating the use of digital tools for data collection.
This report aims to provide a holistic analysis of the data collected in a survey of six research groups to identify relevant patterns, relationships, and segmentations that will allow us to determine and evaluate the satisfaction and usability of the technological innovation management methodology proposed for the defense sector.
Objective
This report presents a comprehensive descriptive analysis of the data collected to validate the adoption of the proposed methodology.
Data dictionary:
-Satisfaction: 1=Very dissatisfied, 2=Dissatisfied, 3=Neutral, 4=Satisfied, 5=Very satisfied -Usability: 1=Strongly disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly agree
1. EXPLORATORY DATA ANALYSIS
1.1 Data review
The following section aims to verify the variables contained in the database, where a total of 17 variables and 279 records were identified, including categorical and numerical variables. For further analysis, these variables will be divided into two large groups: the first related to characteristics of the level of satisfaction with the methodology, and the second related to usability.
Age: respondent age.
Gender:
-Male -Female -Non-binary/other -I prefer not to answer
Type of affiliation:
-Emeritus researcher -Senior Researcher -Associate Researcher -Junior Researcher -Researcher in training (graduate student) -Undergraduate Student -Affiliated Member
Research group:
-BRIAV32 -ESCAB -ESMIC -BAIDI -ESCOM -ESAVE
Satisfaction:
- Satisfaction with methodology
- Willingness to implement
- Alignment with the Army
- Clarity and structure
- Improvement in innovation management
Usability:
- Frequent use
- Ease of understanding
- Quick learning
- Confidence in guidelines
- Organized structure
## Rows: 279
## Columns: 17
## $ Age <dbl> 34, 31, 33, 26, 32, 28, 30, …
## $ Gender <chr> "Male", "Male", "Male", "I p…
## $ `Type of affiliation` <chr> "Researcher in training (gra…
## $ `Research group` <chr> "ESAVE", "ESAVE", "ESAVE", "…
## $ `1. Satisfaction with methodology` <dbl> 4, 5, 3, 5, 5, 4, 4, 5, 4, 4…
## $ `2. Willingness to implement` <dbl> 4, 5, 4, 4, 5, 4, 4, 5, 4, 4…
## $ `3. Alignment with the Army` <dbl> 4, 5, 4, 4, 5, 5, 3, 5, 4, 5…
## $ `4. Clarity and structure` <dbl> 3, 5, 4, 4, 5, 4, 4, 4, 4, 4…
## $ `5. Improvement in innovation management` <dbl> 3, 4, 4, 4, 5, 5, 4, 4, 3, 5…
## $ `1. Frequent use` <dbl> 4, 4, 5, 4, 5, 4, 4, 4, 4, 4…
## $ `2. Ease of understanding` <dbl> 3, 4, 4, 4, 4, 3, 5, 4, 5, 4…
## $ `3. Quick learning` <dbl> 5, 2, 4, 5, 4, 4, 4, 3, 4, 3…
## $ `4. Confidence in guidelines` <dbl> 4, 4, 5, 3, 3, 4, 4, 3, 5, 3…
## $ `5. Organized structure` <dbl> 4, 4, 5, 4, 4, 4, 5, 3, 3, 3…
## $ Index_Satisfaction <chr> "NO", "SI", "NO", "SI", "SI"…
## $ Index_Usability <chr> "SI", "NO", "SI", "SI", "SI"…
## $ Global_Score <chr> "NO", "SI", "SI", "SI", "SI"…
| Name | datos_encuesta |
| Number of rows | 279 |
| Number of columns | 17 |
| _______________________ | |
| Column type frequency: | |
| character | 6 |
| numeric | 11 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Gender | 0 | 1 | 4 | 22 | 0 | 3 | 0 |
| Type of affiliation | 0 | 1 | 17 | 41 | 0 | 7 | 0 |
| Research group | 0 | 1 | 5 | 7 | 0 | 6 | 0 |
| Index_Satisfaction | 0 | 1 | 2 | 2 | 0 | 2 | 0 |
| Index_Usability | 0 | 1 | 2 | 2 | 0 | 2 | 0 |
| Global_Score | 0 | 1 | 2 | 2 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Age | 0 | 1 | 30.71 | 6.30 | 0 | 29 | 31 | 34 | 42 | ▁▁▁▇▅ |
| 1. Satisfaction with methodology | 0 | 1 | 4.15 | 0.64 | 2 | 4 | 4 | 5 | 5 | ▁▂▁▇▃ |
| 2. Willingness to implement | 0 | 1 | 4.13 | 0.64 | 2 | 4 | 4 | 5 | 5 | ▁▂▁▇▃ |
| 3. Alignment with the Army | 0 | 1 | 4.16 | 0.65 | 3 | 4 | 4 | 5 | 5 | ▂▁▇▁▅ |
| 4. Clarity and structure | 0 | 1 | 4.15 | 0.62 | 3 | 4 | 4 | 5 | 5 | ▂▁▇▁▃ |
| 5. Improvement in innovation management | 0 | 1 | 4.15 | 0.63 | 3 | 4 | 4 | 5 | 5 | ▂▁▇▁▃ |
| 1. Frequent use | 0 | 1 | 3.95 | 0.78 | 2 | 4 | 4 | 4 | 5 | ▁▃▁▇▃ |
| 2. Ease of understanding | 0 | 1 | 4.08 | 0.75 | 2 | 4 | 4 | 5 | 5 | ▁▃▁▇▆ |
| 3. Quick learning | 0 | 1 | 3.94 | 0.81 | 2 | 3 | 4 | 5 | 5 | ▁▅▁▇▅ |
| 4. Confidence in guidelines | 0 | 1 | 3.95 | 0.76 | 2 | 3 | 4 | 4 | 5 | ▁▃▁▇▃ |
| 5. Organized structure | 0 | 1 | 3.98 | 0.75 | 2 | 3 | 4 | 5 | 5 | ▁▅▁▇▅ |
The above data enables an initial review and evaluation, allowing for the observation of a total of 279 records, which corresponds to the total number of surveys conducted. There are a total of 6 categorical data points and 11 numerical data points, with each record containing 17 columns of information provided by the personnel surveyed. According to the population census, a total of 354 samples were expected to be obtained, which is the total population.
For our analysis, a total of 279 samples were obtained from the total population N, which is a good sample, covering 78.8% of the population, with a margin of error of +-3.2% (better than the standard of +-5). This sample N has the capacity to detect subtle differences and effects between variables and greater statistical power to perform multivariate analyses, with a representativeness that significantly reduces sampling bias, meeting the statistical requirements for the results to be generalized to the entire population N.
1.2 Missing data
## Valores cero en edad: 7
According to the analysis carried out in section 1.1, no missing data were observed in the survey data. However, the statistical results for the numerical variables allow us to identify that there is age data that was filled in with a value of zero. Therefore, this data must be cleaned so that it does not affect the statistical measures.
Considering that age is the only variable affected due to incorrect completion, we will impute data for the seven identified respondents. The median will be used, as it is a measure of central tendency that is more resistant to extreme values and better represents the actual central value of the data.
## Median Age Calculated:32
## Imputed Values:7
## Imputation completed: ZEROS → NA → MEDIAN(32)
Once the data has been cleaned, we proceed to analyze it, proposing hypotheses that allow us to confirm or refute the respondents’ behavior toward accepting or rejecting the proposed methodology.
2. HYPOTHESIS FORMULATION
2.1 Hypothesis Quantitative Variables
Age: It is expected that the younger people the age, the greater the satisfaction/usability of the proposed methodology, taking into account that may more influenced younger people to follow a research methodology within their work area, as opposed to older people, who are expected to be less receptive to adopting a new proposed innovation methodology due to their experience, presenting resistance to change.
2.2 Categorical Variable Hypotheses
Gender: Men are expected to be more accepting of the methodology, considering that they represent the majority within the institution. Therefore, women are expected to be less accepting than men because they are a minority within the military institution.
Type of affiliation: It is expected that undergraduate students will be more receptive to the methodology, considering that they are new to the research groups, while senior staff may not be as receptive to adopting the innovation methodology.
Research group: It is expected that research groups at the central level will be more accepting of the methodology, while decentralized groups will not.
3. UNIVARIATE ANALYSIS
Next, a univariate analysis will be performed on the variables selected for the different hypotheses.
3.1 Numerical variable graphs
The above graph allows us to analyze the age distribution, where we can see that 75% of the data is below 34 years of age, the first quartile Q1 represents 25% of the data with ages below 29 years, and we find some outliers that are above the maximum value (41) and below the minimum value (22).
It can be concluded that there is moderate symmetry in the data due to its distribution, which is slightly skewed to the left, with 50% of the information falling between the ages of 29 and 34 (adults), meaning that the research groups’ workforce falls within this range and represents half of the data collected. It is also worth noting that there is little participation among people under the age of 23 and those over 36, with no obvious extreme outliers.
3.2 Categorical variable graphs
The gender distribution allows us to identify the presence of 77.4% male staff versus 19.7% female staff. This has a lot to do with the current workforce, where male staff are in the majority. Finally, 2.9% preferred not to respond.
The graph above shows the distribution according to type of employment, where undergraduate students have the highest participation (33%), while emeritus researchers only account for 0.7% of the sample n.
Finally, the distribution by research group is evident, with ESAVE (31.2%) and ESCOM (30.8%) having the highest participation rates compared to BRIAV32 (4.37%).
4. ANALYSIS OF SURVEY RESULTS
In this section, we will detail the results obtained from the questions asked in the survey:
4.1 Satisfaction category
## null device
## 1
## === GRÁFICO PRINCIPAL: DISTRIBUCIÓN DE RESPUESTAS ===
The graphs above show that, in general, respondents rated most questions as satisfactory to very satisfactory, with values of 4 and 5 representing more than 80% of all responses. This indicates that, overall, there is a high level of satisfaction with the methodology proposed by the surveyed staff.
The boxplot graph shows the consistency of the information, as it can be verified that the values 4 and 5 are predominant and are where most of the data is concentrated.
4.2 Usability category
## === GRÁFICO PRINCIPAL: DISTRIBUCIÓN DE RESPUESTAS ===
In the case of usability, most respondents lean towards the numerical value 4, that is, they agree with the usability of the methodology, and between 20 and 25% are totally in agreement. However, a large portion (between 20-26%) expressed a neutral position regarding the use of the methodology, which means they are undecided about whether to implement it.
The boxplot graph allows us to identify how the data is grouped. Unlike the satisfaction category, which ranged between 4 and 5, the usability category ranged between 3 and 5, with some responses even scoring 2 (disagree), suggesting that there is a group of people who disagree with the use of the methodology.
4.3 Category correlations (Satisfaction/Usability)
The correlation matrix indicates that the questions were correctly structured and divided into two categories. Considering that there is only a correlation between questions within the same category, we can see in the upper right part of the graph that we obtain values close to zero, indicating that there is no correlation between these variables when different categories are crossed. The opposite is true for the satisfaction category, where we can see that there is a moderate correlation between most of its variables (0.40- 0.59), and that there is a high correlation (0.60-0.79) between the variables “Willingness to implement” and “Clarity and structure.”
In the case of the usability category, the variables are weakly correlated (0.20-0.39) and there is a moderate correlation between the variables “ease of understanding” and “quick learning,” as well as with the variable “frequent use,” suggesting that these variables are associated but not causal.
4.4 Dendogram
The graph above allows us to identify how the categories of the study variables are being grouped hierarchically. The dendrogram shows a cluster, analysis of the variables based on their correlation patterns.
In summary, we can conclude that there are two distinct dimensions or clusters, and that within each cluster there is internal cohesion between the variables. Therefore, strategies to improve each group must be presented and proposed separately. For example, for the usability group, frequent use correlates with confidence in guidelines and organized structure, which means that variable 1 could be influenced by variables 4 and 5.
4.5 Cronbach’s coefficient
To establish a metric that allows us to determine the reliability of the proposed questionnaire, we will use Cronbach’s coefficient, a statistical measure used to assess the consistency of a data set. If the questions are related to each other and measure the same construct, we expect to obtain a high coefficient, indicating high reliability.
Cronbach’s alpha provides a way to ensure that the questions in our questionnaire measure the same thing, providing more reliable and accurate results.
## Cronbach's Alpha coefficient for the Satisfaction group: 0.857
## Cronbach's Alpha coefficient for the Usability group: 0.755
The above result allows us to identify that there are high Cronbach’s coefficients that guarantee that internally the questions in our questionnaire measure the same thing, providing more reliable and accurate results for our research, demonstrating the correlation of each group, and concluding that the constructs for usability and satisfaction are highly consistent and dependable.
5 BIVARIATE ANALYSIS
A bivariate analysis is performed where the response variable is “acceptance of the methodology” coded as follows (y=1 if adopted, y=0 if not adopted), to verify the hypotheses proposed. As mentioned above, there are two primary groups or categories: satisfaction and usability. However, it is essential to note that for our analysis, the metric used is that for a person to adopt the methodology, they must have a score of 4 or higher in both categories; a lower score means that the person is not in agreement with the proposed innovation methodology.
According to the hypothesis, it was expected that the younger the age, the greater the acceptance of the proposal. This hypothesis is confirmed, considering that people under the age of 20 have greater acceptance of usability (60%) and satisfaction (80%), which suggests that young people tend to adopt the implemented methodology more easily.
In the case of people between 41 and 50 years old, it can be observed that, contrary to the hypothesis, they also have a high probability of adopting the methodology. Still, it is essential to note that the samples are small.
However, analyzing the ages where the highest concentration of data is found, 21 to 40 years old, it can be observed that these people are more inclined towards the satisfaction group, where they represent almost 70% of the chances of adopting the methodology, while in usability, we find more even data, although there is a tendency towards the use of the innovation methodology. This means that strategies should be implemented to focus usability efforts on people in this age range.
5.1 Bivariate categorical variables
The initial hypothesis was that men would be more represented than women. However, this hypothesis was not validated, as we found that female staff had a higher acceptance of usability (69.1%) compared to male staff, suggesting that more women tend to be more receptive to the use of technology. Regarding the satisfaction category, we found that men are more satisfied (73.6%) than women (60%), indicating that a higher percentage of men are more willing to adopt technology.
It was expected that, under this hypothesis, undergraduate students would have greater adoption of the technology. However, this theory can be refuted, considering that for the usability category, students have divided opinions (50%), which could indicate that they are unclear about the impact of the methodology and tend not to know whether to use it or not. On the other hand, concerning the satisfaction category, a high percentage indicates a greater willingness to implement the methodology in terms of clarity, structure, alignment, and opportunities for improvement in management and innovation.
However, analyzing positions such as emeritus researcher, we can see that they are 100% aligned with the implementation of the methodology (low sample size, only two of the respondents).
According to this hypothesis, it was expected that the central-level research groups (ESCAB-ESAVE-BAIDI-ESMIC) would have greater acceptance. However, it can be seen that ESCAB has a 25% usability of the methodology, while external groups such as ESCOM (59.3%) and BRIAV (69.2%) showed better reception of the innovation methodology.
On the other hand, regarding the satisfaction category, it can be seen that BAIDI was the group with the lowest score (57.1%), while the other agencies appear to be more satisfied with the proposed methodology, with percentages ranging from 60% to 81.6%.
5.2 Correspondence analysis
Correspondence analysis will allow us to analyze the variables in order to determine their association. It is very important to define the objective of our CMA.
Objective: To understand how perceptions of satisfaction and usability (10 questions) vary according to gender, type of affiliation, and research group. To answer the question, “What do people think?” and then, “Who are the people who think that way?” In this particular case, the responses to the 10 questions (five on satisfaction and five on usability) were considered active variables. In contrast, gender, type of relationship, and research group were considered complementary variables. It is essential to set the objective, taking into account that MCA analyzes the relationships between all categories. We should add up the scores of the questions to create a single index. In that case, we are losing key information about which specific aspect of satisfaction or usability is influencing the relationship between variables.
MCA graph of satisfaction questions
The graph above shows how the active variables are consistent at the top, while the complementary variables are at the bottom, where the respondents’ answers are more closely associated with the research group to which they belong. The first two dimensions explain 22% of the variance in the data.
Graph showing distribution of active vs. complementary variables (Satisfaction)
The above graph allows us to conclude how the responses obtained behave in contrast to the active variables. The variables that are furthest from the center are those that contribute most to the interpretation. For example, we can see that the questions that were answered with a rating of 4 (satisfied) are closer to the variables of gender, research group, and type of affiliation, suggesting that these data were more common for the supplementary variables.
For example, neutral responses (3) are further from the center, indicating a greater association than those in the center. Still, it can be seen that there are no complementary variables nearby.
A closeness between categories means a high association between those categories. For example, males are very close to undergraduate students, which may suggest that undergraduate students are male due to their closeness.
MCA graph usability questions
The above graph shows that the questions appear to be more diverse and less consistent, meaning that more people answered with levels 2 (disagree) and 3 (neutral). The variables of research group and type of affiliation seem to have some association with the question of organized structure.
Graph distribution of active vs. complementary variables (Satisfaction)
The graph above shows that there are different responses between 2 and 5. There is a strong relationship between associate researchers and the BRIAV32 group, suggesting that this type of researcher is often found within the BRIAV32 group.
Another easy example to see is gender, as they have a negative association and are opposites of each other, suggesting that a male person cannot belong to the female gender.
It can also be observed that the questions on rapid learning and structure are associated with the ESAVE group.