Eugenia Theodosopoulos
Statistics Final Project
5/12/2019
The data I analyzed was Resilience Data. The data was collected from a survey sent out by the Action Research Project, which is led by members of the administration at the Nueva School. The aim of the project is to research resilience in teachers and administrators within the independent school landscape. The program describes its mission, aiming to “better understand what factors allow educators to both adapt to change and weather the storm”.This data allows us to look for trends that could possibly “generate traction for prioritizing and developing the emotional intelligence skills of those working in educational environments.”
The data represents responses from faculty at schools across the United States with relations to the Nueva School, with each respondent indicating details about their demographics and role within the school, and then responding to a series of questions on a Likert scale of 1-5. The y-axis of the data set is respondents and the x-axis is each question on the survey. The sample size was medium, with 298 data points. Units varied as some questions regarded gender and age, and the majority were numbers on a scale with no specific units. One limitation of the data set was the sample pool, as the only schools surveyed are ones with relations to the Nueva School, and geographical bias, as certain schools chose to opt out of the survey, specifically in the northeast region of the United States. Additionally, there is no way to control any personal bias within the responses as individuals may be inclined to portray themselves in a positive light, despite the anonymity of the survey. This is indicated in the fact that almost none of the responses were in the 1-2 range, as most respondents ranked themselves higher on the variables that may reflect positively on oneself.
This project was partially based off of the 2018 book Onward: Cultivating Emotional Resilience in Educators by Elena Aguilar. The topics discussed in this book were used as the groundwork for the questions asked in the survey. In her book, she describes the importance of resilience as an educator and outlines the various habits and correlating dispositions that one must possess to have emotional resilience. The variables I chose to examine were “Acceptance” and “Understanding Emotions”. In the survey, these variables had their own definitions. “Understand Emotions” was defined as “I recognize and understand my emotions, and I have strategies to respond to them. I recognize and understand other people’s emotions and have strategies to respond to them.” “Acceptance” was defined as “I am not resigned, but I’m able to recognize what I can and can’t change in a situation, and I can accept that.” These definitions align with how Aguilar described the variables in the book. She categorized Understand Emotions as a habit—or behavior—of an emotionally resilient educator, and acceptance was categorized as the correlating disposition—attitude, character, or way of being—of an emotionally resilient educator. She claims that “Understanding emotions—accepting them and having strategies to respond to them—is essential to cultivate resilience. With an understanding of emotions, you can accept their existence, recognize where you can influence a situation, and let go of what is outside your control” (45, Onward). I chose these variables because the book correlated them and I was curious to see whether it was correct.
This data is important to analyze as it provides insight into the concept of resilience as a whole. It allows for comparison between variables to view possible associations between personality traits, as well as trends across geographic locations, race, and gender. Since the data is specifically targeted towards teachers, it is important in understanding how one’s role as a teacher may impact these personality traits or perhaps preexisting traits that incline an individual towards this field. The correlation between an individual’s personality and their likelihood to join the field of education could be tested using the variable “How many years have you been working in education?” I would also be curious to see how this data could be used in conjunction with more detailed data on the specifics of curricula and systems of each school in order to see if there are any correlations.
If you look below, you can see “R”’s visualization of the data set I plugged into it. If you look at the left side of this table, you will notice “Understand Emotions” and “Acceptance.” While not all the data is visible, if you look to the right of these labels, you can see some of the data results from the survey. For “Understand Emotions”, the data points visible are “5 4 4 4 5 4 4 3 5 4” and for “Acceptance,” the visible data points are “4 3 3 4 5 3 4 5 5 4.”
str(data_resilience)
## Classes 'tbl_df', 'tbl' and 'data.frame': 300 obs. of 34 variables:
## $ Timestamp : POSIXct, format: "2019-08-24 09:56:13" "2019-08-24 14:09:08" ...
## $ Please indicate your age range. : chr "45 - 49" "35 - 39" "50 - 54" "50 - 54" ...
## $ Please indicate your gender. : chr "Female" "Female" "Female" "Female" ...
## $ Please indicate your race. : chr "White" "White" "White" "White" ...
## $ How many years have you been working in education? : chr "25 - 29 years" "5 - 9 years" "25 - 29 years" "20 - 24 years" ...
## $ Which option best describes your role in your school?: chr "Administrator" "Administrator" "Administrator" "Administrator" ...
## $ In what division do you work? : chr "Upper School" "Upper School" "Upper School" "Upper School" ...
## $ In what state is your school located? : chr "DE" "AZ" "MS" "NY" ...
## $ How would you describe your school? : num 2 3 3 3 4 4 3 4 4 3 ...
## $ How much autonomy do you have in your job? : num 3 4 4 4 4 4 4 5 4 3 ...
## $ Know Yourself : num 5 4 4 4 5 5 4 4 5 5 ...
## $ Understand Emotions : num 5 4 4 4 5 4 4 3 5 4 ...
## $ Tell Empowering Stories : num 4 3 3 3 4 4 4 4 5 4 ...
## $ Build Community : num 4 3 5 4 3 5 3 4 5 3 ...
## $ Be Here Now : num 3 2 4 3 3 5 4 3 5 4 ...
## $ Take Care of Yourself : num 3 3 2 5 4 2 3 3 4 5 ...
## $ Focus on the Bright Spots : num 3 3 4 3 2 3 4 4 4 5 ...
## $ Cultivate Compassion : num 3 4 4 4 2 5 3 4 5 4 ...
## $ Be a Learner : num 3 4 4 3 3 5 4 4 4 4 ...
## $ Play and Create : num 3 3 5 3 3 5 4 4 5 5 ...
## $ Ride the Waves of Change : num 4 3 4 4 4 5 4 5 4 4 ...
## $ Celebrate and Appreciate : num 4 3 5 5 4 4 4 5 5 4 ...
## $ Purposefulness : num 5 4 4 4 4 4 4 4 5 4 ...
## $ Acceptance : num 4 3 3 4 5 3 4 5 5 4 ...
## $ Optimism : num 4 3 4 4 3 4 4 4 5 4 ...
## $ Empathy : num 3 5 4 4 5 5 4 4 5 5 ...
## $ Humor : num 4 5 4 2 4 5 4 5 4 5 ...
## $ Positive Self-Perception : num 2 3 4 4 5 3 3 4 4 3 ...
## $ Empowerment : num 3 4 5 4 5 5 4 4 4 4 ...
## $ Perspective : num 3 5 4 5 5 5 4 4 4 4 ...
## $ Curiosity : num 4 4 4 3 4 5 4 5 4 4 ...
## $ Courage : num 4 4 4 4 5 4 4 5 4 3 ...
## $ Perseverance : num 4 5 4 4 5 5 3 5 4 4 ...
## $ Trust : num 3 3 4 4 5 5 3 4 5 3 ...
Because all of the data is categorical, I changed all of the variables to “factors” in the dataset. I also deleted the last two rows of the data, which contained the means of each column and disrupted the graphs that I was producing.
data_resilience[]<-lapply(data_resilience, factor)
data_resilience<-data_resilience[1:298,]
data_resilience<-droplevels(data_resilience)
I decided to run a test of independence in order to determine if there is a significant relationship between the variables “Acceptance” and “Understand Emotions” or if they are independent of each other. I chose independence over goodness of fit and homogeneity because I was specifically attempting to see if the variables were correlated in any sense. Homogeneity and independence tests are similar, however, homogeneity measures the same variable in multiple populations, whereas independence tests two variables within the same population. In the independence test, the null hypothesis is that two variables in the same population are independent of eachother. In a homogeneity test, the null hypothesis is that one variable is distributed in the same way across two different populations, and that the distribution of the variable is not affected by population.
My null hypothesis is that the variables are independent, and my alternative hypothesis is that the variables are correlated. I felt as though the book made a believable correlation, and I wanted to see if Aguilar’s claims would hold up against a statistical test.
In order to satisfy one of the major assumptions of a chi-squared test, which is that the expected count is greater than or equal to 5, I had to combine certain columns and rows. Since many individuals refrained from rating themselves or their schools on the lower side of the number scale, there were very few entries for the 1, 2, and 3 columns/rows. Thus, I combined the three in order to meet the assumptions of chi-squared and to make my bar graph more neat. Additionally, since the chi-squared and bar graph have the same rows and columns, they are easier to compare. Another assumption is that the data is all counted categorical data, which is true. The data also fulfills the assumption of independence, as no cells have any effect on any other cells throughout the data set.
data_resilience$acceptance_comb<-ifelse(data_resilience$Acceptance==2 | data_resilience$Acceptance==1 | data_resilience$Acceptance==3,"1/2/3", ifelse(data_resilience$Acceptance==4, 4, 5))
data_resilience$emotions_comb<-ifelse(data_resilience$`Understand Emotions`==1 | data_resilience$`Understand Emotions`==2| data_resilience$`Understand Emotions`==3, "1/2/3", data_resilience$`Understand Emotions`)
I started out by looking at how the two variables are distributed across various categories, organized into tables. I made a frequency table, relative frequency table, and found a marginal distribution for both variables. The frequency table shows the scores alongside the frequency at which each score appeared within the data. The relative frequency table is similar, however, it shows the number of times each score appeared within the data compared to the population–essentially the “popularity” of a score. The marginal distribution of the variables shows the probability of achieving certain values within a variable.
#frequency table (counts)
table_emotions<-table(data_resilience$emotions_comb)
table_emotions
##
## 1/2/3 4 5
## 41 160 97
#relative frequencey table (propoertions)
table_emotions_rel<-prop.table(table_emotions)
table_emotions_rel
##
## 1/2/3 4 5
## 0.1375839 0.5369128 0.3255034
#marginal distribtuion
addmargins(table_emotions)
##
## 1/2/3 4 5 Sum
## 41 160 97 298
#frequency table (counts)
table_acceptance<-table(data_resilience$acceptance_comb)
table_acceptance
##
## 1/2/3 4 5
## 72 148 77
#relative frequencey table (propoertions)
table_acceptance_rel<-prop.table(table_acceptance)
table_acceptance_rel
##
## 1/2/3 4 5
## 0.2424242 0.4983165 0.2592593
#marginal distribtuion
addmargins(table_acceptance)
##
## 1/2/3 4 5 Sum
## 72 148 77 297
Next, in order to run a chi-squared test, I needed to make a contingency table. A contingency table displays the frequency distribution of the variables being analyzed. The frequency distribution is essentially a compilation of the different possible values within a variable and how many times they occur. The variables are shown simultaneously in this table, unlike the frequency or relative frequency tables. They are especially helpful when conducting chi-squared tests as they allow you to easily calculate the expected and observed values across the cells, which you then plug into an equation to perform the test. I used my combined columns to make the table and then ran the chi-squared test.
#contingency table
table_emotions_acceptance<-table(data_resilience$emotions_comb, data_resilience$acceptance_comb)
addmargins(table_emotions_acceptance)
##
## 1/2/3 4 5 Sum
## 1/2/3 13 18 10 41
## 4 42 86 32 160
## 5 17 44 35 96
## Sum 72 148 77 297
#chisquared
chisq.test(table_emotions_acceptance)
##
## Pearson's Chi-squared test
##
## data: table_emotions_acceptance
## X-squared = 10.301, df = 4, p-value = 0.03564
The results from this chi-squared test are very promising, as the p-value produced was 0.03564. A p-value must be less than 0.05 to be considered significant, as it provides strong evidence against the null hypothesis. Since the p-value produced when performing a chi-squared test with these variables was less than 0.05, it means that their relationship is statistically significant. A major limitation of this chi-squared test is that correlation does not equal causation. In the book, these two variables are established so that acceptance is a subsection of understanding emotions and you cannot fully possess the ability to understand emotions without acceptance. However, this test merely exposes their correlation, and does not reveal which variable causes the presence of the other. The scope of this is somewhat limited, as it only refers to the specific population of individuals that were permitted to respond to the survey as well as those with connections to Nueva. While it is a good sized population, the generalizability of these findings is limited as it is difficult to apply these findings to the larger population of all teachers or even people across the United States.
In this bar graph, I attempted to isolate the data points for those who answered 1/2/3, 4, and 5 in BOTH variables. In this specific graph I chose to use “Understand Emotions” as my base. I took the amount of people who responded 1/2/3, 4, or 5 for Understand Emotions and isolated how many individuals responded the same for Acceptance. The x-axis is the response, and the y-axis is the percent of respondents who answered the particular response for BOTH variables.
tbl2<-table(data_resilience$emotions_comb, data_resilience$acceptance_comb)
prop.table(tbl2, margin=1)
##
## 1/2/3 4 5
## 1/2/3 0.3170732 0.4390244 0.2439024
## 4 0.2625000 0.5375000 0.2000000
## 5 0.1770833 0.4583333 0.3645833
addmargins(prop.table(tbl2, margin=2))
##
## 1/2/3 4 5 Sum
## 1/2/3 0.1805556 0.1216216 0.1298701 0.4320473
## 4 0.5833333 0.5810811 0.4155844 1.5799988
## 5 0.2361111 0.2972973 0.4545455 0.9879539
## Sum 1.0000000 1.0000000 1.0000000 3.0000000
scores<-matrix(c(0.3170732, 0.5375000, 0.3645833), ncol=3, byrow=T)
colnames(scores)<-c("1/2/3", "4", "5")
rownames(scores)<-c("Freq")
scores<-as.table(scores)
scores<-as.data.frame(scores)
p4 <- ggplot(scores) + geom_bar(aes(y = Freq, x = Var2),
stat="identity", fill="skyblue") + theme_classic()
p4 + labs(x="Responses For Both Variables", y="Percent Of Respondants")
This bar plot shows that a moderate percentage of respondents overlapped their responses for both variables. 31% of respondents replied 1/2/3 for both variables, 53% of respondents replied 4 for both variables, and 36% replied 5 for both variables. While the majority of individuals did not overlap their answers for 1/2/3 and 5, I believe that the percent is still high enough to be acknowledged. Just like the chi-squared test, the scope of this chart is limited, as it only refers to the specific population of individuals that responded to this survey and the generalizability is limited as it is difficult to apply these findings to the larger population of all teachers or even people across the United States. This is because the individuals who received the survey were only from schools that have connections with Nueva and some schools did not even permit their faculty from taking the survey.
Through the results of the chi squared test and bar plot, it can be inferred that the variables “Acceptance” and “Understand Emotions” are correlated with each other. Since the p-value produced through the chi-squared test with these variables was 0.03564, which is less than 0.05, it means that their relationship is statistically significant. Despite discovering that there is a correlation, we still cannot infer which variable potentially causes the other. In the book that I am basing my investigation on, the author states that one must possess the trait of acceptance in order to achieve the ability to understand emotions. While this may be true, the statistical tests performed in this analysis do not confirm nor deny this statement.
In regards to the topic of resilience, the fact that the respondents rarely rated themselves below “4” on the 1-5 scale implies that they believe themselves to possess these traits connected to resilience. This is promising, as it suggests that many individuals in the field of education possess resilience, at least in regards to the definition that the book and survey provided. That being said, there are many areas of error that could dismantle this assumption. Since this survey is entirely self-reported, there is only one parameter in place to suggest honesty—anonymity. One may believe that a survey being anonymous incentivizes respondents to be truthful, as no one can connect their responses with them and thus cannot judge that. However, individuals still may rate themselves higher for these traits as they hold themselves to high standards and want to believe that they possess these traits—whether it is true or not.
Despite the many areas of error within this survey, I believe it to be a very good start in understanding the mindset of educators and the importance of these habits and personality traits within this field of work. With more analysis of the variables available, one could establish more concrete conclusions and work towards the question of causation. I would be interested in learning the mathematical skills necessary to test for causation. Other than exploring causation, possible next steps could be to add more variables to this analysis in order to see how interconnected multiple ones are or test the other associations that Aguilar made to see if the correlation I found was in fact a confirmation of her practice or if it was just lucky.