Final Project

Author

E. Nguyen

Load libraries

library(tidyverse)
library(openintro)
library(tidymodels)
library(ggridges)
library(GGally)
library(ggfortify)
library(ggplot2)
library(dunn.test)
library(wesanderson)

Load dataset

ultrarunning <- read_csv("ultrarunning.csv")
Rows: 288 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl  (9): age, sex, pb_surface, pb_elev, pb100k_dec, avg_km, teique_sf, steu...
time (1): pb100k_time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

filter out NA values

ultrarunning_1 <- ultrarunning|>
  filter(!is.na(age) & !is.na(sex) &  !is.na(pb_elev) &  !is.na(teique_sf) &  !is.na(steu_b) &  !is.na(stem_b) &  !is.na(pb_elev))
summary(ultrarunning_1)
      age            sex          pb_surface       pb_elev    
 Min.   :19.0   Min.   :1.000   Min.   :1.000   Min.   :   0  
 1st Qu.:33.0   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 800  
 Median :38.0   Median :2.000   Median :1.000   Median :2200  
 Mean   :40.1   Mean   :1.664   Mean   :1.624   Mean   :2523  
 3rd Qu.:45.0   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:3658  
 Max.   :87.0   Max.   :2.000   Max.   :4.000   Max.   :9000  
 pb100k_time         pb100k_dec        avg_km        teique_sf   
 Length:125        Min.   : 6.50   Min.   :  5.0   Min.   :3.13  
 Class1:hms        1st Qu.:12.16   1st Qu.: 56.0   1st Qu.:4.77  
 Class2:difftime   Median :14.38   Median : 70.0   Median :5.23  
 Mode  :numeric    Mean   :14.69   Mean   : 72.1   Mean   :5.20  
                   3rd Qu.:17.00   3rd Qu.: 80.0   3rd Qu.:5.70  
                   Max.   :23.50   Max.   :160.0   Max.   :6.50  
     steu_b         stem_b     
 Min.   : 7.0   Min.   : 5.83  
 1st Qu.:12.0   1st Qu.:10.33  
 Median :13.0   Median :11.58  
 Mean   :13.1   Mean   :11.46  
 3rd Qu.:15.0   3rd Qu.:12.92  
 Max.   :18.0   Max.   :15.08  
ur1 <- ultrarunning_1 |>
  mutate(sex2 = ifelse(sex == 1, "male", "female")) |>
  mutate(surface = ifelse(pb_surface == 1, "trail",
                          ifelse(pb_surface == 2, "track",
                                 ifelse(pb_surface == 3, "road", "all"))))
head(ur1)
# A tibble: 6 × 12
    age   sex pb_surface pb_elev pb100k_time pb100k_dec avg_km teique_sf steu_b
  <dbl> <dbl>      <dbl>   <dbl> <time>           <dbl>  <dbl>     <dbl>  <dbl>
1    36     2          3     631 07:36:12           7.6    110      5.73     14
2    29     2          1    1524 14:12:00          14.2     80      5.33     15
3    26     2          1    3657 14:20:00          14.3    110      5.33     14
4    40     1          1    4420 17:00:00          17       70      5.33     13
5    27     2          1    1372 12:00:00          12      105      5.23     15
6    29     1          3    2286 16:00:00          16      125      5.97     15
# ℹ 3 more variables: stem_b <dbl>, sex2 <chr>, surface <chr>

Introduction

1. You will research your topic (dataset) and provide background information regarding this topic. Define any technical terms, provide a history of how the topic has evolved over time, etc.

#Emotional intelligence is defined as intelligence used to process emotional information for reasoning and other cognitive activities. Emotional intelligence models have been used in providing insights on the role of emotional intelligence in professional spaces, leadership, and interpersonal relationships. 

#The Tripartite-EI model evaluates emotional intelligence on three levels: knowledge, abilities, and dispositions. Knowledge refers to combining both semantic knowledge (what people say is the correct thing to do in emotional situations) and episodic knowledge (recalling past experiences in emotional situations). Ability is defined as being able to implement strategies in emotional situations (positive reappraisal in stressful situations, temporary distraction in angry situations). Disposition is defined as the frequency of emotionally intelligent behavior in emotional situations (how often does one act 'emotionally intelligent'). 

#Beyond the strong emotions invoked by sports competition, endurance sport athletes often face psychological obstacles due to the extreme periods of time that push their physical limit. Within the field of endurance sport, this sample uses data of 100-km ultramarathon times. It is reasonable to assume that 100-km ultra marathons would elicit strong emotions and a desire to concede.

2. Describe how the source for your data and how data was collected (randomized experiment or observations). Discuss any potential bias, if any.

#This dataset is from Samtleben (2021), who investigated the role of emotional intelligence (EI) on participants’ 100km ultra-marathon personal best times adjusted for physical preparedness, age, and sex at birth. There is potential bias in the data sample as only those that consented to sharing their data in an online survey are included, demonstrating clear volunteer bias. There was also an incentive of a $50 Amazon gift card reward, which is another source of potential bias. 

3. Provide details about the statistics you will use. What type of information does it contain?

#The sample size is n = 288 and there are p = 10 variables. Measures of EI in this data include scores from three scales measuring EI: 1) the Trait Emotional Intelligence Questionnaire Short Form; 2) the Situational Test for Emotional Management Brief; and 3) the Situational Test for Emotional Understanding Brief. There are some missing values.

4. Define the variables included in your dataset.

#age: participant age in years

#sex: participant sex; 1 indicates male and 2 indicates female

#pb_surface: surface of participant's pb100k time; 1 indicates trail, 2 indicates track, 3 indicates road, 4 indicates mix of all three

#pb_elev: sum of positive change in elevation during race in meters

#pb100k_time: fastest time to complete a 100k ultra-marathon in hour:min:sec

#pb100k_dec: fastest time to complete a 100k ultra-marathon in hours

#avg_km: average distance run per week in km

#TEIQue-SF Score: the Emotional Intelligence score, average of the 30 items of the Trait Emotional Intelligence Questionnaire Short Form (Salovey & Mayer, 1990)

#STEU Score: Situational Test of Emotional Understanding, sum of 19 items in this assessment (Allen et al, 2014)

#STEM B: Situational Test for Emotional Management Brief score, sum of 18 items in this assessment (Allen et al, 2014)

5. Incorporate background research about this topic using at least 3 sources and 5 facts (4 sources and 10 facts for HM Students). Discuss research that has already been done on this topic.

#From a study by Samtleben (2021), results provide preliminary evidence supporting the notion that at longer distances EI does not directly impact performance; rather it exerts its effects  by enhancing an athlete’s training,

6. Define the overarching question you would like to ask about your dataset.

#Is there a correlation between faster 100 km PB times and emotional intelligence scores?

Part 2 – Your work with the data

7. Create initial summary graphs (boxplots, histograms, scatterplots, qqplots, etc.) of the data.

Scatter plot of Personal Best 100k Times to Emotional Intelligence Test

ggplot(ur1, aes(x=pb100k_dec, y=teique_sf))+
  geom_point(aes(color = age)) +
  scale_color_gradient(low = "green",high = "red",)+
  geom_smooth(method = "lm") +
  theme_bw()+
  labs(x="PB 100k (hr)", 
       y="TEIQUE_SF assessment of emotional intelligence",
       title = "Scatterplot of Personal Best 100k Times to TEIQue_SF Scores")
`geom_smooth()` using formula = 'y ~ x'

#This initial scatter plot doesn't suggest any correlation. All SE bars, the gray areas, overlap.

Scatter plot of Personal Best 100k Times to Emotional Understanding Test

ggplot(ur1, aes(x=pb100k_dec, y=steu_b))+
  geom_point(aes(color = age)) +
  scale_color_gradient(low = "green",high = "red",) +
  geom_smooth(method = "lm") +
  theme_bw()+
  labs(x="Personal Best 100k (hr)", 
       y= "STEM_B assessment of emotional intelligence",
       title = "Scatterplot of Personal Best 100k to STEU_B Score")
`geom_smooth()` using formula = 'y ~ x'

#This initial scatter plot doesn't suggest any correlation. All SE bars, the gray areas, overlap.

Scatter plot of Personal Best 100k Times to Situational Management Test

ggplot(ur1, aes(x=pb100k_dec, y=stem_b))+
  geom_point(aes(color = age)) +
  scale_color_gradient(low = "green",high = "red",) +
  geom_smooth(method = "lm") +
  theme_bw()+
  labs(x="Personal Best 100k (hr)", 
       y= "STEM_B assessment of emotional intelligence",
       title = "Scatterplot of Personal Best 100k to STEM_B Score")
`geom_smooth()` using formula = 'y ~ x'

#This initial scatter plot doesn't suggest any correlation. All SE bars, the gray areas, overlap.

Scatter plot of Average Distance Run per Week to Personal Best 100k

ggplot(ur1, aes(x=avg_km, y=pb100k_dec))+
  geom_point(aes(color = age)) +
  scale_color_gradient(low = "green",high = "red",) +
  geom_smooth(method = "lm") +
  theme_bw()+
  labs(x="Average distance run per week (km)", 
       y= "Personal Best 100k (hr)",
       title = "Scatterplot of Average distance run per week (km) to Personal Best 100k")
`geom_smooth()` using formula = 'y ~ x'

#Plot demonstrates a negative correlation between average distance run per week and personal best 100k times. This makes sense since more training and distance ran per week should lead to faster 100k times. 

Scatter plot of PB Elevation to Personal Best 100k

ggplot(ur1, aes(x=avg_km, y=pb_elev))+
  geom_point(aes(color = age)) +
  scale_color_gradient(low = "green",high = "red",) +
  geom_smooth(method = "lm") +
  theme_bw()+
  labs(x="Elevation Change (m)", 
       y= "Personal Best 100k (hr)",
       title = "Scatterplot of PB Elevation (km) to Personal Best 100k")
`geom_smooth()` using formula = 'y ~ x'

#Plot demonstrates a little correlation between elevation and personal best 100k times. This makes sense since more training and distance ran per week should lead to faster 100k times. 

Boxplot of Average Distance Run per Week

boxplot(ur1$avg_km, main = "Boxplot of Average Distance Ran per Week", ylab = "distance(km)")

#Fom the box plot, most participants run around 70 km per week, with some distinct outliers running up to 150 km per week. 

ggplot(ur1, aes(x=avg_km)) +
    geom_histogram(alpha=0.5, binwidth = 15, aes(y=..density..), colour="black", fill="red")+
    geom_density(alpha=.5, color = "black", fill = "lightblue")
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Histogram of age

hist(ur1$age, xlab = "Age (yrs)", main= "Distribution of Participant Age")

#The histogram is right skewed. makes sense since older people are less likely to be doing ultra-marathons. Looks like the minimum age is 18. There's also one 87 year old participating which is honestly ridiculous. 

ggplot(ur1, aes(x = age, fill = sex2)) +
  geom_histogram(binwidth = 5, color = "black") +
  labs(
    title = "Distribution of Participant Sex"
  )

8. Calculate any summary statistics

Barplot of Sex

ggplot(ur1, aes(x = sex2, fill = sex2)) +
  geom_bar(show.legend=FALSE, color="black") +
    labs(
      title = "Freuqency of Participant Sex")

summary(ur1)
      age            sex          pb_surface       pb_elev    
 Min.   :19.0   Min.   :1.000   Min.   :1.000   Min.   :   0  
 1st Qu.:33.0   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 800  
 Median :38.0   Median :2.000   Median :1.000   Median :2200  
 Mean   :40.1   Mean   :1.664   Mean   :1.624   Mean   :2523  
 3rd Qu.:45.0   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:3658  
 Max.   :87.0   Max.   :2.000   Max.   :4.000   Max.   :9000  
 pb100k_time         pb100k_dec        avg_km        teique_sf   
 Length:125        Min.   : 6.50   Min.   :  5.0   Min.   :3.13  
 Class1:hms        1st Qu.:12.16   1st Qu.: 56.0   1st Qu.:4.77  
 Class2:difftime   Median :14.38   Median : 70.0   Median :5.23  
 Mode  :numeric    Mean   :14.69   Mean   : 72.1   Mean   :5.20  
                   3rd Qu.:17.00   3rd Qu.: 80.0   3rd Qu.:5.70  
                   Max.   :23.50   Max.   :160.0   Max.   :6.50  
     steu_b         stem_b          sex2             surface         
 Min.   : 7.0   Min.   : 5.83   Length:125         Length:125        
 1st Qu.:12.0   1st Qu.:10.33   Class :character   Class :character  
 Median :13.0   Median :11.58   Mode  :character   Mode  :character  
 Mean   :13.1   Mean   :11.46                                        
 3rd Qu.:15.0   3rd Qu.:12.92                                        
 Max.   :18.0   Max.   :15.08                                        

9. Define the parameter or parameters you are trying to estimate.

#I will calculate if there is a meaningful difference in PB 100k times in relation to emotional intelligence scores, age, sex, and surface type. 

10. Use at least 3 DIFFERENT statistical techniques (4 for HM students) you have learned throughout this course to attempt to answer your question. Techniques may include sampling, randomization testing, Chi-Square tests, ANOVA, bootstrapping confidence intervals, linear, multiple linear, and logistic regressions, non-parametric tests, etc. Be sure you check all basic assumptions for that technique before performing any HT or CI.

Chi-Square test

# Null hypothesis (H0): There is no meaningful difference between female and male participants in relation to which surface they ran their personal best 100k.

# Alternative hypothesis (Ha): There is a meaningful difference between female and male participants in relation to which surface they ran their personal best 100k.

perm1 <- ur1 |>
  specify(surface ~ sex) |>
  hypothesize(null = "independence") |>
  generate(reps = 1, type = "permute")
chisq.test(perm1$sex, perm1$surface)
Warning in chisq.test(perm1$sex, perm1$surface): Chi-squared approximation may
be incorrect

    Pearson's Chi-squared test

data:  perm1$sex and perm1$surface
X-squared = 5.3516, df = 3, p-value = 0.1478
#Since the p-value is greater than 0.05, we fail to reject the null hypothesis. There is no meaningful difference between sex and surface of personal best 100k time. There's a warning, so use non-parametric version to confirm results.
ggplot(perm1, aes(x = sex, color = surface, fill = surface)) +
  geom_bar(alpha = 0.75 ) +
  labs(
    title = "Frequency of Surface Types"
  )

fisher.test(perm1$sex, perm1$surface)

    Fisher's Exact Test for Count Data

data:  perm1$sex and perm1$surface
p-value = 0.08934
alternative hypothesis: two.sided
#P-value is still greater than 0.05. Again, fail to reject the null hypothesis. There is no meaningful difference between sex and surface of personal best 100k time.

Linear Model Regression

full_model <- lm(data = ultrarunning_1, pb100k_dec ~ teique_sf + pb_elev + age + avg_km + steu_b + stem_b)
summary(full_model)

Call:
lm(formula = pb100k_dec ~ teique_sf + pb_elev + age + avg_km + 
    steu_b + stem_b, data = ultrarunning_1)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.1653 -2.2229 -0.4163  1.6415  8.4899 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.6236139  3.4379694   3.963 0.000127 ***
teique_sf    0.5449281  0.4670050   1.167 0.245621    
pb_elev      0.0005030  0.0001494   3.367 0.001025 ** 
age         -0.0112172  0.0286799  -0.391 0.696415    
avg_km      -0.0535740  0.0132386  -4.047 9.31e-05 ***
steu_b       0.1285510  0.1367711   0.940 0.349189    
stem_b      -0.0356949  0.1642514  -0.217 0.828335    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.249 on 118 degrees of freedom
Multiple R-squared:  0.2034,    Adjusted R-squared:  0.1629 
F-statistic: 5.022 on 6 and 118 DF,  p-value: 0.0001283
#All of the emotional intelligence test scores unfortunately have the largest p-values. 
#diagnostic plots
autoplot(full_model, nrow=2, ncol=2)

#Let's try taking out the emotional intelligence scores and see how it turns out
lm1 <- lm(data = ur1, pb100k_dec ~ pb_elev + age + avg_km)
summary(lm1)

Call:
lm(formula = pb100k_dec ~ pb_elev + age + avg_km, data = ur1)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.0214 -2.1663 -0.2964  1.7456  8.2566 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 17.7250543  1.4243477  12.444  < 2e-16 ***
pb_elev      0.0004738  0.0001470   3.222 0.001634 ** 
age         -0.0135019  0.0285708  -0.473 0.637367    
avg_km      -0.0511790  0.0129516  -3.952 0.000131 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.244 on 121 degrees of freedom
Multiple R-squared:  0.1854,    Adjusted R-squared:  0.1652 
F-statistic: 9.181 on 3 and 121 DF,  p-value: 1.608e-05
#Taking out age since it had the largest p-value 
lm2 <- lm(data = ur1, pb100k_dec ~ pb_elev + avg_km)
summary(lm2)

Call:
lm(formula = pb100k_dec ~ pb_elev + avg_km, data = ur1)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.0696 -2.2339 -0.3661  1.7965  8.0233 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 17.2479105  1.0014568  17.223  < 2e-16 ***
pb_elev      0.0004874  0.0001437   3.391 0.000938 ***
avg_km      -0.0525468  0.0125838  -4.176  5.6e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.234 on 122 degrees of freedom
Multiple R-squared:  0.1839,    Adjusted R-squared:  0.1705 
F-statistic: 13.75 on 2 and 122 DF,  p-value: 4.129e-06
#Looking at the diagnostic plots, in the residuals vs fitted, the line is generally horizontal which indicates that a linear model is appropriate. Most points on the Normal Q-Q graph are near the graph. There are some outilers but the plot indicates a relatively normal distibution.The scale-location and residuals vs leverage plots also indciate that point 40 is a constant outlier among all plots. Along with some other points, it is skewing the variance distribution and has high leverage. 

#All of the emotional intelligence test scores unfortunately have the largest p-values, so these were taken out of the model. Age had the highest correlation with pb100k_dec according to residual plots and had a larger p value, so these were also removed. The adjusted R-squared value is 0.1705, which means that 17.05% of the variation in the observation can be explained by this model. 

T-Test

#I will use a T test to see if there is a meaningful difference in PB 100k times between male and female participants. I am using a quantitative variable and seeing if there's a meaningful difference in the means in the 2 groups of my binary categorical variable.

t.test(ur1$pb100k_dec ~ ur1$sex2)

    Welch Two Sample t-test

data:  ur1$pb100k_dec by ur1$sex2
t = -0.39789, df = 91.615, p-value = 0.6916
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -1.547003  1.030629
sample estimates:
mean in group female   mean in group male 
            14.60229             14.86048 
#P-value larger than 0.05 so accept the null hypothesis. Thee is no meaningful difference in PB 100k times between males and female participants.

Anova Test

ur1|>
ggplot(aes(surface, pb100k_dec, color=surface))+
  geom_boxplot(show.legend = TRUE)+
  geom_jitter(show.legend = FALSE, alpha = .4)+
  labs( 
    x = "Surface Type",
    y = "PB 100k Times",
    color = "Surface",
    title = "Distribution of PB 100K Times by Surface")+
  theme_bw()

ggplot(ur1, aes(pb100k_dec, fill = surface))+
  geom_density(alpha=0.3)+
  theme_bw()

surface_table <- ur1 |>
  group_by(pb_surface) |>
  count()
surface_table
# A tibble: 4 × 2
# Groups:   pb_surface [4]
  pb_surface     n
       <dbl> <int>
1          1    93
2          2     2
3          3    14
4          4    16
#There are more than 2 groups in the surfaces variable, so must use ANOVA test. 
#Check if dataset violates basic assumptions for ANOVA test
summarytable <- ur1 |>
  group_by(surface) |>
  summarise(mean_pb100k = mean(pb100k_dec), sd_pb100k = sd(pb100k_dec), median_pb100k = median(pb100k_dec), count = n())
summarytable
# A tibble: 4 × 5
  surface mean_pb100k sd_pb100k median_pb100k count
  <chr>         <dbl>     <dbl>         <dbl> <int>
1 all            13.9      3.41          13.0    16
2 road           11.6      3.28          11.1    14
3 track          12.8      1.66          12.8     2
4 trail          15.3      3.39          14.9    93
#SD for track is very low compared to to the other SD_pb100k values. Sample size is too small, so must use non-parametric version, which is the Kruskal Wallis test.
ktest_ur1 <- kruskal.test(pb100k_dec ~ surface, data = ur1)
ktest_ur1

    Kruskal-Wallis rank sum test

data:  pb100k_dec by surface
Kruskal-Wallis chi-squared = 12.786, df = 3, p-value = 0.005122
#Because the p-value is less than 0.05, the null is rejected and a post-hoc test must be performed. Use Dunn test as the appropriate post hoc test for Kruskal Wallis.
dunn.test(ur1$pb100k_dec, ur1$pb_surface)
  Kruskal-Wallis rank sum test

data: x and group
Kruskal-Wallis chi-squared = 12.7863, df = 3, p-value = 0.01

                           Comparison of x by group                            
                                (No adjustment)                                
Col Mean-|
Row Mean |          1          2          3
---------+---------------------------------
       2 |   1.076464
         |     0.1409
         |
       3 |   3.330165   0.245196
         |    0.0004*     0.4032
         |
       4 |   1.433633  -0.508400  -1.548386
         |     0.0758     0.3056     0.0608

alpha = 0.05
Reject Ho if p <= alpha/2
#Difference in means show a large difference between groups 1 and 3, as the p-value is much lower than 0.05. There is a meaningful difference in the PB 100k times of those who run on trail versus those who run on track.

Part 3 – The conclusion 11. Write a general conclusion based on the statistical analysis you performed.

#There is no meaningful difference found between PB 100k times in relation to emotional intelligence scores and in relation to sex. There is a meaningful difference found in PB 100k times between participants who ran on trail versus those who ran on track. In regards to comparing other PB 100k times with other surfaces, there is no meaningful difference. 
  1. Restate p-values, confidence intervals, and any other important results from your findings.
# The Chi-squared test was used to estimate a difference in means of PB 100k times and sex. This resulted in a p-value of 0.929, with a warning that the approximation may be incorrect. To confirm the result, a Fisher test was used, resulting in a p-value of 0.9324. Because the p-values are well above 0.05, the null hypothesis is accepted. There is no meaningful difference in PB 100k times regards to sex. 

#From the T-test of PB 100k times between male and female participants, the P-value was 0.69, which is larger than 0.05, so accept the null hypothesis. There is no meaningful difference in PB 100k times between males and female participants.

#The Anova test was used to compare PB 100k times with surface type. Because the dataset violates the basic assumptions, a the non-parametric Kruskal Wallis test was used. With a p-value of 0.0051, the null is rejected and a post-hoc test must be performed. Using the Dunn test, there is a p-value of 0.14 between groups 1 and 3, meaning there is meaningful difference in the PB 100k times of those who run on trail versus those who run on track.

#From the initial scatter plot using linear regression lines, it seemed unlikely that there was a correlation between PB 100k times and emotional intelligence.  

#This is further shown with the multiple linear regression model, where the three variables for emotional intelligence tests have the highest p-values, meaning lesser correlation with PB 100k times. 
  1. Write specific conclusions regarding implications of your results (useful to the general public). You can include your own personal opinions here.
#Sort of splitted this one up between 12 and 14.
  1. Write your opinion about how the overall statistical analysis went. Was it thorough? Were there pieces you wished to include if you had had that data? Were there questions left unanswered? Were there deficiencies in the original data?
#Personally disappointed in lack of interesting results, especially on the emotional intelligence end. For the difference in PB 100k times of those who run on track versus trail, this was also an expected result since trail paths have a lot more variation, causing them to be more physically and mentally more fatiguing. I wished there were more data entries, preferably with less N/A values, as the sample size was cut down by more than half when ridding of N/A values. I wish that the dataset included other types of athletes, not just ultramarathon runners. I'd like to see if there was a difference between endurance athletes, sprint athletes, and ordinary people. 
  1. Include a bibliography of all sources
#“Apa Dictionary of Psychology.” American Psychological Association, American Psychological Association, dictionary.apa.org/emotional-intelligence. Accessed 4 Apr. 2024.

#Mikolajczak, Moïra. (2010). Going Beyond The Ability-Trait Debate: The Three-Level Model of Emotional Intelligence. E-Journal of Applied Psychology. 5. 10.7790/ejap.v5i2.175.

#Kellmann, Michael, et al. "Recovery and performance in sport: consensus statement." International journal of sports physiology and performance 13.2 (2018): 240-245.