Project 2

2024-05-01

Introduction

The field of engineering is a complex and demanding field that requires strong interpersonal skills, effective stress management, and positive moods to ensure success. In this project, we aim to explore the relationship between emotional intelligence and the performance of teams in an engineering project.

Understanding the Study

The following article introduces the data and helps to understand the reason for the study.

EPOJ Study

The study utilizes data that was designed to gain a better understanding of how the emotional intelligence of individual team members relates directly to the performance of their team during an engineering project (Leicht). Prior to the start of the course, all students completed an emotional intelligence test. This test yielded three results, an interpersonal score, a stress management score and a mood score. The students were split into 23 groups for a group project, and the average grades within a group were recorded.

Variables Included

Variables

data <- read.csv("TEAMPERF.csv", header=TRUE)
head(data)

##   IntraPers StressMan Mood Rating Project
## 1        14        12   17   High    88.0
## 2        21        13   45   High    86.0
## 3        26        18    6   High    83.5
## 4        30        20   36   High    85.5
## 5        28        23   22   High    90.0
## 6        27        24   28   High    90.5

The data includes 23 observations and five variables: three independent variables and two dependent variables. The three independent variables are all continuous variables, while the two dependent variables are categorical and continuous, respectively. The categorical variable is the rating of the project, which could be “High,” “Overrated,” or “HardWork.” The continuous variable is the mean project score.

Data Plots

library(s20x)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.3

data <- read.csv("TEAMPERF.csv", header=TRUE)
stress <- data[1:23, c(2,5)]
pairs20x(stress)

g = ggplot(stress, aes(x=StressMan, y=Project)) + geom_point()
g = g + geom_smooth(method = "loess")
g

## `geom_smooth()` using formula = 'y ~ x'

These plots serve to examine the relationship between a stress management score, and the project average grade. Unfortunately, nothing appears to pop out immediately about the data. We will have to do more analysis to determine if there is a correlation.

How was the data collected?

To achieve this goal, we will analyze data collected from undergraduate students enrolled in the course, Introduction to the Building Industry, who participated in the study. Each student completed an emotional intelligence test, which included measures of interpersonal skills, stress management, and mood.

The students were then divided into 23 teams, with each team assigned a group project. The individual project scores were averaged to obtain the dependent variable in the analysis, the mean project score. Three independent variables were determined for each team: the range of interpersonal scores, the range of stress management scores, and the range of mood scores.

Why was it gathered

The data was gathered to determine if there are correlations between the emotional intelligence of a group and the score the group ends up receiving. Groups with different characteristics tend to accomplish tasks in different ways, so the data was gathered to find if there is an ideal group emotional intelligence.

My interest in the data?

I am interested in this data specifically because it relates directly to myself. I am an engineering student that works in different teams and am constantly analyzing the performance of my team. It leaves me to wonder if there are certain factors relating to individuals within the group that directly correlate with team performance.

Problem I wish to solve

The problem we aim to solve is to determine how emotional intelligence, specifically stress management, affects the performance of teams in an engineering project. By understanding the relationship between stress management and team performance, we can identify the factors that contribute to success in engineering projects and develop strategies to enhance team performance.

Theory of SLR

I want to make a model relating the stress management levels and the average grade of individuals. I will define project grade as my dependent variable and stress management as my independent variable. Using a Simple Linear Regression model will benefit us greatly in our analysis as it will take into account randomness of the distribution within our model. SLR assumes that the mean value of the y data for any value of the x data will make a straight line when graphed, and that any points which deviate from the line (above r below) are equal to \(\epsilon\). This statement is represented as: \[ y = \beta_0 + \beta_1x_i+\epsilon_i\] where \(\beta_0\) and \(\beta_1\) are unknown. \(\beta_0+\beta_1x\) is the mean value of y for a given x, and \(\epsilon\) is the random error. Working with the assumption that some points are going to deviate from the line, I know that some will be above the line (positive deviation) and some will be below the line (negative deviation), with an \(E(\epsilon)=0\). This would make the mean value of y: \[ \begin{align} E(y) &= E(\beta_0+\beta_1x_i+\epsilon_i) \\ &= \beta_0+\beta_1x_i+E(\epsilon_i) \\ &= \beta_0+\beta_1x \end{align} \]

Thus, the mean value of y for any given value of x will be represented by \(E(Y|x)\) and will graph as a straight line, with a y-intercept of \(\beta_0\) and a slope of \(\beta_1\).

In order to fit a SLR model to my data set I need to first estimate the parameters \(\beta_0\) and \(\beta_1\). In order to make a valid estimate, it will depend on the sampling distributions of the estimators, which depend on the probability distribution of ε; thus, I need to make the following assumptions about \(\epsilon\) which are:

the mean of the probability distribution of \(\epsilon\) = 0;
the variance of the probability distribution of \(\epsilon\) is constant for all values of the independent variable - for a straight line model this means \(V(\epsilon) = a\) constant for all values of x;
the probability distribution of \(\epsilon\) is normal;
the errors associated with different observations are independent of one another.

Main Result 1

Is there a correlation between stress management and the average grade the group ends up receiving?

Main Result 2

If so, how much of an impact does management have on the outcome of the grades.

Validity with mathematical expressions

Straight Line Trend

stress.lm = lm(Project~StressMan, data=stress)
plot(Project~StressMan,bg="Blue",pch=21,cex=1.2,
     ylim=c(70,1.1*max(Project)),xlim=c(0,1.1*max(StressMan)), 
     main="Trend of Project Average vs Stress Management Levels", data=stress)
abline(stress.lm)

Errors Distributed Normally - Shapiro Wilk

normcheck(stress.lm, shapiro.wilk = TRUE)

The value of W being 0.96 indicates that the data are closely aligned with a normal distribution. The p-value is 0.595 which is greater than 0.05, so the null hypothesis that it is normally distributed cannot be rejected.

Constant Variance

Residual vs fitted values

model <- lm(Project ~ StressMan, data=data)
fitted_values <- fitted(model)
residuals <- resid(model)
plot(fitted_values, residuals, main="Residuals vs Fitted Values",
     xlab="Fitted Values", ylab="Residuals", pch=20, col="blue")
abline(h=0, col="red")

trendscatter on Residual Vs Fitted

plot(Project~StressMan,bg="Blue",pch=21,cex=1.2,
              ylim=c(70,1.1*max(Project)),xlim=c(0,1.1*max(StressMan)),
              main="Residuals of Project Average vs Stress Management Levels", data=stress)
ht.lm=with(stress, lm(Project~StressMan))
abline(stress.lm)
yhat=with(stress,predict(stress.lm,data.frame(StressMan)))
with(stress,{segments(StressMan,Project,StressMan,yhat)})
abline(stress.lm)

Zero mean value of \(\epsilon\)

model <- lm(Project ~ StressMan, data=data)
residuals <- resid(model)
mean_residuals <- mean(residuals)
print(mean_residuals)

## [1] 1.424288e-16

The zero mean of \(\epsilon\) is very close to 0.

Independence of data

By randomly assigning students to teams, the study design ensures that the team compositions are not influenced by the individual characteristics of the students. Although students received individual scores, these were aggregated to compute a mean project score for each team. This aggregation reduces the impact that any single student’s score could have on the overall analysis, promoting independence between the observations.

Analysis of the data

Trends of the data

plot(Project~StressMan,bg="Blue",pch=21,cex=1.2,
             ylim=c(70,1.1*max(Project)),xlim=c(0,1.1*max(StressMan)),
             main="Mean of Project Average vs Stress Management Levels", data=stress)
abline(stress.lm)
with(stress, abline(h=mean(Project)))
abline(stress.lm)
with(stress, segments(StressMan,mean(Project),StressMan,yhat,col="Red"))

library(s20x)
trendscatter(Project~StressMan, f = 0.5, data = stress, main="Project Average vs Stress Management Level")

Summary lm object

stress.lm = lm(Project~StressMan, data=stress)
summary(stress.lm)

## 
## Call:
## lm(formula = Project ~ StressMan, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9475 -2.5962 -0.1625  2.6349  6.8943 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 84.21943    2.17094  38.794   <2e-16 ***
## StressMan    0.12026    0.08054   1.493     0.15    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.197 on 21 degrees of freedom
## Multiple R-squared:  0.09597,    Adjusted R-squared:  0.05293 
## F-statistic: 2.229 on 1 and 21 DF,  p-value: 0.1503

This yields that \[ \hat{\beta_0} = 84.21943 \\ \hat{\beta_1} = 0.12026\]

Interpretation of Tests

The t-value is 1.493 with a corresponding p-value of 0.15. A p-value greater than 0.05 typically indicates that the results are not statistically significant at the 5% significance level, suggesting that the evidence is insufficient to conclude that stress management has a significant impact on project scores.

Interpretation of Multiple R Squared

The value of multiple R Squared is about 9.6%. This indicates that about 9.6% of the variance in the project performance scores can be explained by the stress management scores alone. Considering how low it is, it inicates that it is not a very strong factor.

Interpretatoin of point estimates

The intercept was 84.21943 meaning that is the starting point for expected scores. The slope in the summary was about 0.12, and this indicates that there is a positive correlation between stress and grades.

Calculate cis for Beta

Predict

Model 1:

amount = predict(stress.lm, data.frame(StressMan=c(50,35,20)))
amount

##        1        2        3 
## 90.23240 88.42851 86.62462

Model 2:

quad.lm=lm(Project~StressMan + I(StressMan^2),data=stress)
amount2 = predict(quad.lm, data.frame(StressMan=c(50,35,20)))
amount2

##        1        2        3 
## 92.77714 88.56188 86.42782

The predictions made using the model2 (quadratic) are slightly larger than the predictions made using model1 (linear).

ciReg

ciReg(stress.lm, conf.level = 0.95, print.out=TRUE)

##             95 % C.I.lower    95 % C.I.upper
## (Intercept)       79.70472          88.73414
## StressMan         -0.04724           0.28776

Outliers

cooks20x(stress.lm)

Cooks Distance for this linear model and data tells me that observation numbers 7, 12, and 16 have large enough values that they could be outliers.

Conclusion

As shown above, our goal was to determine whether stress management levels have a direct correlation with project scores collected by the “TEAMPERF.csv” data set. While this was only one study that we analyzed, there is a lot more published research on this topic. This is a very important topic as stress is a human phenomenon that all have encountered. It is important to recognize various techniques that aid individuals in managing their stress therefore leading to healthier habits.

The Reaserch Question and Results

Using the data in the “TEAMPERF.csv” file, our goal was to determine whether the data shows a direct relationship between stress management levels and average project scores. The stress management levels in the file was a range of stress management levels of a particular engineering team. The score was the average score of that group. I believe the statistical analysis generated from this data did not prove conclusive regarding a correlation between the stress management levels and average scoring on a project. One of the main pieces of information that reinforces my conclusion is the \(R^2\) value for both the linear and quadratic model. The quadratic model only had an \(R^2\) value of around 0.096 which is very far from 1. Therefore, it is not intuitive to assume that these individual points representing a range of stress management levels have a direct correlation with overall project grades.

Potential Improvements

I believe that this model could be improved in the future by increasing the sample size. This would perhaps help in determining whether there is a trend that the data is heading towards. Another way of improving this sample is to also remove more outliers to separate the signal from the noise. There are many more questions and information out there to determine if a range in stress management levels leads to lower or higher grade averages.