Because we are interested in how much each individual improves, it means that the same group of people should appear in both conditions. A paired (within-subjects) design directly measures the change in percentage for each subject. With the same sample size, paired (within-subjects) design generally have higher statistical power than between-subjects designs and the error term is smaller.
It controls for individual differences. Subjects differ in many aspects, such as prior knowledge, reading ability and general test-taking ability. In a between-subjects design, those individual differences add noise to the comparison between groups. However, in the paired design, each student serves as their own control. All differences in the subject cancel out when we look at the difference score. It is simpler and easier than comparing two different groups that have never experienced both conditions.
Differences in report difficulty between the two phases may affect the accuracy. The âpreâ and âpostâ phases use two different sets of safety reports. They may not be perfectly matched in difficulty.
library(pwr)
power_values <- seq(0.10, 0.99, by = 0.01) # power values array
effect_sizes <- array(data = 0, dim = length(power_values)) # all output d values
# For-loop
for (i in 1:length(power_values)) {
p <- power_values[i] # Extract the i-th power value
result <- pwr.t.test(
n = 23,
power = p,
sig.level = 0.05,
type = "paired",
alternative = "two.sided"
)
effect_sizes[i] <- result$d # input the effect size from the loop
}
# Plot the power curve
plot(power_values, effect_sizes,
type = "l",
xlab = "Power",
ylab = "Effect Size",
main = "Power Curve"
)
# Effect size needed for 80% power
pwr.t.test(
n = 23,
power = 0.80,
sig.level = 0.05,
type = "paired",
alternative = "two.sided")
##
## Paired t test power calculation
##
## n = 23
## d = 0.6112775
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number of *pairs*
Difficulty level on the reports
The âpreâ and âpostâ phases use two different sets of safety reports. Even though the assignment randomizes the order of the sets across participants, the two sets may not be perfectly matched in difficulty. If the âPost-Rubricâ report set simpler or more obvious cases, subjects may perform more accurate after learning the rubric. Conversely, if the second set is more difficult, this may underestimate the effect of the rubric.
The impact of fatigue/time-on-task on results.
Participants will likely experience fatigue or distraction due to the time of day when Phase 2 is performed, thus impacting their performance and possibly negating the benefits seen from using the rubrics (negative bias). If Phase 1 is rushed and performed too quickly, that too can negatively affect result averages and exaggerate any improvements that occurred in this phase.
Reduce the length of sessions, take breaks in between, randomize the order of when each participant enters the study. Collect the time spent on each report and include it in the data set as a covariate, to help understand the effects of time-on-task. Conduct sessions at a consistent time each day, if possible.
Ordering of Viewing Reports Not in Order
A potential confound of this data would be that the data has no mechanism to insure that the assignment is done in order. The final video could be viewed out order which would skew the pre data numbers. There needs to be a mechanism to enforce the viewing order.
pre <- c(30,70,40,70,30,30,30,30,70,50,50,50,50,20,70,30,30,60,30,80,40,30,70)
post <- c(100,90,80,50,100,70,40,70,80,70,50,50,50,40,50,60,50,100,100,90,50,100,90)
diff_response<-post - pre
summary(diff_response)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -20.00 10.00 20.00 24.78 40.00 70.00
Above is a statistical summary of the differences of the pre-responses and the post responses for our data. The standard deviation of the data is shown below.
sd(diff_response)
## [1] 26.946
The summary of the individual data groups, pre-response and post-response are shown below
summary(pre)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 30.00 40.00 46.09 65.00 80.00
sd(pre)
## [1] 18.27545
The summary for the post-response is shown below:
summary(post)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 40.00 50.00 70.00 70.87 90.00 100.00
sd(post)
## [1] 21.93234
To check for normality of the difference data, we the Shapiro-Wilk statistic test. Since our obtained p value is less than 0.1, we can conclude that the data set is normal
shapiro.test(diff_response)
##
## Shapiro-Wilk normality test
##
## data: diff_response
## W = 0.92204, p-value = 0.07369
Also, shown below is a histogram of the difference data
# Create a histogram
hist(diff_response,
main = "Histogram of Response Differences",
xlab = "Values",
col = "darkgreen",
border = "white")
Additionally we have NPP plots of our data to insure that the data is normal.
# # Create the Q-Q plot
qqnorm(diff_response, xlab = "Differences of Responses ", main = "Normal Probability Dist. of Differences")
qqline(diff_response, col = "darkgreen", lwd = 2)
# # Create the Q-Q plot
qqnorm(pre, xlab = "Pre-Response ", main = "Normal Probability Dist. of Pre-Responses")
qqline(pre, col = "blue", lwd = 2)
## Create the Q-Q plot
qqnorm(post, xlab = "Post-Response", main = "Normal Probability of Post-Responses")
qqline(post, col = "red", lwd = 2)
It is clear for all the NPP plots that data appears to be approximately linear.
Our box plot show a clear difference between the pre and post responses with a increased mean between the pre and post responses
# #Check for variance graphically
# # Create side-by-side box plots
boxplot(pre, post,
names = c("Pre-Responses", "Post=Responses"), # Labels under each box
main = "Comparison of Pre and Post Responses", # Main title
col = c("darkblue", "red"), # Optional: color for each box
ylab = "Score") # Label for y-axis
đ»0:đ1=đ2 , There is no difference in the mean percentage of correct classifications before and after using the rubric.
đ»đ:đ1â đ2 , There is difference in the mean percentage of correct classifications before and after using the rubric.
t.test(pre,post, alternative = c("two.sided"), paired = TRUE, var.equal = TRUE, conf.level = 0.95)
##
## Paired t-test
##
## data: pre and post
## t = -4.4108, df = 22, p-value = 0.0002212
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -36.43493 -13.13028
## sample estimates:
## mean difference
## -24.78261
Since, the p-value is less than 0.05, we reject the null hypothesis that there is no difference in mean performance. The following Interpretation expounds on this conclusionÂ
Based on the results of the paired t-test, the hypothesis that the introduction of the rubric improved the average percentage of accuracy associated with identifying the correct classification of proactive safety reports according to HRO hallmarks was confirmed (p<.05), thus providing evidence that implementing a rubric positively influences the participantâs ability to correctly classify proactive safety reports according to the HRO hallmark categories.
The consistency in the increase in post-rubric performance across the majority of participants demonstrates that the rubric served to clarify ambiguities and allow participants to make better decisions with respect to classifying a proactive safety report.
The assumption prior to analysis of the data was that structured guidance would improve performance but not eradicate all classification errors, and thus a moderate effect size was anticipated.
The effect size from the study indicates that the implemented rubric had a moderate to large impact on the mean performance of all participants. This was supported by the average difference between pre-rubric and post-rubric performance scores. Consequently, the actual effect size was greater than anticipated due to the fact that the implemented rubric provided a greater degree of structured guidance and clarity than originally considered.
According to the initial power analysis, the study would have sufficient power (>80%) with roughly 40 subjects to detect a moderate effect size. However, given the significant results, and the magnitude of the effect, the observed power is probably greater than the initially estimated power based on the power analysis. As a result, it appears that the study design and the sample size were sufficient to reliably demonstrate the effect the rubric had on improving accuracy of classification.
Statistical significance confirms that the improvement observed was unlikely due to chance, as indicated by the hypothesis testing process, while practical significance demonstrates that the improvement in classification accuracy is meaningful and has potential implications related to the areas of safety management and organizational learning.
For example, the improved consistency in classifications may lead to:
Identification of systemic safety issues more accurately
Targeting specific corrective actions
Better alignment with HRO principles
In conclusion, the results can be considered both statistically significant and practically significant. Thus, the adoption of the rubric for use in opportunity reporting systems is supported both statistically and operationally.
#IE5342 Project
# Power Analysis
library(pwr)
power_values <- seq(0.10, 0.99, by = 0.01) # power values array
effect_sizes <- array(data = 0, dim = length(power_values)) # all output d values
# For-loop
for (i in 1:length(power_values)) {
p <- power_values[i] # Extract the i-th power value
result <- pwr.t.test(
n = 23,
power = p,
sig.level = 0.05,
type = "paired",
alternative = "two.sided"
)
effect_sizes[i] <- result$d # input the effect size from the loop
}
# Plot the power curve
plot(power_values, effect_sizes,
type = "l",
xlab = "Power",
ylab = "Effect Size",
main = "Power Curve"
)
# Effect size needed for 80% power
pwr.t.test(
n = 23,
power = 0.80,
sig.level = 0.05,
type = "paired",
alternative = "two.sided")
# Here is our data
pre <- c(30,70,40,70,30,30,30,30,70,50,50,50,50,20,70,30,30,60,30,80,40,30,70)
post <- c(100,90,80,50,100,70,40,70,80,70,50,50,50,40,50,60,50,100,100,90,50,100,90)
# Descriptive Statistics
#Calculate the differences and do the summary statistics
diff_response<-post - pre
summary(diff_response)
sd(diff_response)
#Summary of Pre Data
summary(pre)
sd(pre)
#Summary of Post Data
summary(post)
sd(post)
# Use the Shapiro test to check for Normality of Differences
shapiro.test(diff_response)
# Create a histogram
hist(diff_response,
main = "Histogram of Response Differences",
xlab = "Values",
col = "darkgreen",
border = "white")
##NPP plots for Pre and Post Data
# # Create the Q-Q plot Diff
qqnorm(diff_response, xlab = "Differences of Responses ", main = "Normal Probability Dist. of Differences")
qqline(diff_response, col = "darkgreen", lwd = 2)
# # Create the Q-Q plot Pre
qqnorm(pre, xlab = "Pre-Response ", main = "Normal Probability Dist. of Pre-Responses")
qqline(pre, col = "blue", lwd = 2)
## Create the Q-Q plot Post
qqnorm(post, xlab = "Post-Response", main = "Normal Probability of Post-Responses")
qqline(post, col = "red", lwd = 2)
# #Check for variance graphically
# # Create side-by-side box plots
boxplot(pre, post,
names = c("Pre-Responses", "Post=Responses"), # Labels under each box
main = "Comparison of Pre and Post Responses", # Main title
col = c("darkblue", "red"), # Optional: color for each box
ylab = "Score") # Label for y-axis
# Paired t-test
## Testing Hypothesis using 2-sample T-test
t.test(pre,post, alternative = c("two.sided"), paired = TRUE, var.equal = TRUE, conf.level = 0.95)