Replace “Your Name” with your actual name.
This lab assignment aims to reinforce your understanding of descriptive statistics, calculating probabilities, and identifying sample spaces. You will apply these concepts to practical problems using R.
Task: Given a dataset of reaction times, calculate the mean, median, mode, variance, standard deviation, and identify any outliers.
Dataset: Use the following reaction times (in
milliseconds) for the analysis:
c(250, 340, 295, 305, 285, 330, 365, 300, 310, 290, 295, 285, 360, 370, 275, 325, 335, 350, 280, 290)
.
Instructions:
Calculate the mean, median, and mode.
Calculate the variance and standard deviation.
Identify any outliers using the IQR method.
Write the R code to perform these calculations and interpret the results.
# Sample data vector
reaction_times <- c(250, 340, 295, 305, 285, 330, 365, 300, 310, 290, 295, 285, 360, 370, 275, 325, 335, 350, 280, 290)
## [1] 311.75
## [1] 302.5
# Calculate mode
get_mode <- function(x) {
uniqv <- unique(x)
uniqv[which.max(tabulate(match(x, uniqv)))]
}
get_mode(reaction_times)
## [1] 295
## [1] 1113.882
## [1] 33.37486
The variance is 1113.89. This value is n squared units and is difficult to interpret as is. Taking the square root of it gives us the standard deviation. The standard deviation is 33.37. This means that the majority of reaction times are 33ms below or above the mean (312).The majority of reaction times are between 279ms and 345ms.
# Adding an outlier
reaction_times_outlier <- c(250, 340, 295, 305, 285, 330, 365, 300, 310, 290, 295, 285, 360, 370, 275, 325, 335, 350, 280, 290, 900)
# Calculate quartiles and other statistics
Q1 <- quantile(reaction_times_outlier, 0.25)
print(paste("Quantile 1:", Q1))
## [1] "Quantile 1: 290"
## [1] "Quantile 3: 340"
## [1] "Quantile 3 - Quantile 1: 50"
## [1] "Median: 305"
These formulas establish the boundaries for identifying outliers in a dataset using what statisticians call the “1.5 × IQR rule.” Any data points that fall below the lower bound or above the upper bound are considered outliers.
Starting with the Box: The box in a boxplot represents the middle 50% of your data (from Q1 to Q3).
Extending Beyond the Box: We want to determine how far beyond the box a value can go before we consider it unusual.
The Multiplier (1.5): The factor 1.5 is a statistical convention that creates what we call “whiskers” on the boxplot.
These whiskers extend 1.5 times the IQR from each quartile. This creates a reasonable range for “normal” data.
The choice of 1.5 as the multiplier is based on statistical theory and practical experience:
Statistical Properties: Under a normal distribution, approximately 99.3% of the data falls within these bounds. This means only about 0.7% of values would be flagged as outliers in normally distributed data.
Balance: The value 1.5 provides a good balance between:
Being too sensitive (flagging too many points as outliers)
Being too lenient (missing actual outliers)
Historical Precedent: John Tukey, who developed the boxplot, found through empirical research that 1.5 worked well across many different types of data.
lower_bound <- Q1 - 1.5 * IQR_value
upper_bound <- Q3 + 1.5 * IQR_value
print(paste("Lower Bound:", lower_bound))
## [1] "Lower Bound: 215"
## [1] "Upper Bound: 415"
# Create a boxplot
boxplot(reaction_times_outlier,
main = "Reaction Times with Labeled Quantiles",
ylab = "Time (ms)",
ylim = c(min(reaction_times_outlier, lower_bound) - 50, max(reaction_times_outlier, upper_bound) + 50),
outline = TRUE,
col = "lightblue")
# Add horizontal dashed lines for Q1 and Q3 to connect these lines across the box
abline(h = Q1, col = "darkblue", lty = 2)
abline(h = Q3, col = "darkblue", lty = 2)
# Add annotations for quantiles
text(x = 1.3, y = Q1, labels = paste("Q1 =", Q1), pos = 4, col = "darkblue")
text(x = 1.2, y = median_val, labels = paste("Median =", median_val), pos = 4, col = "darkgreen")
text(x = 1.3, y = Q3, labels = paste("Q3 =", Q3), pos = 4, col = "darkblue")
text(x = 1.2, y = lower_bound, labels = paste("Lower bound =", round(lower_bound, 1)), pos = 4, col = "red")
text(x = 1.2, y = upper_bound, labels = paste("Upper bound =", round(upper_bound, 1)), pos = 4, col = "red")
# Add a line for the IQR within the box
segments(0.8, Q1, 0.8, Q3, col = "purple", lwd = 3)
text(x = 0.6, y = (Q1 + Q3) / 2, labels = paste("IQR =", IQR_value), pos = 2, col = "purple")
# Highlight outliers
outliers <- reaction_times_outlier[reaction_times_outlier > upper_bound | reaction_times_outlier < lower_bound]
if(length(outliers) > 0) {
text(x = 1, y = outliers, labels = paste("Outlier"), pos = 4, col = "red", cex = 0.9)
}
# Connect the annotation texts for Q1 and Q3 with the boxplot using line segments
# For Q1
segments(x0 = 1.05, y0 = Q1, x1 = 1, y1 = Q1, col = "darkblue", lwd = 2, lty = 2)
# For Q3
segments(x0 = 1.05, y0 = Q3, x1 = 1, y1 = Q3, col = "darkblue", lwd = 2, lty = 2)
## [1] 900
The Quantiles are useful for identifying outliers because they split our data up into 4 “chunks” (quantilies). By using these quantiles and probability theory. We can calculate our upper and lower bounds. This lets us identify values that are either extremely high or extremely low. In this example, the outlier was 900.
Task: Assume reaction times in a cognitive task follow a normal distribution with a mean of 300 ms and a standard deviation of 50 ms. Calculate the probability that a randomly selected individual has a reaction time:
Less than 250 ms.
Between 250 ms and 350 ms.
More than 400 ms.
Instructions:
Use the pnorm
function in R to calculate these
probabilities.
Write the R code for these calculations and explain what each probability means.
## [1] 0.1586553
# Probability of a reaction time between 250 ms and 350 ms
pnorm(350, mean, sd) - pnorm(250, mean, sd)
## [1] 0.6826895
## [1] 0.02275013
The probability of a reaction time less than 250ms is 16% The probability of a reaction time between 250ms and 350ms is 68%. The probability of a reaction time of 400ms or greater is 2%.
Task: You conducted a small-scale study with 8 participants measuring their anxiety levels on a scale of 1 to 10. Calculate the probability of a t-score being less than 2 and between -1 and 1 using the t-distribution.
Instructions:
Define the degrees of freedom for your study.
Use the pt
function in R to calculate these
probabilities.
Write the R code for these calculations and discuss how the results might differ if a normal distribution were assumed.
## [1] 0.9571903
## [1] 0.6493833
The probability of a value less than 2 is 96%. The probability of a value between 1 and -1 is 65%.
Ensure to knit your document to PDF format, checking that all content is correctly displayed before submission. Submit this PDF to Canvas Assignments.