Replace “Your Name” with your actual name.
In this lab, you will apply data transformation techniques, including mean-centering, calculating Z-scores, and performing non-linear transformations on various datasets. Please complete the exercises by filling in the code chunks and answering the interpretation questions. Once completed, knit this document to HTML and submit it as instructed.
Dataset: - Simulated data on the number of hours spent studying per week:
Tasks:
1. Calculate the mean of the study hours.
2. Mean-center the dataset by subtracting the mean from each
value.
3. Plot the original and mean-centered study hours on the same
graph.
4. Interpretation: Explain what the mean-centered
values tell you about the amount of time each student spent studying
compared to the average.
## [1] 22
## [1] -7 0 -4 3 -2 6 2 -3 1 4
# Plot the original study hours
# use plot()
# use abline(h = mean(study_hours))
plot(study_hours, type="b", col="blue", main="Original Study Hours", xlab="Index", ylab="Hours")
abline(h = mean_study_hours, col="red", lty=2)
# Plot the mean-centered study hours
# use plot()
# use abline(h = 0)
plot(mean_centered, type="b", col="green", main="Mean-Centered Study Hours", xlab="Index", ylab="Mean-Centered Hours")
abline(h = 0, col="red", lty=2)
Interpretation:
Dataset: - Simulated data on students’ reaction times (in milliseconds):
Tasks:
1. Calculate the mean and standard deviation of the reaction
times.
2. Compute the Z-scores for each reaction time.
3. Plot the Z-scores on a line graph.
4. Interpretation: Discuss what a Z-score greater than
0 or less than 0 indicates about a reaction time relative to the
average.
# Calculate the mean and standard deviation of the reaction times
mean_reaction <- mean(reaction_times)
sd_reaction <- sd(reaction_times)
mean_reaction
## [1] 377
## [1] 40.56545
## [1] -0.66559107 1.06001542 -1.65165193 0.32046978 -0.17256065 1.79956105
## [7] 0.07395456 -0.91210629 0.56698499 -0.41907586
# Plot the Z-scores
# abline(h= 0)
plot(z_scores, type="b", col="purple", main="Z-Scores of Reaction Times", xlab="Index", ylab="Z-Score")
abline(h = 0, col="red", lty=2)
A Z-score greater than 0 means the reaction time is above average (slower), while a Z-score less than 0 means the reaction time is below average (faster).
Dataset: - Simulated data on annual sales figures (in thousands of dollars):
Tasks:
1. Apply a logarithmic transformation to the sales data.
2. Apply a square root transformation to the sales data.
3. Plot histograms of the original and transformed sales data.
4. Interpretation: Compare the distributions of the
original and transformed data. Explain how each transformation affects
the spread and shape of the data.
## [1] 5.298317 6.109248 6.551080 7.090077 5.703782 6.684612 7.003065 6.802395
## [9] 5.991465 7.313220
## [1] 14.14214 21.21320 26.45751 34.64102 17.32051 28.28427 33.16625 30.00000
## [9] 20.00000 38.72983
# Plot histograms of the original and transformed sales data
#use hist()
par(mfrow=c(1,3))
hist(sales, main="Original Sales Data", col="blue", xlab="Sales", breaks=10)
hist(log_sales, main="Log-Transformed Sales", col="green", xlab="Log(Sales)", breaks=10)
hist(sqrt_sales, main="Square Root-Transformed Sales", col="purple", xlab="Sqrt(Sales)", breaks=10)
Interpretation:
Transformation: Reduces the impact of large values and compresses the scale, making highly skewed data more normal-like.
Square Root Transformation: Also reduces skewness but to a lesser extent than the logarithmic transformation. It is useful for stabilizing variance while retaining more of the original distribution’s structure.
Dataset: - Simulated data on daily step counts:
Tasks: 1. Mean-center the step counts.
2. Calculate the Z-scores for the step counts.
3. Plot the original, mean-centered, and Z-scores on separate
graphs.
4. Interpretation: Explain how the combination of
mean-centering and Z-scores helps in understanding the step count data
compared to looking at the original data alone.
# Mean-center the step counts
mean_step <- mean(step_counts, na.rm = TRUE)
step_counts_centered <- step_counts - mean_step
# Calculate the Z-scores for the step counts
sd_step <- sd(step_counts, na.rm = TRUE)
step_counts_z <- (step_counts - mean_step) / sd_step
# Plot the mean-centered
# Plot the mean-centered
hist(step_counts_centered, main = "Mean-Centered Step Counts", xlab = "Steps (Centered)", col = "green")
# Plot the Z-scores
# Plot the Z-scores
hist(step_counts_z, main = "Z-Scores of Step Counts", xlab = "Z-Score", col = "red")
Interpretation: The original plot shows the raw step counts.
The mean-centered data shifts the distribution but retains its shape.
The Z-score transformation scales the data to a standard normal distribution (mean = 0, SD = 1). Submission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Submit the RPubs link to Canvas Assignments.