Replace “Your Name” with your actual name.

Objective:

In this lab, you will apply data transformation techniques, including mean-centering, calculating Z-scores, and performing non-linear transformations on various datasets. Please complete the exercises by filling in the code chunks and answering the interpretation questions. Once completed, knit this document to HTML and submit it as instructed.

Exercise 1: Mean-Centering

Dataset: - Simulated data on the number of hours spent studying per week:

Tasks:
1. Calculate the mean of the study hours.
2. Mean-center the dataset by subtracting the mean from each value.
3. Plot the original and mean-centered study hours on the same graph.
4. Interpretation: Explain what the mean-centered values tell you about the amount of time each student spent studying compared to the average.

study_hours <- c(15, 22, 18, 25, 20, 28, 24, 19, 23, 26)
# Calculate the mean of the study hours
mean_study_hours <- mean(study_hours)
mean_study_hours
## [1] 22
# Mean-center the study hours
mean_centered <- study_hours - mean_study_hours
mean_centered
##  [1] -7  0 -4  3 -2  6  2 -3  1  4
# Plot the original study hours
# use plot() 
# use abline(h = mean(study_hours))
plot(study_hours, type="b", col="blue", main="Original Study Hours", xlab="Index", ylab="Hours")
abline(h = mean_study_hours, col="red", lty=2)

# Plot the mean-centered study hours
# use plot() 
# use abline(h = 0)
plot(mean_centered, type="b", col="green", main="Mean-Centered Study Hours", xlab="Index", ylab="Mean-Centered Hours")
abline(h = 0, col="red", lty=2)

Interpretation:

Exercise 2: Calculating Z-Scores

Dataset: - Simulated data on students’ reaction times (in milliseconds):

Tasks:
1. Calculate the mean and standard deviation of the reaction times.
2. Compute the Z-scores for each reaction time.
3. Plot the Z-scores on a line graph.
4. Interpretation: Discuss what a Z-score greater than 0 or less than 0 indicates about a reaction time relative to the average.

reaction_times <- c(350, 420, 310, 390, 370, 450, 380, 340, 400, 360)
# Calculate the mean and standard deviation of the reaction times
mean_reaction <- mean(reaction_times)
sd_reaction <- sd(reaction_times)
mean_reaction
## [1] 377
sd_reaction
## [1] 40.56545
# Compute the Z-scores
z_scores <- (reaction_times - mean_reaction) / sd_reaction
z_scores
##  [1] -0.66559107  1.06001542 -1.65165193  0.32046978 -0.17256065  1.79956105
##  [7]  0.07395456 -0.91210629  0.56698499 -0.41907586
# Plot the Z-scores
# abline(h= 0)
plot(z_scores, type="b", col="purple", main="Z-Scores of Reaction Times", xlab="Index", ylab="Z-Score")
abline(h = 0, col="red", lty=2)

A Z-score greater than 0 means the reaction time is above average (slower), while a Z-score less than 0 means the reaction time is below average (faster).

Exercise 3: Non-Linear Transformations

Dataset: - Simulated data on annual sales figures (in thousands of dollars):

Tasks:
1. Apply a logarithmic transformation to the sales data.
2. Apply a square root transformation to the sales data.
3. Plot histograms of the original and transformed sales data.
4. Interpretation: Compare the distributions of the original and transformed data. Explain how each transformation affects the spread and shape of the data.

sales <- c(200, 450, 700, 1200, 300, 800, 1100, 900, 400, 1500)
# Apply a logarithmic transformation
log_sales <- log(sales)
log_sales
##  [1] 5.298317 6.109248 6.551080 7.090077 5.703782 6.684612 7.003065 6.802395
##  [9] 5.991465 7.313220
# Apply a square root transformation
sqrt_sales <- sqrt(sales)
sqrt_sales
##  [1] 14.14214 21.21320 26.45751 34.64102 17.32051 28.28427 33.16625 30.00000
##  [9] 20.00000 38.72983
# Plot histograms of the original and transformed sales data
#use hist()
par(mfrow=c(1,3))
hist(sales, main="Original Sales Data", col="blue", xlab="Sales", breaks=10)
hist(log_sales, main="Log-Transformed Sales", col="green", xlab="Log(Sales)", breaks=10)
hist(sqrt_sales, main="Square Root-Transformed Sales", col="purple", xlab="Sqrt(Sales)", breaks=10)

Interpretation:

  • Transformation: Reduces the impact of large values and compresses the scale, making highly skewed data more normal-like.

  • Square Root Transformation: Also reduces skewness but to a lesser extent than the logarithmic transformation. It is useful for stabilizing variance while retaining more of the original distribution’s structure.

Exercise 4: Combining Transformations

Dataset: - Simulated data on daily step counts:

Tasks: 1. Mean-center the step counts.
2. Calculate the Z-scores for the step counts.
3. Plot the original, mean-centered, and Z-scores on separate graphs.
4. Interpretation: Explain how the combination of mean-centering and Z-scores helps in understanding the step count data compared to looking at the original data alone.

step_counts <- c(8000, 10500, 9200, 11500, 10000, 12500, 11000, 9500, 10200, 12000)
# Mean-center the step counts
mean_step <- mean(step_counts, na.rm = TRUE)
step_counts_centered <- step_counts - mean_step
# Calculate the Z-scores for the step counts
sd_step <- sd(step_counts, na.rm = TRUE)
step_counts_z <- (step_counts - mean_step) / sd_step
# Plot the original
hist(step_counts, main = "Original Step Counts", xlab = "Steps", col = "blue")

# Plot the  mean-centered
# Plot the mean-centered
hist(step_counts_centered, main = "Mean-Centered Step Counts", xlab = "Steps (Centered)", col = "green")

# Plot the Z-scores
# Plot the Z-scores
hist(step_counts_z, main = "Z-Scores of Step Counts", xlab = "Z-Score", col = "red")

Interpretation: The original plot shows the raw step counts.

The mean-centered data shifts the distribution but retains its shape.

The Z-score transformation scales the data to a standard normal distribution (mean = 0, SD = 1). Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Submit the RPubs link to Canvas Assignments.