Replace “Your Name” with your actual name.
In this lab, you will apply data transformation techniques, including mean-centering, calculating Z-scores, and performing non-linear transformations on various datasets. Please complete the exercises by filling in the code chunks and answering the interpretation questions. Once completed, knit this document to HTML and submit it as instructed.
Dataset: - Simulated data on the number of hours spent studying per week:
Tasks:
1. Calculate the mean of the study hours.
2. Mean-center the dataset by subtracting the mean from each
value.
3. Plot the original and mean-centered study hours on the same
graph.
4. Interpretation: Explain what the mean-centered
values tell you about the amount of time each student spent studying
compared to the average.
## [1] 22
## [1] -7 0 -4 3 -2 6 2 -3 1 4
## [1] 15 22 18 25 20 28 24 19 23 26
# Plot the original study hours
# use plot()
# use abline(h = mean(study_hours))
plot(study_hours, col = "blue", pch = 2)
abline(h = mean_study_hours)
# Plot the mean-centered study hours
# use plot()
# use abline(h = 0)
plot(centered_study_hours, col = "red", pch = 3)
abline(h = 0)
Interpretation: The advantage of mean cnetering is that we can easily compare individual study hours to the mean study hours.
Dataset: - Simulated data on students’ reaction times (in milliseconds):
Tasks:
1. Calculate the mean and standard deviation of the reaction
times.
2. Compute the Z-scores for each reaction time.
3. Plot the Z-scores on a line graph.
4. Interpretation: Discuss what a Z-score greater than
0 or less than 0 indicates about a reaction time relative to the
average.
## [1] 377
## [1] 40.56545
## [1] 350 420 310 390 370 450 380 340 400 360
## [,1]
## [1,] -0.66559107
## [2,] 1.06001542
## [3,] -1.65165193
## [4,] 0.32046978
## [5,] -0.17256065
## [6,] 1.79956105
## [7,] 0.07395456
## [8,] -0.91210629
## [9,] 0.56698499
## [10,] -0.41907586
## attr(,"scaled:center")
## [1] 377
## attr(,"scaled:scale")
## [1] 40.56545
Interpretation: For reaction time, a positive z-score indicates a slower reaction time and a negative z-score indicates a faster reaxction time. For example, if we have a z-score of 1.5 and a -0.8, the participant with the -0.8 z-score responded faster; they had a response time of 0.8 standard deviations below the mean. The individual with a z-score of 1.5 responded slower, and had a z-sxore of a response time 1.5 standard deviations aboe the mean.
Dataset: - Simulated data on annual sales figures (in thousands of dollars):
Tasks:
1. Apply a logarithmic transformation to the sales data.
2. Apply a square root transformation to the sales data.
3. Plot histograms of the original and transformed sales data.
4. Interpretation: Compare the distributions of the
original and transformed data. Explain how each transformation affects
the spread and shape of the data.
## [1] 200 450 700 1200 300 800 1100 900 400 1500
## [1] 5.298317 6.109248 6.551080 7.090077 5.703782 6.684612 7.003065 6.802395
## [9] 5.991465 7.313220
## [1] 200 450 700 1200 300 800 1100 900 400 1500
## [1] 14.14214 21.21320 26.45751 34.64102 17.32051 28.28427 33.16625 30.00000
## [9] 20.00000 38.72983
Interpretation: The original data had a mild skew.
Logarithmic Transformation: The log transformation is a more severe transformation and it did reduce the skew, but it over-corrected.
Square Root Transformation: The squre root transformation is not as extreme as the log transformation and it did a good job at normalizing the data.
Dataset: - Simulated data on daily step counts:
Tasks: 1. Mean-center the step counts.
2. Calculate the Z-scores for the step counts.
3. Plot the original, mean-centered, and Z-scores on separate
graphs.
4. Interpretation: Explain how the combination of
mean-centering and Z-scores helps in understanding the step count data
compared to looking at the original data alone.
## [1] 8000 10500 9200 11500 10000 12500 11000 9500 10200 12000
## [1] -2440 60 -1240 1060 -440 2060 560 -940 -240 1560
## [1] 8000 10500 9200 11500 10000 12500 11000 9500 10200 12000
## [,1]
## [1,] -1.78888109
## [2,] 0.04398888
## [3,] -0.90910351
## [4,] 0.77713687
## [5,] -0.32258511
## [6,] 1.51028486
## [7,] 0.41056287
## [8,] -0.68915911
## [9,] -0.17595552
## [10,] 1.14371086
## attr(,"scaled:center")
## [1] 10440
## attr(,"scaled:scale")
## [1] 1363.981
Interpretation: Mean centering is useful because we can quickly see who has higher vs lower daily steps compared to the average. An advantage of this is that it remains in the original units (steps). z-score is useful because it can help us identify outliers and we could also use it to make comparisons on other scales.
Submission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Submit the RPubs link to Canvas Assignments.