1. Read in “HW1_Data.dat”, from the course webpage.
Name the data frame “d”" Use one of the functions we’ve learned to inspect the data.
d <- read.csv("C:/Users/wvillano/downloads/HW1_Data.dat", sep = "\t", stringsAsFactors = FALSE)
head(d)
## SubId Condition PreTest PostCon PostPerc PostProc
## 1 1 1 57 83 76 87
## 2 2 1 65 86 76 90
## 3 3 1 60 77 85 85
## 4 4 1 65 82 81 92
## 5 5 1 64 80 77 91
## 6 6 1 59 89 76 83
b. Check that your new variable is indeed mean-centered (include in your answer the code you could use).
# if this variable is properly mean-centered, ...
# the mean of this variable should approximate 0 (in this case, it is very close to 0)
if (round(mean(d$PreTest_meanCentered),2) == 0) {
print("Variable is mean-centered")
} else {
print("Variable is not mean-centered")
}
## [1] "Variable is mean-centered"
4. Average the posttest measures to create a single global measure of overall learning
a. Create a variable that is the average of the three posttest variables (ignore missing data). Name this variable “d$MeanPostTest”"
d$MeanPostTest <- rowMeans(d[,grep("Post",colnames(d))], na.rm = TRUE)
b. What are the mean and standard deviation of this average post-test score?
mean(d$MeanPostTest) # 73.92778
## [1] 73.92778
sd(d$MeanPostTest) # 9.73705
## [1] 9.73705
c. Generate descriptive statistics of this average post-test score for each of the experimental conditions.
d$Condition[which(d$Condition == 0)] <- "Teach"
d$Condition[which(d$Condition == 1)] <- "Explore"
for (i in unique(d$Condition)) {
print(paste("Mean post-test score descriptive statistics for experimental condition:",i))
print(describe(d$MeanPostTest[which(d$Condition == i)]))
}
## [1] "Mean post-test score descriptive statistics for experimental condition: Explore"
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 30 83.36 1.93 83 83.35 2.22 80 87 7 0.07 -1.07 0.35
## [1] "Mean post-test score descriptive statistics for experimental condition: Teach"
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 30 64.5 2.29 64.83 64.46 2.47 60.67 71 10.33 0.41 0.25
## se
## X1 0.42
5. Standardizing PostCon score
a. Create a standardized variable for the conceptual understanding measure.
d$PostCon_standardized <- (d$PostCon - mean(d$PostCon)) / sd(d$PostCon)
b. How can you check your work to make sure this is the standardized score? Do it.
# if this variable is properly standardized, the mean of this variable should approximate 0, ...
# and the standard deviation should approximate 1.
if (round(mean(d$PostCon_standardized),2) == 0 & round(sd(d$PostCon_standardized),2) == 1) {
print("Variable is standardized")
} else {
print("Variable is not not standardized")
}
## [1] "Variable is standardized"
c. You have now created several new variables. Write out a command that shows all the variables
print(colnames(d))
## [1] "SubId" "Condition" "PreTest"
## [4] "PostCon" "PostPerc" "PostProc"
## [7] "PreTest_meanCentered" "MeanPostTest" "PostCon_standardized"
d. Generate descriptive stats for all variables in the d data frame.
describe(d)
## Warning in describe(d): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## vars n mean sd median trimmed mad min max
## SubId 1 60 30.50 17.46 30.50 30.50 22.24 1.00 60.00
## Condition* 2 60 NaN NA NA NaN NA Inf -Inf
## PreTest 3 60 60.87 4.06 61.00 61.02 4.45 51.00 68.00
## PostCon 4 60 70.07 10.89 72.00 70.00 14.83 52.00 89.00
## PostPerc 5 60 64.52 16.03 64.00 64.48 22.98 40.00 86.00
## PostProc 6 60 87.20 4.89 87.00 87.17 4.45 75.00 99.00
## PreTest_meanCentered 7 60 0.00 4.06 0.13 0.15 4.45 -9.87 7.13
## MeanPostTest 8 60 73.93 9.74 75.50 73.97 14.08 60.67 87.00
## PostCon_standardized 9 60 0.00 1.00 0.18 -0.01 1.36 -1.66 1.74
## range skew kurtosis se
## SubId 59.00 0.00 -1.26 2.25
## Condition* -Inf NA NA NA
## PreTest 17.00 -0.29 -0.63 0.52
## PostCon 37.00 0.03 -1.56 1.41
## PostPerc 46.00 0.01 -1.89 2.07
## PostProc 24.00 0.00 -0.22 0.63
## PreTest_meanCentered 17.00 -0.29 -0.63 0.52
## MeanPostTest 26.33 -0.02 -1.87 1.26
## PostCon_standardized 3.40 0.03 -1.56 0.13
6. Create a table of descriptive stats
DescriptiveTable <- as.data.frame(
rbind(
describe(d$PreTest),
describe(d$PostCon),
describe(d$PostPerc),
describe(d$PostProc),
describe(d$MeanPostTest)
))
rownames(DescriptiveTable) <- NULL
DescriptiveTable$vars <- NULL
DescriptiveTable$Variable <- c("Prior Knowledge Score (Pre-Test)",
"Conceptual Understanding Score (Post-Test)",
"Perceptual Problem Encoding Score (Post-Test)",
"Procedural Competence Score (Post-Test)",
"Global Score (Post-Test)")
DescriptiveTable.truncated <- DescriptiveTable[,c("Variable","mean","sd","min","max","skew")]
DescriptiveTable.truncated[,2:6] <- round(DescriptiveTable.truncated[,2:6],2)
# write.csv(DescriptiveTable.truncated, "C:/Users/User/Documents/HW1_descriptiveTable.csv")
print(DescriptiveTable.truncated)
## Variable mean sd min max
## 1 Prior Knowledge Score (Pre-Test) 60.87 4.06 51.00 68
## 2 Conceptual Understanding Score (Post-Test) 70.07 10.89 52.00 89
## 3 Perceptual Problem Encoding Score (Post-Test) 64.52 16.03 40.00 86
## 4 Procedural Competence Score (Post-Test) 87.20 4.89 75.00 99
## 5 Global Score (Post-Test) 73.93 9.74 60.67 87
## skew
## 1 -0.29
## 2 0.03
## 3 0.01
## 4 0.00
## 5 -0.02
7. Make a scatter plot with raw pretest score as x-axis, mean posttest score as y-axis.
pre_post_ggScatter <- ggplot(d, aes(x = PreTest, y = MeanPostTest)) +
geom_point() +
geom_abline() +
xlab("Pre-Test Score") +
ylab("Mean Post-Test Score") +
ggtitle("Pre-Test Score vs Mean Post-Test Score")
print(pre_post_ggScatter)

# ggsave("C:/Users/User/Documents/pre_post_ggScatter.png", plot = last_plot())
8. Generate the average PostTest score for each Condition. What do you observe (answer in one or two sentences)?
expConditions <- c()
expConditionMeans <- c()
for (i in unique(d$Condition)) {
expConditions <- c(expConditions, i)
expConditionMeans <- c(expConditionMeans,mean(d$MeanPostTest[which(d$Condition == i)]))
}
expCondition_postTest_means.df <- as.data.frame(cbind(expConditions,expConditionMeans))
print(expCondition_postTest_means.df)
## expConditions expConditionMeans
## 1 Explore 83.3555555555556
## 2 Teach 64.5
Answer: Average post-test scores are higher for participants who were placed into the “Explore” condition. Therefore, one would intuit that participants who engage in active problem-solving for this particular physics concept perform better on post-tests than students who are taught lessons about the concept.