1. Read in “HW1_Data.dat”, from the course webpage.

Name the data frame “d”" Use one of the functions we’ve learned to inspect the data.

d <- read.csv("C:/Users/wvillano/downloads/HW1_Data.dat", sep = "\t", stringsAsFactors = FALSE)

head(d)
##   SubId Condition PreTest PostCon PostPerc PostProc
## 1     1         1      57      83       76       87
## 2     2         1      65      86       76       90
## 3     3         1      60      77       85       85
## 4     4         1      65      82       81       92
## 5     5         1      64      80       77       91
## 6     6         1      59      89       76       83

2. Obtain the following basic information about your data…

a. Get a summary of the data.

d.summary <- summary(d)

print(d.summary)
##      SubId         Condition      PreTest         PostCon     
##  Min.   : 1.00   Min.   :0.0   Min.   :51.00   Min.   :52.00  
##  1st Qu.:15.75   1st Qu.:0.0   1st Qu.:58.00   1st Qu.:60.00  
##  Median :30.50   Median :0.5   Median :61.00   Median :72.00  
##  Mean   :30.50   Mean   :0.5   Mean   :60.87   Mean   :70.07  
##  3rd Qu.:45.25   3rd Qu.:1.0   3rd Qu.:64.00   3rd Qu.:80.00  
##  Max.   :60.00   Max.   :1.0   Max.   :68.00   Max.   :89.00  
##     PostPerc        PostProc    
##  Min.   :40.00   Min.   :75.00  
##  1st Qu.:49.00   1st Qu.:84.00  
##  Median :64.00   Median :87.00  
##  Mean   :64.52   Mean   :87.20  
##  3rd Qu.:81.00   3rd Qu.:90.25  
##  Max.   :86.00   Max.   :99.00

b. Get descriptive statistics for the data (mean, median, standard deviation, etc.).

d.descriptives <- describe(d)

print(d.descriptives)
##           vars  n  mean    sd median trimmed   mad min max range  skew
## SubId        1 60 30.50 17.46   30.5   30.50 22.24   1  60    59  0.00
## Condition    2 60  0.50  0.50    0.5    0.50  0.74   0   1     1  0.00
## PreTest      3 60 60.87  4.06   61.0   61.02  4.45  51  68    17 -0.29
## PostCon      4 60 70.07 10.89   72.0   70.00 14.83  52  89    37  0.03
## PostPerc     5 60 64.52 16.03   64.0   64.48 22.98  40  86    46  0.01
## PostProc     6 60 87.20  4.89   87.0   87.17  4.45  75  99    24  0.00
##           kurtosis   se
## SubId        -1.26 2.25
## Condition    -2.03 0.07
## PreTest      -0.63 0.52
## PostCon      -1.56 1.41
## PostPerc     -1.89 2.07
## PostProc     -0.22 0.63

c. Get descriptive statistics for just the Pretest variable.

# # method 1
# describe(d$PreTest)

# method 2
d.descriptives[which(row.names(d.descriptives) == "PreTest"),]
##         vars  n  mean   sd median trimmed  mad min max range  skew
## PreTest    3 60 60.87 4.06     61   61.02 4.45  51  68    17 -0.29
##         kurtosis   se
## PreTest    -0.63 0.52

d. Generate a histogram for the Pretest variable and give the graph a good main title (don’t export the graph, just generate it)

# # method 1
# hist(d$PreTest)

# method 2
ggplot(d, aes(x = PreTest)) + 
  geom_histogram(binwidth = 2, color = "black", fill = "white") + 
  ylab("Frequency") + 
  ggtitle("Distribution of PreTest Variable") + 
  theme_classic()

3. Mean centering PreTest

a. Create a new variable that is the mean-centered score for pretest knowledge. Give this variable a good name.

d$PreTest_meanCentered <- d$PreTest - mean(d$PreTest)

b. Check that your new variable is indeed mean-centered (include in your answer the code you could use).

# if this variable is properly mean-centered, ...
# the mean of this variable should approximate 0 (in this case, it is very close to 0)

if (round(mean(d$PreTest_meanCentered),2) == 0) {
  print("Variable is mean-centered")
} else {
  print("Variable is not mean-centered")
}
## [1] "Variable is mean-centered"

4. Average the posttest measures to create a single global measure of overall learning

a. Create a variable that is the average of the three posttest variables (ignore missing data). Name this variable “d$MeanPostTest”"

d$MeanPostTest <- rowMeans(d[,grep("Post",colnames(d))], na.rm = TRUE)

b. What are the mean and standard deviation of this average post-test score?

mean(d$MeanPostTest) # 73.92778
## [1] 73.92778
sd(d$MeanPostTest) # 9.73705
## [1] 9.73705

c. Generate descriptive statistics of this average post-test score for each of the experimental conditions.

d$Condition[which(d$Condition == 0)] <- "Teach"
d$Condition[which(d$Condition == 1)] <- "Explore"

for (i in unique(d$Condition)) {
  print(paste("Mean post-test score descriptive statistics for experimental condition:",i))
  print(describe(d$MeanPostTest[which(d$Condition == i)]))
}
## [1] "Mean post-test score descriptive statistics for experimental condition: Explore"
##    vars  n  mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 30 83.36 1.93     83   83.35 2.22  80  87     7 0.07    -1.07 0.35
## [1] "Mean post-test score descriptive statistics for experimental condition: Teach"
##    vars  n mean   sd median trimmed  mad   min max range skew kurtosis
## X1    1 30 64.5 2.29  64.83   64.46 2.47 60.67  71 10.33 0.41     0.25
##      se
## X1 0.42

5. Standardizing PostCon score

a. Create a standardized variable for the conceptual understanding measure.

d$PostCon_standardized <- (d$PostCon - mean(d$PostCon)) / sd(d$PostCon)

b. How can you check your work to make sure this is the standardized score? Do it.

# if this variable is properly standardized, the mean of this variable should approximate 0, ...
# and the standard deviation should approximate 1. 

if (round(mean(d$PostCon_standardized),2) == 0 & round(sd(d$PostCon_standardized),2) == 1) {
  print("Variable is standardized")
} else {
  print("Variable is not not standardized")
}
## [1] "Variable is standardized"

c. You have now created several new variables. Write out a command that shows all the variables

print(colnames(d))
## [1] "SubId"                "Condition"            "PreTest"             
## [4] "PostCon"              "PostPerc"             "PostProc"            
## [7] "PreTest_meanCentered" "MeanPostTest"         "PostCon_standardized"

d. Generate descriptive stats for all variables in the d data frame.

describe(d)
## Warning in describe(d): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
##                      vars  n  mean    sd median trimmed   mad   min   max
## SubId                   1 60 30.50 17.46  30.50   30.50 22.24  1.00 60.00
## Condition*              2 60   NaN    NA     NA     NaN    NA   Inf  -Inf
## PreTest                 3 60 60.87  4.06  61.00   61.02  4.45 51.00 68.00
## PostCon                 4 60 70.07 10.89  72.00   70.00 14.83 52.00 89.00
## PostPerc                5 60 64.52 16.03  64.00   64.48 22.98 40.00 86.00
## PostProc                6 60 87.20  4.89  87.00   87.17  4.45 75.00 99.00
## PreTest_meanCentered    7 60  0.00  4.06   0.13    0.15  4.45 -9.87  7.13
## MeanPostTest            8 60 73.93  9.74  75.50   73.97 14.08 60.67 87.00
## PostCon_standardized    9 60  0.00  1.00   0.18   -0.01  1.36 -1.66  1.74
##                      range  skew kurtosis   se
## SubId                59.00  0.00    -1.26 2.25
## Condition*            -Inf    NA       NA   NA
## PreTest              17.00 -0.29    -0.63 0.52
## PostCon              37.00  0.03    -1.56 1.41
## PostPerc             46.00  0.01    -1.89 2.07
## PostProc             24.00  0.00    -0.22 0.63
## PreTest_meanCentered 17.00 -0.29    -0.63 0.52
## MeanPostTest         26.33 -0.02    -1.87 1.26
## PostCon_standardized  3.40  0.03    -1.56 0.13

6. Create a table of descriptive stats

DescriptiveTable <- as.data.frame(
  rbind(
  describe(d$PreTest),
  describe(d$PostCon),
  describe(d$PostPerc),
  describe(d$PostProc),
  describe(d$MeanPostTest)
))

rownames(DescriptiveTable) <- NULL
DescriptiveTable$vars <- NULL
DescriptiveTable$Variable <- c("Prior Knowledge Score (Pre-Test)",
                               "Conceptual Understanding Score (Post-Test)",
                               "Perceptual Problem Encoding Score (Post-Test)",
                               "Procedural Competence Score (Post-Test)",
                               "Global Score (Post-Test)")

DescriptiveTable.truncated <- DescriptiveTable[,c("Variable","mean","sd","min","max","skew")]

DescriptiveTable.truncated[,2:6] <- round(DescriptiveTable.truncated[,2:6],2)

# write.csv(DescriptiveTable.truncated, "C:/Users/User/Documents/HW1_descriptiveTable.csv")

print(DescriptiveTable.truncated)
##                                        Variable  mean    sd   min max
## 1              Prior Knowledge Score (Pre-Test) 60.87  4.06 51.00  68
## 2    Conceptual Understanding Score (Post-Test) 70.07 10.89 52.00  89
## 3 Perceptual Problem Encoding Score (Post-Test) 64.52 16.03 40.00  86
## 4       Procedural Competence Score (Post-Test) 87.20  4.89 75.00  99
## 5                      Global Score (Post-Test) 73.93  9.74 60.67  87
##    skew
## 1 -0.29
## 2  0.03
## 3  0.01
## 4  0.00
## 5 -0.02

7. Make a scatter plot with raw pretest score as x-axis, mean posttest score as y-axis.

pre_post_ggScatter <- ggplot(d, aes(x = PreTest, y = MeanPostTest)) + 
  geom_point() + 
  geom_abline() + 
  xlab("Pre-Test Score") + 
  ylab("Mean Post-Test Score") + 
  ggtitle("Pre-Test Score vs Mean Post-Test Score")

print(pre_post_ggScatter)

# ggsave("C:/Users/User/Documents/pre_post_ggScatter.png", plot = last_plot())

8. Generate the average PostTest score for each Condition. What do you observe (answer in one or two sentences)?

expConditions <- c()
expConditionMeans <- c()

for (i in unique(d$Condition)) {
  expConditions <- c(expConditions, i)
  expConditionMeans <- c(expConditionMeans,mean(d$MeanPostTest[which(d$Condition == i)]))
} 

expCondition_postTest_means.df <- as.data.frame(cbind(expConditions,expConditionMeans))

print(expCondition_postTest_means.df)
##   expConditions expConditionMeans
## 1       Explore  83.3555555555556
## 2         Teach              64.5

Answer: Average post-test scores are higher for participants who were placed into the “Explore” condition. Therefore, one would intuit that participants who engage in active problem-solving for this particular physics concept perform better on post-tests than students who are taught lessons about the concept.