1. Read in “HW1_Data.dat”, from the course webpage.

Name the data frame “d”" Use one of the functions we’ve learned to inspect the data.

d <- read.csv("C:/Users/wvillano/downloads/HW1_Data.dat", sep = "\t", stringsAsFactors = FALSE)

head(d)

##   SubId Condition PreTest PostCon PostPerc PostProc
## 1     1         1      57      83       76       87
## 2     2         1      65      86       76       90
## 3     3         1      60      77       85       85
## 4     4         1      65      82       81       92
## 5     5         1      64      80       77       91
## 6     6         1      59      89       76       83

2. Obtain the following basic information about your data…

a. Get a summary of the data.

d.summary <- summary(d)

print(d.summary)

##      SubId         Condition      PreTest         PostCon     
##  Min.   : 1.00   Min.   :0.0   Min.   :51.00   Min.   :52.00  
##  1st Qu.:15.75   1st Qu.:0.0   1st Qu.:58.00   1st Qu.:60.00  
##  Median :30.50   Median :0.5   Median :61.00   Median :72.00  
##  Mean   :30.50   Mean   :0.5   Mean   :60.87   Mean   :70.07  
##  3rd Qu.:45.25   3rd Qu.:1.0   3rd Qu.:64.00   3rd Qu.:80.00  
##  Max.   :60.00   Max.   :1.0   Max.   :68.00   Max.   :89.00  
##     PostPerc        PostProc    
##  Min.   :40.00   Min.   :75.00  
##  1st Qu.:49.00   1st Qu.:84.00  
##  Median :64.00   Median :87.00  
##  Mean   :64.52   Mean   :87.20  
##  3rd Qu.:81.00   3rd Qu.:90.25  
##  Max.   :86.00   Max.   :99.00

b. Get descriptive statistics for the data (mean, median, standard deviation, etc.).

d.descriptives <- describe(d)

print(d.descriptives)

##           vars  n  mean    sd median trimmed   mad min max range  skew
## SubId        1 60 30.50 17.46   30.5   30.50 22.24   1  60    59  0.00
## Condition    2 60  0.50  0.50    0.5    0.50  0.74   0   1     1  0.00
## PreTest      3 60 60.87  4.06   61.0   61.02  4.45  51  68    17 -0.29
## PostCon      4 60 70.07 10.89   72.0   70.00 14.83  52  89    37  0.03
## PostPerc     5 60 64.52 16.03   64.0   64.48 22.98  40  86    46  0.01
## PostProc     6 60 87.20  4.89   87.0   87.17  4.45  75  99    24  0.00
##           kurtosis   se
## SubId        -1.26 2.25
## Condition    -2.03 0.07
## PreTest      -0.63 0.52
## PostCon      -1.56 1.41
## PostPerc     -1.89 2.07
## PostProc     -0.22 0.63

c. Get descriptive statistics for just the Pretest variable.

# # method 1
# describe(d$PreTest)

# method 2
d.descriptives[which(row.names(d.descriptives) == "PreTest"),]

##         vars  n  mean   sd median trimmed  mad min max range  skew
## PreTest    3 60 60.87 4.06     61   61.02 4.45  51  68    17 -0.29
##         kurtosis   se
## PreTest    -0.63 0.52

d. Generate a histogram for the Pretest variable and give the graph a good main title (don’t export the graph, just generate it)

# # method 1
# hist(d$PreTest)

# method 2
ggplot(d, aes(x = PreTest)) + 
  geom_histogram(binwidth = 2, color = "black", fill = "white") + 
  ylab("Frequency") + 
  ggtitle("Distribution of PreTest Variable") + 
  theme_classic()

3. Mean centering PreTest

a. Create a new variable that is the mean-centered score for pretest knowledge. Give this variable a good name.

d$PreTest_meanCentered <- d$PreTest - mean(d$PreTest)

b. Check that your new variable is indeed mean-centered (include in your answer the code you could use).

# if this variable is properly mean-centered, ...
# the mean of this variable should approximate 0 (in this case, it is very close to 0)

if (round(mean(d$PreTest_meanCentered),2) == 0) {
  print("Variable is mean-centered")
} else {
  print("Variable is not mean-centered")
}

## [1] "Variable is mean-centered"

4. Average the posttest measures to create a single global measure of overall learning

a. Create a variable that is the average of the three posttest variables (ignore missing data). Name this variable “d$MeanPostTest”"

d$MeanPostTest <- rowMeans(d[,grep("Post",colnames(d))], na.rm = TRUE)

b. What are the mean and standard deviation of this average post-test score?

mean(d$MeanPostTest) # 73.92778

## [1] 73.92778

sd(d$MeanPostTest) # 9.73705

## [1] 9.73705

c. Generate descriptive statistics of this average post-test score for each of the experimental conditions.

d$Condition[which(d$Condition == 0)] <- "Teach"
d$Condition[which(d$Condition == 1)] <- "Explore"

for (i in unique(d$Condition)) {
  print(paste("Mean post-test score descriptive statistics for experimental condition:",i))
  print(describe(d$MeanPostTest[which(d$Condition == i)]))
}

## [1] "Mean post-test score descriptive statistics for experimental condition: Explore"
##    vars  n  mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 30 83.36 1.93     83   83.35 2.22  80  87     7 0.07    -1.07 0.35
## [1] "Mean post-test score descriptive statistics for experimental condition: Teach"
##    vars  n mean   sd median trimmed  mad   min max range skew kurtosis
## X1    1 30 64.5 2.29  64.83   64.46 2.47 60.67  71 10.33 0.41     0.25
##      se
## X1 0.42

5. Standardizing PostCon score

a. Create a standardized variable for the conceptual understanding measure.

d$PostCon_standardized <- (d$PostCon - mean(d$PostCon)) / sd(d$PostCon)

b. How can you check your work to make sure this is the standardized score? Do it.

# if this variable is properly standardized, the mean of this variable should approximate 0, ...
# and the standard deviation should approximate 1. 

if (round(mean(d$PostCon_standardized),2) == 0 & round(sd(d$PostCon_standardized),2) == 1) {
  print("Variable is standardized")
} else {
  print("Variable is not not standardized")
}

## [1] "Variable is standardized"

c. You have now created several new variables. Write out a command that shows all the variables

print(colnames(d))

## [1] "SubId"                "Condition"            "PreTest"             
## [4] "PostCon"              "PostPerc"             "PostProc"            
## [7] "PreTest_meanCentered" "MeanPostTest"         "PostCon_standardized"

d. Generate descriptive stats for all variables in the d data frame.

describe(d)

## Warning in describe(d): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

##                      vars  n  mean    sd median trimmed   mad   min   max
## SubId                   1 60 30.50 17.46  30.50   30.50 22.24  1.00 60.00
## Condition*              2 60   NaN    NA     NA     NaN    NA   Inf  -Inf
## PreTest                 3 60 60.87  4.06  61.00   61.02  4.45 51.00 68.00
## PostCon                 4 60 70.07 10.89  72.00   70.00 14.83 52.00 89.00
## PostPerc                5 60 64.52 16.03  64.00   64.48 22.98 40.00 86.00
## PostProc                6 60 87.20  4.89  87.00   87.17  4.45 75.00 99.00
## PreTest_meanCentered    7 60  0.00  4.06   0.13    0.15  4.45 -9.87  7.13
## MeanPostTest            8 60 73.93  9.74  75.50   73.97 14.08 60.67 87.00
## PostCon_standardized    9 60  0.00  1.00   0.18   -0.01  1.36 -1.66  1.74
##                      range  skew kurtosis   se
## SubId                59.00  0.00    -1.26 2.25
## Condition*            -Inf    NA       NA   NA
## PreTest              17.00 -0.29    -0.63 0.52
## PostCon              37.00  0.03    -1.56 1.41
## PostPerc             46.00  0.01    -1.89 2.07
## PostProc             24.00  0.00    -0.22 0.63
## PreTest_meanCentered 17.00 -0.29    -0.63 0.52
## MeanPostTest         26.33 -0.02    -1.87 1.26
## PostCon_standardized  3.40  0.03    -1.56 0.13

6. Create a table of descriptive stats

DescriptiveTable <- as.data.frame(
  rbind(
  describe(d$PreTest),
  describe(d$PostCon),
  describe(d$PostPerc),
  describe(d$PostProc),
  describe(d$MeanPostTest)
))

rownames(DescriptiveTable) <- NULL
DescriptiveTable$vars <- NULL
DescriptiveTable$Variable <- c("Prior Knowledge Score (Pre-Test)",
                               "Conceptual Understanding Score (Post-Test)",
                               "Perceptual Problem Encoding Score (Post-Test)",
                               "Procedural Competence Score (Post-Test)",
                               "Global Score (Post-Test)")

DescriptiveTable.truncated <- DescriptiveTable[,c("Variable","mean","sd","min","max","skew")]

DescriptiveTable.truncated[,2:6] <- round(DescriptiveTable.truncated[,2:6],2)

# write.csv(DescriptiveTable.truncated, "C:/Users/User/Documents/HW1_descriptiveTable.csv")

print(DescriptiveTable.truncated)

##                                        Variable  mean    sd   min max
## 1              Prior Knowledge Score (Pre-Test) 60.87  4.06 51.00  68
## 2    Conceptual Understanding Score (Post-Test) 70.07 10.89 52.00  89
## 3 Perceptual Problem Encoding Score (Post-Test) 64.52 16.03 40.00  86
## 4       Procedural Competence Score (Post-Test) 87.20  4.89 75.00  99
## 5                      Global Score (Post-Test) 73.93  9.74 60.67  87
##    skew
## 1 -0.29
## 2  0.03
## 3  0.01
## 4  0.00
## 5 -0.02

7. Make a scatter plot with raw pretest score as x-axis, mean posttest score as y-axis.

pre_post_ggScatter <- ggplot(d, aes(x = PreTest, y = MeanPostTest)) + 
  geom_point() + 
  geom_abline() + 
  xlab("Pre-Test Score") + 
  ylab("Mean Post-Test Score") + 
  ggtitle("Pre-Test Score vs Mean Post-Test Score")

print(pre_post_ggScatter)

# ggsave("C:/Users/User/Documents/pre_post_ggScatter.png", plot = last_plot())

8. Generate the average PostTest score for each Condition. What do you observe (answer in one or two sentences)?

expConditions <- c()
expConditionMeans <- c()

for (i in unique(d$Condition)) {
  expConditions <- c(expConditions, i)
  expConditionMeans <- c(expConditionMeans,mean(d$MeanPostTest[which(d$Condition == i)]))
} 

expCondition_postTest_means.df <- as.data.frame(cbind(expConditions,expConditionMeans))

print(expCondition_postTest_means.df)

##   expConditions expConditionMeans
## 1       Explore  83.3555555555556
## 2         Teach              64.5

Answer: Average post-test scores are higher for participants who were placed into the “Explore” condition. Therefore, one would intuit that participants who engage in active problem-solving for this particular physics concept perform better on post-tests than students who are taught lessons about the concept.

PSY631_HW1

William Villano

9/2/2019

1. Read in “HW1_Data.dat”, from the course webpage.

Name the data frame “d”" Use one of the functions we’ve learned to inspect the data.

2. Obtain the following basic information about your data…

a. Get a summary of the data.

b. Get descriptive statistics for the data (mean, median, standard deviation, etc.).

c. Get descriptive statistics for just the Pretest variable.

d. Generate a histogram for the Pretest variable and give the graph a good main title (don’t export the graph, just generate it)

3. Mean centering PreTest

a. Create a new variable that is the mean-centered score for pretest knowledge. Give this variable a good name.

b. Check that your new variable is indeed mean-centered (include in your answer the code you could use).

4. Average the posttest measures to create a single global measure of overall learning

a. Create a variable that is the average of the three posttest variables (ignore missing data). Name this variable “d$MeanPostTest”"

b. What are the mean and standard deviation of this average post-test score?

c. Generate descriptive statistics of this average post-test score for each of the experimental conditions.

5. Standardizing PostCon score

a. Create a standardized variable for the conceptual understanding measure.

b. How can you check your work to make sure this is the standardized score? Do it.

c. You have now created several new variables. Write out a command that shows all the variables

d. Generate descriptive stats for all variables in the d data frame.

6. Create a table of descriptive stats

7. Make a scatter plot with raw pretest score as x-axis, mean posttest score as y-axis.

8. Generate the average PostTest score for each Condition. What do you observe (answer in one or two sentences)?