Conducting Simulation Studies in Psychometrics

class: center, middle, inverse, title-slide

# Conducting Simulation Studies in Psychometrics
## Educational Measurement: Issues and Practice
### 王慧心 顏立平
### 2018.6.4

---

#Outline

- Introduction

- Uses of, and Common Terms in, Simulation

- Simulation used in educational and psychological measurement

- Principles and recommendations

---

#Introduction

-Disavetage of real data:

- Time consuming

- Costly to collect

- May be inadequate to support the analyses, inferences, and conclusions
 
- Research questions simply cannot be answered using empirical data

---

#Introduction

-History

- The quality of beer (Student, 1908)

- Nuclear warfare (von Neumann & Ulam, 1949)

- Evaluating model – data fit (Hambleton, Swaminathan, & Rogers, 1991)

- Item response theory (Harwell, Stone, Hsu, & Kirisci, 1996)

- 2012 National Council on Measurement in Education (NCME)

---

#Uses of, and Common Terms in, Simulation

-Simulation methodologies

- Monte Carlo (MC) simulation

-Resampling methodologies

- Bootstrap

- Jackknife

---
#Uses of, and Common Terms in, Simulation

-Term

- Factor

- Condition

- Replication

- Bias

- Standard error

- Mean squared error

---
#Simulation used in educational and psychological measurement

-Types of Data Simulated
 
- IRT Parameters

- Raw(True) Scores

- Response Times

-Type of Research Utilizing Simulation

- IRT Model Fit

- Reliability
---
#Simulation used in educational and psychological measurement

-Number of Replications

- 499 or Fewer (33%)

- 1,000–4,999 (19%)

-Results Present

- Bias/RMSE

- Type I/Power

- ANOVA

- Model Fit

---
#Principles and recommendations

-Steps taken in an IRT simulation study(Harwell et al.,1996)

- Specifying a research question

- Delineating conditions

- Choosing an experimental design

- Generating data

- Estimating parameters

- Replicating the procedure

- Analyzing results

---
#Principles and recommendations

-Information for reader

- Type and version of software

- Type of hardware

- Random number of seeds set

- Distribution

- Number of replications

---
#Principles and recommendations

-Analyzing results

- Correlation

- Bias

- Mean absolute difference (MAD)

- SE

- Mean square error (MSE)

- RMSE
---
##Set-Up
####設定設定讀取和寫入檔案的位置

```r
setwd('D:/dataM/Tutorial')
```
---
##Set Simulation Conditions
#### 1.設定模擬人數和題數

```r
Num_Sim <- 1500 
Num_Items <- 300 
```
#### 2.設定題數的區間(長短和間距)

```r
min_length <- 210 
max_length <- 300 
interval <- 30    
```
#### 3.Set theta mean/sd 設定能力參數

```r
Theta_mean <- 1.6
Theta_sd <- 0.5
```
---
#### 4.設定試題難度參數

```r
Rasch_mean <- 0.4
Rasch_sd <- 0.8
```

#### 5.設定分數 (41通過)

```r
scale_mean <- 50
scale_sd <- 10
scale_cut <- 41
```
#### 6.設定重複次數

```r
reps <- 250
```
---
#### 7.載package並設定亂數種子

```r
### Load IRT Toys for IRT estimation
library(irtoys)
### Load reshape package for restructuring data
library(reshape)
set.seed(5)
```
#### 8.產生真實能力參數的模擬數據

```r
Simulees <- data.frame("Theta" = rnorm(n=Num_Sim, mean=Theta_mean, sd=Theta_sd))
```
#### 9.產生每筆模擬數據的ID

```r
Simulees$ID <- seq (1 : Num_Sim)
```
--

###Setup completed

###Begin Simulation

---

```r
ptm <- proc.time()
REP <- vector("list", reps)
for (r in 1:reps) {
Item_Parameters <- data.frame("Difficulty" = rnorm(n=Num_Items, mean=Rasch_mean, sd=Rasch_sd))
Item_Parameters$Item_ID <- seq (1 : Num_Items)
data <- merge(Item_Parameters, Simulees)
data$Model <- round(1/(1 + exp(data$Difficulty - data$Theta)), 5)
data$Uniform <- runif(nrow(data), min=0, max=1)
data$Score <- as.numeric(ifelse(data$Model > data$Uniform, yes="1", no="0"))
flip_data <- cast(data, ID ~ Item_ID, value='Score')
resp_data <- data.frame(flip_data[c(-1)])
Diffs <- aggregate(data$Difficulty, list(data$Item_ID), max)
a_parm <- rep.int(1, nrow(Item_Parameters))
c_parm <- rep.int(0, nrow(Item_Parameters))
DIFF <- as.matrix(data.frame(a_parm, Diffs$x, c_parm))
for (j in seq(min_length,max_length,interval)) {
est <- data.frame(mlebme(resp_data[ ,1:j], DIFF[1:j, ], method = "ML"))
est$ID <- seq (1 : Num_Sim)
infile <- paste("rep",r,"n",j,".csv",sep="")
write.csv( est, file=infile) 
}
}
proc.time() - ptm
```

```
#    user  system elapsed 
# 8535.64   26.96 8602.27
```
---
- 示意圖
#####生成的檔案會出現在設定的資料夾

```r
knitr::include_graphics("D:/dataM/Tutorial/02.png")
```

![](D:/dataM/Tutorial/02.png)

---
###END SIMULATION And Then......
- 合併檔案

```r
#生成容器儲存數據
REP <- vector("list", reps)
Length <- vector("list", length(seq(min_length,max_length,interval)))
#讀取所有結果檔案
for (r in 1:reps) {
for (j in seq(min_length,max_length,interval)) {
infile <- paste("rep",r,"n",j,".csv",sep="")
Length[[j]] <- data.frame(read.csv(infile),"Rep"=r) 
}
REP[[r]] <- data.frame(do.call("rbind",Length))
}
#合併所有數據
Reps_Combined <- data.frame(do.call("rbind",REP))
#將估計的能力參數和真實能力參數配對
merge_data <- merge(Simulees, Reps_Combined, by.x="ID", by.y="ID")
#將能力參數轉換成量尺分數
merge_data$scale_true <- round((((merge_data$Theta - Theta_mean)/Theta_sd)*scale_sd)+scale_mean)
merge_data$scale_est <- round((((merge_data$est - Theta_mean)/Theta_sd)*scale_sd)+scale_mean)
```
---

```r
#計算每個能力參數的估計結果
#TRUE PASSER
merge_data$outcome[((merge_data$scale_true >= scale_cut) & (merge_data$scale_est >= scale_cut))] = "TP"
#TRUE FAILER
merge_data$outcome[((merge_data$scale_true < scale_cut) & (merge_data$scale_est < scale_cut))] = "TF"
#FALSE PASSER 
merge_data$outcome[((merge_data$scale_true < scale_cut) & (merge_data$scale_est >= scale_cut))] = "FP"
#FALSE FAILER
merge_data$outcome[((merge_data$scale_true >= scale_cut) & (merge_data$scale_est < scale_cut))] = "FF"
merge_data <- merge_data[order(merge_data$ID, merge_data$n, merge_data$Rep),]#合併所有data
head(merge_data,8)
```

```
#     ID    Theta X       est       sem   n Rep scale_true scale_est outcome
# 1    1 1.179572 1 1.3862908 0.1584188 210   1         42        46      TP
# 535  1 1.179572 1 1.0035421 0.1531658 210   2         42        38      FF
# 583  1 1.179572 1 0.9962992 0.1511332 210   3         42        38      FF
# 144  1 1.179572 1 0.9109683 0.1523238 210   4         42        36      FF
# 676  1 1.179572 1 1.1239626 0.1545554 210   5         42        40      FF
# 477  1 1.179572 1 1.1948493 0.1576788 210   6         42        42      TP
# 890  1 1.179572 1 1.1703000 0.1551567 210   7         42        41      TP
# 450  1 1.179572 1 0.9261008 0.1511580 210   8         42        37      FF
```

```r
write.csv( merge_data, file= "Sim_Results.csv") 
```
---
- 計算及整理分類結果

```r
#讀取模擬結果的檔案
merge_data <- data.frame(read.csv("Sim_Results.csv"))
#用測驗長度情境和重複計算分類比率
class <- data.frame(xtabs(~outcome+Rep+n, merge_data))
#把測驗長度變項轉成字串 
class$n <- as.character(class$n)
#重組分類數據
flip_class <- cast(class, Rep+n ~ outcome, value='Freq')
head(flip_class)
```

```
#   Rep   n FF FP  TF   TP
# 1   1 210 70 29 211 1190
# 2   1 240 52 29 211 1208
# 3   1 270 54 32 208 1206
# 4   1 300 50 28 212 1210
# 5   2 210 58 29 211 1202
# 6   2 240 50 34 206 1210
```
---

```r
#計算所有測驗長度情境中FP和FF的平均數
FP_m <- aggregate(FP~n,flip_class, mean)
names(FP_m)[2] = 'Mean'
FP_m$Type <- "FP"
FF_m <- aggregate(FF~n,flip_class, mean)
names(FF_m)[2] = 'Mean'
FF_m$Type <- "FF"
misclass_m <- rbind(FP_m, FF_m)
#計算FP和FF的比例
misclass_m$Rate <- (misclass_m$Mean/Num_Sim)*100
#計算錯誤分類標準差和平均值的標準差
FP_s <- aggregate(FP~n,flip_class, sd)
names(FP_s)[2] = 'SD'
FP_s$SE <- FP_s$SD/(reps**.5)
FP_s$Type <- "FP"
FF_s <- aggregate(FF~n,flip_class, sd)
names(FF_s)[2] = 'SD'
FF_s$SE <- FF_s$SD/(reps**.5)
FF_s$Type <- "FF"
misclass_s <- rbind(FP_s, FF_s)
###合併錯誤分類的結果
misclass_results <- merge(misclass_m, misclass_s, by.x=c("Type","n"), by.y=c("Type","n"))
misclass_results
```
---
####分類結果作圖

```r
knitr::include_graphics("D:/dataM/Tutorial/clsf.png")
```

![](D:/dataM/Tutorial/clsf.png)
---
- 計算模擬數據的誤差

```r
#計算
merge_data$deviance <- merge_data$scale_est - merge_data$scale_true
Bias <- aggregate(deviance~n+ID+scale_true, merge_data, sum)
Bias$bias <- Bias$deviance/(reps)
summary(Bias$bias)
```

```
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# -0.9160 -0.1560  0.0960  0.1073  0.3600  1.3520
```
- 計算模擬數據的標準誤

```r
#計算
SE <- aggregate(scale_est~ID+n+scale_true, merge_data, sd)
names(SE)[4] = 'SE'
summary(SE$SE)
```

```
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   2.257   2.851   3.099   3.165   3.403   5.704
```
---
- 計算模擬數據的均方根誤差

```r
merge_data$deviance_sq <- merge_data$deviance**2
RMSE <- aggregate(deviance_sq~n+ID+scale_true, merge_data, sum)
RMSE$RMSE<- sqrt(RMSE$deviance_sq/(reps - 1))
summary(RMSE$RMSE)
```

```
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   2.270   2.870   3.117   3.186   3.425   5.728
```
---
- 計算測驗長度的主要效果

```r
data_ANOVA = data.frame(bias = Bias$bias, 
	group = factor(Bias$n))
ANOVA <- aov(bias ~ group, data=data_ANOVA)
summary(ANOVA)
```

```
#               Df Sum Sq Mean Sq F value Pr(>F)  
# group          3    1.1  0.3735   3.017 0.0287 *
# Residuals   5996  742.4  0.1238                 
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
data_ANOVA$group
data_ANOVA = data.frame(se = SE$SE, 
	group = factor(SE$n))
ANOVA <- aov(se ~ group, data=data_ANOVA)
data_ANOVA = data.frame(rmse = RMSE$RMSE, 
	group = factor(RMSE$n))
ANOVA <- aov(rmse ~ group, data=data_ANOVA)
```

---
####標準差圖

```r
knitr::include_graphics("D:/dataM/Tutorial/se.png")
```

![](D:/dataM/Tutorial/se.png)
---
####誤差圖

```r
knitr::include_graphics("D:/dataM/Tutorial/bias.png")
```

![](D:/dataM/Tutorial/bias.png)
---

#END