class: center, middle, inverse, title-slide # Conducting Simulation Studies in Psychometrics ## Educational Measurement: Issues and Practice ### 王慧心 顏立平 ### 2018.6.4 --- #Outline - Introduction - Uses of, and Common Terms in, Simulation - Simulation used in educational and psychological measurement - Principles and recommendations --- #Introduction -Disavetage of real data: - Time consuming - Costly to collect - May be inadequate to support the analyses, inferences, and conclusions - Research questions simply cannot be answered using empirical data --- #Introduction -History - The quality of beer (Student, 1908) - Nuclear warfare (von Neumann & Ulam, 1949) - Evaluating model – data fit (Hambleton, Swaminathan, & Rogers, 1991) - Item response theory (Harwell, Stone, Hsu, & Kirisci, 1996) - 2012 National Council on Measurement in Education (NCME) --- #Uses of, and Common Terms in, Simulation -Simulation methodologies - Monte Carlo (MC) simulation -Resampling methodologies - Bootstrap - Jackknife --- #Uses of, and Common Terms in, Simulation -Term - Factor - Condition - Replication - Bias - Standard error - Mean squared error --- #Simulation used in educational and psychological measurement -Types of Data Simulated - IRT Parameters - Raw(True) Scores - Response Times -Type of Research Utilizing Simulation - IRT Model Fit - Reliability --- #Simulation used in educational and psychological measurement -Number of Replications - 499 or Fewer (33%) - 1,000–4,999 (19%) -Results Present - Bias/RMSE - Type I/Power - ANOVA - Model Fit --- #Principles and recommendations -Steps taken in an IRT simulation study(Harwell et al.,1996) - Specifying a research question - Delineating conditions - Choosing an experimental design - Generating data - Estimating parameters - Replicating the procedure - Analyzing results --- #Principles and recommendations -Information for reader - Type and version of software - Type of hardware - Random number of seeds set - Distribution - Number of replications --- #Principles and recommendations -Analyzing results - Correlation - Bias - Mean absolute difference (MAD) - SE - Mean square error (MSE) - RMSE --- ##Set-Up ####設定設定讀取和寫入檔案的位置 ```r setwd('D:/dataM/Tutorial') ``` --- ##Set Simulation Conditions #### 1.設定模擬人數和題數 ```r Num_Sim <- 1500 Num_Items <- 300 ``` #### 2.設定題數的區間(長短和間距) ```r min_length <- 210 max_length <- 300 interval <- 30 ``` #### 3.Set theta mean/sd 設定能力參數 ```r Theta_mean <- 1.6 Theta_sd <- 0.5 ``` --- #### 4.設定試題難度參數 ```r Rasch_mean <- 0.4 Rasch_sd <- 0.8 ``` #### 5.設定分數 (41通過) ```r scale_mean <- 50 scale_sd <- 10 scale_cut <- 41 ``` #### 6.設定重複次數 ```r reps <- 250 ``` --- #### 7.載package並設定亂數種子 ```r ### Load IRT Toys for IRT estimation library(irtoys) ### Load reshape package for restructuring data library(reshape) set.seed(5) ``` #### 8.產生真實能力參數的模擬數據 ```r Simulees <- data.frame("Theta" = rnorm(n=Num_Sim, mean=Theta_mean, sd=Theta_sd)) ``` #### 9.產生每筆模擬數據的ID ```r Simulees$ID <- seq (1 : Num_Sim) ``` -- ###Setup completed -- ###Begin Simulation --- ```r ptm <- proc.time() REP <- vector("list", reps) for (r in 1:reps) { Item_Parameters <- data.frame("Difficulty" = rnorm(n=Num_Items, mean=Rasch_mean, sd=Rasch_sd)) Item_Parameters$Item_ID <- seq (1 : Num_Items) data <- merge(Item_Parameters, Simulees) data$Model <- round(1/(1 + exp(data$Difficulty - data$Theta)), 5) data$Uniform <- runif(nrow(data), min=0, max=1) data$Score <- as.numeric(ifelse(data$Model > data$Uniform, yes="1", no="0")) flip_data <- cast(data, ID ~ Item_ID, value='Score') resp_data <- data.frame(flip_data[c(-1)]) Diffs <- aggregate(data$Difficulty, list(data$Item_ID), max) a_parm <- rep.int(1, nrow(Item_Parameters)) c_parm <- rep.int(0, nrow(Item_Parameters)) DIFF <- as.matrix(data.frame(a_parm, Diffs$x, c_parm)) for (j in seq(min_length,max_length,interval)) { est <- data.frame(mlebme(resp_data[ ,1:j], DIFF[1:j, ], method = "ML")) est$ID <- seq (1 : Num_Sim) infile <- paste("rep",r,"n",j,".csv",sep="") write.csv( est, file=infile) } } proc.time() - ptm ``` ``` # user system elapsed # 8535.64 26.96 8602.27 ``` --- - 示意圖 #####生成的檔案會出現在設定的資料夾 ```r knitr::include_graphics("D:/dataM/Tutorial/02.png") ``` <!-- --> --- ###END SIMULATION And Then...... - 合併檔案 ```r #生成容器儲存數據 REP <- vector("list", reps) Length <- vector("list", length(seq(min_length,max_length,interval))) #讀取所有結果檔案 for (r in 1:reps) { for (j in seq(min_length,max_length,interval)) { infile <- paste("rep",r,"n",j,".csv",sep="") Length[[j]] <- data.frame(read.csv(infile),"Rep"=r) } REP[[r]] <- data.frame(do.call("rbind",Length)) } #合併所有數據 Reps_Combined <- data.frame(do.call("rbind",REP)) #將估計的能力參數和真實能力參數配對 merge_data <- merge(Simulees, Reps_Combined, by.x="ID", by.y="ID") #將能力參數轉換成量尺分數 merge_data$scale_true <- round((((merge_data$Theta - Theta_mean)/Theta_sd)*scale_sd)+scale_mean) merge_data$scale_est <- round((((merge_data$est - Theta_mean)/Theta_sd)*scale_sd)+scale_mean) ``` --- ```r #計算每個能力參數的估計結果 #TRUE PASSER merge_data$outcome[((merge_data$scale_true >= scale_cut) & (merge_data$scale_est >= scale_cut))] = "TP" #TRUE FAILER merge_data$outcome[((merge_data$scale_true < scale_cut) & (merge_data$scale_est < scale_cut))] = "TF" #FALSE PASSER merge_data$outcome[((merge_data$scale_true < scale_cut) & (merge_data$scale_est >= scale_cut))] = "FP" #FALSE FAILER merge_data$outcome[((merge_data$scale_true >= scale_cut) & (merge_data$scale_est < scale_cut))] = "FF" merge_data <- merge_data[order(merge_data$ID, merge_data$n, merge_data$Rep),]#合併所有data head(merge_data,8) ``` ``` # ID Theta X est sem n Rep scale_true scale_est outcome # 1 1 1.179572 1 1.3862908 0.1584188 210 1 42 46 TP # 535 1 1.179572 1 1.0035421 0.1531658 210 2 42 38 FF # 583 1 1.179572 1 0.9962992 0.1511332 210 3 42 38 FF # 144 1 1.179572 1 0.9109683 0.1523238 210 4 42 36 FF # 676 1 1.179572 1 1.1239626 0.1545554 210 5 42 40 FF # 477 1 1.179572 1 1.1948493 0.1576788 210 6 42 42 TP # 890 1 1.179572 1 1.1703000 0.1551567 210 7 42 41 TP # 450 1 1.179572 1 0.9261008 0.1511580 210 8 42 37 FF ``` ```r write.csv( merge_data, file= "Sim_Results.csv") ``` --- - 計算及整理分類結果 ```r #讀取模擬結果的檔案 merge_data <- data.frame(read.csv("Sim_Results.csv")) #用測驗長度情境和重複計算分類比率 class <- data.frame(xtabs(~outcome+Rep+n, merge_data)) #把測驗長度變項轉成字串 class$n <- as.character(class$n) #重組分類數據 flip_class <- cast(class, Rep+n ~ outcome, value='Freq') head(flip_class) ``` ``` # Rep n FF FP TF TP # 1 1 210 70 29 211 1190 # 2 1 240 52 29 211 1208 # 3 1 270 54 32 208 1206 # 4 1 300 50 28 212 1210 # 5 2 210 58 29 211 1202 # 6 2 240 50 34 206 1210 ``` --- ```r #計算所有測驗長度情境中FP和FF的平均數 FP_m <- aggregate(FP~n,flip_class, mean) names(FP_m)[2] = 'Mean' FP_m$Type <- "FP" FF_m <- aggregate(FF~n,flip_class, mean) names(FF_m)[2] = 'Mean' FF_m$Type <- "FF" misclass_m <- rbind(FP_m, FF_m) #計算FP和FF的比例 misclass_m$Rate <- (misclass_m$Mean/Num_Sim)*100 #計算錯誤分類標準差和平均值的標準差 FP_s <- aggregate(FP~n,flip_class, sd) names(FP_s)[2] = 'SD' FP_s$SE <- FP_s$SD/(reps**.5) FP_s$Type <- "FP" FF_s <- aggregate(FF~n,flip_class, sd) names(FF_s)[2] = 'SD' FF_s$SE <- FF_s$SD/(reps**.5) FF_s$Type <- "FF" misclass_s <- rbind(FP_s, FF_s) ###合併錯誤分類的結果 misclass_results <- merge(misclass_m, misclass_s, by.x=c("Type","n"), by.y=c("Type","n")) misclass_results ``` --- ####分類結果作圖 ```r knitr::include_graphics("D:/dataM/Tutorial/clsf.png") ``` <!-- --> --- - 計算模擬數據的誤差 ```r #計算 merge_data$deviance <- merge_data$scale_est - merge_data$scale_true Bias <- aggregate(deviance~n+ID+scale_true, merge_data, sum) Bias$bias <- Bias$deviance/(reps) summary(Bias$bias) ``` ``` # Min. 1st Qu. Median Mean 3rd Qu. Max. # -0.9160 -0.1560 0.0960 0.1073 0.3600 1.3520 ``` - 計算模擬數據的標準誤 ```r #計算 SE <- aggregate(scale_est~ID+n+scale_true, merge_data, sd) names(SE)[4] = 'SE' summary(SE$SE) ``` ``` # Min. 1st Qu. Median Mean 3rd Qu. Max. # 2.257 2.851 3.099 3.165 3.403 5.704 ``` --- - 計算模擬數據的均方根誤差 ```r merge_data$deviance_sq <- merge_data$deviance**2 RMSE <- aggregate(deviance_sq~n+ID+scale_true, merge_data, sum) RMSE$RMSE<- sqrt(RMSE$deviance_sq/(reps - 1)) summary(RMSE$RMSE) ``` ``` # Min. 1st Qu. Median Mean 3rd Qu. Max. # 2.270 2.870 3.117 3.186 3.425 5.728 ``` --- - 計算測驗長度的主要效果 ```r data_ANOVA = data.frame(bias = Bias$bias, group = factor(Bias$n)) ANOVA <- aov(bias ~ group, data=data_ANOVA) summary(ANOVA) ``` ``` # Df Sum Sq Mean Sq F value Pr(>F) # group 3 1.1 0.3735 3.017 0.0287 * # Residuals 5996 742.4 0.1238 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r data_ANOVA$group data_ANOVA = data.frame(se = SE$SE, group = factor(SE$n)) ANOVA <- aov(se ~ group, data=data_ANOVA) data_ANOVA = data.frame(rmse = RMSE$RMSE, group = factor(RMSE$n)) ANOVA <- aov(rmse ~ group, data=data_ANOVA) ``` --- ####標準差圖 ```r knitr::include_graphics("D:/dataM/Tutorial/se.png") ``` <!-- --> --- ####誤差圖 ```r knitr::include_graphics("D:/dataM/Tutorial/bias.png") ``` <!-- --> --- #END