The simulation and evaluation steps will be the same, but the categorizer function will be different. Categorization will be based on the formula; \[ \begin{equation} CrV(\mu, \sigma, P_i ) = e^{\mu + \sigma \cdot NORMINV(P_i)} \end{equation} \]
kaf2=kaf
categorizer2=function(var.s,var.o) {
le=length(table(var.o)) #number of categories
mu=mean(log(var.o)) #mean of variable
sd=sd(log(var.o)) #sd opf variable
prob=as.vector(cumsum(table(var.o))/length(var.o))#obtain probabilities of categories
if (le==2){
crv=exp(mu+(sd*qnorm(prob[1])))
return(ifelse(var.s>crv,2,1))}
if (le>2){
crv=exp(mu+(sd*qnorm(prob[-le]))) #get thresholds from a normal distribution
breaks=c(min(var.s),crv,max(var.s))
return(cut(var.s,right=T,include.lowest=T,breaks=breaks,
labels=names(table(var.o))))
} #close if
} #close function
simtan=simdata
for (i in 1:6){
simtan[,i]=categorizer2(simdata[,i],kaf2[,i])}
The simulation is completed, lets evaluate
2-a) Bias of the simulated means with Tannenbaum approach
| Bias | |
|---|---|
| bin1 | 0.0580000 |
| bin2 | 0.0680000 |
| bin3 | 0.0450000 |
| cat1 | 0.1190000 |
| cat2 | 0.1270000 |
| cat3 | 0.1190000 |
| con1 | -0.0252327 |
| con2 | -0.0256821 |
| con3 | -0.0267577 |
Bias of the simulated covariance matrix with Tannenbaum approach
| bin1 | bin2 | bin3 | cat1 | cat2 | cat3 | con1 | con2 | con3 | |
|---|---|---|---|---|---|---|---|---|---|
| bin1 | -0.01 | ||||||||
| bin2 | -0.028 | 0.015 | |||||||
| bin3 | -0.046 | -0.023 | -0.006 | ||||||
| cat1 | -0.035 | -0.031 | -0.042 | -0.069 | |||||
| cat2 | -0.014 | 0.021 | 0.007 | 0.023 | -0.144 | ||||
| cat3 | -0.074 | -0.041 | -0.061 | -0.027 | -0.049 | -0.09 | |||
| con1 | -0.079 | -0.052 | -0.052 | -0.023 | -0.038 | -0.026 | 0.029 | ||
| con2 | -0.012 | -0.02 | -0.005 | -0.105 | -0.058 | -0.002 | 0.024 | -0.027 | |
| con3 | -0.02 | -0.013 | -0.01 | -0.009 | -0.171 | 0.016 | 0.053 | 0.016 | 0.002 |
Frequencies of the simulated categoricals with Tannenbaum approach
percentages of the simulated categoricals with Tannenbaum approach
apply(simtan[,4:6],2,function(x){table(x)/nrow(simtan)})
## $cat1
## x
## 1 2 3 4 5
## 0.035 0.085 0.229 0.222 0.429
##
## $cat2
## x
## 1 2 3
## 0.042 0.093 0.865
##
## $cat3
## x
## 1 2 3 4
## 0.238 0.261 0.372 0.129
In terms of replicating the means the ordinary approach seems to outperform Tannenbaum’s approach
In terms of replicating the covariance matrix the methods perform roughly equal with an exception. The 3rd categorical variable seems to be distriuted more normal compared to other categorical variables and the Tannenbaum’s approach slighltly outperforms the ordinary approach.Further evaluation is needed.
When there are different types of variables, the pearson correlation matrix might be misleading.Let’s use the polycor package
library(polycor)
## Loading required package: sfsmisc
round(hetcor(kaf)$correlations,2)
## bin1 bin2 bin3 cat1 cat2 cat3 con1 con2 con3
## bin1 1.00 0.82 0.99 0.43 0.06 0.69 0.93 0.29 0.38
## bin2 0.82 1.00 0.83 0.50 -0.05 0.62 0.72 0.38 0.29
## bin3 0.99 0.83 1.00 0.46 -0.03 0.68 0.88 0.31 0.37
## cat1 0.43 0.50 0.46 1.00 -0.07 0.31 0.52 0.67 0.34
## cat2 0.06 -0.05 -0.03 -0.07 1.00 0.31 0.26 0.26 0.72
## cat3 0.69 0.62 0.68 0.31 0.31 1.00 0.74 0.33 0.35
## con1 0.93 0.72 0.88 0.52 0.26 0.74 1.00 0.48 0.41
## con2 0.29 0.38 0.31 0.67 0.26 0.33 0.48 1.00 0.34
## con3 0.38 0.29 0.37 0.34 0.72 0.35 0.41 0.34 1.00
round(hetcor(simkaf)$correlations,2)
## bin1 bin2 bin3 cat1 cat2 cat3 con1 con2 con3
## bin1 1.00 0.64 0.88 0.32 -0.03 0.57 0.67 0.26 0.32
## bin2 0.64 1.00 0.64 0.45 0.05 0.50 0.61 0.36 0.24
## bin3 0.88 0.64 1.00 0.35 -0.03 0.57 0.66 0.30 0.33
## cat1 0.32 0.45 0.35 1.00 0.02 0.27 0.44 0.57 0.33
## cat2 -0.03 0.05 -0.03 0.02 1.00 0.22 0.17 0.22 0.50
## cat3 0.57 0.50 0.57 0.27 0.22 1.00 0.73 0.35 0.39
## con1 0.67 0.61 0.66 0.44 0.17 0.73 1.00 0.50 0.44
## con2 0.26 0.36 0.30 0.57 0.22 0.35 0.50 1.00 0.36
## con3 0.32 0.24 0.33 0.33 0.50 0.39 0.44 0.36 1.00
round(hetcor(simtan)$correlations,2)
## bin1 bin2 bin3 cat1 cat2 cat3 con1 con2 con3
## bin1 1.00 0.67 0.89 0.35 -0.01 0.56 0.67 0.27 0.33
## bin2 0.67 1.00 0.69 0.41 0.10 0.51 0.60 0.34 0.24
## bin3 0.89 0.69 1.00 0.37 0.04 0.58 0.66 0.31 0.34
## cat1 0.35 0.41 0.37 1.00 0.02 0.27 0.44 0.59 0.33
## cat2 -0.01 0.10 0.04 0.02 1.00 0.28 0.20 0.22 0.48
## cat3 0.56 0.51 0.58 0.27 0.28 1.00 0.74 0.34 0.39
## con1 0.67 0.60 0.66 0.44 0.20 0.74 1.00 0.50 0.44
## con2 0.27 0.34 0.31 0.59 0.22 0.34 0.50 1.00 0.36
## con3 0.33 0.24 0.34 0.33 0.48 0.39 0.44 0.36 1.00
Reference: http://link.springer.com/article/10.1007%2Fs10928-006-9033-1