The mean of a sample (statistics) for RV is :\[\bar{X}=\frac{1}{n}\sum_{i=1:n} X_i\]
Usually people takes the mean of all columns (like in xls) make a sum and divides by the number of means: This is un unbiased estimator AS LONG AS LENGTH OF COLUMNS ARE EQUAL !
When groups count are unequal (Unbalanced data) making such procedure result in a biased estimator , hence your statistics:
– Lets define i the X_i’s value of any group and j the group 1,2,3 of the random vectors gathered in one BIG Random vector : Here j=1:3 as total of 3 group vectors constructed representing a sample of a RV[Random Variable].
Usually the mean estimator is the sum of x (Or X as a [RV]) / total of count each Xs (has n1=1,n2=1… as a count. Lets create 3 vector of Xi , Xj, Xk as a RV vectors
xi=c(2,3,5,6,5,5)#6 values)
xj=c(2,3,3,4,7,5)#6 values)
xk=c(4,4,5,2,8,4)#6 values)
xi
## [1] 2 3 5 6 5 5
##create a mega vector of all x's
xijk=c(xi,xj,xk)
xijk
## [1] 2 3 5 6 5 5 2 3 3 4 7 5 4 4 5 2 8 4
summary(xijk)#GRAND MEAN is 4.278
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 4.000 4.278 5.000 8.000
length(xijk)#3x6=18 values counted
## [1] 18
#prof by maths def.mean
GRANDMEAN=sum(xi,xj,xk)/18
GRANDMEAN
## [1] 4.277778
mean(xi)
## [1] 4.333333
mean(xj)
## [1] 4
mean(xk)
## [1] 4.5
##what we do all: summing the 3 means/3
sum(mean(xi),mean(xj),mean(xk))/3
## [1] 4.277778
#GRAND MEAN is here an unbiased estimator here because == ni in each j groups: cool the maths is correct
xi2=c(2,3,5,6)#4 values)
xj2=c(2,3,3,4,7,5)#6 values)
xk2=c(4,2)#2 values)
#12 values in j=3 groups
##create a mega vector of all x's
xi2j2k2=c(xi2,xj2,xk2)
xi2j2k2
## [1] 2 3 5 6 2 3 3 4 7 5 4 2
summary(xi2j2k2)#GRAND MEAN is 4.278
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.750 3.500 3.833 5.000 7.000
GRANDMEAN=sum(mean(xi2),mean(xj2),mean(xk2))/3#3 is the number of mean used in the sum:
GRANDMEAN
## [1] 3.666667
\[\tilde{X_G}=\frac{1}\sum_{w_j}\sum_{w_j=1:ij} w_j*\bar{X}_j\]
df1=data.frame(c(3.833,3.66))
dotplot(df1,cex=3,col=c(3,2),main="UNBIASED VS BIASED G_MEAN (RED)",xlab="",ylab="value of G-mean estimators")