Moments

The r\(^{th}\) moment of a variable x about any point x=A denoted by \(\mu_r^\prime\) given by

\[\begin{equation} \mu_r^\prime = \frac{1}{N} \sum_i f_i\, (x_i-A)^r \end{equation}\]

Central Moments Put A = \(\bar{X}\), the rth central moment is \[\begin{equation} \mu_r = \frac{1}{N} \sum_i f_i\, ({x_i-\bar{x}})^r \end{equation}\]

\[\begin{equation} \begin{array}{rcl} \mu_r & = & \frac{1}{N}\sum_i (d_i-\mu_1^\prime)^r\\ &=& \frac{1}{N}\sum f_i \left[ d_i^r-C_1^r d_i^{r-1} \mu_1^\prime +C_2^r d_i^{r-2} \mu_1^{\prime \, 2}-\cdots +(-1)^r\mu_1^{\prime \, r} \right] \end{array} \end{equation}\]

The effect of change of origin and scale on the moments \[ U=\frac{X-A}{h} \]

then

\[ {\mu_r}^{X} = h^r{\mu_r}^{U} \] Thus the rth moment of the variable x about mean is hr times the rth moment of the variable II about its mean.

Sheppard’s Corrections for Moments

The effect due to grouping at the mid-point of the intervals can be corrected by the following fonnulae, known as Sheppard’s corrections: \[ \begin{array}{rcl} \mu_2(corrected) &=& \mu_2-\frac{h^2}{12}\\ \mu_3(corrected) &=& \mu_3\\ \mu_4(corrected) &=& \mu_4-\frac{h^2}{2}\mu_2+\frac{7}{240}h^4 \end{array} \]

where h is the width of the class interval.

Charlier’s Checks

The following identitfes \[ \begin{array}{rcl} \sum f(x+1) &=& \sum fx +N\\ \sum f(x+1)^3 &=& \sum f x^3+ 3 \sum f x^2 + 3 \sum f x+ N\\ \sum f(x+1)^4 &=& \sum f x^4+ 4 \sum f x^3 + 6 \sum f x^2+ 4 \sum f x + N \end{array} \] are often used in checking the accuracy in the calculation of first four moments and are known as Charlier’s Checks.

Conversion formulae from non-central to central moments

The non-cental moment is \[ \mu^\prime_r = \frac{1}{N}\sum_i f_i (x_i-A)^r \]

Add and subtract \(\bar{x}\)

\[ \begin{array}{rcl} \mu^\prime_r &=& \frac{1}{N}\sum_i f_i (x_i-\bar{x}+\bar{x}-A)^r\\ &=& \frac{1}{N}\sum_i f_i (z_i+\mu_1^\prime)^r\\ &=& \frac{1}{N}\sum_i f_i \left[ \sum_k C^r_k z_i^{r-k} {\mu_1^\prime}^k \right] \\ &=& \sum_k C^r_k \mu_{r-k}\, {\mu_1^\prime}^k\\ &=& \mu_r+C^r_1\mu_{r-1}{\mu_1^\prime}^1+\cdots+{\mu_1^\prime}^r \end{array} \]

Alpha Coefficients

Based on the moments \[ \begin{array}{rcl} \alpha_1 &=& \frac{\mu_1}{\sigma}\\ \alpha_2 &=& \frac{\mu_2}{\sigma^2}\\ \alpha_3 &=& \frac{\mu_3}{\sigma^3}\\ \alpha_4 &=& \frac{\mu_4}{\sigma^4} \end{array} \]

Pearson’s \(\beta\) and \(\gamma\) Coefficients

Karl Pearson defined the following four coefficients, based upon the first fout moments"about mean:

\[ \begin{array}{rcl} \beta_1 &=&\frac{\mu_3^2}{\mu_2^3}\\ \gamma_1 &=& + \sqrt{\beta_1}\\ \beta_2 &=& \frac{\mu_4}{\mu_2^2}\\ \gamma_2 &=& \beta_2 - 3 \end{array} \]

Kurtosis

The frequency distribution is

Measures of Skewness

Karl Pearson Coefficient of Skewness \[ s_k = \frac{Me-Mo}{\sigma} \]

The frequency distribution is Assymetric

###Based upon moments The coefficient of skewness is \[ Sk = \frac{\sqrt{\beta_1}(\beta_2 + 3 )}{2 (5\beta_2-6\beta_1-9)} \]

Coefficient of Variation

It is given by \[ CV=\frac{\sigma}{\bar{x}}\times 100 \] smaller CV means consistent.

Example 1.

Calculate the first four moments distribution about the mean and hence find \(\beta_1\) and \(\beta_2\)

x: 0 1 2 3 4 5 6 7 8
f: 1 8 28 56 70 56 28 8 1

Ans: The mean is 4 hence choose A = 4

##     x   f  fx fx2  fx3  fx4
## 1  -4   1  -4  16  -64  256
## 2  -3   8 -24  72 -216  648
## 3  -2  28 -56 112 -224  448
## 4  -1  56 -56  56  -56   56
## 5   0  70   0   0    0    0
## 6   1  56  56  56   56   56
## 7   2  28  56 112  224  448
## 8   3   8  24  72  216  648
## 9   4   1   4  16   64  256
## 10  0 256   0 512    0 2816
##    mu1 mu2 mu3 mu4
## 10   0   2   0  11

The \(\beta_1\) and \(\beta_2\) coefficients are calculated below

beta1 = result$mu3^2/result$mu2^3
beta2 = result$mu4/result$mu2^2
result2=c(beta1,beta2)
names(result2)=c('beta1','beta2')
result2
## beta1 beta2 
##  0.00  2.75
rm(list=ls())

Example 2

Find out the kurtosis of the data given below:

Class Interval : 0-10 10-20 2O-30 30-40

Frequency : 1 3 4 2

df=data.frame(x=seq(5,35,10),f=c(1,3,4,2))
N=sum(df$f)
df$x=df$x-22
df$fx=df$f*df$x
df$fx2=df$fx*df$x
df$fx3=df$fx2*df$x
df$fx4=df$fx3*df$x
sum_row=apply(df,2,sum)
df=rbind(df,sum_row)
df
##     x  f  fx fx2   fx3    fx4
## 1 -17  1 -17 289 -4913  83521
## 2  -7  3 -21 147 -1029   7203
## 3   3  4  12  36   108    324
## 4  13  2  26 338  4394  57122
## 5  -8 10   0 810 -1440 148170
result=round(df[5,3:6]/N,2)
names(result)=c('mu1','mu2','mu3','mu4')
result
##   mu1 mu2  mu3   mu4
## 5   0  81 -144 14817
beta1 = result$mu3^2/result$mu2^3
beta2 = result$mu4/result$mu2^2
result2=c(beta1,beta2)
names(result2)=c('beta1','beta2')
result2
##      beta1      beta2 
## 0.03901844 2.25834476
rm(list=ls())
r=3
choose(r,0:r)
## [1] 1 3 3 1

Example 3

Lives of two models of refrigerators turned a recent survey are given below. What is the average life of each model of these refrigerators ? Which model shows more uniformity?

life=c('0-2','2-4','4-6','6-8','8-10','10-12')
modA=c(5,16,13,7,5,4)
modB=c(2,7,12,19,9,1)
data.frame(life,modA,modB)
##    life modA modB
## 1   0-2    5    2
## 2   2-4   16    7
## 3   4-6   13   12
## 4   6-8    7   19
## 5  8-10    5    9
## 6 10-12    4    1
midx=seq(1,11,2)
df=data.frame(midx,modA,modB,xf=midx*modA,yf=midx*modB)
res=apply(df,2,sum)/50
xbar=res['xf']
ybar=res['yf']
df=data.frame(midx,modA,modB,xf=midx*modA,yf=midx*modB,
              z2f=(midx-xbar)^2*modA,
              w2f=(midx-ybar)^2*modB
              )
df
##   midx modA modB xf  yf      z2f     w2f
## 1    1    5    2  5   2  84.8720 53.2512
## 2    3   16    7 48  21  71.9104 69.8992
## 3    5   13   12 65  60   0.1872 16.1472
## 4    7    7   19 49 133  24.7408 13.4064
## 5    9    5    9 45  81  75.2720 72.5904
## 6   11    4    1 44  11 138.2976 23.4256
res=apply(df,2,sum)/50

cv_A=sqrt(res['z2f'])/res['xf']*100
cv_B=sqrt(res['w2f'])/res['yf']*100

data.frame(cv_A=round(cv_A,2),cv_B=round(cv_B,2))
##      cv_A  cv_B
## z2f 54.92 36.21
data.frame(meanA=round(res['xf'],2),MeanB=round(res['yf'],2))
##    meanA MeanB
## xf  5.12  6.16

Example 4

Calculate Mean, Mode, Median, Standard Deviation, Quratile Deviation, Moments upto the order of 4, beta coefficients and gamma coefficients to the following frequency distribution

x=seq(175,245,10)
f=c(52,68,85,92,100,95,70,28) 
df=data.frame(x,f)

To calculate the mean

#calculate mean
mean=sum(x*f)/sum(f)
data.frame(mean=mean)
##       mean
## 1 208.9831
N=sum(f)

To calculate median

#calculate median (q2)
#data.frame(x,f,cumsum(f))
L_md=200
f_md=92
cf_md=205
Q2=L_md+10*(N/2-cf_md)/f_md
data.frame(median=Q2)
##     median
## 1 209.7826
#calculate q1
L_q1=190
f_q1=85
cf_q1=120

Q1=L_q1+10*(N/4-cf_q1)/f_q1
Q1
## [1] 193.2353
#calculate q3
L_q3=220
f_q3=95
cf_q3=397

Q3=L_q3+10*(3*N/4-cf_q3)/f_q3
Q3
## [1] 224.7895
#calculate stdev 
varx=sum(x^2*f)/sum(f)-(sum(x*f)/sum(f))^2

stdx=round(sqrt(varx),2)

#print table
data.frame(x,f,cumsum(f))
##     x   f cumsum.f.
## 1 175  52        52
## 2 185  68       120
## 3 195  85       205
## 4 205  92       297
## 5 215 100       397
## 6 225  95       492
## 7 235  70       562
## 8 245  28       590
#print the result
data.frame(mean,Q1,Q2,Q3,QD=(Q3-Q1)/2,stdx)
##       mean       Q1       Q2       Q3       QD stdx
## 1 208.9831 193.2353 209.7826 224.7895 15.77709 19.7
#calculate coefficient of variation
cv=stdx/mean*100
data.frame(cv=round(cv,2))
##     cv
## 1 9.43
z=x-mean
mu2=sum(z^2*f)/sum(f)
mu3=sum(z^3*f)/sum(f)
mu4=sum(z^4*f)/sum(f)
beta1 = mu3^2/mu2^3
beta2 = mu4/mu2^2
result2=c(beta1,beta2)
names(result2)=c('beta1','beta2')
result2
##       beta1       beta2 
## 0.003424753 2.034009213

Example 5

Find the mean, mode and median of the following frequency distribution

l=seq(40,160,10)
u=seq(49,169,10)
L=l-0.5
U=u+0.5
CI=paste(L,U,sep='-')
f=c(2,3,7,19,37,79,69,65,17,5,3,2,1)
N=sum(f)
#df=data.frame(CI,x=L+U,f)
A=94.5
x=(L+U)/2
u=(x-A)/10
df=data.frame(CI,f)
df
##             CI  f
## 1    39.5-49.5  2
## 2    49.5-59.5  3
## 3    59.5-69.5  7
## 4    69.5-79.5 19
## 5    79.5-89.5 37
## 6    89.5-99.5 79
## 7   99.5-109.5 69
## 8  109.5-119.5 65
## 9  119.5-129.5 17
## 10 129.5-139.5  5
## 11 139.5-149.5  3
## 12 149.5-159.5  2
## 13 159.5-169.5  1
df=data.frame(CI,x,f,u,fu=f*u,cf=cumsum(f))
#calculate mean
res=apply(df[3:5],2,sum,na.rm=TRUE)/N
ubar=res[3]
xbar=A+10*ubar
df
##             CI     x  f  u  fu  cf
## 1    39.5-49.5  44.5  2 -5 -10   2
## 2    49.5-59.5  54.5  3 -4 -12   5
## 3    59.5-69.5  64.5  7 -3 -21  12
## 4    69.5-79.5  74.5 19 -2 -38  31
## 5    79.5-89.5  84.5 37 -1 -37  68
## 6    89.5-99.5  94.5 79  0   0 147
## 7   99.5-109.5 104.5 69  1  69 216
## 8  109.5-119.5 114.5 65  2 130 281
## 9  119.5-129.5 124.5 17  3  51 298
## 10 129.5-139.5 134.5  5  4  20 303
## 11 139.5-149.5 144.5  3  5  15 306
## 12 149.5-159.5 154.5  2  6  12 308
## 13 159.5-169.5 164.5  1  7   7 309
meanx=xbar 
data.frame(meanx)
##       meanx
## fu 100.5194
#calculate median q2
l_q2=99.5
f_q2=69
cf_q2=147
q2=l_q2+10*(154.5-cf_q2)/f_q2
medianx=q2

#calculation of mode
#mean - mode = 3 (mean - median)
modex = 3*medianx - 2*meanx
modex
##      fu 
## 100.722
df
##             CI     x  f  u  fu  cf
## 1    39.5-49.5  44.5  2 -5 -10   2
## 2    49.5-59.5  54.5  3 -4 -12   5
## 3    59.5-69.5  64.5  7 -3 -21  12
## 4    69.5-79.5  74.5 19 -2 -38  31
## 5    79.5-89.5  84.5 37 -1 -37  68
## 6    89.5-99.5  94.5 79  0   0 147
## 7   99.5-109.5 104.5 69  1  69 216
## 8  109.5-119.5 114.5 65  2 130 281
## 9  119.5-129.5 124.5 17  3  51 298
## 10 129.5-139.5 134.5  5  4  20 303
## 11 139.5-149.5 144.5  3  5  15 306
## 12 149.5-159.5 154.5  2  6  12 308
## 13 159.5-169.5 164.5  1  7   7 309
data.frame(meanx,medianx,modex)
##       meanx medianx   modex
## fu 100.5194 100.587 100.722