The r\(^{th}\) moment of a variable x about any point x=A denoted by \(\mu_r^\prime\) given by
\[\begin{equation} \mu_r^\prime = \frac{1}{N} \sum_i f_i\, (x_i-A)^r \end{equation}\]
Central Moments Put A = \(\bar{X}\), the rth central moment is \[\begin{equation} \mu_r = \frac{1}{N} \sum_i f_i\, ({x_i-\bar{x}})^r \end{equation}\]
\[\begin{equation} \begin{array}{rcl} \mu_r & = & \frac{1}{N}\sum_i (d_i-\mu_1^\prime)^r\\ &=& \frac{1}{N}\sum f_i \left[ d_i^r-C_1^r d_i^{r-1} \mu_1^\prime +C_2^r d_i^{r-2} \mu_1^{\prime \, 2}-\cdots +(-1)^r\mu_1^{\prime \, r} \right] \end{array} \end{equation}\]
The effect of change of origin and scale on the moments \[ U=\frac{X-A}{h} \]
then
\[ {\mu_r}^{X} = h^r{\mu_r}^{U} \] Thus the rth moment of the variable x about mean is hr times the rth moment of the variable II about its mean.
The effect due to grouping at the mid-point of the intervals can be corrected by the following fonnulae, known as Sheppard’s corrections: \[ \begin{array}{rcl} \mu_2(corrected) &=& \mu_2-\frac{h^2}{12}\\ \mu_3(corrected) &=& \mu_3\\ \mu_4(corrected) &=& \mu_4-\frac{h^2}{2}\mu_2+\frac{7}{240}h^4 \end{array} \]
where h is the width of the class interval.
The following identitfes \[ \begin{array}{rcl} \sum f(x+1) &=& \sum fx +N\\ \sum f(x+1)^3 &=& \sum f x^3+ 3 \sum f x^2 + 3 \sum f x+ N\\ \sum f(x+1)^4 &=& \sum f x^4+ 4 \sum f x^3 + 6 \sum f x^2+ 4 \sum f x + N \end{array} \] are often used in checking the accuracy in the calculation of first four moments and are known as Charlier’s Checks.
The non-cental moment is \[ \mu^\prime_r = \frac{1}{N}\sum_i f_i (x_i-A)^r \]
Add and subtract \(\bar{x}\)
\[ \begin{array}{rcl} \mu^\prime_r &=& \frac{1}{N}\sum_i f_i (x_i-\bar{x}+\bar{x}-A)^r\\ &=& \frac{1}{N}\sum_i f_i (z_i+\mu_1^\prime)^r\\ &=& \frac{1}{N}\sum_i f_i \left[ \sum_k C^r_k z_i^{r-k} {\mu_1^\prime}^k \right] \\ &=& \sum_k C^r_k \mu_{r-k}\, {\mu_1^\prime}^k\\ &=& \mu_r+C^r_1\mu_{r-1}{\mu_1^\prime}^1+\cdots+{\mu_1^\prime}^r \end{array} \]
Based on the moments \[ \begin{array}{rcl} \alpha_1 &=& \frac{\mu_1}{\sigma}\\ \alpha_2 &=& \frac{\mu_2}{\sigma^2}\\ \alpha_3 &=& \frac{\mu_3}{\sigma^3}\\ \alpha_4 &=& \frac{\mu_4}{\sigma^4} \end{array} \]
Karl Pearson defined the following four coefficients, based upon the first fout moments"about mean:
\[ \begin{array}{rcl} \beta_1 &=&\frac{\mu_3^2}{\mu_2^3}\\ \gamma_1 &=& + \sqrt{\beta_1}\\ \beta_2 &=& \frac{\mu_4}{\mu_2^2}\\ \gamma_2 &=& \beta_2 - 3 \end{array} \]
The frequency distribution is
Karl Pearson Coefficient of Skewness \[ s_k = \frac{Me-Mo}{\sigma} \]
The frequency distribution is Assymetric
###Based upon moments The coefficient of skewness is \[ Sk = \frac{\sqrt{\beta_1}(\beta_2 + 3 )}{2 (5\beta_2-6\beta_1-9)} \]
It is given by \[ CV=\frac{\sigma}{\bar{x}}\times 100 \] smaller CV means consistent.
Calculate the first four moments distribution about the mean and hence find \(\beta_1\) and \(\beta_2\)
x: 0 1 2 3 4 5 6 7 8
f: 1 8 28 56 70 56 28 8 1
Ans: The mean is 4 hence choose A = 4
## x f fx fx2 fx3 fx4
## 1 -4 1 -4 16 -64 256
## 2 -3 8 -24 72 -216 648
## 3 -2 28 -56 112 -224 448
## 4 -1 56 -56 56 -56 56
## 5 0 70 0 0 0 0
## 6 1 56 56 56 56 56
## 7 2 28 56 112 224 448
## 8 3 8 24 72 216 648
## 9 4 1 4 16 64 256
## 10 0 256 0 512 0 2816
## mu1 mu2 mu3 mu4
## 10 0 2 0 11
The \(\beta_1\) and \(\beta_2\) coefficients are calculated below
beta1 = result$mu3^2/result$mu2^3
beta2 = result$mu4/result$mu2^2
result2=c(beta1,beta2)
names(result2)=c('beta1','beta2')
result2
## beta1 beta2
## 0.00 2.75
rm(list=ls())
Find out the kurtosis of the data given below:
Class Interval : 0-10 10-20 2O-30 30-40
Frequency : 1 3 4 2
df=data.frame(x=seq(5,35,10),f=c(1,3,4,2))
N=sum(df$f)
df$x=df$x-22
df$fx=df$f*df$x
df$fx2=df$fx*df$x
df$fx3=df$fx2*df$x
df$fx4=df$fx3*df$x
sum_row=apply(df,2,sum)
df=rbind(df,sum_row)
df
## x f fx fx2 fx3 fx4
## 1 -17 1 -17 289 -4913 83521
## 2 -7 3 -21 147 -1029 7203
## 3 3 4 12 36 108 324
## 4 13 2 26 338 4394 57122
## 5 -8 10 0 810 -1440 148170
result=round(df[5,3:6]/N,2)
names(result)=c('mu1','mu2','mu3','mu4')
result
## mu1 mu2 mu3 mu4
## 5 0 81 -144 14817
beta1 = result$mu3^2/result$mu2^3
beta2 = result$mu4/result$mu2^2
result2=c(beta1,beta2)
names(result2)=c('beta1','beta2')
result2
## beta1 beta2
## 0.03901844 2.25834476
rm(list=ls())
r=3
choose(r,0:r)
## [1] 1 3 3 1
Lives of two models of refrigerators turned a recent survey are given below. What is the average life of each model of these refrigerators ? Which model shows more uniformity?
life=c('0-2','2-4','4-6','6-8','8-10','10-12')
modA=c(5,16,13,7,5,4)
modB=c(2,7,12,19,9,1)
data.frame(life,modA,modB)
## life modA modB
## 1 0-2 5 2
## 2 2-4 16 7
## 3 4-6 13 12
## 4 6-8 7 19
## 5 8-10 5 9
## 6 10-12 4 1
midx=seq(1,11,2)
df=data.frame(midx,modA,modB,xf=midx*modA,yf=midx*modB)
res=apply(df,2,sum)/50
xbar=res['xf']
ybar=res['yf']
df=data.frame(midx,modA,modB,xf=midx*modA,yf=midx*modB,
z2f=(midx-xbar)^2*modA,
w2f=(midx-ybar)^2*modB
)
df
## midx modA modB xf yf z2f w2f
## 1 1 5 2 5 2 84.8720 53.2512
## 2 3 16 7 48 21 71.9104 69.8992
## 3 5 13 12 65 60 0.1872 16.1472
## 4 7 7 19 49 133 24.7408 13.4064
## 5 9 5 9 45 81 75.2720 72.5904
## 6 11 4 1 44 11 138.2976 23.4256
res=apply(df,2,sum)/50
cv_A=sqrt(res['z2f'])/res['xf']*100
cv_B=sqrt(res['w2f'])/res['yf']*100
data.frame(cv_A=round(cv_A,2),cv_B=round(cv_B,2))
## cv_A cv_B
## z2f 54.92 36.21
data.frame(meanA=round(res['xf'],2),MeanB=round(res['yf'],2))
## meanA MeanB
## xf 5.12 6.16
Calculate Mean, Mode, Median, Standard Deviation, Quratile Deviation, Moments upto the order of 4, beta coefficients and gamma coefficients to the following frequency distribution
x=seq(175,245,10)
f=c(52,68,85,92,100,95,70,28)
df=data.frame(x,f)
To calculate the mean
#calculate mean
mean=sum(x*f)/sum(f)
data.frame(mean=mean)
## mean
## 1 208.9831
N=sum(f)
To calculate median
#calculate median (q2)
#data.frame(x,f,cumsum(f))
L_md=200
f_md=92
cf_md=205
Q2=L_md+10*(N/2-cf_md)/f_md
data.frame(median=Q2)
## median
## 1 209.7826
#calculate q1
L_q1=190
f_q1=85
cf_q1=120
Q1=L_q1+10*(N/4-cf_q1)/f_q1
Q1
## [1] 193.2353
#calculate q3
L_q3=220
f_q3=95
cf_q3=397
Q3=L_q3+10*(3*N/4-cf_q3)/f_q3
Q3
## [1] 224.7895
#calculate stdev
varx=sum(x^2*f)/sum(f)-(sum(x*f)/sum(f))^2
stdx=round(sqrt(varx),2)
#print table
data.frame(x,f,cumsum(f))
## x f cumsum.f.
## 1 175 52 52
## 2 185 68 120
## 3 195 85 205
## 4 205 92 297
## 5 215 100 397
## 6 225 95 492
## 7 235 70 562
## 8 245 28 590
#print the result
data.frame(mean,Q1,Q2,Q3,QD=(Q3-Q1)/2,stdx)
## mean Q1 Q2 Q3 QD stdx
## 1 208.9831 193.2353 209.7826 224.7895 15.77709 19.7
#calculate coefficient of variation
cv=stdx/mean*100
data.frame(cv=round(cv,2))
## cv
## 1 9.43
z=x-mean
mu2=sum(z^2*f)/sum(f)
mu3=sum(z^3*f)/sum(f)
mu4=sum(z^4*f)/sum(f)
beta1 = mu3^2/mu2^3
beta2 = mu4/mu2^2
result2=c(beta1,beta2)
names(result2)=c('beta1','beta2')
result2
## beta1 beta2
## 0.003424753 2.034009213
Find the mean, mode and median of the following frequency distribution
l=seq(40,160,10)
u=seq(49,169,10)
L=l-0.5
U=u+0.5
CI=paste(L,U,sep='-')
f=c(2,3,7,19,37,79,69,65,17,5,3,2,1)
N=sum(f)
#df=data.frame(CI,x=L+U,f)
A=94.5
x=(L+U)/2
u=(x-A)/10
df=data.frame(CI,f)
df
## CI f
## 1 39.5-49.5 2
## 2 49.5-59.5 3
## 3 59.5-69.5 7
## 4 69.5-79.5 19
## 5 79.5-89.5 37
## 6 89.5-99.5 79
## 7 99.5-109.5 69
## 8 109.5-119.5 65
## 9 119.5-129.5 17
## 10 129.5-139.5 5
## 11 139.5-149.5 3
## 12 149.5-159.5 2
## 13 159.5-169.5 1
df=data.frame(CI,x,f,u,fu=f*u,cf=cumsum(f))
#calculate mean
res=apply(df[3:5],2,sum,na.rm=TRUE)/N
ubar=res[3]
xbar=A+10*ubar
df
## CI x f u fu cf
## 1 39.5-49.5 44.5 2 -5 -10 2
## 2 49.5-59.5 54.5 3 -4 -12 5
## 3 59.5-69.5 64.5 7 -3 -21 12
## 4 69.5-79.5 74.5 19 -2 -38 31
## 5 79.5-89.5 84.5 37 -1 -37 68
## 6 89.5-99.5 94.5 79 0 0 147
## 7 99.5-109.5 104.5 69 1 69 216
## 8 109.5-119.5 114.5 65 2 130 281
## 9 119.5-129.5 124.5 17 3 51 298
## 10 129.5-139.5 134.5 5 4 20 303
## 11 139.5-149.5 144.5 3 5 15 306
## 12 149.5-159.5 154.5 2 6 12 308
## 13 159.5-169.5 164.5 1 7 7 309
meanx=xbar
data.frame(meanx)
## meanx
## fu 100.5194
#calculate median q2
l_q2=99.5
f_q2=69
cf_q2=147
q2=l_q2+10*(154.5-cf_q2)/f_q2
medianx=q2
#calculation of mode
#mean - mode = 3 (mean - median)
modex = 3*medianx - 2*meanx
modex
## fu
## 100.722
df
## CI x f u fu cf
## 1 39.5-49.5 44.5 2 -5 -10 2
## 2 49.5-59.5 54.5 3 -4 -12 5
## 3 59.5-69.5 64.5 7 -3 -21 12
## 4 69.5-79.5 74.5 19 -2 -38 31
## 5 79.5-89.5 84.5 37 -1 -37 68
## 6 89.5-99.5 94.5 79 0 0 147
## 7 99.5-109.5 104.5 69 1 69 216
## 8 109.5-119.5 114.5 65 2 130 281
## 9 119.5-129.5 124.5 17 3 51 298
## 10 129.5-139.5 134.5 5 4 20 303
## 11 139.5-149.5 144.5 3 5 15 306
## 12 149.5-159.5 154.5 2 6 12 308
## 13 159.5-169.5 164.5 1 7 7 309
data.frame(meanx,medianx,modex)
## meanx medianx modex
## fu 100.5194 100.587 100.722