project8.utf8

Last time, we tried to estimate Gini index given Lorenz curve using Integration. There are other ways to calculate Gini Index, which you can look for it elsewhere, but our concern is to estimate Gini index using a polynomial function.

Now, the question is why our estimators from the last time are not exact?
If you remember, Lorenz curve intersects with the line of equality at \((0,0)\) and \((1,1)\). When we estimated the polynomial functions, we calculated the intercept is equal to \(0\), so that: \(L(p)=0\) if \(p=0\). But we didn’t calculate Lorenz curve so that it will be equal to \(1\) when the line of equality equal to \(1\).
Today we try to satisfy the condition that: \(L(p)=1\) if \(p=1\). Since we try to estimate Lorenz curve using polynomials, if \(p=1\) then \(L(p)=1\) only if the summation of coefficients equals \(1\), and the closer the summation of coefficients to \(1\), the more accurate we can estimate Lorenz curve, which in turn yields more accurate Gini index estimator.
Simply: \[\begin{equation} L(p)=ap^n + bp^{n-1} + cp^{n-2}+ .... \end{equation}\] Where \(n\) is a positive integer, and \(a,b,c,...\) are constant coefficients. As you can see, if \(p=1\), then \(L(p)=1\) if \(a+b+c+...=1\).

We used quadratic, cubic, and quartic functions last time. Now, let’s see if the summation of coefficients equal to \(1\) using our income data from Ilocos in the Philippines, using higher polynomials.

#load the package.
library(ineq)

#load the data.
data("Ilocos")

head(Ilocos)

  income    sex family.size urbanity     province AP.income AP.family.size
1 133582   male         5.5    urban Ilocos Norte  153924.0              5
2 312800   male         1.0    urban Ilocos Norte  164531.5              1
3 395434   male         6.0    urban Ilocos Norte  195892.4              5
4 519235 female        10.0    urban Ilocos Norte  358701.0             10
5 321952   male         5.0    urban Ilocos Norte   55330.0              3
6 663557   male         4.0    urban Ilocos Norte  403462.5              6
  AP.weight
1      3844
2      3844
3      3844
4      3844
5      3844
6      3844

#subset the data to contain only the income values.
Ilocos<-Ilocos[,1]

head(Ilocos)

[1] 133582 312800 395434 519235 321952 663557

#sort the income values in an ascending order.
Ilocos<-sort(Ilocos)

#calculate cumulative fraction of the no. of households surveyed.
percentile_income<-ecdf(Ilocos)(Ilocos)

#merge it with our data.
Ilocos<-data.frame(Ilocos,percentile_income)

#calculate the empirical ordinary Lorenz curve, which is the cumulative proportion of income.
Ilocos$value<-cumsum(Ilocos$Ilocos)/sum(Ilocos$Ilocos)

#create a variable for p-squared.
Ilocos$percentile_income_sqr<-Ilocos$percentile_income^2

lm(value~0+percentile_income+percentile_income_sqr,data=Ilocos)

#create a new variable for p-cubed.
Ilocos$percentile_income_cub<-Ilocos$percentile_income^3

lm(value~0+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

#create p^4
Ilocos$percentile_income_four<-Ilocos$percentile_income^4

#estimate our function.
lm(value~0+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

Ilocos$percentile_income_five<-Ilocos$percentile_income^5

#estimate our function.
lm(value~0+percentile_income_five+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

Ilocos$percentile_income_six<-Ilocos$percentile_income^6

#estimate our function.
lm(value~0+percentile_income_six+percentile_income_five+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

Ilocos$percentile_income_seven<-Ilocos$percentile_income^7

#estimate our function.
lm(value~0+percentile_income_seven+percentile_income_six+percentile_income_five+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

Ilocos$percentile_income_eight<-Ilocos$percentile_income^8

#estimate our function.
lm(value~0+percentile_income_eight+percentile_income_seven+percentile_income_six+percentile_income_five+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

Ilocos$percentile_income_nine<-Ilocos$percentile_income^9

#estimate our function.
lm(value~0+percentile_income_nine+percentile_income_eight+percentile_income_seven+percentile_income_six+percentile_income_five+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

Ilocos$percentile_income_ten<-Ilocos$percentile_income^10

#estimate our function.
lm(value~0+percentile_income_ten+percentile_income_nine+percentile_income_eight+percentile_income_seven+percentile_income_six+percentile_income_five+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)

As you can see from the code, we used polynomial functions up until the \(10^{th}\) degree.

I calculated the summation of the coefficients in each of them. I submitted the code so that you can review the results yourself easily.

#create a vector for the degree of the polynomial function
x<-c(2:10)

#create a vector for the summation of coefficients for each polynomial function based on my manual calculations.
y<-c(0.845723,0.9106,0.94237,0.9582,0.97167,0.9811,0.9861,0.9875,1.68409)

#merge the two vectors in one data frame.
poly<-data.frame(x,y)

#change columns name.
colnames(poly)<-c("The degree of polynomial function","Summation of coefficients")

#present our results in a table.
knitr::kable(poly)

The degree of polynomial function	Summation of coefficients
2	0.845723
3	0.910600
4	0.942370
5	0.958200
6	0.971670
7	0.981100
8	0.986100
9	0.987500
10	1.684090

#visualize it!
plot(poly[,2]~poly[,1],type="o", xlab="The Degree of Polynomial Function",ylab="Summation of Coefficients",axes=F,col=ifelse(poly[,1]==4,"red","black"))
axis(side=1,at=c(2:10))
axis(side=2,at=c(0.85,1,1.7))
abline(h=1)
box()

As you can see, our functions converge to \(1\) up until the \(9^{th}\) degree. Now, we can have a closer look for the polynomials up until the \(9^{th}\) degree.

#subset the data to only contains the polynomials up until the 9th degree.
polysub<-poly[-9,]

#let's see it!
plot(polysub[,2]~polysub[,1],type="o", xlab="The Degree of Polynomial Function",ylab="Summation of Coefficients",ylim=c(0.84,1),col=ifelse(polysub[,2]==0.9875,"red","black"))
abline(h=0.9875,lty=2,col="red")
axis(4,at=c(0.9875),col="red")

The most accurate function is the the \(9^{th}\) degree polynomial function, in which the sum of its coefficients equals \(0.9875\). It’s not equal to \(1\) but it is more accurate than the quartic function that we used the last time.
Therefore, let’s estimate Lorenz curve using the \(9^{th}\) degree polynomial function:

#estimate our function.
lm(value~0+percentile_income_nine+percentile_income_eight+percentile_income_seven+percentile_income_six+percentile_income_five+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)


Call:
lm(formula = value ~ 0 + percentile_income_nine + percentile_income_eight + 
    percentile_income_seven + percentile_income_six + percentile_income_five + 
    percentile_income_four + percentile_income_cub + percentile_income_sqr + 
    percentile_income, data = Ilocos)

Coefficients:
 percentile_income_nine  percentile_income_eight  percentile_income_seven  
                65.6382                -230.3131                 327.5442  
  percentile_income_six   percentile_income_five   percentile_income_four  
              -241.5667                  97.5466                 -19.6303  
  percentile_income_cub    percentile_income_sqr        percentile_income  
                 0.8377                   0.7621                   0.1688

\[\begin{equation} L(p)=65.6382 \ p^9 -230.3131 \ p^8 +327.5442 \ p^7 -241.5667 \ p^6 +97.5466 \ p^5 - 19.6303 \ p^4 +0.8377 \ p^3+0.7621 \ p^2+0.1688 \ p \\ \int_{0}^{1} L(p) \ dp = \int_{0}^{1} 65.6382 \ p^9 -230.3131 \ p^8 +327.5442 \ p^7 -241.5667 \ p^6 +97.5466 \ p^5 - 19.6303 \ p^4 +0.8377 \ p^3+0.7621 \ p^2+0.1688 \ p \ dp \\ =6.56382 \ p^{10}-25.59034444 \ p^9+40.943025 \ p^8-34.50952857 \ p^7+16.25776667 \ p^6-3.92606 \ p^5+0.209425 \ p^4+\frac{7621}{30000} \ p^3+0.0844 \ p^2 \Big|_0^1 \\ = 6.56382-25.59034444+40.943025-34.50952857+16.25776667-3.92606+0.209425+\frac{7621}{30000}+0.0844 = \mathbf{0.2865369933} \end{equation}\]

Therefore, let’s estimate Gini index:

\[\begin{equation} G=1-(2*0.2865369933) \\ =\mathbf{0.4269260134} \approx \mathbf{{0.426926}} \end{equation}\]

Then let’s graph our function:

#create our function in R.
our_function<-function(p){
     return(65.6382*p^9-230.3131*p^8+327.5442*p^7-241.5667*p^6+97.5466*p^5-19.6303*p^4+0.8377*p^3+0.7621*p^2+0.1688*p)
}

#now plot it!
plot(our_function,xlab="Cumulative fraction of households",ylab="Cumulative fraction of income",main="Lorenz curve")
points(x=1,y=0.9875,col="red",cex=2)
lines(Ilocos$percentile_income,Ilocos$percentile_income)

Now, calculate Gini index directly without any calculations:

#calculate Gini index.
ineq(Ilocos$Ilocos)

[1] 0.4269508

As you can see, our estimator is biased by \(-0.0000247866\). You can see in the graph, highlighted by the red circle, that Lorenz curve doesn’t reach the line of absolute equality at \(1\).
We assume that the bias exists because the summation of coefficients of Lorenz function is less than \(1\), as we mentioned. Nevertheless, it is now more accurate than before.
Finally, I hope you had fun!

About Income Distribution-part 2

About Income Distribution-part 2

Ahmed Elhefnawy

Ahmed Elhefnawy