Income inequality is a challenge for all countries that governments try to solve or minimize to ensure the welfare of the society’s members.
We try to have a look at two indicators for it. First, the annual percentage average growth rate in per capita real survey mean consumption or income for the bottom \(\mathbf{40\%}\) of the population, whose data is provided by the World Bank. According to the World Bank, “The growth rate in the welfare aggregate of the bottom \(40\%\) is computed as the annualized average growth rate in per capita real consumption or income of the bottom \(40\%\) of the population in the income distribution in a country from household surveys over a roughly \(5\)-year period. Mean per capita real consumption or income is measured at \(2011\) Purchasing Power Parity (PPP) using the PovcalNet. For some countries means are not reported due to grouped and/or confidential data. The annualized growth rate is computed as: \[\begin{equation} [\frac{\text{Mean in final year}}{\text{Mean in initial year}}]^{{1}/{(\text{Final year}) - (\text{Initial year})}} -1 \end{equation}\] The reference year is the year in which the underlying household survey data was collected. In cases for which the data collection period bridged two calendar years, the first year in which data were collected is reported. The initial year refers to the nearest survey collected \(5\) years before the most recent survey available, only surveys collected between \(3\) and \(7\) years before the most recent survey are considered. The final year refers to the most recent survey available between \(2011\) and \(2015\). Growth rates for Iraq are based on survey means of \(2005\) PPP$. The coverage and quality of the \(2011\) PPP price data for Iraq and most other North African and Middle Eastern countries were hindered by the exceptional period of instability they faced at the time of the \(2011\) exercise of the International Comparison Program”.
The second indicator is Gini Index, estimated by the World Bank. According to the World Bank, “Gini index measures the extent to which the distribution of income (or, in some cases, consumption expenditure) among individuals or households within an economy deviates from a perfectly equal distribution. A Lorenz curve plots the cumulative percentages of total income received against the cumulative number of recipients, starting with the poorest individual or household. The Gini index measures the area between the Lorenz curve and a hypothetical line of absolute equality, expressed as a percentage of the maximum area under the line. Thus a Gini index of \(0\) represents perfect equality, while an index of \(100\) implies perfect inequality”.
The basic idea is to see the income inequality/distribution alongside the growth rate of real consumption/income per capita for the bottom \(40\%\) of the population. We will see that the data doesn’t cover all countries neither all years, yet it still can give us a glimpse about income inequality.
To plot Lorenz curve in \(R\), you can use lorenz.curve() function in lawstat package, or Lc function in ineq package.
#load the packages needed.
library(readxl)
library(tidyr)
library(plotly)
library(ggiraph)
library(ggrepel)
library(DT)
library(lawstat)
#import the data. I renamed the file as you can see.
cons<-read_xls("percapitacons.xls")
head(cons)
# A tibble: 6 x 65
`Data Source` `World Developm… ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Last Updated… 44274 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Country Name Country Code Indi… Indi… 1960 1961 1962 1963 1964 1965
4 Aruba ABW Annu… SI.S… <NA> <NA> <NA> <NA> <NA> <NA>
5 Afghanistan AFG Annu… SI.S… <NA> <NA> <NA> <NA> <NA> <NA>
6 Angola AGO Annu… SI.S… <NA> <NA> <NA> <NA> <NA> <NA>
# … with 55 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
# ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
# ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
# ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>,
# ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>, ...34 <chr>,
# ...35 <chr>, ...36 <chr>, ...37 <chr>, ...38 <chr>, ...39 <chr>,
# ...40 <chr>, ...41 <chr>, ...42 <chr>, ...43 <chr>, ...44 <chr>,
# ...45 <chr>, ...46 <chr>, ...47 <chr>, ...48 <chr>, ...49 <chr>,
# ...50 <chr>, ...51 <chr>, ...52 <chr>, ...53 <chr>, ...54 <chr>,
# ...55 <chr>, ...56 <chr>, ...57 <chr>, ...58 <chr>, ...59 <chr>,
# ...60 <chr>, ...61 <chr>, ...62 <chr>, ...63 <chr>, ...64 <chr>,
# ...65 <chr>
#remove the first two rows.
cons<-cons[-c(1:2),]
#rename the columns as the values from the first row.
colnames(cons)<-cons[1,]
#we don't need the first row anymore, so we can remove it.
cons<-cons[-c(1),]
#remove the third and fourth columns as we don't need them.
cons<-cons[,-c(3,4)]
head(cons)
# A tibble: 6 x 63
`Country Name` `Country Code` `1960` `1961` `1962` `1963` `1964` `1965` `1966`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Aruba ABW <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Afghanistan AFG <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Angola AGO <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Albania ALB <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Andorra AND <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Arab World ARB <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# … with 54 more variables: 1967 <chr>, 1968 <chr>, 1969 <chr>, 1970 <chr>,
# 1971 <chr>, 1972 <chr>, 1973 <chr>, 1974 <chr>, 1975 <chr>, 1976 <chr>,
# 1977 <chr>, 1978 <chr>, 1979 <chr>, 1980 <chr>, 1981 <chr>, 1982 <chr>,
# 1983 <chr>, 1984 <chr>, 1985 <chr>, 1986 <chr>, 1987 <chr>, 1988 <chr>,
# 1989 <chr>, 1990 <chr>, 1991 <chr>, 1992 <chr>, 1993 <chr>, 1994 <chr>,
# 1995 <chr>, 1996 <chr>, 1997 <chr>, 1998 <chr>, 1999 <chr>, 2000 <chr>,
# 2001 <chr>, 2002 <chr>, 2003 <chr>, 2004 <chr>, 2005 <chr>, 2006 <chr>,
# 2007 <chr>, 2008 <chr>, 2009 <chr>, 2010 <chr>, 2011 <chr>, 2012 <chr>,
# 2013 <chr>, 2014 <chr>, 2015 <chr>, 2016 <chr>, 2017 <chr>, 2018 <chr>,
# 2019 <chr>, 2020 <chr>
#transform our data from wide format to long format. We called our variable "growth".
cons<-gather(cons,year,growth,-c(1:2))
#check if "year" is recognized as numeric.
class(cons$year)
[1] "character"
#declare them as numeric.
cons$year<-as.numeric(cons$year)
#check if "growth" is recognized as numeric.
class(cons$growth)
[1] "character"
#declare it as numeric.
cons$growth<-as.numeric(cons$growth)
#rename the first two columns.
colnames(cons)[1:2]<-c("country","code")
head(cons)
# A tibble: 6 x 4
country code year growth
<chr> <chr> <dbl> <dbl>
1 Aruba ABW 1960 NA
2 Afghanistan AFG 1960 NA
3 Angola AGO 1960 NA
4 Albania ALB 1960 NA
5 Andorra AND 1960 NA
6 Arab World ARB 1960 NA
Let’s have a look at the annual growth of real consumption per capita across countries in a specific year, for example \(2018\).
#subset our data to only include 2018.
conssub<-dplyr::filter(cons,year==2018)
#remove the missing values.
conssub<-na.omit(conssub)
head(conssub)
# A tibble: 6 x 4
country code year growth
<chr> <chr> <dbl> <dbl>
1 United Arab Emirates ARE 2018 7.08
2 Armenia ARM 2018 1.26
3 Austria AUT 2018 0.14
4 Belgium BEL 2018 1.41
5 Bulgaria BGR 2018 10.4
6 Switzerland CHE 2018 -0.34
#I create a vector manually for the classification of countries according to the World Bank.
level<-c("High income","Upper middle income","High income","High income","Upper middle income","High income","High income","High income","High income","High income","High income","High income","High income","High income","High income","High income","Upper middle income","Upper middle income","Lower middle income","High income","High income","High income","Lower middle income","Upper middle income","High income","Lower middle income","High income","High income","Lower middle income","Lower middle income","High income","High income","High income","Upper middle income","Low income","High income","High income","High income","High income","Lower middle income","High income","Lower middle income")
#merge the vector with our ourdata for 2018.
conssub$level<-level
#plot our data for 2018.
ggplotly(ggplot(data=conssub,aes(x=country,y=growth,color=level))+geom_bar(aes(x=country,y=growth),stat="identity")+geom_text(aes(label=code),size=2.5,nudge_y = 0.5)+theme(axis.title.x=element_text(family="Sans-serifs",size=9),axis.text.x = element_blank(),legend.title = element_blank(),axis.title.y = element_text(size=7,family="Sans-serifs"))+scale_y_continuous(breaks = seq(-2,14,by=2))+labs(x="Country", y="Annualized average growth rate of per capita real consumption, 2018, bottom 40% of population (%)"))
The plot is interactive, so you can see the rate for each available country in \(2018\).
Let’s start with our second indicator: Gini index.
#before we start with gini index data, let's remove the missing values from our original dataset.
cons<-na.omit(cons)
#import the data. I renamed the file as you can see.
gini<-read_xls("gini_index.xls")
head(gini)
# A tibble: 6 x 65
`Data Source` `World Developm… ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Last Updated… 44274 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Country Name Country Code Indi… Indi… 1960 1961 1962 1963 1964 1965
4 Aruba ABW Gini… SI.P… <NA> <NA> <NA> <NA> <NA> <NA>
5 Afghanistan AFG Gini… SI.P… <NA> <NA> <NA> <NA> <NA> <NA>
6 Angola AGO Gini… SI.P… <NA> <NA> <NA> <NA> <NA> <NA>
# … with 55 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
# ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
# ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
# ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>,
# ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>, ...34 <chr>,
# ...35 <chr>, ...36 <chr>, ...37 <chr>, ...38 <chr>, ...39 <chr>,
# ...40 <chr>, ...41 <chr>, ...42 <chr>, ...43 <chr>, ...44 <chr>,
# ...45 <chr>, ...46 <chr>, ...47 <chr>, ...48 <chr>, ...49 <chr>,
# ...50 <chr>, ...51 <chr>, ...52 <chr>, ...53 <chr>, ...54 <chr>,
# ...55 <chr>, ...56 <chr>, ...57 <chr>, ...58 <chr>, ...59 <chr>,
# ...60 <chr>, ...61 <chr>, ...62 <chr>, ...63 <chr>, ...64 <chr>,
# ...65 <chr>
#remove the first two rows since both are empty.
gini<-gini[-c(1:2),]
#rename the columns as the first row values.
colnames(gini)<-gini[1,]
#we don't need the first row anymore, so remove it.
gini<-gini[-c(1),]
#remove the third and the fourth columns as we don't need them.
gini<-gini[,-c(3:4)]
#let's see which columns have only missing values. We will remove them later.
names(which(colSums(is.na(gini))==nrow(gini)))
[1] "1960" "1961" "1962" "1963" "1964" "1965" "1966" "1968" "1970" "1972"
[11] "1973" "1976" "1977" "2020"
head(gini)
# A tibble: 6 x 63
`Country Name` `Country Code` `1960` `1961` `1962` `1963` `1964` `1965` `1966`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Aruba ABW <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Afghanistan AFG <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Angola AGO <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Albania ALB <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Andorra AND <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Arab World ARB <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# … with 54 more variables: 1967 <chr>, 1968 <chr>, 1969 <chr>, 1970 <chr>,
# 1971 <chr>, 1972 <chr>, 1973 <chr>, 1974 <chr>, 1975 <chr>, 1976 <chr>,
# 1977 <chr>, 1978 <chr>, 1979 <chr>, 1980 <chr>, 1981 <chr>, 1982 <chr>,
# 1983 <chr>, 1984 <chr>, 1985 <chr>, 1986 <chr>, 1987 <chr>, 1988 <chr>,
# 1989 <chr>, 1990 <chr>, 1991 <chr>, 1992 <chr>, 1993 <chr>, 1994 <chr>,
# 1995 <chr>, 1996 <chr>, 1997 <chr>, 1998 <chr>, 1999 <chr>, 2000 <chr>,
# 2001 <chr>, 2002 <chr>, 2003 <chr>, 2004 <chr>, 2005 <chr>, 2006 <chr>,
# 2007 <chr>, 2008 <chr>, 2009 <chr>, 2010 <chr>, 2011 <chr>, 2012 <chr>,
# 2013 <chr>, 2014 <chr>, 2015 <chr>, 2016 <chr>, 2017 <chr>, 2018 <chr>,
# 2019 <chr>, 2020 <chr>
#transform our data from wide to long format. We named gini index values: "gini".
gini<-gather(gini,year,gini,-c(1:2))
head(gini)
# A tibble: 6 x 4
`Country Name` `Country Code` year gini
<chr> <chr> <chr> <chr>
1 Aruba ABW 1960 <NA>
2 Afghanistan AFG 1960 <NA>
3 Angola AGO 1960 <NA>
4 Albania ALB 1960 <NA>
5 Andorra AND 1960 <NA>
6 Arab World ARB 1960 <NA>
#check if "year" and "gini" are recognized as numeric.
class(gini$year)
[1] "character"
class(gini$gini)
[1] "character"
#declare both as numeric.
gini$year<-as.numeric(gini$year)
gini$gini<-as.numeric(gini$gini)
#now remove the missing values.
gini<-na.omit(gini)
#rename the columns with shorter names.
colnames(gini)[1:2]<-c("country","code")
head(gini)
# A tibble: 6 x 4
country code year gini
<chr> <chr> <dbl> <dbl>
1 Sweden SWE 1967 34
2 United Kingdom GBR 1969 33.7
3 Canada CAN 1971 37.3
4 United Kingdom GBR 1974 30
5 United States USA 1974 35.3
6 Canada CAN 1975 33.3
Now, let’s merge the two indicators together in one dataset.
#merge the two indicators.
ourdata<-merge(cons,gini)
head(ourdata)
country code year growth gini
1 Albania ALB 2017 2.47 33.2
2 Argentina ARG 2019 -1.46 42.9
3 Armenia ARM 2018 1.26 34.4
4 Austria AUT 2018 0.14 30.8
5 Belarus BLR 2019 1.11 25.3
6 Belgium BEL 2018 1.41 27.2
#let's plot both of them.
girafe(ggobj = ggplot(data=ourdata,mapping = aes(x=gini,y=growth,color=factor(year)))+geom_point_interactive(aes(x=gini,y=growth,color=factor(year),tooltip=country,data_id=country))+geom_text_repel(aes(label=code),size=2)+scale_x_continuous(breaks = seq(20,60,by=5))+scale_y_continuous(breaks = seq(-4,14,by=2))+labs(x="Gini Index",y="Annualized average growth rate of per capita real consumption, bottom 40% of population (%)")+theme_minimal(base_size = 7,base_family = "Sans-serifs")+theme(legend.title = element_blank()))
To see the whole data with the specific values, we may present them in a table.
#first reorder the columns.
ourdata<-ourdata[,c(2,1,3,4,5)]
#rename the rows properly.
row.names(ourdata)<-1:nrow(ourdata)
#rename the columns properly.
colnames(ourdata)[4:5]<-c("Annual percentage average growth of real consumption per capita<br>(bottom 40% of population)","Gini Index")
#present our table.
datatable(ourdata,options = list(
columnDefs = list(list(className = 'dt-center', targets = "_all"))
),escape=F)
Now the data for both indicators is easily accessible and visualized as well, you can explore it at your comfort.
We didn’t infer from the data because of many reasons, one of them is that the data is highly unbalanced, and we removed all the missing values. We don’t know why are the missing values are missing, we can’t assume randomness because the quality of the surveyed data and the ease of access and collection differ significantly across countries.
We try to look how can we derive Gini index given Lorenz curve. We will first plot Lorenz curve using income data from surveys conducted in Ilocos, a region in the Philippines, by the Philippines’ National Statistics Office. The data is availabe in \(R\) under the name Ilocos in ineq package.
#load the package.
library(ineq)
#load the data.
data("Ilocos")
head(Ilocos)
income sex family.size urbanity province AP.income AP.family.size
1 133582 male 5.5 urban Ilocos Norte 153924.0 5
2 312800 male 1.0 urban Ilocos Norte 164531.5 1
3 395434 male 6.0 urban Ilocos Norte 195892.4 5
4 519235 female 10.0 urban Ilocos Norte 358701.0 10
5 321952 male 5.0 urban Ilocos Norte 55330.0 3
6 663557 male 4.0 urban Ilocos Norte 403462.5 6
AP.weight
1 3844
2 3844
3 3844
4 3844
5 3844
6 3844
#subset the data to contain only the income values.
Ilocos<-Ilocos[,1]
head(Ilocos)
[1] 133582 312800 395434 519235 321952 663557
#sort the income values in an ascending order.
Ilocos<-sort(Ilocos)
#calculate cumulative fraction of the no. of households surveyed.
percentile_income<-ecdf(Ilocos)(Ilocos)
#merge it with our data.
Ilocos<-data.frame(Ilocos,percentile_income)
#calculate the empirical ordinary Lorenz curve, which is the cumulative proportion of income.
Ilocos$value<-cumsum(Ilocos$Ilocos)/sum(Ilocos$Ilocos)
head(Ilocos)
Ilocos percentile_income value
1 6067 0.001582278 8.548833e-05
2 13999 0.003164557 2.827442e-04
3 17525 0.004746835 5.296838e-04
4 19451 0.006329114 8.037622e-04
5 19781 0.007911392 1.082491e-03
6 19863 0.009493671 1.362374e-03
#let's plot our data.
plot(Ilocos$value~Ilocos$percentile_income,xlab="Cumulative fraction of households",ylab="Cumulative fraction of income",main="Lorenz curve")
lines(Ilocos$percentile_income,Ilocos$percentile_income)
The plot here is based on our calculation, so let’s plot Lorenz curve directly without calculation.
#plot Lorenz curve directly without calculation.
plot(Lc(Ilocos$Ilocos))
Gini coefficient, or index, captures the area between the line of equality and Lorenz curve.
Let \(p\) denotes the cumulative fraction of households, and \(L(p)\) denotes the cumulative fraction of income earned by \(p\).
To get the area between the two curves, Gini index \(G\), we can calculate the difference between their integrals, such that: \[\begin{equation} G= 2 \int_{0}^{1} p-L(p)\ dp \end{equation}\] We multiply by \(2\) as a factor to scale the area, so that Gini index lives in the interval \([0,1]\). We integrate over the interval \([0,1]\) because \((p,L(p))\) strictly live in that interval. You can multiply by \(200\) if you want the index to live in the interval \([0,100]\) and integrate over \([0,100]\). Since the line of equality has to satisfy the condition that \(p=L(p)\ \forall \ (p,L(p))\), such that \((p,L(p)) \in [0,1]\), therefore the area under it is \(0.5\). So, \[\begin{equation} \int_{0}^{1} p\ dp = 0.5\\ \end{equation}\] ∴\[\begin{equation} G= 1- 2 \int_{0}^{1} L(p)\ dp \end{equation}\]
Now, the idea is to estimate \(L(P)\) to calculate Gini index. Keep in mind that Lorenz curve is not linear, so we might estimate it with a quadratic function. For simplicity, assume that: \(L(p)=ap^2 +bp\).
#create a variable for p-squared.
Ilocos$percentile_income_sqr<-Ilocos$percentile_income^2
#estimate our function. Note that we assume the intercept to be zero because if the cumulative proportion of households doesn't exist (=0), then they can't earn any income! because they are not there :)
lm(value~0+percentile_income+percentile_income_sqr,data=Ilocos)
Call:
lm(formula = value ~ 0 + percentile_income + percentile_income_sqr,
data = Ilocos)
Coefficients:
percentile_income percentile_income_sqr
0.003426 0.842297
Therefore, our estimation is: \[\begin{equation} L(p)=0.842297 \ p^2 +0.003426 \ p \end{equation}\] Let’s calculate the definite integral over the interval \([0,1]\): \[\begin{equation} \int_{0}^{1} 0.842297 \ p^2 + 0.003426 \ p \ dp \\ = 0.2807657 \ p^3 + 0.001713 \ p^2 \Big|_0^1 \\ \\ = 0.2807657+0.001713 = \mathbf{0.2824787}. \end{equation}\] Therefore, \[\begin{equation} G= 1-(2*0.2824787) \\ = \mathbf{0.4350426} \end{equation}\]
Let’s go further with a cubic function such that: \[\begin{equation} L(p)=ap^3+bp^2+cp \end{equation}\]
#create a new variable for p-cubed.
Ilocos$percentile_income_cub<-Ilocos$percentile_income^3
#estimate our function.
lm(value~0+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)
Call:
lm(formula = value ~ 0 + percentile_income_cub + percentile_income_sqr +
percentile_income, data = Ilocos)
Coefficients:
percentile_income_cub percentile_income_sqr percentile_income
0.9795 -0.4647 0.3958
Therefore, \[\begin{equation} L(p)=0.9795 \ p^3-0.4647 \ p^2+0.3958 \ p \end{equation}\]
Then, \[\begin{equation} \int_{0}^{1} 0.9795 \ p^3 - 0.4647 \ p^2 + 0.3958 \ p \ dp \\ = 0.244875 \ p^4 -0.1549 \ p^3+0.1979 \ p^2 \Big|_0^1 \\ \\ =0.244875-0.1549+0.1979=\mathbf{0.287875} \end{equation}\] Therefore, \[\begin{equation} G=1-(2*0.287875) \\ =\mathbf{0.42425} \end{equation}\]
Let’s do it with a quartic function: \(L(p)=ap^4+bp^3+cp^2+dp\).
#create p^4
Ilocos$percentile_income_four<-Ilocos$percentile_income^4
#estimate our function.
lm(value~0+percentile_income_four+percentile_income_cub+percentile_income_sqr+percentile_income,data=Ilocos)
Call:
lm(formula = value ~ 0 + percentile_income_four + percentile_income_cub +
percentile_income_sqr + percentile_income, data = Ilocos)
Coefficients:
percentile_income_four percentile_income_cub percentile_income_sqr
1.80008 -2.39830 1.46695
percentile_income
0.07364
\[\begin{equation} \int_{0}^{1} 1.80008 \ p^4 -2.39830 \ p^3+1.46695 \ p^2+0.07364 \ p \ dp \\ = 0.360016 \ p^5-0.599575 \ p^4 +0.4889833 \ p^3+0.03682 \ p^2 \Big|_0^1 \\ =0.360016-0.599575+0.4889833+0.03682= \mathbf{0.2862443}. \end{equation}\]
\[\begin{equation} G=1-(2*0.2862443) \\ =\mathbf{0.4275114} \end{equation}\]
Now let’s calculate Gini Index directly from \(R\) without calculation:
Gini(Ilocos$Ilocos)
[1] 0.4269508
gini.index(Ilocos$Ilocos)
Measures of Relative Variability - Gini Index
data: Ilocos$Ilocos
Gini Index = 0.42763, delta = 96039
As you can see, our estimators are very close, with the quartic function as the closest.
And finally, I hope you had fun!