Marginal Propensity to Consume in USA by data ordered to percentiles

Francisco Javier Parra Rodríguez

Universidad de Cantabria (UNICAN), España

Abstract

The objective of this article is to estimate a marginal propensity to consume ( MPC) for the American economy through a classic Keynesian consumption function with microdata from household surveys . The MPC value are influenced by alternative definitions of spending units or consumption expenditure and income, at real or a per capita level, usually the data are grouped by arbitrary but seemingly reasonable income intervals. The form to data are collected to describe expenditure patterns, organized by income interval, and highly aggregated, determine the different MPC in consumption function. We propose an estimation method that uses income percentiles, expenditure percentiles and the income-expenditure percentiles of grouping survey sample. These estimates usually have normal problems in error term due to outlier, dependence, heteroscedasticity or the existence of nonlinear relationships in the variables. By Regression Band Spectrum (RBS) , and Generalized Linear Models (GLM), we can solve this problems.

Classification JEL (Journal of Economic Literature): E12, E21,R21

The consumption functions

The consumption functions show the relationship between real disposable income and consumer spending:

\(C_i=a+bY_i+e_i\) (1)

Where, \(C_i\) is the household \(i\) Consumer spending, \(Y_i\) , the household \(i\) income, and \(e_i\) is a vector of disturbance terms each with zero mean and constant variance, \(\sigma^2\).

The ratio of total consumption to total income, \(\frac{\sum C_i}{\sum Y_i}\), is known as the average propensity to consume; the ratio of consumption changes to income changes,\(b\), is MPC, and, \(a\), the autonomous consumption, or the level of consumption that would still exist even if household income was \(0\).

To time \(t\), the total consumer expenditure in an area is:

\(C_t= \sum_i C_i = \sum_i(a_t + b_t Y_i + e_i) = n_t a_t + b_t \sum_i Y_i\) (2)

Where, \(n_t\) are the total household at time \(t\) in this area, if \(e_i\) is a disturbance term with zero mean then: \(\sum_i e_i = 0\).

Now,\(Y_t\) and \(n_t\) are not fixed variables, and assuming some random variability in \(a_t\) and \(b_t\) around the expected value :\(a\) and \(b\), (2) could be express:

\(C_t= a n_t + b Y_t + u_t\), \(t=1,...,T\) (3)

Where \(u_t\) is a vector of disturbance terms each with zero mean and constant variance, \(\sigma^2\).

Dividing (3) to \(n_t\), the formulation to consumption functions in Keynes is obtained :

\(c_t = a + b y_t + u_t\)(4)

Where \(c_t\) is the Consumer spending per capita in time \(t\), \(y_t\) is the income per capita in \(t\) time.

The Keynes consumption theory, was soon questioned both its theoretical simplicity as by empirical evidence. Kutnets, Feber, Goldsmith and others of a time series or long-run estimated PMC around 0.90 in USA. On the other hand, studies utilizing expenditure survey data found cross-sectional or household MPC’s mostly in the range of 0.60 to 0.80. These empirical differences, originated the consumption function paradox, or that somehow individual or household behaviour was different than aggregate behaviour, it implied that The Keynes consumption theory was incomplete , possibly even incorrect. The paradox, stimulated efforts to devise a more complete theory of consumption, the best known are Dusenberry’s (1949), Friedman’s (1957) permanent income hypothesis, and Modigliani and Brumberg’s (1945) life-cycle hypothesis.

According to Friedman’s “permanent income hypothesis,” the consumption of a household is proportional to its permanent income, that is, the average income it expects to earn over its planning horizon. Friedman is not definite about either the factor of proportionality-which might vary with the household’s stage in the life cycle, its wealth, the interest rate, and other variables-or about the length of the planning horizon. On these matters the lifetime income hypothesis of Modigliani is much more explicit. In any case, Friedman employs his hypothesis to explain both the evidence of cross-section budget surveys and the ratchet effect observed in aggregate time series.

Both Modigliani and Brumberg and Friedman used the income elasticity of consumption, defined at the mean as: : \(N_{cy}=\frac {\bar C}{\bar Y}\).

to indicate the existence of transitory income. According to Modigliani and Brumberg (1954) when expected equaled actual income for every household, “the elasticity of consumption with respect to income is unity.” However, “in the presence of short-term fluctuations in income, the proportion of income consumed will tend to fall with income and the elasticity of consumption with respect to income will be less than one.” In Friedman (1957), the elasticity of consumption with respect to income “measures the fraction of the variance of measured income attributable to variation in the permanent component: the higher the elasticity, the smaller the importance of transitory factors relative to permanent factors” and that when the income elasticity equals one, “transitory components are all zero.”

Bunting, D (2003) think that the assumption that all consumer units have identical permanent incomes is implausible. Repeated Consumer Expenditure Surveys conducted by the Bureau of Labor Statistics have shown that individual incomes differ on the basis of individual human capital qualities, demographic characteristics such as age, sex and race, and wealth . If random or transitory influences actually determine the range of cross sectional incomes, then the income and the educational or demographic characteristics of households should not be correlated.

On the other hand, the MPC’s in consumption functions are influenced by alternative definitions of spending units or consumption expenditure, at real or a per capita level (Bunting, 1989), usually the data were grouped by arbitrary but seemingly reasonable income intervals. To show “typical” behaviour, average income and average consumption were calculated by dividing each group by its number of households. The form to data are collected to describe expenditure patterns, organized by income interval, and highly aggregated, determine the different MPC in consumption function.

In Bounting (1989) the estimate of consumption functions to cross section data of BLS to 1960 by income level, such as “under \(1000\)”, “\(1000\) to \(2000\)”, “\(2000\) to \(3000\)”, and so on, produces a classic “paradoxical” cross-sectional consumption: the PPC (0.78) is lower than that for the long run. But the percentage distribution of households clearly shows that some intervals represent far more households than others, the income interval used is not a linear transformation of ungrouped data. If corrected the problem of unequal group size using a weighted regression, or exclude the lowest and highest income groups, for example were estimated MPC upcoming to \(N_{yc}\): 0.83 and 0.80, respectively.

We can formulate the consumption function in household by:

\(C_i= b_i Y_i\) (5)

Were \(C_i\) is the Consumer spending in \(i\) household,and \(Y_i\) is the income in \(i\) household..

Grouped the household by income intervals, the consumption expenditure of each group are:

\(\sum_{i=1}^m C_i = \sum_{i=1}^m b_i Y_i\) (6)

Where \(m\) are the household include in the group.

Dividing (6) by household number and thinking that all household of group are identical behaviour in MPC, \(b_j=\frac {\sum_{i=1}^{m} b_i}{m}\), so:

\(c_j = b_j y_j + e_j\) (7)

were \(c_j\) and \(y_j\) are the consumer spending and income per cápita of each group, and \(e_j\) was the difference to use \(b_j\) in place of the average propensity to consume in the group \(j\), \(\frac {\bar C}{\bar Y}\).

SA criterium to grouped the household whit identical household in each group are the percentile of income:

\(c_j = a + b_j y_j + e_j\)(8)

Where \(e_j\) are a random error term with zero mean, constant variance, \(\sigma^2\),and independent (\(cov(e_j,e_s)=0\)).

In that case,\(b_1\), \(b_2\),…,\(b_100\) are the MPC to household of class 1, class 2 ….. , and we expect null difference into MPC and elasticity consumption within class .

Estimation methodology

The function (5) define a no linear relationship into consumer and income household. We assume that \(b_i\) oscillates randomly around this center of gravity, (5) is specified as (1), a lineal relationship, and we can be use Ordinary Least Squares (OLS). However, it seems reasonable to think that \(b_i\) change depending on household social class. In this case, the relationship into consumer and income household oscillates around this center of gravity with cyclical patterns, periodic and no-periodic . The equation (5) could be estimated by Generalized Linear Model, or no parametric techniques: kernel or spline. The relationship (8) could be estimated using dummys for group (\(D_j\)),would be specified:

\(c_i = a + b_1 Y_i D_j + \epsilon_i\) (9)

Another form to be a relationship into consumer and income household, discriminating by social group, are by a Fourier development:

\(c_j = a + b y_j + \sum_{s=1}^S [a_s\cos(\omega_s)+b_j\sin(\omega_s) + \upsilon_j\) (10)

were, \(S=50\), \(\omega_s = 2 \pi \frac {j}{S}\),and \(\upsilon_j\) a random variable with zero mean, constant variance, \(\sigma^2\), and independent (\(cov(e_j,e_s)=0\)).

The function (10) contain cycles of different frequencies and amplitudes and such combinations of frequencies and amplitudes may yield cyclical patterns which appear non-periodic with irregular amplitude. Given the error term uncorrelated, the more frequent cyclical variations on the tendency line are the differences in PMC between close social groups, and the least frequent cyclical variations on the tendency line are the differences in PMC between distant groups.

To estimate (10) is presented below a series of functions based R (Parra, 2015), the target is make a regression band spectrum (RBS) (Engle, 1974), using the Durbin test (Durbin, 1969) to select the oscillations band.

Test based on residuals from frequency domain regresion

Durbin (1967 and 1969) desing a technique for studying the general nature of the serial dependence in a satacionary time series.

Suppose \(\hat \beta\) is an estimator of \(\beta\). The n x 1 vector of residuals is then defined by

\(\hat u=y-X\hat \beta\)

\(p_j\) denotes the ordinate of the periodogram de \(\hat u\) at frequency \(\lambda_j=2\pi j/n\), and \(\hat v_j\) denote the \(j-th\) element of \(\hat v\), then

\(p_j=\hat v_{2j}^{2}+\hat v_{2j+1}^{2}\)

\(j = 1,...\frac{n}{2}-1\) to \(n\) even, and \(j=1,...\frac{n-1}{2}\) to \(n\) odd,

\(p_j=\hat v_{2j}^{2}\) \(j = \frac{n}{2}-1\) \(n\) even

\(p_0=\hat v_{1}^{2}\)

As regards test statistics, if \(\hat \beta\) is the OLS estimator of \(\beta\) then the elements of \({\hat v}\) may be used directly in Durbin’s cumulative periodogram test. This test is based on the quantities

\(s_j=\frac{\sum_{r=1} ^j p_r}{\sum_{r=1}^m p_r}\)

where \(m=\frac{1}{2}n\) for n even and \(\frac{1}{2}(n-1)\) for n odd. The procedure is a bounds test and upper and lower critical values may be constructed using the tables provided in Durbin (1969). Note that po does not enter into the test statistic as \(p_o\) does not enter in to test statitstic as \(p_o=\hat v_1=0\).

X0.1 <- c(0.4  ,0.35044  ,0.35477   ,0.33435    ,0.31556    ,0.30244    ,0.28991    ,0.27828    ,0.26794    ,0.25884    ,0.25071    ,0.24325    ,0.23639    ,0.2301 ,0.2243 ,0.21895    ,0.21397    ,0.20933    ,0.20498    ,0.20089    ,0.19705    ,0.19343    ,0.19001    ,0.18677    ,0.1837 ,0.18077    ,0.17799    ,0.17037    ,0.1728 ,0.17037    ,0.16805    ,0.16582    ,0.16368    ,0.16162    ,0.15964    ,0.15774    ,0.1559 ,0.15413    ,0.15242    ,0.15076    ,0.14916    ,0.14761    ,0.14011    ,0.14466    ,0.14325    ,0.14188    ,0.14055    ,0.13926    ,0.138  ,0.13678    ,0.13559    ,0.13443    ,0.133  ,0.13221    ,0.13113    ,0.13009    ,0.12907    ,0.12807    ,0.1271 ,0.12615    ,0.12615    ,0.12431    ,0.12431    ,0.12255    ,0.12255    ,0.12087    ,0.12087    ,0.11926    ,0.11926    ,0.11771    ,0.11771    ,0.11622    ,0.11622    ,0.11479    ,0.11479    ,0.11341    ,0.11341    ,0.11208    ,0.11208    ,0.11079    ,0.11079    ,0.10955    ,0.10955    ,0.10835    ,0.10835    ,0.10719    ,0.10719    ,0.10607    ,0.10607    ,0.10499    ,0.10499    ,0.10393    ,0.10393    ,0.10291    ,0.10291    ,0.10192    ,0.10192    ,0.10096    ,0.10096    ,0.10002)

X0.05 <- c(0.45,0.44306,0.41811,0.39075 ,0.37359    ,0.35522    ,0.33905    ,0.32538    ,0.31325    ,0.30221    ,0.29227    ,0.2833 ,0.27515    ,0.26767    ,0.26077    ,0.25439    ,0.24847    ,0.24296    ,0.23781    ,0.23298    ,0.22844    ,0.22416    ,0.22012    ,0.2163 ,0.21268    ,0.20924    ,0.20596    ,0.20283    ,0.19985    ,0.197  ,0.19427    ,0.19166    ,0.18915    ,0.18674    ,0.18442    ,0.18218    ,0.18003    ,0.17796    ,0.17595    ,0.17402    ,0.17215    ,0.17034    ,0.16858    ,0.16688    ,0.16524    ,0.16364    ,0.16208    ,0.16058    ,0.15911    ,0.15769    ,0.1563 ,0.15495    ,0.15363    ,0.15235    ,0.1511 ,0.14989    ,0.1487 ,0.14754    ,0.14641    ,0.1453 ,0.1453 ,0.14361    ,0.14361    ,0.14112    ,0.14112    ,0.13916    ,0.13916    ,0.13728    ,0.13728    ,0.13548    ,0.13548    ,0.13375    ,0.13375    ,0.13208    ,0.13208    ,0.13048    ,0.13048    ,0.12894    ,0.12894    ,0.12745    ,0.12745    ,0.12601    ,0.12601    ,0.12464    ,0.12464    ,0.12327    ,0.12327    ,0.12197    ,0.12197    ,0.12071    ,0.12071    ,0.11949    ,0.11949    ,0.11831    ,0.11831    ,0.11716    ,0.11716    ,0.11604    ,0.11604    ,0.11496)

X0.025 <- c(0.475   ,0.50855    ,0.46702    ,0.44641    ,0.42174    ,0.40045    ,0.38294    ,0.3697 ,0.35277    ,0.34022    ,0.32894    ,0.31869    ,
0.30935 ,0.30081    ,0.29296    ,0.2857 ,0.27897    ,0.2727 ,0.26685    ,0.26137    ,0.25622    ,0.25136    ,0.24679    ,0.24245    ,0.23835    ,0.23445    ,0.23074    ,0.22721    ,0.22383    ,0.22061    ,0.21752    ,0.21457    ,0.21173    ,0.20901    ,0.20639    ,0.20337    ,0.20144    ,0.1991 ,0.19684    ,0.19465    ,0.19254    ,0.1905 ,0.18852    ,0.18661    ,0.18475    ,0.18205    ,0.1812 ,0.1795 ,0.17785    ,0.17624    ,0.17468    ,0.17361    ,0.17168    ,0.17024    ,0.16884    ,0.16748    ,0.16613    ,0.16482    ,0.16355    ,0.1623 ,0.1623 ,0.1599 ,0.1599 ,0.1576 ,0.1576 ,0.1554 ,0.1554 ,0.15329    ,0.15329    ,0.15127    ,0.15127    ,0.14932    ,0.14932    ,0.14745    ,0.14745    ,0.14565    ,0.14565    ,0.14392    ,0.14392    ,0.14224    ,0.14224    ,0.14063    ,0.14063    ,0.13907    ,0.13907    ,0.13756    ,0.13756    ,0.1361 ,0.1361 ,0.13468    ,0.13468    ,0.13331    ,0.13331    ,0.13198    ,0.13198    ,0.1307 ,0.1307 ,0.12944    ,0.12944    ,0.12823)

X0.01 <- c(  0.49   ,0.56667    ,0.53456    ,0.50495    ,0.47629    ,0.4544 ,0.43337    ,0.41522    ,0.39922    ,0.38481    ,0.37187    ,0.36019    ,0.34954    ,0.3398 ,0.33083    ,0.32256    ,0.31489    ,0.30775    ,0.30108    ,0.29484    ,0.28898    ,0.28346    ,0.27825    ,0.27333    ,0.26866    ,0.26423    ,0.26001    ,0.256  ,0.25217    ,0.24851    ,0.24501    ,0.24165    ,0.23843    ,0.23534    ,0.23237    ,0.22951    ,0.22676    ,0.2241 ,0.22154    ,0.21906    ,0.21667    ,0.21436    ,0.21212    ,0.20995    ,0.20785    ,0.20581    ,0.20383    ,0.2119 ,0.20003    ,0.19822    ,0.19645    ,0.19473    ,0.19305    ,0.19142    ,0.18983    ,0.18828    ,0.18677    ,0.18529    ,0.18385    ,0.18245    ,0.18245    ,0.17973    ,0.17973    ,0.17713    ,0.17713    ,0.17464    ,0.17464    ,0.17226    ,0.17226    ,0.16997    ,0.16997    ,0.16777    ,0.16777    ,0.16566    ,0.16566    ,0.16363    ,0.16363    ,0.16167    ,0.16167    ,0.15978    ,0.15978    ,0.15795    ,0.15795    ,0.15619    ,0.15619    ,0.15449    ,0.15449    ,0.15284    ,0.15284    ,0.15124    ,0.15124    ,0.1497 ,0.1497 ,0.1482 ,0.1482 ,0.14674    ,0.14674    ,0.14533    ,0.14533    ,0.14396)

X0.005 <- c(0.495   ,0.59596    ,0.579  ,0.5421 ,0.51576    ,0.48988    ,0.4671 ,0.44819    ,0.43071    ,0.41517    ,0.40122    ,0.38856    ,0.37703    ,0.36649    ,0.35679    ,0.34784    ,0.33953    ,0.33181    ,0.32459    ,0.31784    ,0.31149    ,0.30552    ,0.29989    ,0.29456    ,0.28951    ,0.28472    ,0.28016    ,0.27582    ,0.27168    ,0.26772    ,0.26393    ,0.2603 ,0.25348    ,0.25348    ,0.25027    ,0.24718    ,0.24421    ,0.24134    ,0.23857    ,0.23589    ,0.2331 ,0.23081    ,0.22839    ,0.22605    ,0.22377    ,0.22377    ,0.21943    ,0.21753    ,0.21534    ,0.21337    ,0.21146    ,0.20961    ,0.2078 ,0.20604    ,0.20432    ,0.20265    ,0.20101    ,0.19942    ,0.19786    ,0.19635    ,0.19635    ,0.19341    ,0.19341    ,0.19061    ,0.19061    ,0.18792    ,0.18792    ,0.18534    ,0.18534    ,0.18288    ,0.18288    ,0.18051    ,0.18051    ,0.17823    ,0.17823    ,0.17188    ,0.17188    ,0.17392    ,0.17392    ,0.17188    ,0.17188    ,0.16992    ,0.16992    ,0.16802    ,0.16802    ,0.16618    ,0.16618    ,0.1644 ,0.1644 ,0.16268    ,0.16268    ,0.16101    ,0.16101    ,0.1594 ,0.1594 ,0.15783    ,0.15783    ,0.15631    ,0.15631    ,0.15483)
 

TestD <- data.frame(X0.1,X0.05,X0.025,X0.01,X0.005)

Fuction td (a,b)

Calculates and shows the results of testing Durbin (Durbin, 1969), applied to the variable \(a\) and the significance level \(b\) to \(b=0.1\)(significance=1); \(b=0.05\) (significance=2); \(b=0.025\) (significance=3); \(b=0.01\) (significance=4) and \(b=0.005\) (significance=5) (Durbin; 1969)

td <- function(y,significance) {
# Author: Francisco Parra Rodríguez
# Some ideas from:
#Harvey, A.C. (1978), Linear Regression in the Frequency Domain, International Economic Review, 19, 507-512.
# DURBIN, J., "Tests for Serial Correlation in Regression Analysis based on the Periodogram ofLeast-Squares Residuals," Biometrika, 56, (No. 1, 1969), 1-15.
# http://econometria.wordpress.com/2013/08/21/estimation-of-time-varying-regression-coefficients/ 
per <- periodograma(y)
p <- as.numeric(per$densidad)
n <- length(p)
s <- p[1]
t <- 1:n
for(i in 2:n) {s1 <-p[i]+s[(i-1)]
s <- c(s,s1)
s2 <- s/s[n]
}
while (n > 100) n <- 100
if (significance==1) c<- c(TestD[n,1]) else {if (significance==2) c <- c(TestD[n,2]) else {if (significance==3) c <- c(TestD[n,3]) else {if (significance==4) c <- c(TestD[n,4]) 
c <- c(TestD[n,5])}}}
min <- -c+(t/length(p))
max <- c+(t/length(p))
data.frame(s2,min,max)
}

Fuction gtd (a,b)

Plot to the Durbin test (Durbin, 1969), applied to the variable \(a\) and the significance level \(b\).

gtd <- function (y,significance) {
S <- td(y,significance)
plot(ts(S), plot.type="single", lty=1:3,main = "Test Durbin", 
ylab = "densidad acumulada",
xlab="frecuencia")
}

Alternatively you can use the cpgram function from MASS package (src/library/stats/R/cpgram.R).

Function rdf (y,x, significance)

Consider now the linear regression model

\(y_t=\beta_tx_t+u_t\)

where \(x_t\) is an n x 1 vector of fixed observations on the independent variable, \(\beta_t\) is a n x 1 vector of parameters,\(y\) is an n x 1 vector of observations on the dependent variable, and \(u_t\) is an n x 1 vector de errores distribuidos con media cero y varianza constante.

Whit the assumption that any series, \(y_t\),\(x_t\),\(\beta_t\) and \(ut\), can be transformed into a set of sine and cosine waves such as:

\[y_t=\eta^y+\sum_{j=1}^N[a^y_j\cos(\omega_j)+b^y_j\sin(\omega_j)\]

\[x_t=\eta^x+\sum_{j=1}^N[a^y_j\cos(\omega_j)+b^y_j\sin(\omega_j)]\]

\[ \beta_t=\eta^\beta+\sum_{j=1}^N[a^\beta_j\cos(\omega_j)+b^\beta_j\sin(\omega_j)]\]

Pre-multiplying (6) by \(Z\):

\[ \dot y=\dot x\dot\beta+\dot u\]

donde \(\dot y = Zy\),\(\dot x = Zx\), \(\dot \beta = Z\beta\) y \(\dot u = Zu\)

The system (8) can be rewritten as (see appendix):

\[ \dot y=Zx_tI_nZ^T\dot \beta + ZI_nZ^T\dot u\]

If we call \(\dot e = ZI_nZ ^ T \dot u\), It can be found the \(\dot \beta\) that minimize the sum of squared errors \(E_T = Z ^ T \dot e\).

Once you have found the solution to this optimization, the series would be transformed into the time domain for the system (8).

In the function , \(y\) is the dependent variable, \(x\) is the independent variables,and \(significance\) the significance for the Durbin test.

The algorithm calculation is performed in phases:

Gets the cross-periodogram to \(x\) and \(y\).

Let \(x\) a vector n x 1, in frequency domain \(\hat x= Wx\)

Let \(y\) a vector n x 1, in frequency domain \(\hat y= Wy\)

\(p_j\) denotes the ordinate of the cross-periodogram to \(\hat x\) and \(\hat y\) at frequency \(\lambda_j=2\pi j/n\), and \(\hat x_j\) the j-th element to \(\hat x\) and \(\hat y_j\) the j-th element to \(\hat y\), then

\[ \left\lbrace \begin{array}{ll} p_j=\hat x_{2j}\hat y_{2j}+\hat x_{2j+1}\hat y_{2j+1} & \forall j = 1,...\frac{n-1}{2}\\ p_j=\hat x_{2j}\hat y_{2j}& \forall j = \frac{n}{2}-1 \end{array} \right.\]

\[p_0=\hat x_{1}\hat y_{1}\]

Order the co - spectrum by the absolute value of \(p_j\) and make a index.
Calculate the matrix \(Wx_tI_nW^T\), the matrix rows are ordered by index.
Calculate \(\dot e=WI_nW^T\dot u\), add a vector by constant term, \((1,0,...0)^n\), then calculate the model by constant term and the de two first regressors to ordered matrix \(Wx_tI_nW^T\), cthen calculate the model by the constant and the de fourt first regressors, then for six, to complete the \(n\) regressors to ordered matrix.
Testing for serial correlation all model to \(\alpha=0.1;0.05;0.025;0.01;0.005\).
Select the lowest degree of freedom models with random term uncorrelated. If any is uncorrelated returns the OLS.

rdf <- function (y,x,significance) {
  # Author: Francisco Parra Rodríguez
  # http://rpubs.com/PacoParra/24432
  # Leemos datos en forma matriz
  a <- matrix(y, nrow=1)
  b <- matrix(x, nrow=1)
  n <- length(a)
  # calculamos el cros espectro mediante la funcion cperiodograma
  cperiodograma <- function(y,x) {
# Author: Francisco Parra Rodríguez
# http://econometria.wordpress.com/2013/08/21/estimation-of-time-varying-regression-coefficients/ 
cfx <- gdf(y)
n <- length(y)
cfy <- gdf(x)
if (n%%2==0) {
m1x <- c(0)
m2x <- c()
for(i in 1:n){
if(i%%2==0) m1x <-c(m1x,cfx[i]) else m2x <-c(m2x,cfx[i])}
m2x <- c(m2x,0)
m1y <- c(0)
m2y <- c()
for(i in 1:n){
if(i%%2==0) m1y <-c(m1y,cfy[i]) else m2y <-c(m2y,cfy[i])}
m2y <-c(m2y,0) 
frecuencia <- seq(0:(n/2)) 
frecuencia <- frecuencia-1
omega <- pi*frecuencia/(n/2)
periodos <- n/frecuencia
densidad <- (m1x*m1y+m2x*m2y)/(4*pi)
tabla <- data.frame(omega,frecuencia, periodos,densidad)
tabla$densidad[(n/2+1)] <- 4*tabla$densidad[(n/2+1)]
data.frame(tabla[2:(n/2+1),])}
else {m1x <- c(0)
m2x <- c()
for(i in 1:(n-1)){
if(i%%2==0) m1x <-c(m1x,cfx[i]) else m2x <-c(m2x,cfx[i])}
m2x <-c(m2x,cfx[n])
m1y <- c(0)
m2y <- c()  
for(i in 1:(n-1)){
if(i%%2==0) m1y <-c(m1y,cfy[i]) else m2y <-c(m2y,cfy[i])}
m2y <-c(m2y,cfy[n])
frecuencia <- seq(0:((n-1)/2)) 
frecuencia <- frecuencia-1
omega <- pi*frecuencia/(n/2)
periodos <- n/frecuencia
densidad <- (m1x*m1y+m2x*m2y)/(4*pi)
tabla <- data.frame(omega,frecuencia, periodos,densidad)
data.frame(tabla[2:((n+1)/2),])}
}
  cper <- cperiodograma(a,b)
# Ordenamos de mayor a menor las densidades absolutas del periodograma, utilizando la funcion "sort.data.frame" function, Kevin Wright. Package taRifx
  S1 <- data.frame(f1=cper$frecuencia,p=abs(cper$densidad))
  S <- S1[order(-S1$p),] 
  id <- seq(2,n)
  m1 <- cbind(S$f1*2,evens(id))
  if (n%%2==0) {m2 <- cbind(S$f1[1:(n/2-1)]*2+1,odds(id))} else 
   {m2 <- cbind(S$f1*2+1,odds(id))}
  m <- rbind(m1,m2)
  colnames(m) <- c("f1","id")
  M <- sort.data.frame (m,formula=~id)
  M <- rbind(c(1,1),M)
  # Obtenemos la funcion auxiliar (cdf) del predictor y se ordena segun el indice de las mayores densidades absolutas del co-espectro.
  cx <- cdf(b)
  id <- seq(1,n)
  S1 <- data.frame(cx,c=id)
  S2 <- merge(M,S1,by.x="id",by.y="c")
  S3 <- sort.data.frame (S2,formula=~f1)
  m <- n+2
  X1 <- S3[,3:m]
  X1 <- rbind(C=c(1,rep(0,(n-1))),S3[,3:m])
  # Se realizan las regresiones en el dominio de la frecuencia utilizando un modelo con constante, pendiente y los arm?nicos correspondientes a las frecuencias mas altas de la densidad del coespectro. Se realiza un test de durbin para el residuo y se seleccionan aquellas que son significativas. 
  par <- evens(id)
  i <- 1
  D <- 1
  resultado <- cbind(i,D)
  for (i in par) {
  X <- as.matrix(X1[1:i,])
  cy <- gdf(a)
  B1 <- solve(X%*%t(X))%*%(X%*%cy)
  Y <- t(X)%*%B1 
  F <- gdt(Y)
  res <- (t(a) - F)
  T <- td(res,significance)
  L <- as.numeric(c(T$min<T$s2,T$s2<T$max))
  LT <- sum(L)
  if (n%%2==0) {D=LT-n} else {D=LT-(n-1)} 
  resultado1 <- cbind(i,D)
  resultado <- rbind(resultado,resultado1)
  resultado}
resultado2 <-data.frame(resultado)
criterio <- resultado2[which(resultado2$D==0),]
sol <- as.numeric(is.na(criterio$i[1]))
if (sol==1) {X <- as.matrix(X1[1:2,])
cy <- gdf(a)
  B1 <- solve(X%*%t(X))%*%(X%*%cy)
  Y <- t(X)%*%B1 
  F <- gdt(Y)
  res <- (t(a) - F)
  datos <- data.frame(cbind(t(a),t(b),F,res))
  colnames(datos) <- c("Y","X","F","res")
list(datos=datos,Fregresores=t(X),Tregresores= t(MW(n))%*%t(X),Nregresores=criterio$i[1],Betas=B1)} else {
 X <- as.matrix(X1[1:criterio$i[1],])
cy <- gdf(a)
  B1 <- solve(X%*%t(X))%*%(X%*%cy)
  Y <- t(X)%*%B1 
  F <- gdt(Y)
  res <- (t(a) - F)
  datos <- data.frame(cbind(t(a),t(b),F,res))
  colnames(datos) <- c("Y","X","F","res")
list(datos=datos,Fregresores=t(X),Tregresores= t(MW(n))%*%t(X),Nregresores=criterio$i[1],Betas=B1)}}

Data ordended by income or expediture

Income refers to the ongoing flow of economic resources that a household receives over time. It includes wages and salaries and money earned through self-employment as well as private pensions, investments and other non-government sources and cash benefits/social transfers. Income is important in this context as it allows people to satisfy their needs and pursue many other goals that they deem important to their lives. Those with low incomes typically have a restricted capacity to consume the goods and services they need to participate fully in the society in which they live (U.N.E.C.E. 2016).

Consumption is the use of goods and services to directly satisfy a person’s needs and wants, whilst consumption expenditure is the value of consumption goods and services paid for by a household. Considered simply, and everything else being equal, people with lower levels of consumption or consumption expenditure can be regarded as having a lower level of current economic well-being. Many economists would argue consumption is a more important determinant of economic well-being than income alone (U.N.E.C.E., 2016). Indeed, Brewer and O’Dea (2012) and others (see Noll, 2007 for a review) argue that it is preferable to consider the distribution of consumption rather than income on both theoretical and pragmatic grounds.

In U.N.E.C.E.(2016) are discussed the pros and cons of each approach. We propose, use the income distribution, the expenditure distribution, and the income-expenditure coincidences to estimated the PMC.

Consumer Expenditure Survey

The Consumer Expenditure Survey (CE) program provides data on the buying habits of American consumers. CE also provides the data to the public for research in the Public-Use Microdata (PUMD). This survey collects 95 percent of the total expenditures and income by households. The current data set covers 2014 and the first quarter of 2015. The microdata files are in the public domain and, with appropriate credit, may be reproduced without permission. A suggested citation is: “U.S. Department of Labor, Bureau of Labor Statistics, Consumer Expenditure Survey, Interview Survey, 2014.” The Interview Survey microdata are provided as SAS, STATA, SPSS. or ASCII comma-delimited files. The 2014 Interview release contains three groups of files: . 8 major data files (FMLI, MEMI, MTBI, ITBI, ITII, NTAXI, FPAR, and MCHI) . 4 types of processing files . 43 detailed expenditure data files (EXPN files)

Six of the eight major data files (FMLI, MEMI, MTBI, ITBI, ITII, and NTAXI) are organized by the calendar quarter of the year in which the data were collected. There are five1 quarterly data sets for each of these files, running from the first quarter of 2014 through the first quarter of 2015. The FMLI file contains CU characteristics, income, and summary level expenditures; the MEMI file contains member characteristics and income data; the MTBI file contains expenditures organized on a monthly basis at the UCC level; the ITBI file contains income data converted to a monthly time frame and assigned to UCCs; and the ITII file contains the five imputation variants of the income data converted to amonthly time frame and assigned to UCCs. The NTAXI file contains federal and state tax information for each tax unit within the CU. Monthly Expenditure File (MTBI)

In the MTBI file, each expenditure reported by a CU is identified by UCC, gift/nongift status, and month in which the expenditure occurred. UCCs are six digit codes that identify items or groups of items. The expenditure data record purchases that were made during the three month period prior to the month of the interview. Income File (ITBI) The “ITBI” file, also referred to as the “Income” file, contains CU characteristics and income data. This file is created directly from the FMLI file and contains the same annual and point-of-interview data in a monthly format. It was created to facilitate linking CU income and characteristics data with MTBI expenditure data. As such, the file structure is similar to MTBI. Each characteristic and income item is identified by UCC (For a list of the UCCs, see Istub), gift/nongift status, and month. Imputed Income File (ITII) As a result of the introduction of multiply imputed income data in the Consumer Expenditure Survey. It is very similar to the ITBI file, except that the variable IMPNUM. will indicate the number (1-5) of the imputation variant of the income variable and it only contains UCCs from variables subject to income imputation.

Data from the first quarter of 2014 are processed.

setwd("~/Word Press/Econometria aplicada/funcion de consumo USA")
#setwd("Y:/Paco/funciones de consumo/funcion de consumo usa")
#setwd("E:/funcion de consumo USA")
library(foreign) 
ingreso1<- data.frame(read.spss("itbi141x.sav"))
ingreso1_i<- data.frame(read.spss("itii141x.sav"))
gasto1 <- data.frame(read.spss("mtbi141x.sav"))
ingreso2_i<- data.frame(read.spss("itii142.sav"))
gasto2 <- data.frame(read.spss("mtbi142.sav"))
ingreso3_i<- data.frame(read.spss("itii143.sav"))
gasto3 <- data.frame(read.spss("mtbi143.sav"))
ingreso4_i<- data.frame(read.spss("itii144.sav"))
gasto4 <- data.frame(read.spss("mtbi144.sav"))
ingresoT_i <- rbind(ingreso1_i,ingreso2_i,ingreso3_i,ingreso4_i)
gastoT <- rbind(gasto1,gasto2,gasto3,gasto4)
# obtenemos las sumas anuales
ING <- subset(ingresoT_i,ingresoT_i$IMPNUM==5)
#ING <- tapply(datos$VALUE,datos$NEWID,mean)
ING1 <- tapply(ING$VALUE,ING$NEWID,sum)
ING1 <- data.frame(NEWID=names(ING1),ING=as.numeric(ING1))
str(ING1)

## 'data.frame':    25894 obs. of  2 variables:
##  $ NEWID: Factor w/ 25894 levels "2647065","2647085",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ ING  : num  11158 11088 37344 28447 64130 ...

GAST <- tapply(gastoT$COST,gastoT$NEWID,sum)
GAST <- data.frame(NEWID=names(GAST),GAST=as.numeric(GAST))
GAST$NEWID=as.character(GAST$NEWID)
ING1$NEWID=as.character(ING1$NEWID)
str(GAST)

## 'data.frame':    25907 obs. of  2 variables:
##  $ NEWID: chr  "2647065" "2647085" "2647105" "2647115" ...
##  $ GAST : num  6004 53015 9853 29936 72290 ...

str(ING1)

## 'data.frame':    25894 obs. of  2 variables:
##  $ NEWID: chr  "2647065" "2647085" "2647105" "2647115" ...
##  $ ING  : num  11158 11088 37344 28447 64130 ...

datos <- merge(GAST,ING1,by.x="NEWID",by.y="NEWID")
summary(datos)

##     NEWID                GAST               ING         
##  Length:25893       Min.   :-1340261   Min.   :-593355  
##  Class :character   1st Qu.:    9900   1st Qu.:  22267  
##  Mode  :character   Median :   37967   Median :  46067  
##                     Mean   :   62100   Mean   :  65348  
##                     3rd Qu.:   80116   3rd Qu.:  85839  
##                     Max.   : 2050207   Max.   :1993323

An estimation of the Consumption Function for USA

Using OLS, the estimated coefficient for MPC is:

fit <-lm(datos$GAST~datos$ING)
# Global test of model assumptions
library(gvlma)
gvmodel <- gvlma(fit) 
summary(gvmodel)

## 
## Call:
## lm(formula = datos$GAST ~ datos$ING)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1467242   -33261   -18768    13226  1953851 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.282e+04  6.878e+02   33.18   <2e-16 ***
## datos$ING   6.011e-01  7.296e-03   82.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 79770 on 25891 degrees of freedom
## Multiple R-squared:  0.2077, Adjusted R-squared:  0.2076 
## F-statistic:  6786 on 1 and 25891 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit) 
## 
##                        Value p-value                   Decision
## Global Stat        3205865.6       0 Assumptions NOT satisfied!
## Skewness             63321.8       0 Assumptions NOT satisfied!
## Kurtosis           3142206.1       0 Assumptions NOT satisfied!
## Link Function          156.3       0 Assumptions NOT satisfied!
## Heteroscedasticity     181.4       0 Assumptions NOT satisfied!

plot(gvmodel)

# Representación gráfica de los datos
plot(datos$ING,datos$GAST)
lines(datos$ING,fit$fitted,col=2)

Detect and remove outlier

library(car)
out <- outlierTest(fit)
quitar <- c(-as.numeric(names(out$p)))
for (n in quitar)
datos <- datos[n,]

Computing percentile rank in R. The coincidence into income and expenditure percentiles are only 537 of the 25.893 household, differences to less of 50 leveles are in 24045 household, the differences haven’t normality distribution.

## 
## Attaching package: 'gtools'

## The following object is masked from 'package:car':
## 
##     logit

## 
##  Jarque Bera Test
## 
## data:  datos$distancia
## X-squared = 800.46, df = 2, p-value < 2.2e-16

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  datos$distancia
## D = 0.066002, p-value < 2.2e-16

## Warning in cvm.test(datos$distancia): p-value is smaller than 7.37e-10,
## cannot be computed more accurately

## 
##  Cramer-von Mises normality test
## 
## data:  datos$distancia
## W = 21.099, p-value = 7.37e-10

## 
##  Anderson-Darling normality test
## 
## data:  datos$distancia
## A = 109.79, p-value < 2.2e-16

By OLS, is estimated a consumption function, by data ordered in random percentiles.

## 
## Call:
## lm(formula = gaperc.perc ~ ingpch.perc)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10112.9  -3646.6    276.2   3792.2  10158.2 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.484e+04  7.095e+03   2.092    0.039 *  
## ingpch.perc 7.219e-01  1.083e-01   6.663 1.58e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4922 on 98 degrees of freedom
## Multiple R-squared:  0.3118, Adjusted R-squared:  0.3048 
## F-statistic: 44.39 on 1 and 98 DF,  p-value: 1.578e-09
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit.m) 
## 
##                      Value p-value                Decision
## Global Stat        2.86738  0.5803 Assumptions acceptable.
## Skewness           0.21218  0.6451 Assumptions acceptable.
## Kurtosis           2.62528  0.1052 Assumptions acceptable.
## Link Function      0.02175  0.8827 Assumptions acceptable.
## Heteroscedasticity 0.00816  0.9280 Assumptions acceptable.

## function (x, y, ...) 
## UseMethod("plot")
## <bytecode: 0x00000000136a33b0>
## <environment: namespace:graphics>

## Loading required package: taRifx

## 
##  Jarque Bera Test
## 
## data:  fit.m$residuals
## X-squared = 2.8375, df = 2, p-value = 0.242

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  fit.m$residuals
## D = 0.054217, p-value = 0.6664

## 
##  Cramer-von Mises normality test
## 
## data:  fit.m$residuals
## W = 0.057777, p-value = 0.4012

## 
##  Anderson-Darling normality test
## 
## data:  fit.m$residuals
## A = 0.39815, p-value = 0.3601

## 
##  Shapiro-Francia normality test
## 
## data:  fit.m$residuals
## W = 0.9877, p-value = 0.4118

By OLS, is estimated a consumption function, by data ordered in income percentiles.

## 
## Call:
## lm(formula = gaperc.perc ~ ingpch.perc)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20873  -4022   -945   2873  33059 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.101e+04  1.076e+03   19.53   <2e-16 ***
## ingpch.perc 6.274e-01  1.162e-02   54.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7621 on 98 degrees of freedom
## Multiple R-squared:  0.9675, Adjusted R-squared:  0.9672 
## F-statistic:  2916 on 1 and 98 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit.i) 
## 
##                      Value   p-value                   Decision
## Global Stat        189.242 0.000e+00 Assumptions NOT satisfied!
## Skewness            33.521 7.050e-09 Assumptions NOT satisfied!
## Kurtosis           120.419 0.000e+00 Assumptions NOT satisfied!
## Link Function        9.364 2.213e-03 Assumptions NOT satisfied!
## Heteroscedasticity  25.938 3.526e-07 Assumptions NOT satisfied!

## 
##  Jarque Bera Test
## 
## data:  fit.i$residuals
## X-squared = 153.94, df = 2, p-value < 2.2e-16

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  fit.i$residuals
## D = 0.12614, p-value = 0.0004788

## 
##  Cramer-von Mises normality test
## 
## data:  fit.i$residuals
## W = 0.52574, p-value = 1.517e-06

## 
##  Anderson-Darling normality test
## 
## data:  fit.i$residuals
## A = 3.1478, p-value = 6e-08

## 
##  Shapiro-Francia normality test
## 
## data:  fit.i$residuals
## W = 0.8649, p-value = 3.537e-07

By OLS, is estimated a consumption function, by data ordered in expenditure percentiles.

## 
## Call:
## lm(formula = gaperc.perc ~ ingpch.perc)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -48028 -21967  -7074   8603 172835 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -73171.158   7515.937  -9.735 4.54e-16 ***
## ingpch.perc      2.069      0.100  20.681  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37120 on 98 degrees of freedom
## Multiple R-squared:  0.8136, Adjusted R-squared:  0.8117 
## F-statistic: 427.7 on 1 and 98 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit.g) 
## 
##                     Value   p-value                   Decision
## Global Stat        544.23 0.000e+00 Assumptions NOT satisfied!
## Skewness           110.74 0.000e+00 Assumptions NOT satisfied!
## Kurtosis           337.25 0.000e+00 Assumptions NOT satisfied!
## Link Function       57.93 2.709e-14 Assumptions NOT satisfied!
## Heteroscedasticity  38.32 6.019e-10 Assumptions NOT satisfied!

## 
##  Jarque Bera Test
## 
## data:  fit.g$residuals
## X-squared = 447.98, df = 2, p-value < 2.2e-16

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  fit.g$residuals
## D = 0.16272, p-value = 7.165e-07

## 
##  Cramer-von Mises normality test
## 
## data:  fit.g$residuals
## W = 0.76911, p-value = 2.077e-08

## 
##  Anderson-Darling normality test
## 
## data:  fit.g$residuals
## A = 4.8411, p-value = 4.555e-12

## 
##  Shapiro-Francia normality test
## 
## data:  fit.g$residuals
## W = 0.7581, p-value = 4.965e-10

By OLS, is estimated a consumption function, bay data ordered expenditure percentiles 2 to 94.

## 
## Call:
## lm(formula = gaperc.perc ~ ingpch.perc)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -32502  -7215   2130   7221  27507 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.752e+04  3.351e+03  -11.20   <2e-16 ***
## ingpch.perc  1.394e+00  5.056e-02   27.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13890 on 91 degrees of freedom
## Multiple R-squared:  0.8931, Adjusted R-squared:  0.8919 
## F-statistic: 760.4 on 1 and 91 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit.g) 
## 
##                      Value   p-value                   Decision
## Global Stat        41.9677 1.694e-08 Assumptions NOT satisfied!
## Skewness            1.3974 2.372e-01    Assumptions acceptable.
## Kurtosis            0.0921 7.615e-01    Assumptions acceptable.
## Link Function      34.2883 4.752e-09 Assumptions NOT satisfied!
## Heteroscedasticity  6.1899 1.285e-02 Assumptions NOT satisfied!

## function (x, y, ...) 
## UseMethod("plot")
## <bytecode: 0x00000000136a33b0>
## <environment: namespace:graphics>

## 
##  Jarque Bera Test
## 
## data:  fit.g$residuals
## X-squared = 1.4895, df = 2, p-value = 0.4748

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  fit.g$residuals
## D = 0.1137, p-value = 0.004734

## 
##  Cramer-von Mises normality test
## 
## data:  fit.g$residuals
## W = 0.21984, p-value = 0.002937

## 
##  Anderson-Darling normality test
## 
## data:  fit.g$residuals
## A = 1.1256, p-value = 0.005731

## 
##  Shapiro-Francia normality test
## 
## data:  fit.g$residuals
## W = 0.97239, p-value = 0.04506

The Durbin test (1969) shows error term is correlated. In the normality test the Assumptions of linear regression are not satisfied.

Using RBS, is estimated a consumption function, with average household expenditure and income percentiles. The algorithm select the OLS model. Generalized linear models are implemented using “glm” function in R.

library(descomponer)
# Estimación de la regresión por bandas de frecuencia
y <- as.numeric(gaperc.perc)
x <- as.numeric(ingpch.perc)
res <- rdf(y,x,3)
# grafica de los residuos en el dominio frecuencial
gtd(res$datos$res,3)

# Representación gráfica de los datos
plot(ingpch.perc,gaperc.perc)
lines(ingpch.perc,res$datos$F,col=2)

# gráfica de normalidad de los residuos
hist(res$datos$res, freq=FALSE, 
   main="Distribución de los errores")

boxplot(res$datos$res)

# Estimación del modelo en MCO
fit.rbs <- lm(gaperc.perc ~ 0 + res$Tregresores)
summary(fit.rbs)

## 
## Call:
## lm(formula = gaperc.perc ~ 0 + res$Tregresores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11148.5  -1924.8    -66.2   1787.6  14203.8 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC -1.261e+05  1.839e+04  -6.854 1.17e-09 ***
## res$Tregresores1  8.961e+00  3.584e-01  25.001  < 2e-16 ***
## res$Tregresores2  1.065e+00  9.540e-02  11.167  < 2e-16 ***
## res$Tregresores3 -2.080e+00  1.302e-01 -15.975  < 2e-16 ***
## res$Tregresores4  1.207e+00  8.045e-02  15.000  < 2e-16 ***
## res$Tregresores5  2.616e-01  1.283e-01   2.039 0.044667 *  
## res$Tregresores6  1.863e-01  8.745e-02   2.130 0.036131 *  
## res$Tregresores7 -9.375e-02  9.568e-02  -0.980 0.330016    
## res$Tregresores8  3.059e-01  7.939e-02   3.853 0.000229 ***
## res$Tregresores9  1.474e-01  7.937e-02   1.857 0.066851 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4164 on 83 degrees of freedom
## Multiple R-squared:  0.996,  Adjusted R-squared:  0.9955 
## F-statistic:  2070 on 10 and 83 DF,  p-value: < 2.2e-16

library(car)
outlierTest(fit.rbs)

##                     rstudent unadjusted p-value Bonferonni p
## (9.58e+04,9.94e+04] 3.853161         0.00023057     0.021443

gvmodel <- gvlma(fit.rbs) 
summary(gvmodel)

## 
## Call:
## lm(formula = gaperc.perc ~ 0 + res$Tregresores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11148.5  -1924.8    -66.2   1787.6  14203.8 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC -1.261e+05  1.839e+04  -6.854 1.17e-09 ***
## res$Tregresores1  8.961e+00  3.584e-01  25.001  < 2e-16 ***
## res$Tregresores2  1.065e+00  9.540e-02  11.167  < 2e-16 ***
## res$Tregresores3 -2.080e+00  1.302e-01 -15.975  < 2e-16 ***
## res$Tregresores4  1.207e+00  8.045e-02  15.000  < 2e-16 ***
## res$Tregresores5  2.616e-01  1.283e-01   2.039 0.044667 *  
## res$Tregresores6  1.863e-01  8.745e-02   2.130 0.036131 *  
## res$Tregresores7 -9.375e-02  9.568e-02  -0.980 0.330016    
## res$Tregresores8  3.059e-01  7.939e-02   3.853 0.000229 ***
## res$Tregresores9  1.474e-01  7.937e-02   1.857 0.066851 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4164 on 83 degrees of freedom
## Multiple R-squared:  0.996,  Adjusted R-squared:  0.9955 
## F-statistic:  2070 on 10 and 83 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit.rbs) 
## 
##                     Value   p-value                   Decision
## Global Stat        95.631 0.000e+00 Assumptions NOT satisfied!
## Skewness            5.974 1.452e-02 Assumptions NOT satisfied!
## Kurtosis           32.790 1.027e-08 Assumptions NOT satisfied!
## Link Function      13.071 2.999e-04 Assumptions NOT satisfied!
## Heteroscedasticity 43.796 3.645e-11 Assumptions NOT satisfied!

#plot(gvmodel)
# test normalidad de los errores
library(tseries)
jarque.bera.test(res$datos$res)

## 
##  Jarque Bera Test
## 
## data:  res$datos$res
## X-squared = 38.764, df = 2, p-value = 3.823e-09

library(nortest)
lillie.test(res$datos$res)

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  res$datos$res
## D = 0.12572, p-value = 0.0009752

cvm.test(res$datos$res)

## 
##  Cramer-von Mises normality test
## 
## data:  res$datos$res
## W = 0.30116, p-value = 0.0003096

ad.test(res$datos$res)

## 
##  Anderson-Darling normality test
## 
## data:  res$datos$res
## A = 2.0386, p-value = 3.158e-05

sf.test(res$datos$res)

## 
##  Shapiro-Francia normality test
## 
## data:  res$datos$res
## W = 0.91555, p-value = 4.924e-05

#Estimacion modelo glm (Gaussian)
gfit1 <- glm(gaperc.perc ~ 0 + res$Tregresores,family=gaussian)
summary(gfit1)

## 
## Call:
## glm(formula = gaperc.perc ~ 0 + res$Tregresores, family = gaussian)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -11148.5   -1924.8     -66.2    1787.6   14203.8  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC -1.261e+05  1.839e+04  -6.854 1.17e-09 ***
## res$Tregresores1  8.961e+00  3.584e-01  25.001  < 2e-16 ***
## res$Tregresores2  1.065e+00  9.540e-02  11.167  < 2e-16 ***
## res$Tregresores3 -2.080e+00  1.302e-01 -15.975  < 2e-16 ***
## res$Tregresores4  1.207e+00  8.045e-02  15.000  < 2e-16 ***
## res$Tregresores5  2.616e-01  1.283e-01   2.039 0.044667 *  
## res$Tregresores6  1.863e-01  8.745e-02   2.130 0.036131 *  
## res$Tregresores7 -9.375e-02  9.568e-02  -0.980 0.330016    
## res$Tregresores8  3.059e-01  7.939e-02   3.853 0.000229 ***
## res$Tregresores9  1.474e-01  7.937e-02   1.857 0.066851 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 17340838)
## 
##     Null deviance: 3.6032e+11  on 93  degrees of freedom
## Residual deviance: 1.4393e+09  on 83  degrees of freedom
## AIC: 1825.5
## 
## Number of Fisher Scoring iterations: 2

par(mfcol = c(2, 2))
plot(gfit1)

#Estimacion modelo glm (Gaussian)
gfit2 <- glm(gaperc.perc ~ 0 + res$Tregresores,family=gaussian(link="log"))
summary(gfit2)

## 
## Call:
## glm(formula = gaperc.perc ~ 0 + res$Tregresores, family = gaussian(link = "log"))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -17576.7   -3565.0    -117.4    2475.0   28909.2  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC  9.055e+01  1.576e+00  57.462  < 2e-16 ***
## res$Tregresores1  1.214e-04  3.049e-05   3.981 0.000147 ***
## res$Tregresores2 -1.748e-05  8.597e-06  -2.034 0.045196 *  
## res$Tregresores3 -1.094e-04  1.709e-05  -6.402 8.66e-09 ***
## res$Tregresores4  5.175e-05  1.083e-05   4.777 7.57e-06 ***
## res$Tregresores5 -1.294e-05  1.180e-05  -1.097 0.275816    
## res$Tregresores6  1.561e-05  9.563e-06   1.632 0.106442    
## res$Tregresores7  1.127e-05  7.457e-06   1.511 0.134541    
## res$Tregresores8 -1.579e-06  4.123e-06  -0.383 0.702774    
## res$Tregresores9  1.040e-05  4.625e-06   2.249 0.027147 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 58198410)
## 
##     Null deviance: 3.6032e+11  on 93  degrees of freedom
## Residual deviance: 4.8302e+09  on 83  degrees of freedom
## AIC: 1938.1
## 
## Number of Fisher Scoring iterations: 13

par(mfcol = c(2, 2))
plot(gfit2)

#Estimacion modelo glm (Gamma)
#gfit3 <- glm(gaperc.perc ~ 0 + res$Tregresores,family=Gamma)
#summary(gfit3)
#par(mfcol = c(2, 2))
#plot(gfit3)
#Estimacion modelo glm (Gamma)
#gfit4 <- glm(gaperc.perc ~ 0 + res$Tregresores,family=Gamma(link="identity"))
#summary(gfit4)
#par(mfcol = c(2, 2))
#plot(gfit4)
# Durbin de los errores de Gamma
#par(mfcol = c(1, 1))
#gtd(gfit4$residuals,3)

We make a data set with average propensity to consume (PMeC) and marginal propensity to consume (PMgC) calculated for Group, the PMec are obtained by \(\frac{\hat c_s}{y_s}\), the PMgC in OLS model is \(\hat b\), the PMgC in RBS model is \(\frac {\hat c_s - \hat a x_0}{y_s}\) and the tendency lines is obtained by \(\frac {\hat b x_1}{y_s}\). Being \(x_0\) and \(x_1\) the two first regressor of RBS.

# Obtención de las propensiones medias al consumo por percentiles
PMeC <- data.frame(percentil=seq(2,94,by=1),observado=gaperc.perc/ingpch.perc,estimado_MCO=lm(gaperc.perc~ingpch.perc)$fitted/ingpch.perc,estimado_RBS=lm(gaperc.perc~res$Tregresores)$fitted/ingpch.perc)
#PMeC
# Obtención de las propensiones marginales al consumo por percentiles
PMgC <- data.frame(percentil=seq(2,94,by=1),estimado_MCO=rep(lm(gaperc.perc~ingpch.perc)$coefficients[2],93), estimado_RBS=(res$datos$F-(res$Betas[1]*res$Tregresores[1]))/ingpch.perc,estimado_RBS_T=(res$Betas[2]*res$Tregresores[,2])/ingpch.perc)
#PMgC
# Obtención de las propensiones marginales al consumo por percentiles con glm
PMgC_glm <- data.frame(percentil=seq(2,94,by=1),estimado_MCO=rep(lm(gaperc.perc~ingpch.perc)$coefficients[2],93), estimado_RBS=(gfit1$fitted.values-(gfit1$coefficients[1]*res$Tregresores[1]))/ingpch.perc,estimado_RBS_T=(gfit1$coefficients[2]*res$Tregresores[,2])/ingpch.perc)
#PMgC_glm

# gráficos PMeC
plot(PMeC$percentil,PMeC$observado,ylim=c(-0.5,1.5))
lines(PMeC$percentil,PMeC$estimado_MCO,col=2)
lines(PMeC$percentil,PMeC$estimado_RBS,col=3)
legend("top", ncol=2,c("MCO","RBS"),cex=0.6,bty="n",fill=c(2,3))

# gráficos PMgC
plot(PMgC$percentil,PMgC$estimado_RBS,type="l",ylim=c(0,1.5),col=1)
lines(PMgC$percentil,PMgC$estimado_RBS_T,type="l",col=2)
lines(PMgC$percentil,PMgC$estimado_MCO,type="l",col=3)
legend("top", ncol=3,c("RBS","RBS_T","MCO"),cex=0.6,bty="n",fill=c(1,2,3))

# gráficos PMgC_glm
plot(PMgC_glm$percentil,PMgC_glm$estimado_RBS,type="l",ylim=c(0,1.5),col=1)
lines(PMgC_glm$percentil,PMgC_glm$estimado_RBS_T,col=2)
lines(PMgC_glm$percentil,PMgC_glm$estimado_MCO,col=3)
legend("top", ncol=3,c("RBS","RBS_T","MCO"),cex=0.6,bty="n",fill=c(1,2,3))

By OLS, is estimated a consumption function, by data ordered by income percentiles in households with differences into income and expenditure percentiles less 50.

## 
## Call:
## lm(formula = datos1$GAST ~ datos1$ING)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -929942  -28855  -13408   11883 1188448 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.327e+04  6.381e+02    20.8   <2e-16 ***
## datos1$ING  7.058e-01  6.736e-03   104.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 70910 on 24043 degrees of freedom
## Multiple R-squared:  0.3135, Adjusted R-squared:  0.3135 
## F-statistic: 1.098e+04 on 1 and 24043 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit) 
## 
##                        Value p-value                   Decision
## Global Stat        1978787.9       0 Assumptions NOT satisfied!
## Skewness             83263.2       0 Assumptions NOT satisfied!
## Kurtosis           1894860.5       0 Assumptions NOT satisfied!
## Link Function          546.7       0 Assumptions NOT satisfied!
## Heteroscedasticity     117.5       0 Assumptions NOT satisfied!

Detect and remove outlier

library(car)
out <- outlierTest(fit)
quitar <- c(-as.numeric(names(out$p)))
for (n in quitar)
datos1 <- datos1[n,]

ingpch.perc <- tapply(datos1$ING,datos1$perc.i,mean)
gaperc.perc <- tapply(datos1$GAST,datos1$perc.i,mean)
fit.i.g <-lm(gaperc.perc~ingpch.perc)
# Global test of model assumptions
library(gvlma)
gvmodel <- gvlma(fit.i.g) 
summary(gvmodel)

## 
## Call:
## lm(formula = gaperc.perc ~ ingpch.perc)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -57783  -4702  -1282   4804  47288 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.090e+04  1.449e+03    7.52 2.64e-11 ***
## ingpch.perc 7.404e-01  1.564e-02   47.33  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10270 on 98 degrees of freedom
## Multiple R-squared:  0.9581, Adjusted R-squared:  0.9576 
## F-statistic:  2240 on 1 and 98 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit.i.g) 
## 
##                      Value   p-value                   Decision
## Global Stat        878.285 0.000e+00 Assumptions NOT satisfied!
## Skewness             6.587 1.027e-02 Assumptions NOT satisfied!
## Kurtosis           738.757 0.000e+00 Assumptions NOT satisfied!
## Link Function       58.064 2.542e-14 Assumptions NOT satisfied!
## Heteroscedasticity  74.878 0.000e+00 Assumptions NOT satisfied!

plot(gvmodel)

# test Durbin sobre los residuos
library(descomponer)
gtd(fit.i.g$residuals,3)

# gráfica de normalidad de los residuos
hist(fit.g$residuals, freq=FALSE, 
   main="Distribución de los errores")

# test normalidad de los errores
library(tseries)
jarque.bera.test(fit.i.g$residuals)

## 
##  Jarque Bera Test
## 
## data:  fit.i.g$residuals
## X-squared = 745.34, df = 2, p-value < 2.2e-16

library(nortest)
lillie.test(fit.i.g$residuals)

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  fit.i.g$residuals
## D = 0.18471, p-value = 5.666e-09

cvm.test(fit.i.g$residuals)

## 
##  Cramer-von Mises normality test
## 
## data:  fit.i.g$residuals
## W = 0.74499, p-value = 2.967e-08

ad.test(fit.i.g$residuals)

## 
##  Anderson-Darling normality test
## 
## data:  fit.i.g$residuals
## A = 4.7787, p-value = 6.449e-12

sf.test(fit.i.g$residuals)

## 
##  Shapiro-Francia normality test
## 
## data:  fit.i.g$residuals
## W = 0.74828, p-value = 3.023e-10

# Representación gráfica de los datos
plot(ingpch.perc,gaperc.perc)
lines(ingpch.perc,fit.i.g$fitted,col=2)

The Durbin test (1969) shows error term is uncorrelated. In the normality test the Assumptions of linear regression are not satisfied.

library(descomponer)
# Estimación de la regresión por bandas de frecuencia
y <- as.numeric(gaperc.perc)[1:100]
x <- as.numeric(ingpch.perc)[1:100]
res <- rdf(y,x,3)
# grafica de los residuos en el dominio frecuencial
gtd(res$datos$res,3)

# Representación gráfica de los datos
plot(ingpch.perc[1:100],gaperc.perc[1:100])
lines(ingpch.perc[1:100],res$datos$F,col=2)

# gráfica de normalidad de los residuos
hist(res$datos$res, freq=FALSE, 
   main="Distribución de los errores")

boxplot(res$datos$res)

# Estimación del modelo en MCO
fit.rbs <- lm(gaperc.perc[1:100] ~ 0 + res$Tregresores)
summary(fit.rbs)

## 
## Call:
## lm(formula = gaperc.perc[1:100] ~ 0 + res$Tregresores)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -15804  -3730    -60   2641  32104 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC  1.176e+05  2.018e+04   5.825 8.74e-08 ***
## res$Tregresores1  6.029e+00  5.383e-01  11.200  < 2e-16 ***
## res$Tregresores2 -1.097e+00  1.931e-01  -5.681 1.63e-07 ***
## res$Tregresores3 -1.616e+00  3.656e-01  -4.421 2.74e-05 ***
## res$Tregresores4 -3.106e-02  2.038e-01  -0.152 0.879219    
## res$Tregresores5 -1.156e+00  2.606e-01  -4.436 2.58e-05 ***
## res$Tregresores6  1.568e-01  2.085e-01   0.752 0.453921    
## res$Tregresores7 -9.147e-01  1.813e-01  -5.046 2.34e-06 ***
## res$Tregresores8  5.765e-01  1.493e-01   3.862 0.000212 ***
## res$Tregresores9 -2.198e-01  1.318e-01  -1.668 0.098693 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6320 on 90 degrees of freedom
## Multiple R-squared:  0.994,  Adjusted R-squared:  0.9933 
## F-statistic:  1489 on 10 and 90 DF,  p-value: < 2.2e-16

library(car)
outlierTest(fit.rbs)

##                     rstudent unadjusted p-value Bonferonni p
## (1.85e+05,2.03e+05] 6.746942         1.4760e-09   1.4760e-07
## (2.11e+05,2.31e+05] 4.115547         8.6093e-05   8.6093e-03

gvmodel <- gvlma(fit.rbs) 
summary(gvmodel)

## 
## Call:
## lm(formula = gaperc.perc[1:100] ~ 0 + res$Tregresores)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -15804  -3730    -60   2641  32104 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC  1.176e+05  2.018e+04   5.825 8.74e-08 ***
## res$Tregresores1  6.029e+00  5.383e-01  11.200  < 2e-16 ***
## res$Tregresores2 -1.097e+00  1.931e-01  -5.681 1.63e-07 ***
## res$Tregresores3 -1.616e+00  3.656e-01  -4.421 2.74e-05 ***
## res$Tregresores4 -3.106e-02  2.038e-01  -0.152 0.879219    
## res$Tregresores5 -1.156e+00  2.606e-01  -4.436 2.58e-05 ***
## res$Tregresores6  1.568e-01  2.085e-01   0.752 0.453921    
## res$Tregresores7 -9.147e-01  1.813e-01  -5.046 2.34e-06 ***
## res$Tregresores8  5.765e-01  1.493e-01   3.862 0.000212 ***
## res$Tregresores9 -2.198e-01  1.318e-01  -1.668 0.098693 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6320 on 90 degrees of freedom
## Multiple R-squared:  0.994,  Adjusted R-squared:  0.9933 
## F-statistic:  1489 on 10 and 90 DF,  p-value: < 2.2e-16
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit.rbs) 
## 
##                        Value   p-value                   Decision
## Global Stat        392.63319 0.000e+00 Assumptions NOT satisfied!
## Skewness            48.41986 3.441e-12 Assumptions NOT satisfied!
## Kurtosis           291.34057 0.000e+00 Assumptions NOT satisfied!
## Link Function        0.07359 7.862e-01    Assumptions acceptable.
## Heteroscedasticity  52.79917 3.695e-13 Assumptions NOT satisfied!

#plot(gvmodel)
# test normalidad de los errores
library(tseries)
jarque.bera.test(res$datos$res)

## 
##  Jarque Bera Test
## 
## data:  res$datos$res
## X-squared = 339.76, df = 2, p-value < 2.2e-16

library(nortest)
lillie.test(res$datos$res)

## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  res$datos$res
## D = 0.10421, p-value = 0.009368

cvm.test(res$datos$res)

## 
##  Cramer-von Mises normality test
## 
## data:  res$datos$res
## W = 0.27971, p-value = 0.0005527

ad.test(res$datos$res)

## 
##  Anderson-Darling normality test
## 
## data:  res$datos$res
## A = 2.0698, p-value = 2.664e-05

sf.test(res$datos$res)

## 
##  Shapiro-Francia normality test
## 
## data:  res$datos$res
## W = 0.85376, p-value = 1.561e-07

#Estimacion modelo glm (Gaussian)
gfit1 <- glm(gaperc.perc[1:100] ~ 0 + res$Tregresores,family=gaussian)
summary(gfit1)

## 
## Call:
## glm(formula = gaperc.perc[1:100] ~ 0 + res$Tregresores, family = gaussian)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -15804   -3730     -60    2641   32104  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC  1.176e+05  2.018e+04   5.825 8.74e-08 ***
## res$Tregresores1  6.029e+00  5.383e-01  11.200  < 2e-16 ***
## res$Tregresores2 -1.097e+00  1.931e-01  -5.681 1.63e-07 ***
## res$Tregresores3 -1.616e+00  3.656e-01  -4.421 2.74e-05 ***
## res$Tregresores4 -3.106e-02  2.038e-01  -0.152 0.879219    
## res$Tregresores5 -1.156e+00  2.606e-01  -4.436 2.58e-05 ***
## res$Tregresores6  1.568e-01  2.085e-01   0.752 0.453921    
## res$Tregresores7 -9.147e-01  1.813e-01  -5.046 2.34e-06 ***
## res$Tregresores8  5.765e-01  1.493e-01   3.862 0.000212 ***
## res$Tregresores9 -2.198e-01  1.318e-01  -1.668 0.098693 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 39942248)
## 
##     Null deviance: 5.9831e+11  on 100  degrees of freedom
## Residual deviance: 3.5948e+09  on  90  degrees of freedom
## AIC: 2045.5
## 
## Number of Fisher Scoring iterations: 2

par(mfcol = c(2, 2))
plot(gfit1)

#Estimacion modelo glm (Gaussian)
gfit2 <- glm(gaperc.perc[1:100] ~ 0 + res$Tregresores,family=gaussian(link="log"))
summary(gfit2)

## 
## Call:
## glm(formula = gaperc.perc[1:100] ~ 0 + res$Tregresores, family = gaussian(link = "log"))
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -16316   -3779    -293    3868   20299  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC  9.817e+01  9.058e-01 108.383  < 2e-16 ***
## res$Tregresores1  9.030e-05  1.866e-05   4.839 5.38e-06 ***
## res$Tregresores2 -8.648e-05  8.386e-06 -10.313  < 2e-16 ***
## res$Tregresores3 -7.130e-05  1.293e-05  -5.516 3.29e-07 ***
## res$Tregresores4  6.703e-06  8.667e-06   0.773 0.441289    
## res$Tregresores5 -6.664e-05  8.099e-06  -8.228 1.38e-12 ***
## res$Tregresores6  3.144e-05  6.506e-06   4.833 5.51e-06 ***
## res$Tregresores7 -2.352e-05  5.375e-06  -4.376 3.24e-05 ***
## res$Tregresores8  1.666e-05  2.647e-06   6.294 1.10e-08 ***
## res$Tregresores9  1.131e-05  3.235e-06   3.497 0.000732 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 44588137)
## 
##     Null deviance: 5.983e+11  on 100  degrees of freedom
## Residual deviance: 4.013e+09  on  90  degrees of freedom
## AIC: 2056.6
## 
## Number of Fisher Scoring iterations: 7

par(mfcol = c(2, 2))
plot(gfit2)

#Estimacion modelo glm (Gamma)
gfit3 <- glm(gaperc.perc[1:100] ~ 0 + res$Tregresores,family=Gamma)
summary(gfit3)

## 
## Call:
## glm(formula = gaperc.perc[1:100] ~ 0 + res$Tregresores, family = Gamma)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.53803  -0.09148  -0.00753   0.09204   0.98782  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC  6.544e-04  3.691e-05  17.731  < 2e-16 ***
## res$Tregresores1 -5.622e-09  7.713e-10  -7.288 1.16e-10 ***
## res$Tregresores2  3.570e-09  2.960e-10  12.060  < 2e-16 ***
## res$Tregresores3  1.012e-10  4.978e-10   0.203   0.8393    
## res$Tregresores4  7.052e-10  3.255e-10   2.166   0.0329 *  
## res$Tregresores5  1.755e-09  2.809e-10   6.248 1.36e-08 ***
## res$Tregresores6 -6.010e-10  2.341e-10  -2.567   0.0119 *  
## res$Tregresores7  8.607e-10  1.929e-10   4.462 2.34e-05 ***
## res$Tregresores8 -4.315e-10  9.369e-11  -4.606 1.35e-05 ***
## res$Tregresores9 -1.868e-10  1.180e-10  -1.583   0.1169    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.05556184)
## 
##     Null deviance:    NaN  on 100  degrees of freedom
## Residual deviance: 4.3768  on  90  degrees of freedom
## AIC: 2125.2
## 
## Number of Fisher Scoring iterations: 5

par(mfcol = c(2, 2))
plot(gfit3)

## Warning in sqrt(crit * p * (1 - hh)/hh): Se han producido NaNs

## Warning in sqrt(crit * p * (1 - hh)/hh): Se han producido NaNs

#Estimacion modelo glm (Gamma)
gfit4 <- glm(gaperc.perc[1:100] ~ 0 + res$Tregresores,family=Gamma(link="identity"))
summary(gfit4)

## 
## Call:
## glm(formula = gaperc.perc[1:100] ~ 0 + res$Tregresores, family = Gamma(link = "identity"))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.20855  -0.07311  -0.00298   0.05615   0.32860  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## res$TregresoresC  8.914e+04  4.025e+03  22.146  < 2e-16 ***
## res$Tregresores1  6.995e+00  1.613e-01  43.374  < 2e-16 ***
## res$Tregresores2 -8.208e-01  1.096e-01  -7.489 4.54e-11 ***
## res$Tregresores3 -9.855e-01  1.207e-01  -8.164 1.87e-12 ***
## res$Tregresores4 -1.255e-01  1.016e-01  -1.236  0.21972    
## res$Tregresores5 -6.359e-01  1.130e-01  -5.625 2.07e-07 ***
## res$Tregresores6 -2.124e-01  9.870e-02  -2.152  0.03410 *  
## res$Tregresores7 -5.852e-01  1.090e-01  -5.370 6.11e-07 ***
## res$Tregresores8  9.435e-02  9.794e-02   0.963  0.33795    
## res$Tregresores9 -3.269e-01  1.060e-01  -3.085  0.00271 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.009568845)
## 
##     Null deviance:     NaN  on 100  degrees of freedom
## Residual deviance: 0.84015  on  90  degrees of freedom
## AIC: 1959.6
## 
## Number of Fisher Scoring iterations: 7

par(mfcol = c(2, 2))
plot(gfit4)

# Durbin de los errores de Gamma
par(mfcol = c(1, 1))
gtd(gfit4$residuals,3)

GLM produces poor results than OLS.

# Obtención de las propensiones medias al consumo por percentiles
PMeC <- data.frame(percentil=seq(1,100,by=1),observado=gaperc.perc[1:100]/ingpch.perc[1:100],estimado_MCO=lm(gaperc.perc[1:100]~ingpch.perc[1:100])$fitted/ingpch.perc[1:100],estimado_RBS=lm(gaperc.perc[1:100]~res$Tregresores)$fitted/ingpch.perc[1:100])
#PMeC

# Obtención de las propensiones marginales al consumo por percentiles
PMgC <- data.frame(percentil=seq(1,100,by=1),estimado_MCO=rep(lm(gaperc.perc[1:100]~ingpch.perc[1:100])$coefficients[2],100), estimado_RBS=(res$datos$F-(res$Betas[1]*res$Tregresores[1]))/ingpch.perc[1:100],estimado_RBS_T=(res$Betas[2]*res$Tregresores[,2])/ingpch.perc[1:100])
#PMgC
# Obtención de las propensiones marginales al consumo por percentiles con glm
PMgC_glm <- data.frame(percentil=seq(1,100,by=1),estimado_MCO=rep(lm(gaperc.perc[1:100]~ingpch.perc[1:100])$coefficients[2],100), estimado_RBS=(gfit4$fitted.values-(gfit4$coefficients[1]*res$Tregresores[1]))/ingpch.perc[1:100],estimado_RBS_T=(gfit4$coefficients[2]*res$Tregresores[,2])/ingpch.perc[1:100])
#PMgC_glm

# gráficos PMeC
plot(PMeC$percentil,PMeC$observado,ylim=c(-0.5,1.5))
lines(PMeC$percentil,PMeC$estimado_MCO,col=2)
lines(PMeC$percentil,PMeC$estimado_RBS,col=3)
legend("top", ncol=2,c("MCO","RBS"),cex=0.6,bty="n",fill=c(2,3))

# gráficos PMgC
plot(PMgC$percentil,PMgC$estimado_RBS,type="l",ylim=c(0,1.5),col=1)
lines(PMgC$percentil,PMgC$estimado_RBS_T,type="l",col=2)
lines(PMgC$percentil,PMgC$estimado_MCO,type="l",col=3)
legend("top", ncol=3,c("RBS","RBS_T","MCO"),cex=0.6,bty="n",fill=c(1,2,3))

# gráficos PMgC_glm
plot(PMgC_glm$percentil,PMgC_glm$estimado_RBS,type="l",ylim=c(0,1.5),col=1)
lines(PMgC_glm$percentil,PMgC_glm$estimado_RBS_T,col=2)
lines(PMgC_glm$percentil,PMgC_glm$estimado_MCO,col=3)
legend("top", ncol=3,c("RBS","RBS_T","MCO"),cex=0.6,bty="n",fill=c(1,2,3))

Conclusions

Bunting ( 1989) think that the states that the definition of the variables of income and expenses, including consideration or not in per capita terms , the form to data are organized by income interval, highly aggregated, determine the different MPC in consumption function. Usually , the MPC based on Keynesian consumption functions , the data are grouped at reasonable intervals income , established based on knowledge of the researcher.If I use intervals based in random percentiles, the PMC estimated by OLS is similar to individual data. But if I use intervals based in income percentiles, expenditure percentiles, or households whit income-expenditure percentiles differences less 50 (24.045 househols), the PMC estimated are different. The grouping form is informatión that incorpored to model, and the results should be interpreted based on the grouping. The best significance we find when we use the data grouping in percentiles of income or expenditure determines the social classes relevance on the consumption function.

By OLS and data grouping by percentils, we found normality problems in the error term : dependency between the errors obtained and the class ( percentile ) , unconstant variance, and extreme values , so the approach of grouping households income percentils in OLS , do not give satisfactory results from the point of view of gaussianas. Estimation made with RBS and GLM could be a solution to these problems in error terms.

The MPC estimated by OLS , RSB and GLM by income, expenditure, and less 50 are:

Ordened by	MCO	MCO-p	RBS	RBS-GLM	\(\frac {\bar C}{\bar Y}\)
Income	0.60	0.63	0.63	0.60	0.82
Expenditure	0.60	1.39	0.92	0.92	0.82
Income less 50	0.70	0.74	0.61	0.70	0.91

Mean the a-theoretical behavior of low-income households is the estimate of less than 50. That a-theorical behavior are observate in Argentina, Mexico and Spain estimation of PMC (Parra, F, 2016)

Fuente: Elaboración Propia

Bibliografía

Brewer, M., and O’Dea, C. (2012). Measuring living standards with income and consumption: evidence from the UK, retrieved from Institute for Social and Economic Research (ISER)-University of Essex: https://www.iser.essex.ac.uk/publications/working-papers/iser/2012-05.pdf

Bunting, D., 1989.“The compsumption function paradox” Journal of Post Keynesian Economics. vol 11, nº 3, 1989, pp. 347-359.

Bunting, D., 2001.“Keynes Law and Its Critics” Journal of Post Keynesian Economics vol. 24, nº 1, 2001, pp. 149- 163.

DURBIN, J., “Tests for Serial Correlation in Regression Analysis based on the Periodogram ofLeast-Squares Residuals,” Biometrika, 56, (No. 1, 1969), 1-15.

Engle, Robert F. (1974), Band Spectrum Regression,International Economic Review 15,1-11.

Friedman, M., 1957. A theory of the consumption function (Princeton University Press, Princeton, NJ).

Friedman, M. and Kuznets, S., 1945. Income from independent professional practice (National Bureau of Economic Research, NY).

Harvey, A.C. (1978), Linear Regression in the Frequency Domain, International Economic Review, 19, 507-512.

Keynes, J. M., 1936. The general theory of employment, interest and money (Harcourt, Brace & World, NY).

Kuznets, S., 1942. Uses of national income in peace and war, Occasional paper 6 (National Bureau of Economic Research, NY).

Modigliani, F. and Brumberg, R., 1954. “Utility analysis and the consumption function: an interpretation of cross-sectional data” in Kurihara, K. K. (ed.) Post Keynesian economics (Rutgers University Press, New Brunswick, NJ) 388-436.

Noll, H-H. (2007).Household consumption, household incomes and living standards-a review of related recent research activities. Retrieved from GESIS-Leibniz Institute for the Social Sciences: http://www.gesis.org/fileadmin/upload/institut/wiss_arbeitsbereiche/soz_indikatoren/Publikationen/Household-Expenditures-Research-Report.pdf

Parra F (2015): Seasonal Adjustment by Frequency Analysis. Package R Version 1.2. https://cran.r-project.org/web/packages/descomponer/index.html

Parra F (2016a):Propensión marginal al consumo de Argentina: https://rpubs.com/PacoParra/136937

Parra F (2016b): Propensión marginal al consumo de España: https://rpubs.com/PacoParra/164650

Parra F (2016c): Propensión marginal al consumo de México: https://rpubs.com/PacoParra/139911

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS (2016). Measurement challenges in consumption and income poverty. Working paper 4. Seminar on poverty measurement 12-13 July 2016, Geneva, Switzerland http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.15/2016/Sem/WP4_Chapter_2_ENG.pdf

U.S. Department of Labor, Bureau of Labor Statistics, Consumer Expenditure Survey, Interview Survey, 2014