Correr solo las lineas que sean necesarias, de paquetes que no se tienen instalados
Obtener indicador endógeno(% de poblacion rural) y los factores que la pueden explicar (% de fertilidad,area y PIB per capita) Indicadoeres recuperados del siguiente link (https://data.worldbank.org/indicator)
gdp_data <- wb_data(country = c("IT","PY","SG","UG"), indicator = c("SP.RUR.TOTL.ZS","SP.DYN.TFRT.IN","AG.SRF.TOTL.K2","NY.GDP.PCAP.CD"), start_date = 1961, end_date = 2021)
datatable(gdp_data, style = "bootstrap5", options = list(scrollX = TRUE))panel <- gdp_data %>%
select("country","date","rurp"="SP.RUR.TOTL.ZS","fert_rate"="SP.DYN.TFRT.IN","area"="AG.SRF.TOTL.K2","gdp_pc"="NY.GDP.PCAP.CD")
panel <- subset(panel, date == 1970 | date == 1980 | date == 1990 | date == 2000 | date == 2010 | date == 2020)
panel <- pdata.frame(panel, index = c("country","date"))
datatable(panel, style = "bootstrap5", options = list(scrollX = TRUE))¿Las lineas de une los promedios horizontales, son planas o tienen muchos picos?
No se forma ningun pico, las lineas forman una linea casi reacta con un poco de tendencia de decrecimiento.
¿Los intervalos de confianza miden lo mismo o estan desfasados?
Los intervalos miden a la vista se ven relativamente similares, sin embargo su muestran un pequeño desfase, siendo mas cortos conforme pasa el tiempo.
¿Las lineas de une los promedios horizontales, son rectas o tienen muchos picos?
Las lineas que unen los promedios estan muy desfasadas, con picos muy notorios entre pais y pais.
¿Los intervalos de confianza miden lo mismo o estan desfasados?
Los intervalos de confianza son imperceptibles para 2 de los paises, Sin embargo paro otros 2 de ellos si se ve un rango mayor. No se puede observar que exista sobrepocision entre ninguno ni similitudes notoros entre los intervalos.
Heterogeneidad
La heterogeneidad habla de la variabilidad que puede existir entre los datos.Tenemos ambos casos en nuestro set de datos, una en la que las medias se sobreponen y otro en el que hay mucha variabilidad generando picos.Para el caso del estudio, es mas deseable que exista variabilidad, pues la heterogeneidad permite tener un estudio mas profundo.
## Pooling Model
##
## Call:
## plm(formula = rurp ~ fert_rate + area + gdp_pc, data = panel,
## model = "pooling")
##
## Balanced Panel: n = 4, T = 6, N = 24
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -18.0589 -4.7412 1.4100 6.8509 20.2304
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -2.4210e+01 7.8503e+00 -3.0840 0.005853 **
## fert_rate 1.2726e+01 1.3228e+00 9.6202 6.042e-09 ***
## area 7.8920e-05 1.5942e-05 4.9504 7.704e-05 ***
## gdp_pc 2.8415e-04 1.8302e-04 1.5525 0.136217
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 23788
## Residual Sum of Squares: 2212.4
## R-Squared: 0.90699
## Adj. R-Squared: 0.89304
## F-statistic: 65.012 on 3 and 20 DF, p-value: 1.7148e-10
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = rurp ~ fert_rate + area + gdp_pc, data = panel,
## model = "within")
##
## Balanced Panel: n = 4, T = 6, N = 24
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -5.31851 -1.37320 0.27169 1.55007 3.47691
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## fert_rate 6.8525e+00 6.6347e-01 10.3282 9.624e-09 ***
## area -3.7391e-03 3.0484e-03 -1.2266 0.236700
## gdp_pc 1.5347e-04 4.7794e-05 3.2110 0.005124 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 776.43
## Residual Sum of Squares: 104.13
## R-Squared: 0.86588
## Adj. R-Squared: 0.81855
## F-statistic: 36.5845 on 3 and 17 DF, p-value: 1.2345e-07
##
## F test for individual effects
##
## data: rurp ~ fert_rate + area + gdp_pc
## F = 114.73, df1 = 3, df2 = 17, p-value = 1.755e-11
## alternative hypothesis: significant effects
walhus <- plm(rurp ~ fert_rate + area + gdp_pc, data = panel, model = "random", random.method = "walhus")
summary(walhus)## Oneway (individual) effect Random Effect Model
## (Wallace-Hussain's transformation)
##
## Call:
## plm(formula = rurp ~ fert_rate + area + gdp_pc, data = panel,
## model = "random", random.method = "walhus")
##
## Balanced Panel: n = 4, T = 6, N = 24
##
## Effects:
## var std.dev share
## idiosyncratic 29.888 5.467 0.324
## individual 62.297 7.893 0.676
## theta: 0.7279
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -9.72448 -2.70135 -0.60043 2.55608 9.86784
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) -1.1471e+01 7.4143e+00 -1.5472 0.1218217
## fert_rate 8.6915e+00 1.0461e+00 8.3085 < 2.2e-16 ***
## area 9.0448e-05 2.3256e-05 3.8892 0.0001006 ***
## gdp_pc 1.5669e-04 8.0162e-05 1.9546 0.0506285 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2480.2
## Residual Sum of Squares: 411.26
## R-Squared: 0.83418
## Adj. R-Squared: 0.80931
## Chisq: 100.615 on 3 DF, p-value: < 2.22e-16
amemiya <- plm(rurp ~ fert_rate + area + gdp_pc, data = panel, model = "random", random.method = "amemiya")
summary(amemiya)## Oneway (individual) effect Random Effect Model
## (Amemiya's transformation)
##
## Call:
## plm(formula = rurp ~ fert_rate + area + gdp_pc, data = panel,
## model = "random", random.method = "amemiya")
##
## Balanced Panel: n = 4, T = 6, N = 24
##
## Effects:
## var std.dev share
## idiosyncratic 5.207e+00 2.282e+00 0
## individual 3.274e+05 5.722e+02 1
## theta: 0.9984
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -6.14005 -1.05730 0.50996 1.77575 2.74267
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 2.8343e+02 4.8769e+02 0.5812 0.561123
## fert_rate 6.8387e+00 6.3073e-01 10.8425 < 2.2e-16 ***
## area -1.1224e-03 1.6344e-03 -0.6867 0.492254
## gdp_pc 1.3732e-04 4.2976e-05 3.1952 0.001397 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 776.49
## Residual Sum of Squares: 110.76
## R-Squared: 0.85735
## Adj. R-Squared: 0.83596
## Chisq: 120.206 on 3 DF, p-value: < 2.22e-16
nerlove <- plm(rurp ~ fert_rate + area + gdp_pc, data = panel, model = "random", random.method = "nerlove")
summary(nerlove)## Oneway (individual) effect Random Effect Model
## (Nerlove's transformation)
##
## Call:
## plm(formula = rurp ~ fert_rate + area + gdp_pc, data = panel,
## model = "random", random.method = "nerlove")
##
## Balanced Panel: n = 4, T = 6, N = 24
##
## Effects:
## var std.dev share
## idiosyncratic 4.339e+00 2.083e+00 0
## individual 4.365e+05 6.607e+02 1
## theta: 0.9987
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -6.11243 -1.03224 0.48394 1.67164 2.80559
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 3.8303e+02 5.8190e+02 0.6582 0.510384
## fert_rate 6.8409e+00 6.2771e-01 10.8981 < 2.2e-16 ***
## area -1.5416e-03 1.8854e-03 -0.8176 0.413563
## gdp_pc 1.3990e-04 4.3172e-05 3.2407 0.001193 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 776.47
## Residual Sum of Squares: 109.7
## R-Squared: 0.85872
## Adj. R-Squared: 0.83753
## Chisq: 121.563 on 3 DF, p-value: < 2.22e-16
##
## Hausman Test
##
## data: rurp ~ fert_rate + area + gdp_pc
## chisq = 1.0341, df = 3, p-value = 0.793
## alternative hypothesis: one model is inconsistent
Por lo tanto, el mejor modelo de los datos de panel es el de REGRESIÓN AGRUPADA.
medias_empresa <- patents %>%
group_by(cusip) %>%
summarise(employ = round(mean(employ, na.rm = TRUE),0),
return = round(mean(return, na.rm = TRUE),0),
stckpr = round(mean(stckpr, na.rm = TRUE),0),
rndstck = round(mean(rndstck, na.rm = TRUE),0),
sales = round(mean(sales, na.rm = TRUE),0))
datatable(medias_empresa, style = "bootstrap5", options = list(scrollX = TRUE))d <- select(patents,"cusip", "employ", "return", "stckpr", "rndstck", "sales")
for (i in 1:nrow(d)) {
for (j in 2:ncol(d)) {
if (is.na(d[i, j])) {
k <- d[i,1]
for (l in 1:nrow(medias_empresa)) {
if (medias_empresa[l,1] == k) {
if (medias_empresa[l, j]=="NaN") {
d[i, j] <- 0
} else {
d[i, j] <- medias_empresa[l, j]
}
}
}
}
}
}
patents <- patents %>%
mutate(employ = d$employ) %>%
mutate(return = d$return) %>%
mutate(stckpr = d$stckpr) %>%
mutate(rndstck = d$rndstck) %>%
mutate(sales = d$sales)Se ven disyintis picos año contra año en los datos.
Hay muchas empresas, por lo cual la gráfica es poco clara, igualmente
hay algunos datos extremos que hacen que parezca que la gran mayoría son
iguales. Con esto podemos decir que si existe variabilidad en la
muestra, apreciado principalmente por los picos notorios de cada
empresa.
## Pooling Model
##
## Call:
## plm(formula = patentsg ~ employ + employ + rnd, data = panel,
## model = "pooling")
##
## Balanced Panel: n = 226, T = 10, N = 2260
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -466.38225 -8.01744 -4.43539 -0.64974 550.82923
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 4.059626 1.200800 3.3808 0.0007351 ***
## employ 1.396484 0.043841 31.8535 < 2.2e-16 ***
## rnd -0.101845 0.017568 -5.7971 7.694e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 14168000
## Residual Sum of Squares: 6290100
## R-Squared: 0.55603
## Adj. R-Squared: 0.55564
## F-statistic: 1413.34 on 2 and 2257 DF, p-value: < 2.22e-16
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = patentsg ~ employ + employ + rnd, data = panel,
## model = "within")
##
## Balanced Panel: n = 226, T = 10, N = 2260
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -220.74801 -1.90511 -0.31859 1.50760 268.54356
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## employ -0.1147219 0.0594787 -1.9288 0.0539 .
## rnd -0.1774642 0.0093599 -18.9601 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 715640
## Residual Sum of Squares: 596130
## R-Squared: 0.16699
## Adj. R-Squared: 0.073934
## F-statistic: 203.676 on 2 and 2032 DF, p-value: < 2.22e-16
##
## F test for individual effects
##
## data: patentsg ~ employ + employ + rnd
## F = 86.262, df1 = 225, df2 = 2032, p-value < 2.2e-16
## alternative hypothesis: significant effects
walhus <- plm(patentsg ~ employ + employ + rnd, data = panel, model = "random", random.method = "walhus")
summary(walhus)## Oneway (individual) effect Random Effect Model
## (Wallace-Hussain's transformation)
##
## Call:
## plm(formula = patentsg ~ employ + employ + rnd, data = panel,
## model = "random", random.method = "walhus")
##
## Balanced Panel: n = 226, T = 10, N = 2260
##
## Effects:
## var std.dev share
## idiosyncratic 418.42 20.46 0.15
## individual 2364.82 48.63 0.85
## theta: 0.8681
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -150.14608 -3.87903 -2.59540 -0.17031 316.47962
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 18.059732 3.159814 5.7154 1.094e-08 ***
## employ 0.740581 0.048523 15.2624 < 2.2e-16 ***
## rnd -0.161776 0.010270 -15.7523 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 949520
## Residual Sum of Squares: 824010
## R-Squared: 0.13218
## Adj. R-Squared: 0.13141
## Chisq: 343.77 on 2 DF, p-value: < 2.22e-16
amemiya <- plm(patentsg ~ employ + employ + rnd, data = panel, model = "random", random.method = "amemiya")
summary(amemiya)## Oneway (individual) effect Random Effect Model
## (Amemiya's transformation)
##
## Call:
## plm(formula = patentsg ~ employ + employ + rnd, data = panel,
## model = "random", random.method = "amemiya")
##
## Balanced Panel: n = 226, T = 10, N = 2260
##
## Effects:
## var std.dev share
## idiosyncratic 293.08 17.12 0.03
## individual 9470.68 97.32 0.97
## theta: 0.9445
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -176.9881 -3.0564 -1.8240 0.3183 291.1447
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 29.3690526 6.5221228 4.5030 6.7e-06 ***
## employ 0.1520859 0.0546938 2.7807 0.005425 **
## rnd -0.1728768 0.0092743 -18.6405 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 757140
## Residual Sum of Squares: 654340
## R-Squared: 0.13577
## Adj. R-Squared: 0.13501
## Chisq: 354.581 on 2 DF, p-value: < 2.22e-16
nerlove <- plm(patentsg ~ employ + employ + rnd, data = panel, model = "random", random.method = "nerlove")
summary(nerlove)## Oneway (individual) effect Random Effect Model
## (Nerlove's transformation)
##
## Call:
## plm(formula = patentsg ~ employ + employ + rnd, data = panel,
## model = "random", random.method = "nerlove")
##
## Balanced Panel: n = 226, T = 10, N = 2260
##
## Effects:
## var std.dev share
## idiosyncratic 263.77 16.24 0.027
## individual 9542.21 97.68 0.973
## theta: 0.9475
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -178.74502 -3.00149 -1.75568 0.37022 289.92470
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 29.832914 6.863371 4.3467 1.382e-05 ***
## employ 0.127897 0.054877 2.3306 0.01977 *
## rnd -0.173299 0.009238 -18.7594 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 752720
## Residual Sum of Squares: 648790
## R-Squared: 0.13807
## Adj. R-Squared: 0.1373
## Chisq: 361.535 on 2 DF, p-value: < 2.22e-16
Por lo tanto, el mejor modelo de los datos de panel es el de
REGRESIÓN AGRUPADA.
El modelo seleccionado tiene una R cuadrada ajustada de
0.55, lo cual nos dice que es un modelo con una certeza aceptable.