La base de datos es de la Universidad de NY y contiene 90 observaciones que incluyen los costos de 6 aerolineas estadounidenses durante 15 años, de 1970 a 1984
Las variables son: * I = Aerolinea * T = Año * Q = Millas voladas por los pasajeros * C = Costo toal en $1,000 * PF = Precio del combustible * LF = Factor de carga (Utilización promedio de la capacidad de la flota)
Fuente: [Tabla F7.1] https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
#install.packages("plm")
library(plm)
#install.packages("tidyverse")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between() masks plm::between()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks plm::lag(), stats::lag()
## ✖ dplyr::lead() masks plm::lead()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#install.packages("forecast")
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
#install.packages("lavaan")
library(lavaan)
## This is lavaan 0.6-19
## lavaan is FREE software! Please report any bugs.
#install.packages("lavaanPlot")
library(lavaanPlot)
#install.packages("DataExplorer")
library(DataExplorer)
#install.packages("ggplot2")
library(ggplot2)
#install.packages("gplots")
library(gplots)
##
## Adjuntando el paquete: 'gplots'
##
## The following object is masked from 'package:stats':
##
## lowess
df <- read.csv("C:\\Users\\rodri\\Desktop\\Octavo\\Analitica\\Modulo_1\\Cost Data for U.S. Airlines.csv")
summary(df)
## I T C Q
## Min. :1.0 Min. : 1 Min. : 68978 Min. :0.03768
## 1st Qu.:2.0 1st Qu.: 4 1st Qu.: 292046 1st Qu.:0.14213
## Median :3.5 Median : 8 Median : 637001 Median :0.30503
## Mean :3.5 Mean : 8 Mean :1122524 Mean :0.54499
## 3rd Qu.:5.0 3rd Qu.:12 3rd Qu.:1345968 3rd Qu.:0.94528
## Max. :6.0 Max. :15 Max. :4748320 Max. :1.93646
## PF LF
## Min. : 103795 Min. :0.4321
## 1st Qu.: 129848 1st Qu.:0.5288
## Median : 357434 Median :0.5661
## Mean : 471683 Mean :0.5605
## 3rd Qu.: 849840 3rd Qu.:0.5947
## Max. :1015610 Max. :0.6763
str(df)
## 'data.frame': 90 obs. of 6 variables:
## $ I : int 1 1 1 1 1 1 1 1 1 1 ...
## $ T : int 1 2 3 4 5 6 7 8 9 10 ...
## $ C : int 1140640 1215690 1309570 1511530 1676730 1823740 2022890 2314760 2639160 3247620 ...
## $ Q : num 0.953 0.987 1.092 1.176 1.16 ...
## $ PF: int 106650 110307 110574 121974 196606 265609 263451 316411 384110 569251 ...
## $ LF: num 0.534 0.532 0.548 0.541 0.591 ...
head(df)
## I T C Q PF LF
## 1 1 1 1140640 0.952757 106650 0.534487
## 2 1 2 1215690 0.986757 110307 0.532328
## 3 1 3 1309570 1.091980 110574 0.547736
## 4 1 4 1511530 1.175780 121974 0.540846
## 5 1 5 1676730 1.160170 196606 0.591167
## 6 1 6 1823740 1.173760 265609 0.575417
df$I <- as.factor(df$I)
df$Y <- df$T + 1969
summary(df)
## I T C Q PF
## 1:15 Min. : 1 Min. : 68978 Min. :0.03768 Min. : 103795
## 2:15 1st Qu.: 4 1st Qu.: 292046 1st Qu.:0.14213 1st Qu.: 129848
## 3:15 Median : 8 Median : 637001 Median :0.30503 Median : 357434
## 4:15 Mean : 8 Mean :1122524 Mean :0.54499 Mean : 471683
## 5:15 3rd Qu.:12 3rd Qu.:1345968 3rd Qu.:0.94528 3rd Qu.: 849840
## 6:15 Max. :15 Max. :4748320 Max. :1.93646 Max. :1015610
## LF Y
## Min. :0.4321 Min. :1970
## 1st Qu.:0.5288 1st Qu.:1973
## Median :0.5661 Median :1977
## Mean :0.5605 Mean :1977
## 3rd Qu.:0.5947 3rd Qu.:1981
## Max. :0.6763 Max. :1984
str(df)
## 'data.frame': 90 obs. of 7 variables:
## $ I : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ T : int 1 2 3 4 5 6 7 8 9 10 ...
## $ C : int 1140640 1215690 1309570 1511530 1676730 1823740 2022890 2314760 2639160 3247620 ...
## $ Q : num 0.953 0.987 1.092 1.176 1.16 ...
## $ PF: int 106650 110307 110574 121974 196606 265609 263451 316411 384110 569251 ...
## $ LF: num 0.534 0.532 0.548 0.541 0.591 ...
## $ Y : num 1970 1971 1972 1973 1974 ...
head(df)
## I T C Q PF LF Y
## 1 1 1 1140640 0.952757 106650 0.534487 1970
## 2 1 2 1215690 0.986757 110307 0.532328 1971
## 3 1 3 1309570 1.091980 110574 0.547736 1972
## 4 1 4 1511530 1.175780 121974 0.540846 1973
## 5 1 5 1676730 1.160170 196606 0.591167 1974
## 6 1 6 1823740 1.173760 265609 0.575417 1975
#crate_report(df)
plot_missing(df)
plot_histogram(df)
plot_correlation(df)
ggplot(df, aes(x=Y, y=C, color=I, group=I)) + geom_line() + labs(tittle="Costo por Aerolinea (en miles)"
, x= "Año", y="Costo USD", color="Aerolinea")+
theme_minimal()
ggplot(df, aes(x=Y, y=Q, color=I, group=I)) + geom_line() + labs(tittle="Millas voladas (por pasajero )"
, x= "Año", y="Indice Normalizado ", color="Aerolinea")+
theme_minimal()
ggplot(df, aes(x=Y, y=PF, color=I, group=I)) + geom_line() + labs(tittle="Precio de Combustible "
, x= "Año", y="Costo USD", color="Aerolinea")+
theme_minimal()
ggplot(df, aes(x=Y, y=LF, color=I, group=I)) + geom_line() + labs(tittle="Factor de carga "
, x= "Año", y="Porcentaje", color="Aerolinea")+
theme_minimal()
plotmeans(C~I, main="Heterogenidad entre Aerolineas", xlab="Aerolineas",
ylab="Costo(Miles USD)", data= df)
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
Como el valor promedio (circulo) y el rango intercuartil (lineas azules) varian entre individuos se observa presencia de heterogeneidad
df_panel <- pdata.frame(df, index=c("I", "Y"))
df_panel <- df_panel %>% select(-c("I", "T", "Y"))
#El modelo de regresión agrupada (Pooled) es una tecnica de estimación en datos de panel donde sse asume que no hay efectos individuales especificos para cada unidad ni variaciones en el tiempo, ignora heterogeinidad.
pooled <- plm(C~ Q + PF + LF, data= df_panel, model = "pooling")
summary(pooled)
## Pooling Model
##
## Call:
## plm(formula = C ~ Q + PF + LF, data = df_panel, model = "pooling")
##
## Balanced Panel: n = 6, T = 15, N = 90
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -520654 -250270 37333 208690 849700
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 1.1586e+06 3.6059e+05 3.2129 0.00185 **
## Q 2.0261e+06 6.1807e+04 32.7813 < 2.2e-16 ***
## PF 1.2253e+00 1.0372e-01 11.8138 < 2.2e-16 ***
## LF -3.0658e+06 6.9633e+05 -4.4027 3.058e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1.2647e+14
## Residual Sum of Squares: 6.8177e+12
## R-Squared: 0.94609
## Adj. R-Squared: 0.94421
## F-statistic: 503.118 on 3 and 86 DF, p-value: < 2.22e-16
#Prueba de Breush-Pagan (BP): Para verificar si el modelo pooled es adecuado.
#P-value <0.05 Avanzamos para usar un modelo de efectos fijos o Aleatorios
#P-vaule >0.05 podemos usar el modelo Pooled
plmtest(pooled,type="bp")
##
## Lagrange Multiplier Test - (Breusch-Pagan)
##
## data: C ~ Q + PF + LF
## chisq = 0.61309, df = 1, p-value = 0.4336
## alternative hypothesis: significant effects
Como el P-value es >0.05, podemos utilizar el modelo pooled
within <- plm(C~ Q + PF + LF, data= df_panel, model = "within")
summary(within)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = C ~ Q + PF + LF, data = df_panel, model = "within")
##
## Balanced Panel: n = 6, T = 15, N = 90
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -551783 -159259 1796 0 137226 499296
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Q 3.3190e+06 1.7135e+05 19.3694 < 2.2e-16 ***
## PF 7.7307e-01 9.7319e-02 7.9437 9.698e-12 ***
## LF -3.7974e+06 6.1377e+05 -6.1869 2.375e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 5.0776e+13
## Residual Sum of Squares: 3.5865e+12
## R-Squared: 0.92937
## Adj. R-Squared: 0.92239
## F-statistic: 355.254 on 3 and 81 DF, p-value: < 2.22e-16
walhus <- plm(C~ Q + PF + LF, data= df_panel, model = "random" ,
random.method = "walhus")
summary(walhus)
## Oneway (individual) effect Random Effect Model
## (Wallace-Hussain's transformation)
##
## Call:
## plm(formula = C ~ Q + PF + LF, data = df_panel, model = "random",
## random.method = "walhus")
##
## Balanced Panel: n = 6, T = 15, N = 90
##
## Effects:
## var std.dev share
## idiosyncratic 7.339e+10 2.709e+05 0.969
## individual 2.363e+09 4.861e+04 0.031
## theta: 0.1788
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -524180 -243611 39332 199517 824905
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 1.1267e+06 3.6994e+05 3.0455 0.002323 **
## Q 2.0647e+06 7.1927e+04 28.7051 < 2.2e-16 ***
## PF 1.2075e+00 1.0358e-01 11.6578 < 2.2e-16 ***
## LF -3.0314e+06 7.1431e+05 -4.2438 2.198e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1.0182e+14
## Residual Sum of Squares: 6.5784e+12
## R-Squared: 0.93539
## Adj. R-Squared: 0.93314
## Chisq: 1245.09 on 3 DF, p-value: < 2.22e-16
amemiya <- plm(C~ Q + PF + LF, data= df_panel, model = "random" ,
random.method = "amemiya")
summary(amemiya)
## Oneway (individual) effect Random Effect Model
## (Amemiya's transformation)
##
## Call:
## plm(formula = C ~ Q + PF + LF, data = df_panel, model = "random",
## random.method = "amemiya")
##
## Balanced Panel: n = 6, T = 15, N = 90
##
## Effects:
## var std.dev share
## idiosyncratic 4.270e+10 2.066e+05 0.084
## individual 4.640e+11 6.812e+05 0.916
## theta: 0.9219
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -603585 -144415 22641 158005 485417
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 1.0746e+06 4.2105e+05 2.5522 0.0107 *
## Q 3.2090e+06 1.6482e+05 19.4695 < 2.2e-16 ***
## PF 8.1014e-01 9.6147e-02 8.4260 < 2.2e-16 ***
## LF -3.7168e+06 6.1330e+05 -6.0603 1.359e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 5.1238e+13
## Residual Sum of Squares: 3.8227e+12
## R-Squared: 0.92539
## Adj. R-Squared: 0.92279
## Chisq: 1066.71 on 3 DF, p-value: < 2.22e-16
nerlove <- plm(C~ Q + PF + LF, data= df_panel, model = "random" ,
random.method = "nerlove")
summary(nerlove)
## Oneway (individual) effect Random Effect Model
## (Nerlove's transformation)
##
## Call:
## plm(formula = C ~ Q + PF + LF, data = df_panel, model = "random",
## random.method = "nerlove")
##
## Balanced Panel: n = 6, T = 15, N = 90
##
## Effects:
## var std.dev share
## idiosyncratic 3.985e+10 1.996e+05 0.066
## individual 5.602e+11 7.485e+05 0.934
## theta: 0.9313
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -601947 -145039 18713 154903 483623
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 1.0752e+06 4.4535e+05 2.4142 0.01577 *
## Q 3.2323e+06 1.6521e+05 19.5652 < 2.2e-16 ***
## PF 8.0229e-01 9.5804e-02 8.3743 < 2.2e-16 ***
## LF -3.7338e+06 6.0963e+05 -6.1247 9.084e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 5.1133e+13
## Residual Sum of Squares: 3.7726e+12
## R-Squared: 0.92622
## Adj. R-Squared: 0.92365
## Chisq: 1079.63 on 3 DF, p-value: < 2.22e-16
Comparando sus R2 ajustadas, el mejor metodo en el modelo en efectos aleatorios es el de walhus.
phtest(within,walhus)
##
## Hausman Test
##
## data: C ~ Q + PF + LF
## chisq = 65.039, df = 3, p-value = 4.919e-14
## alternative hypothesis: one model is inconsistent
df_a1 <- df[df$I == "1" , ]
ts_a1 <- ts(df_a1$C, start= 1970, frequency=1)
df_a2 <- df[df$I == "2" , ]
ts_a2 <- ts(df_a2$C, start= 1970, frequency=1)
df_a3 <- df[df$I == "3" , ]
ts_a3 <- ts(df_a3$C, start= 1970, frequency=1)
df_a4 <- df[df$I == "4" , ]
ts_a4 <- ts(df_a4$C, start= 1970, frequency=1)
df_a5 <- df[df$I == "5" , ]
ts_a5 <- ts(df_a5$C, start= 1970, frequency=1)
df_a6 <- df[df$I == "6" , ]
ts_a6 <- ts(df_a6$C, start= 1970, frequency=1)
arima_a1 <- auto.arima(ts_a1)
summary(arima_a1)
## Series: ts_a1
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 257691.43
## s.e. 44508.78
##
## sigma^2 = 2.987e+10: log likelihood = -188.19
## AIC=380.37 AICc=381.46 BIC=381.65
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 58.86321 160892 129527.1 -1.742419 5.395122 0.502644 0.4084903
arima_a2 <- auto.arima(ts_a2)
summary(arima_a2)
## Series: ts_a2
## ARIMA(0,2,0)
##
## sigma^2 = 1.392e+10: log likelihood = -170.26
## AIC=342.53 AICc=342.89 BIC=343.09
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 11689.89 109830.2 79466.33 1.387268 3.747652 0.3056315 0.3172172
arima_a3 <- auto.arima(ts_a3)
summary(arima_a3)
## Series: ts_a3
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 63155.14
## s.e. 13344.11
##
## sigma^2 = 2.685e+09: log likelihood = -171.32
## AIC=346.64 AICc=347.74 BIC=347.92
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 14.87618 48235.79 38474.72 -0.9277567 5.324145 0.538349 0.09130379
arima_a4 <- auto.arima(ts_a4)
summary(arima_a4)
## Series: ts_a4
## ARIMA(0,2,0)
##
## sigma^2 = 1.469e+09: log likelihood = -155.65
## AIC=313.3 AICc=313.66 BIC=313.86
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7232.074 35684.5 27472.98 1.761789 5.046326 0.2977402 0.1925091
arima_a5 <- auto.arima(ts_a5)
summary(arima_a5)
## Series: ts_a5
## ARIMA(1,2,0)
##
## Coefficients:
## ar1
## -0.4543
## s.e. 0.2354
##
## sigma^2 = 775697764: log likelihood = -151.09
## AIC=306.18 AICc=307.38 BIC=307.31
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 3061.06 24911.01 14171.99 2.393894 4.771228 0.3823654 0.008627682
arima_a6 <- auto.arima(ts_a6)
summary(arima_a6)
## Series: ts_a6
## ARIMA(1,2,0)
##
## Coefficients:
## ar1
## 0.5824
## s.e. 0.2281
##
## sigma^2 = 386182350: log likelihood = -146.65
## AIC=297.3 AICc=298.5 BIC=298.43
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 6829.403 17576.86 10190.16 2.076518 3.550582 0.1516841 -0.2989742
pronostico_a6 <- forecast(arima_a6, level=95, h=5)
pronostico_a6
## Point Forecast Lo 95 Hi 95
## 1985 1234478 1195962 1272994
## 1986 1471026 1364365 1577687
## 1987 1714311 1510670 1917953
## 1988 1961521 1635113 2287929
## 1989 2211016 1738872 2683160
plot (pronostico_a6, main = "Pronostico de costo total (En miles)",
xlab = "Año", ylab= "Dolares")
modelo <- '
#Regresiones
C ~ Q + PF + LF + I Y
Q ~ LF + PF + I
PF ~ Y
LF ~ I
#Variables Latentes
#Varianza y Covarianza
#Intercepto
'
#cfa1 <- sem(modelo, data=df_a1)
#summary(cfa1)
modelo <- '
# Regresiones
C ~ Q + PF + LF + I + Y
Q ~ PF + I
PF ~ Y
LF ~ I
# Variables Latentes
# Varianzas y Covarianzas
C ~~ C
Q ~~ Q
PF ~~ PF
LF ~~ LF
# Intercepto
'
df_escalada <- df
df_escalada$I <- as.numeric(df_escalada$I)
df_escalada <- scale(df_escalada)
cfa <- cfa(modelo, df_escalada)
summary(cfa)
## lavaan 0.6-19 ended normally after 3 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 13
##
## Number of observations 90
##
## Model Test User Model:
##
## Test statistic 63.804
## Degrees of freedom 5
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## C ~
## Q 1.000 0.053 18.826 0.000
## PF 0.194 0.065 3.000 0.003
## LF -0.154 0.025 -6.248 0.000
## I 0.105 0.052 1.999 0.046
## Y 0.140 0.063 2.211 0.027
## Q ~
## PF 0.239 0.046 5.213 0.000
## I -0.871 0.046 -18.985 0.000
## PF ~
## Y 0.931 0.038 24.233 0.000
## LF ~
## I -0.340 0.099 -3.429 0.001
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .C 0.048 0.007 6.708 0.000
## .Q 0.187 0.028 6.708 0.000
## .PF 0.131 0.020 6.708 0.000
## .LF 0.875 0.130 6.708 0.000
lavaanPlot(cfa)
Profe tuve un problema a la hora de publicar la shiny App, asi que tuve que hacer 2 publicaciones, la primera con las pestañas que hicimos en clase y la segunda con las pestañas que hicimos por equipo.
Primera parte Shiny App Link shiny app (Parte en clase):
Segunda parte Shiny App Link Shiny App (Parte equipo):