Se desea establecer un modelo de regresión logística que permita calcular la probabilidad de lograr concretar un emprendimiento (Emprendimiento) en función de la edad de la persona (Edad). Los datos se encuentran en la base de datos denominada Data1
# =============================================
# 1. Cargar y preparar los datos
# =============================================
Data1 <- read.csv("Data1.csv")
# Mantener Emprendimiento como FACTOR
Data1$Emprendimiento <- factor(Data1$Emprendimiento,levels = c(0,1),labels = c("No", "Sí"))
# Verificar estructura
str(Data1)## 'data.frame': 300 obs. of 2 variables:
## $ Emprendimiento: Factor w/ 2 levels "No","Sí": 2 2 1 1 2 1 1 1 1 1 ...
## $ Edad : int 54 24 42 44 28 30 62 57 41 56 ...
## Emprendimiento Edad
## No:233 Min. :20.00
## Sí: 67 1st Qu.:36.75
## Median :51.00
## Mean :50.72
## 3rd Qu.:66.00
## Max. :80.00
# 2. Visualización de los datos
# Visualización 1: Boxplot + jitter
ggplot(Data1, aes(x = Emprendimiento, y = Edad, fill = Emprendimiento)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.25, alpha = 0.6, size = 2.5) +
scale_fill_manual(values = c("No" = "red", "Sí" = "green")) +
labs(title = "Relación entre Edad y Emprendimiento",
x = "Emprendimiento",
y = "Edad") +
theme_minimal(base_size = 14)# Visualización 2: Proporción de emprendedores por rangos de edad
Data1 %>%
mutate(Edad_grupo = cut(Edad, breaks = seq(18, 70, by = 5))) %>%
group_by(Edad_grupo) %>%
summarise(Proporcion = mean(Emprendimiento == "Sí")) %>%
ggplot(aes(x = Edad_grupo, y = Proporcion, group = 1)) +
geom_line(color = "blue", size = 1.1) +
geom_point(size = 3, color = "blue") +
labs(title = "Proporción de emprendedores por grupo de edad",
x = "Grupo de Edad",
y = "Proporción de Sí") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))# Proporciones
prop1 <- prop.table(table(Data1$Emprendimiento))
# Probabilidades
p1 <- prop1["Sí"]
q1 <- prop1["No"]
# Odds de emprendimiento
odds1 <- p1 / q1
#Resultados
cat("Proporción de emprendedores:", round(p1*100, 2), "%\n")## Proporción de emprendedores: 22.33 %
## Proporción de no emprendedores: 77.67 %
## Odds globales de emprender: 0.288
# Ajustar modelo de regresión logística
mod1 <- glm(Emprendimiento ~ ., data = Data1, family = binomial(link = "logit"))
# Resumen del modelo
summary(mod1)##
## Call:
## glm(formula = Emprendimiento ~ ., family = binomial(link = "logit"),
## data = Data1)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.96604 0.53096 5.586 2.32e-08 ***
## Edad -0.09626 0.01305 -7.375 1.64e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 318.66 on 299 degrees of freedom
## Residual deviance: 229.42 on 298 degrees of freedom
## AIC: 233.42
##
## Number of Fisher Scoring iterations: 5
## (Intercept) Edad
## 19.4149053 0.9082291
## 2.5 % 97.5 %
## (Intercept) 7.1456966 57.8181113
## Edad 0.8836901 0.9302935
# =============================================
# Analisis e interpretacion
# =============================================
# El coeficiente es negativo (-0.09626).
# Esto significa que a mayor edad, menor es la probabilidad de concretar un emprendimiento.
# El coeficiente es altamente significativo (p-value = 1.64e-13).
# Por cada año adicional de edad, la razón de probabilidades de concretar un emprendimiento disminuyen
# aproximadamente un 9.18%, manteniendo constante las demas variables.Se ajustó el siguiente modelo de regresión logística:
##
## Call: glm(formula = Emprendimiento ~ Edad, family = binomial(link = "logit"),
## data = Data1)
##
## Coefficients:
## (Intercept) Edad
## 2.96604 -0.09626
##
## Degrees of Freedom: 299 Total (i.e. Null); 298 Residual
## Null Deviance: 318.7
## Residual Deviance: 229.4 AIC: 233.4
Odds Ratios:
## (Intercept) Edad
## 19.4149053 0.9082291
Se desea plantear un modelo de datos de panel para analizar cuánto afecta la variable Manufacturing al GDP en algunos países de las Antillas Mayores (Puerto Rico, Cuba, Haití y República Dominicana), medido desde 1990. Estos datos fueron recopilados del Banco Mundial, y se encuentran en la base de datos denominada Data2.
#------------------------------------------------------
# 1. Cargar los datos
#------------------------------------------------------
data <- read.csv("Data2.csv")
# 1. Gráfico de dispersión
ggplot(data, aes(x = Year, y = Manufacturing, color = Country)) +
geom_point() +
geom_line() +
labs(
title = "Evolución de Manufacturing por país",
x = "Tiempo (Año)",
y = "Manufacturing"
) +
theme_minimal()# Declarar estructura panel
pdata <- pdata.frame(data, index = c("Country", "Year"))
#------------------------------------------------------
# 2. Visualización exploratoria
#------------------------------------------------------
scatterplot(Manufacturing ~ Year | Country, regLine = FALSE, data = pdata)plotmeans(Manufacturing ~ Year, data = pdata, main = "Heterogeneidad de Manufacturing a través del tiempo")plotmeans(Manufacturing ~ Country, data = pdata, main = "Heterogeneidad de Manufacturing entre países")#------------------------------------------------------
# 3. Estimación de modelos base
#------------------------------------------------------
# Modelo agrupado
modelo_pool <- plm(Manufacturing ~ GDP, data = pdata, model = "pooling")
# Modelo de efectos fijos individuales
modelo_fe <- plm(Manufacturing ~ GDP, data = pdata, model = "within", effect = "individual")
# Modelo de efectos aleatorios individuales
modelo_re <- plm(Manufacturing ~ GDP, data = pdata, model = "random", effect = "individual")
#------------------------------------------------------
# 4. Selección del modelo
#------------------------------------------------------
# Paso 1: Modelo agrupado vs efectos fijos
# H0: no existen efectos individuales -> usar modelo agrupado
# H1: existen efectos individuales -> usar modelo de efectos fijos
# Si p-value < 0.05, se rechaza H0 y se continúa con efectos fijos/aleatorios.
pFtest(modelo_fe, modelo_pool)##
## F test for individual effects
##
## data: Manufacturing ~ GDP
## F = 745.32, df1 = 3, df2 = 127, p-value < 2.2e-16
## alternative hypothesis: significant effects
# Paso 2: Efectos fijos vs efectos aleatorios
# H0: el modelo adecuado es efectos aleatorios
# H1: el modelo adecuado es efectos fijos
# Si p-value < 0.05, se usa efectos fijos.
# Si p-value >= 0.05, se usa efectos aleatorios.
phtest(modelo_fe, modelo_re)##
## Hausman Test
##
## data: Manufacturing ~ GDP
## chisq = 0.0037046, df = 1, p-value = 0.9515
## alternative hypothesis: one model is inconsistent
# Paso 3: Evaluar efectos de tiempo
# Si el modelo seleccionado fue efectos fijos:
modelo_fet <- plm(Manufacturing ~ GDP, data = pdata, model = "within", effect = "twoways")
# H0: no existen efectos de tiempo
# H1: existen efectos de tiempo
# Si p-value < 0.05, usar efectos fijos con tiempo.
pFtest(modelo_fet, modelo_fe)##
## F test for twoways effects
##
## data: Manufacturing ~ GDP
## F = 1.2574, df1 = 32, df2 = 95, p-value = 0.1973
## alternative hypothesis: significant effects
# Si el modelo seleccionado fue efectos aleatorios:
modelo_ret <- plm(Manufacturing ~ GDP, data = pdata, model = "random", effect = "twoways")
summary(modelo_re)## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, effect = "individual",
## model = "random")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Effects:
## var std.dev share
## idiosyncratic 8.528 2.920 0.029
## individual 284.570 16.869 0.971
## theta: 0.9699
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -6.866142 -2.204334 -0.023309 1.875971 9.658746
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 2.3576e+01 8.4098e+00 2.8035 0.005056 **
## GDP -1.3878e-05 3.1169e-06 -4.4526 8.483e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1268.3
## Residual Sum of Squares: 1100.5
## R-Squared: 0.13233
## Adj. R-Squared: 0.12565
## Chisq: 19.8257 on 1 DF, p-value: 8.4834e-06
## Twoways effects Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, effect = "twoways",
## model = "random")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Effects:
## var std.dev share
## idiosyncratic 8.008 2.830 0.027
## individual 284.586 16.870 0.973
## time 0.000 0.000 0.000
## theta: 0.9708 (id) 0 (time) 0 (total)
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -6.856964 -2.201694 -0.015112 1.878369 9.665814
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 2.3576e+01 8.6739e+00 2.7181 0.006566 **
## GDP -1.3877e-05 3.1155e-06 -4.4543 8.415e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1267.2
## Residual Sum of Squares: 1099.4
## R-Squared: 0.13241
## Adj. R-Squared: 0.12574
## Chisq: 19.8412 on 1 DF, p-value: 8.4149e-06
#------------------------------------------------------
# 5. Resumen de todos los modelos
#------------------------------------------------------
summary(modelo_pool)## Pooling Model
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, model = "pooling")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -17.0937 -9.0507 -4.0508 6.8297 24.9298
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 2.4280e+01 1.2636e+00 19.2138 < 2e-16 ***
## GDP -2.5026e-05 1.0307e-05 -2.4281 0.01654 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 21065
## Residual Sum of Squares: 20151
## R-Squared: 0.043385
## Adj. R-Squared: 0.036027
## F-statistic: 5.89587 on 1 and 130 DF, p-value: 0.016545
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, effect = "individual",
## model = "within")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -6.56955 -2.09051 0.14825 2.03845 9.88957
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## GDP -1.3861e-05 3.1294e-06 -4.4295 2.016e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1250.4
## Residual Sum of Squares: 1083
## R-Squared: 0.13382
## Adj. R-Squared: 0.10654
## F-statistic: 19.6204 on 1 and 127 DF, p-value: 2.0161e-05
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, effect = "individual",
## model = "random")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Effects:
## var std.dev share
## idiosyncratic 8.528 2.920 0.029
## individual 284.570 16.869 0.971
## theta: 0.9699
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -6.866142 -2.204334 -0.023309 1.875971 9.658746
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 2.3576e+01 8.4098e+00 2.8035 0.005056 **
## GDP -1.3878e-05 3.1169e-06 -4.4526 8.483e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1268.3
## Residual Sum of Squares: 1100.5
## R-Squared: 0.13233
## Adj. R-Squared: 0.12565
## Chisq: 19.8257 on 1 DF, p-value: 8.4834e-06
## Twoways effects Within Model
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, effect = "twoways",
## model = "within")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -5.9248199 -1.7253916 0.0058238 1.3978062 7.5347360
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## GDP -2.8669e-05 4.0719e-06 -7.0408 2.978e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1157.8
## Residual Sum of Squares: 760.8
## R-Squared: 0.34289
## Adj. R-Squared: 0.093879
## F-statistic: 49.5723 on 1 and 95 DF, p-value: 2.9785e-10
## Twoways effects Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, effect = "twoways",
## model = "random")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Effects:
## var std.dev share
## idiosyncratic 8.008 2.830 0.027
## individual 284.586 16.870 0.973
## time 0.000 0.000 0.000
## theta: 0.9708 (id) 0 (time) 0 (total)
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -6.856964 -2.201694 -0.015112 1.878369 9.665814
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 2.3576e+01 8.6739e+00 2.7181 0.006566 **
## GDP -1.3877e-05 3.1155e-06 -4.4543 8.415e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1267.2
## Residual Sum of Squares: 1099.4
## R-Squared: 0.13241
## Adj. R-Squared: 0.12574
## Chisq: 19.8412 on 1 DF, p-value: 8.4149e-06
#------------------------------------------------------
# 6. Modelo elegido
#------------------------------------------------------
summary(modelo_fet)## Twoways effects Within Model
##
## Call:
## plm(formula = Manufacturing ~ GDP, data = pdata, effect = "twoways",
## model = "within")
##
## Balanced Panel: n = 4, T = 33, N = 132
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -5.9248199 -1.7253916 0.0058238 1.3978062 7.5347360
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## GDP -2.8669e-05 4.0719e-06 -7.0408 2.978e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1157.8
## Residual Sum of Squares: 760.8
## R-Squared: 0.34289
## Adj. R-Squared: 0.093879
## F-statistic: 49.5723 on 1 and 95 DF, p-value: 2.9785e-10