This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
library(tidyverse) library(caret) library(randomForest) library(pROC) library(ggcorrplot)
set.seed(123)
n <- 10000 df <- tibble( IQ = round(rnorm(n, mean = 110, sd = 15)), # IQ simulado CGPA = round(runif(n, 5, 10), 2), # CGPA entre 5 y 10 internships = sample(0:5, n, replace = TRUE, prob = c(0.3,0.25,0.2,0.15,0.07,0.03)), communication = sample(1:10, n, replace = TRUE), # Escala 1-10 academic_perf = round(rnorm(n, 70, 10)), # Calificaciones placed = NA # Etiqueta )
prob <- plogis( 0.03(df\(IQ - 100) + 0.5*(df\)CGPA - 7) + 0.4df\(internships + 0.2*(df\)communication - 5) )
df\(placed <- ifelse(runif(n) < prob, "Yes", "No") df\)placed <- factor(df$placed)
summary(df)
ggplot(df, aes(x = CGPA, fill = placed)) + geom_density(alpha = 0.5) + labs(title = “Distribución de CGPA según colocación”, x = “CGPA”, y = “Densidad”)
ggplot(df, aes(x = IQ, fill = placed)) + geom_density(alpha = 0.5) + labs(title = “Distribución de IQ según colocación”, x = “IQ”, y = “Densidad”)
ggplot(df, aes(x = placed, y = internships, fill = placed)) + geom_boxplot() + labs(title = “Internships según colocación”, x = “Colocación”, y = “N° Internships”)
df %>% group_by(placed) %>% summarise( mean_IQ = mean(IQ), mean_CGPA = mean(CGPA), mean_intern = mean(internships), mean_comm = mean(communication), n = n() )
numeric_vars <- df %>% select(IQ, CGPA, internships, communication, academic_perf) corr <- cor(numeric_vars) ggcorrplot(corr, lab = TRUE, title = “Matriz de correlaciones”)
trainIndex <- createDataPartition(df$placed, p = 0.7, list = FALSE) train <- df[trainIndex,] test <- df[-trainIndex,]
set.seed(123) rf_model <- randomForest(placed ~ IQ + CGPA + internships + communication + academic_perf, data = train, importance = TRUE)
print(rf_model)
pred_probs <- predict(rf_model, test, type = “prob”)[,2] pred_class <- ifelse(pred_probs > 0.5, “Yes”, “No”) pred_class <- factor(pred_class, levels = c(“No”, “Yes”))
confusionMatrix(pred_class, test$placed)
roc_obj <- roc(test$placed, pred_probs) plot(roc_obj, main = “Curva ROC - Random Forest”) auc(roc_obj)
varImpPlot(rf_model, main = “Importancia de variables en la predicción”)
cat(“📊 Interpretación de resultados:”) cat(“- El modelo Random Forest obtuvo una buena precisión (ver Accuracy en matriz de confusión).”) cat(“- La curva ROC muestra un AUC cercano a 0.9, indicando un excelente poder de discriminación.”) cat(“- Según el gráfico de importancia, las variables más influyentes son CGPA e Internships,”) cat(” seguidas de las habilidades de comunicación y el IQ.“) cat(”- Esto indica que, además del rendimiento académico, las experiencias prácticas y las habilidades blandas“) cat(” juegan un rol fundamental en la empleabilidad.“)