EJERCICIO
El departamento de calidad de una empresa de alimentación se encarga de medir el contenido en grasa de la carne que comercializa. Este estudio se realiza mediante técnicas de analítica química, un proceso relativamente costoso en tiempo y recursos. Una alternativa que permitiría reducir costes y optimizar tiempo es emplear un espectrofotómetro (instrumento capaz de detectar la absorbancia que tiene un material a diferentes tipos de luz en función de sus características) e inferir el contenido en grasa a partir de sus medidas.
Antes de dar por válida esta nueva técnica, la empresa necesita comprobar qué margen de error tiene respecto al análisis químico. Para ello, se mide el espectro de absorbancia a 100 longitudes de onda en 215 muestras de carne, cuyo contenido en grasa se obtiene también por análisis químico, y se entrena un modelo con el objetivo de predecir el contenido en grasa a partir de los valores dados por el espectrofotómetro. Los datos meatspec empleados en este ejemplo se han obtenido del paquete faraway.
# Bibliotecas
#Datos
library(faraway)
#Gráficos y tratamiento de datos
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr)
library(DataExplorer)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(corrr)
##
## Attaching package: 'corrr'
##
## The following object is masked from 'package:skimr':
##
## focus
#Modelado
library(glmnet)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Loaded glmnet 4.1-8
library(pls)
##
## Attaching package: 'pls'
##
## The following object is masked from 'package:stats':
##
## loadings
DATOS
El set de datos contiene 101 columnas. Las 100 primeras, nombradas como V1 , …, V100 recogen el valor de absorbancia para cada una de las 100 longitudes de onda analizadas (predictores), y la columna fat el contenido en grasa medido por técnicas químicas (variable respuesta).
# Datos
#----------------------
data("meatspec")
datos <- meatspec
head(datos, 5)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 2.61776 2.61814 2.61859 2.61912 2.61981 2.62071 2.62186 2.62334 2.62511
## 2 2.83454 2.83871 2.84283 2.84705 2.85138 2.85587 2.86060 2.86566 2.87093
## 3 2.58284 2.58458 2.58629 2.58808 2.58996 2.59192 2.59401 2.59627 2.59873
## 4 2.82286 2.82460 2.82630 2.82814 2.83001 2.83192 2.83392 2.83606 2.83842
## 5 2.78813 2.78989 2.79167 2.79350 2.79538 2.79746 2.79984 2.80254 2.80553
## V10 V11 V12 V13 V14 V15 V16 V17 V18
## 1 2.62722 2.62964 2.63245 2.63565 2.63933 2.64353 2.64825 2.65350 2.65937
## 2 2.87661 2.88264 2.88898 2.89577 2.90308 2.91097 2.91953 2.92873 2.93863
## 3 2.60131 2.60414 2.60714 2.61029 2.61361 2.61714 2.62089 2.62486 2.62909
## 4 2.84097 2.84374 2.84664 2.84975 2.85307 2.85661 2.86038 2.86437 2.86860
## 5 2.80890 2.81272 2.81704 2.82184 2.82710 2.83294 2.83945 2.84664 2.85458
## V19 V20 V21 V22 V23 V24 V25 V26 V27
## 1 2.66585 2.67281 2.68008 2.68733 2.69427 2.70073 2.70684 2.71281 2.71914
## 2 2.94929 2.96072 2.97272 2.98493 2.99690 3.00833 3.01920 3.02990 3.04101
## 3 2.63361 2.63835 2.64330 2.64838 2.65354 2.65870 2.66375 2.66880 2.67383
## 4 2.87308 2.87789 2.88301 2.88832 2.89374 2.89917 2.90457 2.90991 2.91521
## 5 2.86331 2.87280 2.88291 2.89335 2.90374 2.91371 2.92305 2.93187 2.94060
## V28 V29 V30 V31 V32 V33 V34 V35 V36
## 1 2.72628 2.73462 2.74416 2.75466 2.76568 2.77679 2.78790 2.79949 2.81225
## 2 3.05345 3.06777 3.08416 3.10221 3.12106 3.13983 3.15810 3.17623 3.19519
## 3 2.67892 2.68411 2.68937 2.69470 2.70012 2.70563 2.71141 2.71775 2.72490
## 4 2.92043 2.92565 2.93082 2.93604 2.94128 2.94658 2.95202 2.95777 2.96419
## 5 2.94986 2.96035 2.97241 2.98606 3.00097 3.01652 3.03220 3.04793 3.06413
## V37 V38 V39 V40 V41 V42 V43 V44 V45
## 1 2.82706 2.84356 2.86106 2.87857 2.89497 2.90924 2.92085 2.93015 2.93846
## 2 3.21584 3.23747 3.25889 3.27835 3.29384 3.30362 3.30681 3.30393 3.29700
## 3 2.73344 2.74327 2.75433 2.76642 2.77931 2.79272 2.80649 2.82064 2.83541
## 4 2.97159 2.98045 2.99090 3.00284 3.01611 3.03048 3.04579 3.06194 3.07889
## 5 3.08153 3.10078 3.12185 3.14371 3.16510 3.18470 3.20140 3.21477 3.22544
## V46 V47 V48 V49 V50 V51 V52 V53 V54
## 1 2.94771 2.96019 2.97831 3.00306 3.03506 3.07428 3.11963 3.16868 3.21771
## 2 3.28925 3.28409 3.28505 3.29326 3.30923 3.33267 3.36251 3.39661 3.43188
## 3 2.85121 2.86872 2.88905 2.91289 2.94088 2.97325 3.00946 3.04780 3.08554
## 4 3.09686 3.11629 3.13775 3.16217 3.19068 3.22376 3.26172 3.30379 3.34793
## 5 3.23505 3.24586 3.26027 3.28063 3.30889 3.34543 3.39019 3.44198 3.49800
## V55 V56 V57 V58 V59 V60 V61 V62 V63
## 1 3.26254 3.29988 3.32847 3.34899 3.36342 3.37379 3.38152 3.38741 3.39164
## 2 3.46492 3.49295 3.51458 3.53004 3.54067 3.54797 3.55306 3.55675 3.55921
## 3 3.11947 3.14696 3.16677 3.17938 3.18631 3.18924 3.18950 3.18801 3.18498
## 4 3.39093 3.42920 3.45998 3.48227 3.49687 3.50558 3.51026 3.51221 3.51215
## 5 3.55407 3.60534 3.64789 3.68011 3.70272 3.71815 3.72863 3.73574 3.74059
## V64 V65 V66 V67 V68 V69 V70 V71 V72
## 1 3.39418 3.39490 3.39366 3.39045 3.38541 3.37869 3.37041 3.36073 3.34979
## 2 3.56045 3.56034 3.55876 3.55571 3.55132 3.54585 3.53950 3.53235 3.52442
## 3 3.18039 3.17411 3.16611 3.15641 3.14512 3.13241 3.11843 3.10329 3.08714
## 4 3.51036 3.50682 3.50140 3.49398 3.48457 3.47333 3.46041 3.44595 3.43005
## 5 3.74357 3.74453 3.74336 3.73991 3.73418 3.72638 3.71676 3.70553 3.69289
## V73 V74 V75 V76 V77 V78 V79 V80 V81
## 1 3.33769 3.32443 3.31013 3.29487 3.27891 3.26232 3.24542 3.22828 3.21080
## 2 3.51583 3.50668 3.49700 3.48683 3.47626 3.46552 3.45501 3.44481 3.43477
## 3 3.07014 3.05237 3.03393 3.01504 2.99569 2.97612 2.95642 2.93660 2.91667
## 4 3.41285 3.39450 3.37511 3.35482 3.33376 3.31204 3.28986 3.26730 3.24442
## 5 3.67900 3.66396 3.64785 3.63085 3.61305 3.59463 3.57582 3.55695 3.53796
## V82 V83 V84 V85 V86 V87 V88 V89 V90
## 1 3.19287 3.17433 3.15503 3.13475 3.11339 3.09116 3.06850 3.04596 3.02393
## 2 3.42465 3.41419 3.40303 3.39082 3.37731 3.36265 3.34745 3.33245 3.31818
## 3 2.89655 2.87622 2.85563 2.83474 2.81361 2.79235 2.77113 2.75015 2.72956
## 4 3.22117 3.19757 3.17357 3.14915 3.12429 3.09908 3.07366 3.04825 3.02308
## 5 3.51880 3.49936 3.47938 3.45869 3.43711 3.41458 3.39129 3.36772 3.34450
## V91 V92 V93 V94 V95 V96 V97 V98 V99
## 1 3.00247 2.98145 2.96072 2.94013 2.91978 2.89966 2.87964 2.85960 2.83940
## 2 3.30473 3.29186 3.27921 3.26655 3.25369 3.24045 3.22659 3.21181 3.19600
## 3 2.70934 2.68951 2.67009 2.65112 2.63262 2.61461 2.59718 2.58034 2.56404
## 4 2.99820 2.97367 2.94951 2.92576 2.90251 2.87988 2.85794 2.83672 2.81617
## 5 3.32201 3.30025 3.27907 3.25831 3.23784 3.21765 3.19766 3.17770 3.15770
## V100 fat
## 1 2.81920 22.5
## 2 3.17942 40.1
## 3 2.54816 8.4
## 4 2.79622 5.9
## 5 3.13753 25.5
Correlaciones
df_correlaciones <- datos %>%
correlate(method = "pearson") %>%
stretch(remove.dups = TRUE)
## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'
df_correlaciones %>% mutate(r_abs = abs(r)) %>%
arrange(desc(r_abs)) %>% head(25)
## # A tibble: 25 × 4
## x y r r_abs
## <chr> <chr> <dbl> <dbl>
## 1 V10 V11 1.00 1.00
## 2 V11 V12 1.00 1.00
## 3 V9 V10 1.00 1.00
## 4 V12 V13 1.00 1.00
## 5 V8 V9 1.00 1.00
## 6 V7 V8 1.00 1.00
## 7 V13 V14 1.00 1.00
## 8 V6 V7 1.00 1.00
## 9 V5 V6 1.00 1.00
## 10 V14 V15 1.00 1.00
## # ℹ 15 more rows
Las correlaciones son en valor absoluto superiores al 0,75 en muchos casos. Esto indica un alto grado de multicolinealidad aproximada y arroja complicaciones a los modelos de estimación econométrica habituales.
Modelos
Se ajustan varios modelos lineales con y sin regularización, con el objetivo de identificar cuál de ellos es capaz de predecir mejor el contenido en grasa de la carne en función de las señales registradas por el espectrofotómetro.
Para poder evaluar la capacidad predictiva de cada modelo, se dividen las observaciones disponibles en dos grupos: uno de entrenamiento (70%) y otro de test (30%).
# Split 70/30
#--------------------------
set.seed(12345)
#---------------------
id_train <- sample(1:nrow(datos), size = 0.7*nrow(datos),
replace = FALSE)
#---------------------------
datos_train <- datos[id_train,]
datos_test <- datos[-id_train,]
skim(datos_train)
| Name | datos_train |
| Number of rows | 150 |
| Number of columns | 101 |
| _______________________ | |
| Column type frequency: | |
| numeric | 101 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| V1 | 0 | 1 | 2.79 | 0.40 | 2.07 | 2.51 | 2.74 | 2.99 | 4.24 | ▃▇▃▁▁ |
| V2 | 0 | 1 | 2.79 | 0.40 | 2.07 | 2.51 | 2.74 | 2.99 | 4.25 | ▃▇▃▁▁ |
| V3 | 0 | 1 | 2.79 | 0.41 | 2.07 | 2.51 | 2.74 | 3.00 | 4.26 | ▃▇▃▁▁ |
| V4 | 0 | 1 | 2.79 | 0.41 | 2.06 | 2.51 | 2.74 | 3.01 | 4.27 | ▃▇▃▁▁ |
| V5 | 0 | 1 | 2.80 | 0.41 | 2.06 | 2.51 | 2.74 | 3.01 | 4.28 | ▃▇▃▁▁ |
| V6 | 0 | 1 | 2.80 | 0.41 | 2.06 | 2.51 | 2.75 | 3.02 | 4.29 | ▃▇▃▁▁ |
| V7 | 0 | 1 | 2.80 | 0.42 | 2.06 | 2.51 | 2.75 | 3.02 | 4.30 | ▃▇▃▁▁ |
| V8 | 0 | 1 | 2.81 | 0.42 | 2.06 | 2.52 | 2.75 | 3.03 | 4.31 | ▃▇▃▁▁ |
| V9 | 0 | 1 | 2.81 | 0.42 | 2.06 | 2.52 | 2.75 | 3.03 | 4.33 | ▅▇▃▁▁ |
| V10 | 0 | 1 | 2.81 | 0.42 | 2.06 | 2.52 | 2.76 | 3.04 | 4.34 | ▅▇▃▁▁ |
| V11 | 0 | 1 | 2.82 | 0.43 | 2.06 | 2.52 | 2.76 | 3.04 | 4.35 | ▅▇▃▁▁ |
| V12 | 0 | 1 | 2.82 | 0.43 | 2.06 | 2.52 | 2.76 | 3.05 | 4.37 | ▅▇▃▁▁ |
| V13 | 0 | 1 | 2.83 | 0.43 | 2.06 | 2.53 | 2.77 | 3.06 | 4.38 | ▅▇▃▁▁ |
| V14 | 0 | 1 | 2.83 | 0.44 | 2.06 | 2.53 | 2.77 | 3.06 | 4.40 | ▅▇▃▁▁ |
| V15 | 0 | 1 | 2.84 | 0.44 | 2.07 | 2.53 | 2.78 | 3.07 | 4.42 | ▅▇▃▁▁ |
| V16 | 0 | 1 | 2.84 | 0.44 | 2.07 | 2.54 | 2.79 | 3.08 | 4.44 | ▅▇▃▁▁ |
| V17 | 0 | 1 | 2.85 | 0.44 | 2.07 | 2.54 | 2.79 | 3.09 | 4.46 | ▅▇▃▁▁ |
| V18 | 0 | 1 | 2.86 | 0.45 | 2.07 | 2.54 | 2.80 | 3.10 | 4.48 | ▅▇▃▁▁ |
| V19 | 0 | 1 | 2.86 | 0.45 | 2.07 | 2.55 | 2.81 | 3.11 | 4.50 | ▅▇▃▁▁ |
| V20 | 0 | 1 | 2.87 | 0.45 | 2.07 | 2.55 | 2.82 | 3.12 | 4.53 | ▅▇▃▁▁ |
| V21 | 0 | 1 | 2.88 | 0.46 | 2.07 | 2.56 | 2.83 | 3.13 | 4.55 | ▅▇▃▁▁ |
| V22 | 0 | 1 | 2.89 | 0.46 | 2.08 | 2.57 | 2.83 | 3.14 | 4.58 | ▅▇▃▁▁ |
| V23 | 0 | 1 | 2.90 | 0.46 | 2.08 | 2.58 | 2.84 | 3.15 | 4.61 | ▅▇▃▁▁ |
| V24 | 0 | 1 | 2.90 | 0.47 | 2.08 | 2.58 | 2.84 | 3.16 | 4.63 | ▅▇▃▁▁ |
| V25 | 0 | 1 | 2.91 | 0.47 | 2.08 | 2.58 | 2.85 | 3.17 | 4.65 | ▅▇▃▁▁ |
| V26 | 0 | 1 | 2.92 | 0.47 | 2.09 | 2.59 | 2.86 | 3.18 | 4.68 | ▅▇▅▁▁ |
| V27 | 0 | 1 | 2.93 | 0.48 | 2.09 | 2.60 | 2.86 | 3.19 | 4.70 | ▅▇▅▁▁ |
| V28 | 0 | 1 | 2.94 | 0.48 | 2.09 | 2.60 | 2.87 | 3.20 | 4.73 | ▆▇▅▁▁ |
| V29 | 0 | 1 | 2.94 | 0.48 | 2.09 | 2.61 | 2.88 | 3.21 | 4.75 | ▅▇▅▁▁ |
| V30 | 0 | 1 | 2.95 | 0.49 | 2.09 | 2.61 | 2.88 | 3.24 | 4.79 | ▅▇▃▁▁ |
| V31 | 0 | 1 | 2.96 | 0.49 | 2.10 | 2.62 | 2.90 | 3.26 | 4.83 | ▅▇▅▁▁ |
| V32 | 0 | 1 | 2.98 | 0.50 | 2.10 | 2.63 | 2.91 | 3.28 | 4.86 | ▅▇▅▁▁ |
| V33 | 0 | 1 | 2.99 | 0.50 | 2.10 | 2.63 | 2.91 | 3.30 | 4.90 | ▅▇▅▁▁ |
| V34 | 0 | 1 | 3.00 | 0.51 | 2.10 | 2.64 | 2.92 | 3.33 | 4.94 | ▅▇▅▁▁ |
| V35 | 0 | 1 | 3.01 | 0.51 | 2.11 | 2.65 | 2.94 | 3.35 | 4.98 | ▅▇▅▁▁ |
| V36 | 0 | 1 | 3.02 | 0.51 | 2.11 | 2.66 | 2.95 | 3.38 | 5.01 | ▅▇▃▁▁ |
| V37 | 0 | 1 | 3.04 | 0.52 | 2.12 | 2.68 | 2.96 | 3.41 | 5.05 | ▅▇▃▁▁ |
| V38 | 0 | 1 | 3.05 | 0.52 | 2.13 | 2.69 | 2.98 | 3.43 | 5.09 | ▅▇▃▁▁ |
| V39 | 0 | 1 | 3.07 | 0.53 | 2.14 | 2.71 | 3.00 | 3.46 | 5.12 | ▅▇▅▁▁ |
| V40 | 0 | 1 | 3.09 | 0.53 | 2.15 | 2.73 | 3.02 | 3.48 | 5.15 | ▅▇▅▁▁ |
| V41 | 0 | 1 | 3.10 | 0.53 | 2.17 | 2.74 | 3.03 | 3.50 | 5.18 | ▅▇▅▁▁ |
| V42 | 0 | 1 | 3.12 | 0.53 | 2.18 | 2.76 | 3.05 | 3.51 | 5.19 | ▆▇▅▁▁ |
| V43 | 0 | 1 | 3.13 | 0.53 | 2.20 | 2.77 | 3.06 | 3.52 | 5.19 | ▆▇▅▁▁ |
| V44 | 0 | 1 | 3.15 | 0.53 | 2.22 | 2.79 | 3.07 | 3.52 | 5.17 | ▆▇▅▁▁ |
| V45 | 0 | 1 | 3.16 | 0.53 | 2.24 | 2.80 | 3.08 | 3.50 | 5.15 | ▅▇▃▁▁ |
| V46 | 0 | 1 | 3.17 | 0.52 | 2.27 | 2.81 | 3.09 | 3.49 | 5.13 | ▅▇▃▁▁ |
| V47 | 0 | 1 | 3.19 | 0.52 | 2.29 | 2.82 | 3.10 | 3.49 | 5.12 | ▆▇▅▁▁ |
| V48 | 0 | 1 | 3.21 | 0.52 | 2.32 | 2.85 | 3.12 | 3.50 | 5.11 | ▆▇▅▁▁ |
| V49 | 0 | 1 | 3.23 | 0.52 | 2.35 | 2.88 | 3.15 | 3.51 | 5.12 | ▆▇▅▁▁ |
| V50 | 0 | 1 | 3.26 | 0.52 | 2.39 | 2.91 | 3.18 | 3.54 | 5.14 | ▅▇▃▁▁ |
| V51 | 0 | 1 | 3.30 | 0.52 | 2.43 | 2.94 | 3.22 | 3.56 | 5.17 | ▆▇▃▁▁ |
| V52 | 0 | 1 | 3.34 | 0.52 | 2.48 | 2.98 | 3.27 | 3.61 | 5.21 | ▆▇▃▁▁ |
| V53 | 0 | 1 | 3.39 | 0.52 | 2.53 | 3.03 | 3.32 | 3.65 | 5.25 | ▆▇▃▁▁ |
| V54 | 0 | 1 | 3.44 | 0.52 | 2.57 | 3.08 | 3.36 | 3.69 | 5.30 | ▆▇▃▁▁ |
| V55 | 0 | 1 | 3.48 | 0.53 | 2.61 | 3.12 | 3.42 | 3.73 | 5.34 | ▆▇▃▁▁ |
| V56 | 0 | 1 | 3.52 | 0.53 | 2.65 | 3.15 | 3.46 | 3.76 | 5.38 | ▅▇▃▁▁ |
| V57 | 0 | 1 | 3.55 | 0.53 | 2.67 | 3.17 | 3.50 | 3.79 | 5.41 | ▆▇▃▁▁ |
| V58 | 0 | 1 | 3.57 | 0.53 | 2.69 | 3.19 | 3.52 | 3.81 | 5.43 | ▆▇▃▁▁ |
| V59 | 0 | 1 | 3.58 | 0.54 | 2.71 | 3.20 | 3.53 | 3.82 | 5.45 | ▆▇▃▁▁ |
| V60 | 0 | 1 | 3.59 | 0.54 | 2.72 | 3.20 | 3.54 | 3.82 | 5.45 | ▆▇▃▁▁ |
| V61 | 0 | 1 | 3.59 | 0.54 | 2.72 | 3.21 | 3.55 | 3.83 | 5.46 | ▆▇▃▁▁ |
| V62 | 0 | 1 | 3.60 | 0.54 | 2.73 | 3.21 | 3.55 | 3.83 | 5.46 | ▆▇▃▁▁ |
| V63 | 0 | 1 | 3.60 | 0.54 | 2.73 | 3.21 | 3.56 | 3.84 | 5.46 | ▆▇▃▁▁ |
| V64 | 0 | 1 | 3.60 | 0.54 | 2.73 | 3.21 | 3.55 | 3.84 | 5.47 | ▆▇▃▁▁ |
| V65 | 0 | 1 | 3.60 | 0.54 | 2.73 | 3.20 | 3.55 | 3.83 | 5.47 | ▆▇▃▁▁ |
| V66 | 0 | 1 | 3.59 | 0.54 | 2.73 | 3.20 | 3.54 | 3.83 | 5.47 | ▆▇▃▁▁ |
| V67 | 0 | 1 | 3.59 | 0.54 | 2.73 | 3.19 | 3.53 | 3.83 | 5.46 | ▆▇▃▁▁ |
| V68 | 0 | 1 | 3.58 | 0.54 | 2.72 | 3.18 | 3.52 | 3.83 | 5.45 | ▆▇▃▁▁ |
| V69 | 0 | 1 | 3.57 | 0.54 | 2.72 | 3.17 | 3.51 | 3.82 | 5.44 | ▆▇▃▁▁ |
| V70 | 0 | 1 | 3.56 | 0.54 | 2.71 | 3.16 | 3.50 | 3.81 | 5.44 | ▆▇▃▁▁ |
| V71 | 0 | 1 | 3.55 | 0.54 | 2.70 | 3.15 | 3.48 | 3.81 | 5.43 | ▆▇▃▁▁ |
| V72 | 0 | 1 | 3.53 | 0.54 | 2.69 | 3.13 | 3.46 | 3.80 | 5.42 | ▆▇▃▁▁ |
| V73 | 0 | 1 | 3.52 | 0.54 | 2.68 | 3.11 | 3.45 | 3.79 | 5.41 | ▆▇▃▁▁ |
| V74 | 0 | 1 | 3.50 | 0.54 | 2.66 | 3.09 | 3.43 | 3.78 | 5.40 | ▇▇▃▁▁ |
| V75 | 0 | 1 | 3.49 | 0.54 | 2.65 | 3.08 | 3.41 | 3.76 | 5.38 | ▇▇▃▁▁ |
| V76 | 0 | 1 | 3.47 | 0.54 | 2.63 | 3.06 | 3.39 | 3.75 | 5.37 | ▇▇▃▁▁ |
| V77 | 0 | 1 | 3.45 | 0.54 | 2.62 | 3.04 | 3.36 | 3.74 | 5.36 | ▇▇▃▁▁ |
| V78 | 0 | 1 | 3.44 | 0.54 | 2.60 | 3.02 | 3.35 | 3.73 | 5.34 | ▇▇▃▁▁ |
| V79 | 0 | 1 | 3.42 | 0.54 | 2.59 | 3.01 | 3.33 | 3.71 | 5.33 | ▇▇▅▁▁ |
| V80 | 0 | 1 | 3.40 | 0.54 | 2.57 | 2.99 | 3.31 | 3.70 | 5.32 | ▇▇▅▁▁ |
| V81 | 0 | 1 | 3.38 | 0.54 | 2.55 | 2.97 | 3.29 | 3.69 | 5.31 | ▇▇▅▁▁ |
| V82 | 0 | 1 | 3.36 | 0.54 | 2.53 | 2.96 | 3.27 | 3.68 | 5.30 | ▇▇▅▁▁ |
| V83 | 0 | 1 | 3.34 | 0.54 | 2.51 | 2.94 | 3.25 | 3.66 | 5.28 | ▇▇▅▁▁ |
| V84 | 0 | 1 | 3.32 | 0.54 | 2.50 | 2.92 | 3.23 | 3.64 | 5.27 | ▇▇▅▁▁ |
| V85 | 0 | 1 | 3.30 | 0.54 | 2.48 | 2.91 | 3.21 | 3.62 | 5.26 | ▇▇▅▁▁ |
| V86 | 0 | 1 | 3.28 | 0.54 | 2.46 | 2.89 | 3.18 | 3.60 | 5.24 | ▇▇▅▁▁ |
| V87 | 0 | 1 | 3.26 | 0.53 | 2.44 | 2.87 | 3.16 | 3.58 | 5.22 | ▇▇▅▁▁ |
| V88 | 0 | 1 | 3.24 | 0.53 | 2.42 | 2.85 | 3.14 | 3.56 | 5.20 | ▇▇▅▁▁ |
| V89 | 0 | 1 | 3.22 | 0.53 | 2.40 | 2.83 | 3.12 | 3.53 | 5.18 | ▇▇▅▁▁ |
| V90 | 0 | 1 | 3.19 | 0.53 | 2.38 | 2.81 | 3.10 | 3.52 | 5.16 | ▇▇▅▁▁ |
| V91 | 0 | 1 | 3.17 | 0.53 | 2.36 | 2.79 | 3.07 | 3.50 | 5.14 | ▇▇▅▁▁ |
| V92 | 0 | 1 | 3.15 | 0.53 | 2.34 | 2.77 | 3.05 | 3.49 | 5.13 | ▇▇▅▁▁ |
| V93 | 0 | 1 | 3.13 | 0.53 | 2.32 | 2.75 | 3.03 | 3.47 | 5.11 | ▇▇▅▁▁ |
| V94 | 0 | 1 | 3.11 | 0.53 | 2.30 | 2.73 | 3.01 | 3.46 | 5.10 | ▇▇▅▁▁ |
| V95 | 0 | 1 | 3.09 | 0.53 | 2.28 | 2.71 | 2.99 | 3.44 | 5.08 | ▇▇▅▁▁ |
| V96 | 0 | 1 | 3.07 | 0.53 | 2.26 | 2.70 | 2.97 | 3.43 | 5.06 | ▇▇▅▁▁ |
| V97 | 0 | 1 | 3.06 | 0.53 | 2.24 | 2.68 | 2.95 | 3.41 | 5.04 | ▇▇▅▁▁ |
| V98 | 0 | 1 | 3.04 | 0.53 | 2.22 | 2.66 | 2.93 | 3.39 | 5.02 | ▇▇▅▁▁ |
| V99 | 0 | 1 | 3.02 | 0.53 | 2.21 | 2.64 | 2.91 | 3.38 | 4.99 | ▇▇▅▁▁ |
| V100 | 0 | 1 | 3.00 | 0.52 | 2.19 | 2.63 | 2.89 | 3.36 | 4.97 | ▇▇▅▁▁ |
| fat | 0 | 1 | 17.54 | 12.52 | 0.90 | 7.20 | 14.30 | 26.65 | 48.50 | ▇▅▃▂▂ |
skim(datos_test)
| Name | datos_test |
| Number of rows | 65 |
| Number of columns | 101 |
| _______________________ | |
| Column type frequency: | |
| numeric | 101 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| V1 | 0 | 1 | 2.86 | 0.43 | 2.24 | 2.57 | 2.79 | 3.10 | 4.22 | ▇▇▃▂▁ |
| V2 | 0 | 1 | 2.86 | 0.43 | 2.24 | 2.57 | 2.79 | 3.11 | 4.23 | ▇▇▃▂▁ |
| V3 | 0 | 1 | 2.86 | 0.43 | 2.23 | 2.57 | 2.79 | 3.11 | 4.24 | ▇▇▃▂▁ |
| V4 | 0 | 1 | 2.87 | 0.44 | 2.23 | 2.57 | 2.79 | 3.11 | 4.25 | ▇▇▃▂▁ |
| V5 | 0 | 1 | 2.87 | 0.44 | 2.23 | 2.57 | 2.79 | 3.12 | 4.26 | ▇▇▃▂▁ |
| V6 | 0 | 1 | 2.87 | 0.44 | 2.23 | 2.57 | 2.79 | 3.12 | 4.27 | ▇▇▃▂▁ |
| V7 | 0 | 1 | 2.87 | 0.45 | 2.23 | 2.57 | 2.79 | 3.13 | 4.28 | ▇▇▃▂▁ |
| V8 | 0 | 1 | 2.88 | 0.45 | 2.23 | 2.58 | 2.79 | 3.13 | 4.29 | ▇▇▃▂▁ |
| V9 | 0 | 1 | 2.88 | 0.45 | 2.23 | 2.58 | 2.79 | 3.14 | 4.30 | ▇▇▃▂▁ |
| V10 | 0 | 1 | 2.88 | 0.45 | 2.23 | 2.58 | 2.79 | 3.14 | 4.31 | ▇▇▃▂▁ |
| V11 | 0 | 1 | 2.89 | 0.46 | 2.23 | 2.58 | 2.79 | 3.15 | 4.33 | ▇▇▃▂▁ |
| V12 | 0 | 1 | 2.89 | 0.46 | 2.23 | 2.59 | 2.79 | 3.15 | 4.34 | ▇▇▃▂▁ |
| V13 | 0 | 1 | 2.90 | 0.46 | 2.24 | 2.59 | 2.79 | 3.16 | 4.35 | ▇▇▃▂▁ |
| V14 | 0 | 1 | 2.90 | 0.47 | 2.24 | 2.59 | 2.80 | 3.17 | 4.37 | ▇▇▃▂▁ |
| V15 | 0 | 1 | 2.91 | 0.47 | 2.24 | 2.60 | 2.80 | 3.18 | 4.38 | ▇▇▃▂▁ |
| V16 | 0 | 1 | 2.92 | 0.47 | 2.24 | 2.60 | 2.81 | 3.18 | 4.40 | ▇▇▃▂▁ |
| V17 | 0 | 1 | 2.92 | 0.48 | 2.24 | 2.60 | 2.82 | 3.19 | 4.42 | ▇▇▃▂▁ |
| V18 | 0 | 1 | 2.93 | 0.48 | 2.25 | 2.61 | 2.83 | 3.20 | 4.43 | ▇▇▃▂▁ |
| V19 | 0 | 1 | 2.94 | 0.48 | 2.25 | 2.61 | 2.84 | 3.21 | 4.45 | ▇▇▃▂▁ |
| V20 | 0 | 1 | 2.95 | 0.49 | 2.25 | 2.62 | 2.85 | 3.23 | 4.47 | ▇▇▃▂▁ |
| V21 | 0 | 1 | 2.95 | 0.49 | 2.26 | 2.62 | 2.86 | 3.24 | 4.49 | ▇▇▃▂▁ |
| V22 | 0 | 1 | 2.96 | 0.50 | 2.26 | 2.63 | 2.87 | 3.25 | 4.51 | ▇▇▃▂▁ |
| V23 | 0 | 1 | 2.97 | 0.50 | 2.26 | 2.63 | 2.88 | 3.26 | 4.53 | ▇▇▃▂▁ |
| V24 | 0 | 1 | 2.98 | 0.50 | 2.27 | 2.64 | 2.89 | 3.27 | 4.55 | ▇▇▃▂▁ |
| V25 | 0 | 1 | 2.99 | 0.51 | 2.27 | 2.64 | 2.90 | 3.29 | 4.57 | ▇▇▃▂▁ |
| V26 | 0 | 1 | 3.00 | 0.51 | 2.27 | 2.65 | 2.90 | 3.30 | 4.59 | ▇▇▃▂▁ |
| V27 | 0 | 1 | 3.00 | 0.51 | 2.28 | 2.65 | 2.91 | 3.31 | 4.61 | ▇▇▃▂▁ |
| V28 | 0 | 1 | 3.01 | 0.52 | 2.28 | 2.66 | 2.91 | 3.32 | 4.63 | ▇▇▃▂▁ |
| V29 | 0 | 1 | 3.02 | 0.52 | 2.28 | 2.66 | 2.92 | 3.33 | 4.65 | ▇▇▃▂▁ |
| V30 | 0 | 1 | 3.03 | 0.53 | 2.29 | 2.67 | 2.92 | 3.35 | 4.68 | ▇▇▃▂▁ |
| V31 | 0 | 1 | 3.04 | 0.53 | 2.30 | 2.67 | 2.93 | 3.36 | 4.70 | ▇▇▃▂▁ |
| V32 | 0 | 1 | 3.06 | 0.54 | 2.30 | 2.68 | 2.94 | 3.38 | 4.73 | ▇▇▃▂▁ |
| V33 | 0 | 1 | 3.07 | 0.54 | 2.31 | 2.68 | 2.95 | 3.40 | 4.76 | ▇▇▃▂▁ |
| V34 | 0 | 1 | 3.08 | 0.55 | 2.31 | 2.69 | 2.95 | 3.41 | 4.78 | ▇▇▃▂▁ |
| V35 | 0 | 1 | 3.09 | 0.55 | 2.32 | 2.69 | 2.96 | 3.42 | 4.81 | ▇▇▃▂▁ |
| V36 | 0 | 1 | 3.11 | 0.56 | 2.33 | 2.70 | 2.96 | 3.43 | 4.84 | ▇▇▃▂▁ |
| V37 | 0 | 1 | 3.12 | 0.56 | 2.34 | 2.71 | 2.98 | 3.44 | 4.87 | ▇▇▃▂▁ |
| V38 | 0 | 1 | 3.14 | 0.57 | 2.35 | 2.72 | 2.99 | 3.46 | 4.90 | ▇▇▃▂▁ |
| V39 | 0 | 1 | 3.16 | 0.57 | 2.36 | 2.73 | 3.00 | 3.49 | 4.93 | ▇▇▅▂▁ |
| V40 | 0 | 1 | 3.18 | 0.58 | 2.37 | 2.74 | 3.02 | 3.52 | 4.96 | ▇▇▅▂▁ |
| V41 | 0 | 1 | 3.19 | 0.58 | 2.39 | 2.76 | 3.04 | 3.54 | 4.98 | ▇▇▅▂▁ |
| V42 | 0 | 1 | 3.21 | 0.58 | 2.40 | 2.77 | 3.06 | 3.55 | 5.00 | ▇▇▅▂▁ |
| V43 | 0 | 1 | 3.22 | 0.58 | 2.41 | 2.79 | 3.08 | 3.54 | 5.01 | ▇▇▃▂▁ |
| V44 | 0 | 1 | 3.23 | 0.57 | 2.42 | 2.81 | 3.10 | 3.55 | 5.02 | ▇▇▃▂▁ |
| V45 | 0 | 1 | 3.24 | 0.57 | 2.44 | 2.83 | 3.11 | 3.57 | 5.02 | ▇▇▃▂▁ |
| V46 | 0 | 1 | 3.25 | 0.56 | 2.45 | 2.85 | 3.13 | 3.58 | 5.02 | ▇▇▃▂▁ |
| V47 | 0 | 1 | 3.27 | 0.56 | 2.47 | 2.87 | 3.15 | 3.60 | 5.03 | ▇▇▃▂▁ |
| V48 | 0 | 1 | 3.28 | 0.55 | 2.49 | 2.90 | 3.17 | 3.63 | 5.04 | ▇▇▃▂▁ |
| V49 | 0 | 1 | 3.31 | 0.55 | 2.51 | 2.93 | 3.19 | 3.65 | 5.06 | ▇▇▃▂▁ |
| V50 | 0 | 1 | 3.34 | 0.55 | 2.54 | 2.96 | 3.22 | 3.68 | 5.09 | ▇▇▃▂▁ |
| V51 | 0 | 1 | 3.38 | 0.55 | 2.57 | 3.00 | 3.26 | 3.71 | 5.13 | ▇▇▃▂▁ |
| V52 | 0 | 1 | 3.42 | 0.55 | 2.61 | 3.05 | 3.30 | 3.76 | 5.17 | ▇▇▃▂▁ |
| V53 | 0 | 1 | 3.47 | 0.55 | 2.66 | 3.10 | 3.35 | 3.81 | 5.23 | ▇▇▃▂▁ |
| V54 | 0 | 1 | 3.52 | 0.55 | 2.70 | 3.15 | 3.38 | 3.86 | 5.28 | ▇▇▃▂▁ |
| V55 | 0 | 1 | 3.56 | 0.55 | 2.74 | 3.19 | 3.43 | 3.91 | 5.33 | ▇▇▃▂▁ |
| V56 | 0 | 1 | 3.60 | 0.55 | 2.77 | 3.23 | 3.47 | 3.94 | 5.37 | ▇▇▃▂▁ |
| V57 | 0 | 1 | 3.63 | 0.56 | 2.79 | 3.25 | 3.50 | 3.97 | 5.41 | ▇▇▃▂▁ |
| V58 | 0 | 1 | 3.65 | 0.56 | 2.81 | 3.27 | 3.53 | 3.98 | 5.43 | ▇▇▃▂▁ |
| V59 | 0 | 1 | 3.66 | 0.56 | 2.82 | 3.28 | 3.54 | 3.99 | 5.44 | ▇▇▃▂▁ |
| V60 | 0 | 1 | 3.67 | 0.56 | 2.83 | 3.28 | 3.55 | 4.00 | 5.45 | ▇▇▃▂▁ |
| V61 | 0 | 1 | 3.68 | 0.56 | 2.84 | 3.28 | 3.56 | 4.00 | 5.46 | ▇▇▃▂▁ |
| V62 | 0 | 1 | 3.68 | 0.56 | 2.84 | 3.28 | 3.56 | 4.00 | 5.47 | ▇▇▃▂▁ |
| V63 | 0 | 1 | 3.68 | 0.56 | 2.85 | 3.27 | 3.56 | 4.00 | 5.47 | ▇▇▃▂▁ |
| V64 | 0 | 1 | 3.68 | 0.56 | 2.85 | 3.26 | 3.56 | 3.99 | 5.47 | ▇▇▃▂▁ |
| V65 | 0 | 1 | 3.68 | 0.56 | 2.85 | 3.26 | 3.56 | 3.99 | 5.47 | ▇▇▃▂▁ |
| V66 | 0 | 1 | 3.68 | 0.56 | 2.85 | 3.26 | 3.56 | 3.98 | 5.47 | ▇▇▃▂▁ |
| V67 | 0 | 1 | 3.67 | 0.56 | 2.84 | 3.26 | 3.55 | 3.97 | 5.47 | ▇▇▃▂▁ |
| V68 | 0 | 1 | 3.66 | 0.56 | 2.84 | 3.25 | 3.55 | 3.95 | 5.46 | ▇▇▃▂▁ |
| V69 | 0 | 1 | 3.65 | 0.56 | 2.83 | 3.24 | 3.54 | 3.94 | 5.45 | ▇▇▃▂▁ |
| V70 | 0 | 1 | 3.64 | 0.56 | 2.82 | 3.24 | 3.53 | 3.92 | 5.44 | ▇▇▃▂▁ |
| V71 | 0 | 1 | 3.63 | 0.56 | 2.81 | 3.22 | 3.51 | 3.91 | 5.43 | ▇▇▃▂▁ |
| V72 | 0 | 1 | 3.62 | 0.56 | 2.80 | 3.21 | 3.50 | 3.89 | 5.42 | ▇▇▃▂▁ |
| V73 | 0 | 1 | 3.60 | 0.56 | 2.79 | 3.20 | 3.49 | 3.87 | 5.40 | ▇▇▃▂▁ |
| V74 | 0 | 1 | 3.59 | 0.56 | 2.77 | 3.19 | 3.48 | 3.85 | 5.39 | ▇▇▃▂▁ |
| V75 | 0 | 1 | 3.57 | 0.56 | 2.76 | 3.17 | 3.47 | 3.83 | 5.37 | ▇▇▃▂▁ |
| V76 | 0 | 1 | 3.56 | 0.56 | 2.74 | 3.16 | 3.45 | 3.80 | 5.36 | ▇▇▂▂▁ |
| V77 | 0 | 1 | 3.54 | 0.56 | 2.73 | 3.14 | 3.43 | 3.78 | 5.34 | ▇▇▂▂▁ |
| V78 | 0 | 1 | 3.52 | 0.56 | 2.71 | 3.12 | 3.42 | 3.76 | 5.32 | ▇▇▂▂▁ |
| V79 | 0 | 1 | 3.50 | 0.56 | 2.70 | 3.10 | 3.40 | 3.73 | 5.30 | ▇▇▂▂▁ |
| V80 | 0 | 1 | 3.48 | 0.56 | 2.68 | 3.08 | 3.38 | 3.71 | 5.29 | ▇▇▂▂▁ |
| V81 | 0 | 1 | 3.47 | 0.56 | 2.67 | 3.06 | 3.36 | 3.69 | 5.27 | ▇▇▂▂▁ |
| V82 | 0 | 1 | 3.45 | 0.56 | 2.65 | 3.04 | 3.34 | 3.67 | 5.25 | ▇▇▂▂▁ |
| V83 | 0 | 1 | 3.43 | 0.56 | 2.63 | 3.02 | 3.32 | 3.65 | 5.23 | ▇▇▂▂▁ |
| V84 | 0 | 1 | 3.41 | 0.56 | 2.62 | 3.00 | 3.30 | 3.63 | 5.21 | ▇▇▂▂▁ |
| V85 | 0 | 1 | 3.39 | 0.56 | 2.60 | 2.97 | 3.28 | 3.62 | 5.19 | ▇▇▂▂▁ |
| V86 | 0 | 1 | 3.37 | 0.56 | 2.58 | 2.95 | 3.26 | 3.60 | 5.17 | ▇▇▂▂▁ |
| V87 | 0 | 1 | 3.34 | 0.56 | 2.56 | 2.93 | 3.24 | 3.58 | 5.15 | ▇▇▂▂▁ |
| V88 | 0 | 1 | 3.32 | 0.56 | 2.54 | 2.91 | 3.21 | 3.56 | 5.13 | ▇▇▂▂▁ |
| V89 | 0 | 1 | 3.30 | 0.56 | 2.52 | 2.88 | 3.19 | 3.54 | 5.10 | ▇▇▂▂▁ |
| V90 | 0 | 1 | 3.28 | 0.56 | 2.50 | 2.86 | 3.17 | 3.53 | 5.08 | ▇▇▂▂▁ |
| V91 | 0 | 1 | 3.26 | 0.56 | 2.48 | 2.84 | 3.15 | 3.51 | 5.05 | ▇▇▃▂▁ |
| V92 | 0 | 1 | 3.24 | 0.56 | 2.45 | 2.82 | 3.13 | 3.49 | 5.03 | ▇▇▃▂▁ |
| V93 | 0 | 1 | 3.22 | 0.56 | 2.43 | 2.79 | 3.11 | 3.48 | 5.01 | ▇▇▃▂▁ |
| V94 | 0 | 1 | 3.20 | 0.57 | 2.41 | 2.78 | 3.09 | 3.46 | 4.99 | ▇▆▂▂▁ |
| V95 | 0 | 1 | 3.18 | 0.56 | 2.39 | 2.76 | 3.07 | 3.45 | 4.97 | ▇▆▂▂▁ |
| V96 | 0 | 1 | 3.16 | 0.56 | 2.37 | 2.74 | 3.05 | 3.43 | 4.95 | ▇▇▃▂▁ |
| V97 | 0 | 1 | 3.14 | 0.56 | 2.35 | 2.72 | 3.03 | 3.41 | 4.92 | ▇▇▂▂▁ |
| V98 | 0 | 1 | 3.12 | 0.56 | 2.33 | 2.71 | 3.01 | 3.39 | 4.90 | ▇▇▂▂▁ |
| V99 | 0 | 1 | 3.10 | 0.56 | 2.31 | 2.69 | 2.99 | 3.37 | 4.88 | ▇▇▂▂▁ |
| V100 | 0 | 1 | 3.08 | 0.56 | 2.30 | 2.67 | 2.98 | 3.35 | 4.85 | ▇▇▂▂▁ |
| fat | 0 | 1 | 19.54 | 13.23 | 4.70 | 8.60 | 13.80 | 29.80 | 49.10 | ▇▂▂▃▁ |
OLS
# Creación y entrenamiento del modelo
# ============================================
modelo <- lm(fat~., data = datos_train)
summary(modelo)
##
## Call:
## lm(formula = fat ~ ., data = datos_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.81096 -0.33544 -0.00993 0.31509 1.30542
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.199 2.327 2.664 0.010423 *
## V1 16776.779 4353.224 3.854 0.000339 ***
## V2 -13326.710 7239.082 -1.841 0.071689 .
## V3 -10526.565 11628.830 -0.905 0.369782
## V4 3785.624 20218.423 0.187 0.852248
## V5 2842.684 25250.315 0.113 0.910823
## V6 -9889.824 21999.445 -0.450 0.655018
## V7 6014.653 13867.225 0.434 0.666387
## V8 -6215.082 8030.689 -0.774 0.442698
## V9 10744.921 6645.318 1.617 0.112317
## V10 -315.483 7934.885 -0.040 0.968447
## V11 -10380.567 12659.142 -0.820 0.416184
## V12 19779.258 22390.683 0.883 0.381350
## V13 -4449.354 29308.849 -0.152 0.879961
## V14 -10638.697 28419.956 -0.374 0.709767
## V15 34255.100 20376.230 1.681 0.099102 .
## V16 -30123.632 10704.020 -2.814 0.007020 **
## V17 -75.368 7743.744 -0.010 0.992274
## V18 99.628 7635.165 0.013 0.989642
## V19 -1384.224 11216.155 -0.123 0.902285
## V20 13785.457 20680.049 0.667 0.508150
## V21 -49353.858 26554.886 -1.859 0.069101 .
## V22 70137.256 28356.540 2.473 0.016898 *
## V23 -37424.808 23323.869 -1.605 0.115015
## V24 5767.856 17500.720 0.330 0.743123
## V25 4936.222 10903.700 0.453 0.652755
## V26 -7103.984 7057.236 -1.007 0.319061
## V27 -4330.953 6876.797 -0.630 0.531756
## V28 22339.430 10401.940 2.148 0.036716 *
## V29 -31637.792 15042.903 -2.103 0.040609 *
## V30 22168.926 20509.902 1.081 0.285038
## V31 -18897.363 21852.706 -0.865 0.391384
## V32 27689.790 19010.258 1.457 0.151613
## V33 -5598.085 14911.598 -0.375 0.708970
## V34 -20683.940 12648.024 -1.635 0.108384
## V35 2743.136 8868.634 0.309 0.758399
## V36 12915.838 7142.232 1.808 0.076687 .
## V37 -6516.866 7819.530 -0.833 0.408658
## V38 -378.097 9784.303 -0.039 0.969332
## V39 14013.549 13955.442 1.004 0.320233
## V40 -32184.905 21462.440 -1.500 0.140136
## V41 49310.281 25469.885 1.936 0.058645 .
## V42 -37458.260 24917.028 -1.503 0.139175
## V43 6127.775 21751.701 0.282 0.779349
## V44 -16729.080 16502.130 -1.014 0.315680
## V45 35314.376 13630.402 2.591 0.012577 *
## V46 -6967.906 8873.826 -0.785 0.436105
## V47 -14172.101 4379.138 -3.236 0.002173 **
## V48 2873.504 6491.564 0.443 0.659965
## V49 10879.491 10090.175 1.078 0.286212
## V50 -23993.762 12587.231 -1.906 0.062499 .
## V51 27653.593 16377.204 1.689 0.097662 .
## V52 -15538.739 19923.577 -0.780 0.439191
## V53 -2736.448 21398.322 -0.128 0.898766
## V54 13796.987 20462.700 0.674 0.503321
## V55 -12763.381 16057.620 -0.795 0.430535
## V56 7358.973 8294.391 0.887 0.379293
## V57 -755.985 5494.619 -0.138 0.891131
## V58 -7868.715 5631.748 -1.397 0.168647
## V59 7856.068 5973.581 1.315 0.194586
## V60 -4710.959 4918.070 -0.958 0.342822
## V61 2352.192 5776.070 0.407 0.685612
## V62 5532.017 6534.671 0.847 0.401354
## V63 7434.491 6686.896 1.112 0.271650
## V64 -17788.055 8784.927 -2.025 0.048351 *
## V65 -2418.594 13504.547 -0.179 0.858602
## V66 18154.802 14513.602 1.251 0.216918
## V67 -16104.110 15174.069 -1.061 0.293761
## V68 6129.342 16085.136 0.381 0.704809
## V69 9051.910 12831.105 0.705 0.483861
## V70 -5345.969 10066.983 -0.531 0.597790
## V71 -13039.080 9220.380 -1.414 0.163638
## V72 3868.698 8343.336 0.464 0.644925
## V73 10511.778 7046.082 1.492 0.142146
## V74 -8254.350 7064.843 -1.168 0.248311
## V75 -5162.865 6359.975 -0.812 0.420847
## V76 8498.050 6550.718 1.297 0.200613
## V77 -588.166 7826.519 -0.075 0.940401
## V78 3605.178 7926.150 0.455 0.651229
## V79 4661.430 8471.082 0.550 0.584630
## V80 -19983.290 9205.431 -2.171 0.034818 *
## V81 17167.721 13003.516 1.320 0.192890
## V82 3919.401 14540.845 0.270 0.788643
## V83 -21457.013 15985.643 -1.342 0.185695
## V84 -4080.855 18074.186 -0.226 0.822309
## V85 39248.222 17910.722 2.191 0.033212 *
## V86 -17134.431 21517.719 -0.796 0.429702
## V87 -13708.692 27608.583 -0.497 0.621736
## V88 16898.807 25885.785 0.653 0.516922
## V89 -18159.340 17714.952 -1.025 0.310357
## V90 15062.858 17358.989 0.868 0.389774
## V91 -11242.618 19692.451 -0.571 0.570670
## V92 27591.350 17074.439 1.616 0.112527
## V93 -43158.595 14154.367 -3.049 0.003695 **
## V94 20219.958 13471.338 1.501 0.139782
## V95 14230.660 11790.418 1.207 0.233239
## V96 -15184.337 10440.314 -1.454 0.152213
## V97 -8479.926 11231.912 -0.755 0.453872
## V98 12283.017 10099.718 1.216 0.229745
## V99 -3296.151 9679.130 -0.341 0.734904
## V100 1331.219 4225.157 0.315 0.754047
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9776 on 49 degrees of freedom
## Multiple R-squared: 0.998, Adjusted R-squared: 0.9939
## F-statistic: 243.9 on 100 and 49 DF, p-value: < 2.2e-16
# Coeficientes del modelo
# ============================
df_coeficientes <- modelo$coefficients %>%
enframe(name="predictor", value = "coeficiente")
df_coeficientes %>%
filter(predictor != "(Intercept)") %>%
ggplot(aes(x = predictor, y = coeficiente)) +
geom_col() +
labs(title = "Coeficientes del modelo OLS") +
theme_bw() +
theme(axis.text.x = element_text(size = 5, angle = 45))
# Predicciones de entrenamiento
# ========================================================
predicciones_train <- predict(modelo, newdata = datos_train)
# MSE de entrenamiento
# ========================================================
training_mse <- mean((predicciones_train - datos_train$fat)^2)
paste("Error (mse) de entrenamiento:", training_mse)
## [1] "Error (mse) de entrenamiento: 0.312182683845749"
# Predicciones de entrenamiento
# ========================================================
predicciones_test <- predict(modelo, newdata = datos_train)
# MSE de entrenamiento
# ========================================================
test_mse <- mean((predicciones_test - datos_test$fat)^2)
## Warning in predicciones_test - datos_test$fat: longitud de objeto mayor no es
## múltiplo de la longitud de uno menor
paste("Error (mse) de test:", test_mse)
## [1] "Error (mse) de test: 351.516221163765"
El modelo falla mucho al predecir la muestra de test, dado que hemos puesto muchas variables irrelevantes (al ser su p-valor superior a 0,05 y 0,10).
Stepwise Selection
# Creación y entrenamiento del modelo
# ========================================
modelo <- step(
object = lm(formula = fat~., data = datos_train),
direction = "backward",
scope = list(upper = ~., lower = ~1),
trace = FALSE
)
summary(modelo)
##
## Call:
## lm(formula = fat ~ V1 + V2 + V3 + V5 + V6 + V7 + V8 + V9 + V11 +
## V12 + V14 + V15 + V16 + V20 + V21 + V22 + V23 + V25 + V26 +
## V27 + V28 + V29 + V30 + V31 + V32 + V34 + V36 + V37 + V39 +
## V40 + V41 + V42 + V44 + V45 + V46 + V47 + V49 + V50 + V51 +
## V52 + V54 + V55 + V56 + V58 + V59 + V60 + V62 + V63 + V64 +
## V66 + V67 + V68 + V71 + V72 + V73 + V75 + V76 + V78 + V80 +
## V81 + V83 + V85 + V86 + V87 + V88 + V89 + V90 + V92 + V93 +
## V94 + V95 + V96 + V98, data = datos_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.73805 -0.31207 -0.02096 0.35422 1.34492
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.744 1.594 3.602 0.000560 ***
## V1 17374.996 2625.452 6.618 4.56e-09 ***
## V2 -13458.894 4315.880 -3.118 0.002568 **
## V3 -10021.321 4483.386 -2.235 0.028340 *
## V5 8873.956 6161.810 1.440 0.153931
## V6 -17707.733 8058.970 -2.197 0.031048 *
## V7 11968.908 6448.652 1.856 0.067325 .
## V8 -9045.644 5238.605 -1.727 0.088279 .
## V9 13218.526 4036.320 3.275 0.001593 **
## V11 -13392.887 5177.135 -2.587 0.011593 *
## V12 21460.063 5982.637 3.587 0.000589 ***
## V14 -16395.542 6436.768 -2.547 0.012881 *
## V15 37710.675 8602.988 4.383 3.70e-05 ***
## V16 -33395.483 5336.808 -6.258 2.12e-08 ***
## V20 14378.846 6111.062 2.353 0.021215 *
## V21 -54030.554 13728.898 -3.936 0.000182 ***
## V22 76362.308 16124.183 4.736 9.92e-06 ***
## V23 -36683.573 9438.069 -3.887 0.000216 ***
## V25 7432.861 3717.562 1.999 0.049141 *
## V26 -6883.179 4612.653 -1.492 0.139776
## V27 -6787.296 5035.998 -1.348 0.181741
## V28 25767.852 6761.136 3.811 0.000279 ***
## V29 -30128.272 9878.792 -3.050 0.003151 **
## V30 18283.043 12318.178 1.484 0.141883
## V31 -14754.572 12528.085 -1.178 0.242583
## V32 21714.999 7880.459 2.756 0.007330 **
## V34 -23365.531 3319.950 -7.038 7.45e-10 ***
## V36 18088.188 3930.022 4.603 1.64e-05 ***
## V37 -9759.285 3964.922 -2.461 0.016107 *
## V39 18511.114 6364.613 2.908 0.004760 **
## V40 -42214.350 11747.923 -3.593 0.000577 ***
## V41 60044.053 12314.618 4.876 5.80e-06 ***
## V42 -39929.205 7954.955 -5.019 3.32e-06 ***
## V44 -13761.794 6392.488 -2.153 0.034507 *
## V45 39383.889 7906.043 4.981 3.85e-06 ***
## V46 -9797.934 5350.685 -1.831 0.070997 .
## V47 -14030.237 2955.085 -4.748 9.48e-06 ***
## V49 14464.098 3666.919 3.944 0.000177 ***
## V50 -23797.812 6782.161 -3.509 0.000760 ***
## V51 27296.874 7250.459 3.765 0.000326 ***
## V52 -18319.420 4433.528 -4.132 9.16e-05 ***
## V54 13504.864 5764.905 2.343 0.021769 *
## V55 -14364.088 7237.832 -1.985 0.050800 .
## V56 8731.679 3366.156 2.594 0.011378 *
## V58 -9473.174 2535.871 -3.736 0.000360 ***
## V59 8603.279 3629.882 2.370 0.020319 *
## V60 -4155.946 2447.494 -1.698 0.093591 .
## V62 7235.310 2740.745 2.640 0.010059 *
## V63 6604.565 3870.619 1.706 0.092030 .
## V64 -16869.742 3522.480 -4.789 8.09e-06 ***
## V66 15430.528 3881.127 3.976 0.000159 ***
## V67 -17037.933 5711.840 -2.983 0.003836 **
## V68 12448.160 3706.595 3.358 0.001228 **
## V71 -18039.513 3206.377 -5.626 2.93e-07 ***
## V72 8476.147 5147.524 1.647 0.103760
## V73 4597.814 3800.470 1.210 0.230105
## V75 -9535.421 2994.783 -3.184 0.002106 **
## V76 9526.560 3034.171 3.140 0.002408 **
## V78 6012.384 2297.848 2.617 0.010712 *
## V80 -16953.204 3785.671 -4.478 2.61e-05 ***
## V81 17019.500 4172.935 4.079 0.000111 ***
## V83 -20558.202 2863.663 -7.179 4.03e-10 ***
## V85 36199.812 6265.898 5.777 1.58e-07 ***
## V86 -13291.846 11326.368 -1.174 0.244248
## V87 -24080.592 14549.097 -1.655 0.102023
## V88 28970.441 13519.529 2.143 0.035325 *
## V89 -23400.860 10184.908 -2.298 0.024337 *
## V90 10544.734 6479.714 1.627 0.107803
## V92 18892.623 5992.571 3.153 0.002316 **
## V93 -42141.740 8525.172 -4.943 4.47e-06 ***
## V94 22141.673 9021.962 2.454 0.016409 *
## V95 14711.915 7341.744 2.004 0.048648 *
## V96 -20272.005 4342.588 -4.668 1.28e-05 ***
## V98 5850.082 1155.296 5.064 2.79e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8167 on 76 degrees of freedom
## Multiple R-squared: 0.9978, Adjusted R-squared: 0.9957
## F-statistic: 478.5 on 73 and 76 DF, p-value: < 2.2e-16
# Coeficientes del modelo
# ==============================================================================
df_coeficientes <- modelo$coefficients %>%
enframe(name = "predictor", value = "coeficiente")
df_coeficientes %>%
filter(predictor != "(Intercept)") %>%
ggplot(aes(x = predictor, y = coeficiente)) +
geom_col() +
labs(title = "Coeficientes del modelo Stepwise") +
theme_bw() +
theme(axis.text.x = element_text(size = 6, angle = 45))
paste("Número de predictores incluidos en el modelo:", length(modelo$coefficients))
## [1] "Número de predictores incluidos en el modelo: 74"
# Predicciones de entrenamiento
# ==============================================================================
predicciones_train <- predict(modelo, newdata = datos_train)
# MSE de entrenamiento
# ==============================================================================
training_mse <- mean((predicciones_train - datos_train$fat)^2)
paste("Error (mse) de entrenamiento:", training_mse)
## [1] "Error (mse) de entrenamiento: 0.337976226617134"
# Predicciones de test
# ==============================================================================
predicciones_test <- predict(modelo, newdata = datos_test)
# MSE de test
# ==============================================================================
test_mse_step <- mean((predicciones_test - datos_test$fat)^2)
paste("Error (mse) de test:", test_mse_step)
## [1] "Error (mse) de test: 26.1301139614897"
El error de test sigue siendo elevado, pero más bajo que la primera regresión OLS.
RIDGE
# Matrices de entrenamiento y test
# ==============================================================================
x_train <- model.matrix(fat~., data = datos_train)[, -1]
y_train <- datos_train$fat
x_test <- model.matrix(fat~., data = datos_test)[, -1]
y_test <- datos_test$fat
# Creación y entrenamiento del modelo
# ==============================================================================
# Para obtener un ajuste con regularización Ridge se indica argumento alpha=0.
# Si no se especifica valor de lambda, se selecciona un rango automático.
modelo <- glmnet(
x = x_train,
y = y_train,
alpha = 0,
nlambda = 100,
standardize = TRUE
)
# Evolución de los coeficientes en función de lambda
# ==============================================================================
regularizacion <- modelo$beta %>%
as.matrix() %>%
t() %>%
as_tibble() %>%
mutate(lambda = modelo$lambda)
regularizacion <- regularizacion %>%
pivot_longer(
cols = !lambda,
names_to = "predictor",
values_to = "coeficientes"
)
regularizacion %>%
ggplot(aes(x = lambda, y = coeficientes, color = predictor)) +
geom_line() +
scale_x_log10(
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))
) +
labs(title = "Coeficientes del modelo en función de la regularización") +
theme_bw() +
theme(legend.position = "none")
Vemos que al final se mantiene estacionario en un punto, sin apenas variabilidad, que es de lo que se trata.
# Evolución del error en función de lambda
# ==============================================================================
set.seed(123)
cv_error <- cv.glmnet(
x = x_train,
y = y_train,
alpha = 0,
nfolds = 10,
type.measure = "mse",
standardize = TRUE
)
plot(cv_error)
# Mejor valor lambda encontrado
# ==============================================================================
paste("Mejor valor de lambda encontrado:", cv_error$lambda.min)
## [1] "Mejor valor de lambda encontrado: 1.19461351901119"
paste("Mejor valor de lambda encontrado + 1 desviación estándar:", cv_error$lambda.1se)
## [1] "Mejor valor de lambda encontrado + 1 desviación estándar: 1.43891621288998"
# Mejor modelo lambda óptimo + 1sd
# ============================================================
modelo <- glmnet(
x = x_train,
y = y_train,
alpha = 0,
lambda = cv_error$lambda.1se,
standardize = TRUE
)
# Coeficientes del modelo
# ==============================================================================
df_coeficientes <- coef(modelo) %>%
as.matrix() %>%
as_tibble(rownames = "predictor") %>%
rename(coeficiente = s0)
df_coeficientes %>%
filter(predictor != "(Intercept)") %>%
ggplot(aes(x = predictor, y = coeficiente)) +
geom_col() +
labs(title = "Coeficientes del modelo Ridge") +
theme_bw() +
theme(axis.text.x = element_text(size = 6, angle = 45))
# Predicciones de entrenamiento
# ==============================================================================
predicciones_train <- predict(modelo, newx = x_train)
# MSE de entrenamiento
# ==============================================================================
training_mse <- mean((predicciones_train - y_train)^2)
paste("Error (mse) de entrenamiento:", training_mse)
## [1] "Error (mse) de entrenamiento: 35.4876331480252"
# Predicciones de test
# ==============================================================================
predicciones_test <- predict(modelo, newx = x_test)
# MSE de test
# ==============================================================================
test_mse_ridge <- mean((predicciones_test - y_test)^2)
paste("Error (mse) de test:", test_mse_ridge)
## [1] "Error (mse) de test: 23.6310482002904"
Muy similar al modelo anterior.
LASSO
# Matrices de entrenamiento y test
# ==============================================================================
x_train <- model.matrix(fat~., data = datos_train)[, -1]
y_train <- datos_train$fat
x_test <- model.matrix(fat~., data = datos_test)[, -1]
y_test <- datos_test$fat
# Creación y entrenamiento del modelo
# ==============================================================================
# Para obtener un ajuste con regularización Lasso se indica argumento alpha=1.
# Si no se especifica valor de lambda, se selecciona un rango automático.
modelo <- glmnet(
x = x_train,
y = y_train,
alpha = 1, #1 es para Lasso, y 0 es para Ridge
nlambda = 100,
standardize = TRUE
)
# Evolución de los coeficientes en función de lambda
# ==============================================================================
regularizacion <- modelo$beta %>%
as.matrix() %>%
t() %>%
as_tibble() %>%
mutate(lambda = modelo$lambda)
regularizacion <- regularizacion %>%
pivot_longer(
cols = !lambda,
names_to = "predictor",
values_to = "coeficientes"
)
regularizacion %>%
ggplot(aes(x = lambda, y = coeficientes, color = predictor)) +
geom_line() +
scale_x_log10(
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))
) +
labs(title = "Coeficientes del modelo en función de la regularización") +
theme_bw() +
theme(legend.position = "none")
Vemos que a partir de lambda = 10^0 la mayoría de los coeficientes
converge, excepto el verde, que tarda un poco más.
# Evolución del error en función de lambda
# ==============================================================================
set.seed(123)
cv_error <- cv.glmnet(
x = x_train,
y = y_train,
alpha = 1,
nfolds = 10,
type.measure = "mse",
standardize = TRUE
)
plot(cv_error)
paste("Mejor valor de lambda encontrado:", cv_error$lambda.min)
## [1] "Mejor valor de lambda encontrado: 0.00924945862610264"
paste("Mejor valor de lambda encontrado + 1 desviación estándar:", cv_error$lambda.1se)
## [1] "Mejor valor de lambda encontrado + 1 desviación estándar: 0.0340230215106218"
# Mejor modelo lambda óptimo + 1sd
# ==============================================================================
modelo <- glmnet(
x = x_train,
y = y_train,
alpha = 1,
lambda = cv_error$lambda.1se,
standardize = TRUE
)
# Coeficientes del modelo
# ==============================================================================
df_coeficientes <- coef(modelo) %>%
as.matrix() %>%
as_tibble(rownames = "predictor") %>%
rename(coeficiente = s0)
df_coeficientes %>%
filter(predictor != "(Intercept)") %>%
ggplot(aes(x = predictor, y = coeficiente)) +
geom_col() +
labs(title = "Coeficientes del modelo Lasso") +
theme_bw() +
theme(axis.text.x = element_text(size = 6, angle = 45))
df_coeficientes %>%
filter(
predictor != "(Intercept)",
coeficiente != 0
)
## # A tibble: 8 × 2
## predictor coeficiente
## <chr> <dbl>
## 1 V16 -23.3
## 2 V17 -36.6
## 3 V18 -24.8
## 4 V19 -9.81
## 5 V40 110.
## 6 V41 40.1
## 7 V50 -42.0
## 8 V51 -20.6
# Predicciones de entrenamiento
# ==============================================================================
predicciones_train <- predict(modelo, newx = x_train)
# MSE de entrenamiento
# ==============================================================================
training_mse <- mean((predicciones_train - y_train)^2)
paste("Error (mse) de entrenamiento:", training_mse)
## [1] "Error (mse) de entrenamiento: 11.5872605021096"
# Predicciones de test
# ==============================================================================
predicciones_test <- predict(modelo, newx = x_test)
# MSE de test
# ==============================================================================
test_mse_lasso <- mean((predicciones_test - y_test)^2)
paste("Error (mse) de test:", test_mse_lasso)
## [1] "Error (mse) de test: 9.7099854494933"
Es bastante más bajo que el Ridge; con lo cual, tiene más precisión.
ELASTIC NET
PCR
# Creación y entrenamiento del modelo
# ==============================================================================
set.seed(123)
# Importante estandarizar las variables indicándolo con el argumento scale
# Indicando validation = CV, se emplea 10-fold-cross-validation para
# identificar el número óptimo de componentes.
modelo_pcr <- pcr(fat ~ ., data = datos_train, scale = TRUE, validation = "CV")
# Evolución del error en función del número de componentes
# ==============================================================================
error_cv <- MSEP(modelo_pcr, estimate = "CV")
n_componentes_optimo <- which.min(error_cv$val)
error_cv <- data.frame(
componentes = seq_along(as.vector(error_cv$val)) - 1,
mse = as.vector(error_cv$val)
)
error_cv %>% head()
## componentes mse
## 1 0 157.78339
## 2 1 130.68153
## 3 2 129.25956
## 4 3 74.19071
## 5 4 21.10944
## 6 5 12.47395
ggplot(data = error_cv, aes(x = componentes, y = mse)) +
geom_line() +
geom_vline(xintercept = n_componentes_optimo, color = 'red', linetype = 'dashed') +
labs(title = "Error en función de las componentes incluidas") +
theme_bw()
# Mejor número de componentes encontrados
# ==============================================================================
paste("Número de componentes óptimo:", n_componentes_optimo)
## [1] "Número de componentes óptimo: 37"
modelo <- pcr(fat ~ ., data = datos_train, scale = TRUE, ncomp = n_componentes_optimo)
summary(modelo)
## Data: X dimension: 150 100
## Y dimension: 150 1
## Fit method: svdpc
## Number of components considered: 37
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 98.48 99.60 99.88 99.99 100.00 100.00 100.00 100.00
## fat 17.67 19.82 55.01 87.66 93.27 94.46 94.49 94.98
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 100.00 100.00 100.00 100.00 100.00 100.00 100.00
## fat 95.47 95.58 96.41 96.54 96.56 96.56 97.55
## 16 comps 17 comps 18 comps 19 comps 20 comps 21 comps 22 comps
## X 100.0 100.00 100.00 100.00 100.00 100.00 100.00
## fat 97.8 97.91 97.95 97.95 97.97 97.97 97.98
## 23 comps 24 comps 25 comps 26 comps 27 comps 28 comps 29 comps
## X 100.00 100.00 100.00 100.00 100.00 100.00 100.00
## fat 97.99 98.06 98.18 98.19 98.19 98.32 98.32
## 30 comps 31 comps 32 comps 33 comps 34 comps 35 comps 36 comps
## X 100.00 100.00 100.00 100.00 100.00 100.00 100.00
## fat 98.53 98.62 98.67 98.67 98.68 98.96 98.96
## 37 comps
## X 100.00
## fat 98.96
# Predicciones de entrenamiento
# ==============================================================================
predicciones_train <- predict(modelo, newdata = datos_train)
# MSE de entrenamiento
# ==============================================================================
training_mse <- mean((predicciones_train - datos_train$fat)^2)
paste("Error (mse) de entrenamiento:", training_mse)
## [1] "Error (mse) de entrenamiento: 12.8382292578259"
# Predicciones de test
# ==============================================================================
predicciones_test <- predict(modelo, newdata = datos_test)
# MSE de test
# ==============================================================================
test_mse_pcr <- mean((predicciones_test - datos_test$fat)^2)
paste("Error (mse) de test:", test_mse_pcr)
## [1] "Error (mse) de test: 18.2691375004994"
PLS
# Creación y entrenamiento del modelo
# ==============================================================================
set.seed(123)
# Importante estandarizar las variables indicándolo con el argumento scale
# Indicando validation = CV, se emplea 10-fold-cross-validation para
# identificar el número óptimo de componentes.
modelo_plsr <- plsr(fat ~ ., data = datos_train, scale = TRUE, validation = "CV")
ggplot(data = error_cv, aes(x = componentes, y = mse)) +
geom_line() +
geom_vline(xintercept = n_componentes_optimo, color = 'red', linetype = 'dashed') +
labs(title = "Error en función de las componentes incluidas") +
theme_bw()
paste("Número de componentes óptimo:", n_componentes_optimo)
## [1] "Número de componentes óptimo: 37"
modelo <- plsr(fat ~ ., data = datos_train, scale = TRUE, ncomp = n_componentes_optimo)
# Predicciones de entrenamiento
# ==============================================================================
predicciones_train <- predict(modelo, newdata = datos_train)
# MSE de entrenamiento
# ==============================================================================
training_mse <- mean((predicciones_train - datos_train$fat)^2)
paste("Error (mse) de entrenamiento:", training_mse)
## [1] "Error (mse) de entrenamiento: 8.60265770012269"
# Predicciones de test
# ==============================================================================
predicciones_test <- predict(modelo, newdata = datos_test)
# MSE de test
# ==============================================================================
test_mse_plsr <- mean((predicciones_test - datos_test$fat)^2)
paste("Error (mse) de test:", test_mse_plsr)
## [1] "Error (mse) de test: 20.0927168415668"
COMPARACIÓN
df_comparacion <- data.frame(
modelo = c("ols", "Stepwise", "Ridge", "Lasso", "PCR","PLS"),
mse = c(test_mse, test_mse_step, test_mse_ridge,
test_mse_lasso, test_mse_pcr, test_mse_plsr)
)
ggplot(data = df_comparacion, aes(x = modelo, y = mse)) +
geom_col(width = 0.5) +
geom_text(aes(label = round(mse, 2)), vjust = -0.1) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
sesion_info <- devtools::session_info()
dplyr::select(
tibble::as_tibble(sesion_info$packages),
c(package, loadedversion, source)
)
## # A tibble: 99 × 3
## package loadedversion source
## <chr> <chr> <chr>
## 1 base64enc 0.1-3 CRAN (R 4.3.1)
## 2 boot 1.3-28.1 CRAN (R 4.3.2)
## 3 bslib 0.5.1 CRAN (R 4.3.2)
## 4 cachem 1.0.8 CRAN (R 4.3.2)
## 5 callr 3.7.3 CRAN (R 4.3.2)
## 6 cli 3.6.1 CRAN (R 4.3.2)
## 7 codetools 0.2-19 CRAN (R 4.3.2)
## 8 colorspace 2.1-0 CRAN (R 4.3.2)
## 9 corrr 0.4.4 CRAN (R 4.3.2)
## 10 crayon 1.5.2 CRAN (R 4.3.2)
## # ℹ 89 more rows