##
## Overall
## n 101
## Sexo = M (%) 67 (66.3)
## Edad_calc (mean (SD)) 58.00 (11.66)
## Sitio (%)
## Colorrectal 52 (51.5)
## Esófago 4 ( 4.0)
## Gástrico 18 (17.8)
## Hepatobiliopancreático 27 (26.7)
## ECOG (%)
## ECOG 0 9 ( 8.9)
## ECOG 1 59 (58.4)
## ECOG 2 33 (32.7)
## MuscEsqueléticoKg (mean (SD)) 28.40 (6.00)
## SupCorp (mean (SD)) 1.81 (0.22)
## BMI_calc (mean (SD)) 26.14 (4.90)
## Talla (mean (SD)) 165.84 (9.30)
## Peso (mean (SD)) 72.04 (15.65)
## GrasaCorporalkg (mean (SD)) 21.24 (10.94)
## GrasaVisceral (mean (SD)) 9.71 (5.24)
## RelaciónCC (mean (SD)) 0.91 (0.07)
## LabAlb (mean (SD)) 3.89 (0.23)
## LabCreat (mean (SD)) 0.83 (0.19)
## LabUrea (mean (SD)) 35.11 (9.33)
## LabHb (mean (SD)) 12.77 (1.33)
## LabLinf (mean (SD)) 23.70 (8.75)
## Stratified by Sexo
## F M
## n 34 67
## Sexo = M (%) 0 ( 0.0) 67 (100.0)
## Edad_calc (mean (SD)) 56.66 (11.73) 58.69 (11.66)
## Sitio (%)
## Colorrectal 15 (44.1) 37 ( 55.2)
## Esófago 0 ( 0.0) 4 ( 6.0)
## Gástrico 2 ( 5.9) 16 ( 23.9)
## Hepatobiliopancreático 17 (50.0) 10 ( 14.9)
## ECOG (%)
## ECOG 0 1 ( 2.9) 8 ( 11.9)
## ECOG 1 19 (55.9) 40 ( 59.7)
## ECOG 2 14 (41.2) 19 ( 28.4)
## MuscEsqueléticoKg (mean (SD)) 22.42 (3.73) 31.53 (4.35)
## SupCorp (mean (SD)) 1.67 (0.18) 1.88 (0.21)
## BMI_calc (mean (SD)) 25.95 (4.75) 26.24 (5.01)
## Talla (mean (SD)) 157.53 (6.81) 170.06 (7.36)
## Peso (mean (SD)) 64.47 (13.36) 75.88 (15.40)
## GrasaCorporalkg (mean (SD)) 23.88 (10.99) 19.91 (10.75)
## GrasaVisceral (mean (SD)) 11.25 (5.04) 8.97 (5.21)
## RelaciónCC (mean (SD)) 0.91 (0.07) 0.91 (0.07)
## LabAlb (mean (SD)) 3.81 (0.26) 3.93 (0.21)
## LabCreat (mean (SD)) 0.72 (0.17) 0.89 (0.18)
## LabUrea (mean (SD)) 33.74 (11.32) 35.83 (8.11)
## LabHb (mean (SD)) 12.09 (1.02) 13.12 (1.34)
## LabLinf (mean (SD)) 24.59 (9.05) 23.29 (8.65)
## # A tibble: 15 x 3
## variable n_miss pct_miss
## <chr> <int> <dbl>
## 1 linfocitos 15 14.9
## 2 RelaciónCC 3 2.97
## 3 MuscEsqueléticoKg 2 1.98
## 4 LabUrea 2 1.98
## 5 LabAlb 1 0.990
## 6 LabCreat 1 0.990
## 7 LabHb 1 0.990
## 8 SupCorp 0 0
## 9 BMI_calc 0 0
## 10 Talla 0 0
## 11 Peso 0 0
## 12 Edad_calc 0 0
## 13 log_P_Peso 0 0
## 14 Male 0 0
## 15 Sitio_sup 0 0
## # A tibble: 101 x 3
## case n_miss pct_miss
## <int> <int> <dbl>
## 1 45 5 33.3
## 2 18 2 13.3
## 3 41 2 13.3
## 4 66 2 13.3
## 5 1 1 6.67
## 6 2 1 6.67
## 7 7 1 6.67
## 8 9 1 6.67
## 9 12 1 6.67
## 10 19 1 6.67
## # ... with 91 more rows
Forzando relaciones lineales y suavizando, discriminado por sexo
CONSULTA -Viendo las correlaciones con masa muscular, a igualdad de tamaño, es decir peso, BMI, sup.corp. (salvo talla), las mujeres tienen mas masa muscular. Es razonable?
Análisis de componentes principales entre variables cuantitativas
Estudiamos la colinealidad
## RelaciónCC LabAlb LabCreat LabUrea LabHb SupCorp
## 4.518738 1.353824 1.902131 1.532458 1.520250 714.716376
## BMI_calc Talla Peso Edad_calc linfocitos log_P_Peso
## 118.678541 80.263343 627.687164 1.339684 1.111300 1.683424
## Male Sitio_sup
## 2.260774 1.406221
## Male RelaciónCC LabAlb SupCorp linfocitos BMI_calc
## Sum of weights: 1.00 0.90 0.86 0.83 0.69 0.63
## N containing models: 8192 8192 8192 8192 8192 8192
## LabHb Peso Sitio_sup Talla Edad_calc log_P_Peso
## Sum of weights: 0.55 0.49 0.49 0.39 0.32 0.24
## N containing models: 8192 8192 8192 8192 8192 8192
## LabCreat LabUrea
## Sum of weights: 0.23 0.23
## N containing models: 8192 8192
## Male Peso Talla RelaciónCC LabAlb linfocitos
## 1.893981 4.735725 2.509589 3.840202 1.143337 1.047990
## LabHb
## 1.276754
##
## Call:
## lm(formula = MuscEsqueléticoKg ~ Male + Peso + Talla + RelaciónCC +
## LabAlb + linfocitos + LabHb, data = bd_compl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4884 -1.1751 0.1353 1.4211 5.6298
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.157e+01 8.432e+00 -2.558 0.01213 *
## Male1 3.523e+00 6.098e-01 5.777 1.00e-07 ***
## Peso 2.322e-01 2.927e-02 7.935 4.64e-12 ***
## Talla 2.161e-01 3.585e-02 6.028 3.32e-08 ***
## RelaciónCC -1.362e+01 5.844e+00 -2.331 0.02192 *
## LabAlb 2.684e+00 9.699e-01 2.768 0.00681 **
## linfocitos 5.481e-04 3.021e-04 1.814 0.07285 .
## LabHb -3.116e-01 1.778e-01 -1.752 0.08300 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.104 on 93 degrees of freedom
## Multiple R-squared: 0.8841, Adjusted R-squared: 0.8754
## F-statistic: 101.3 on 7 and 93 DF, p-value: < 2.2e-16
##
## Shapiro-Wilk normality test
##
## data: re
## W = 0.98955, p-value = 0.6284
##
## Shapiro-Wilk normality test
##
## data: residuals(m5_lineal)
## W = 0.98834, p-value = 0.5339
##
## Call:
## lm(formula = MuscEsqueléticoKg ~ Peso + Male + Talla + LabAlb +
## RelaciónCC + linfocitos + LabHb, data = bd_compl[-48, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0671 -1.3416 -0.0184 1.2543 5.3990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.285e+01 8.043e+00 -1.598 0.113526
## Peso 2.555e-01 2.753e-02 9.280 7.46e-15 ***
## Male1 3.161e+00 5.684e-01 5.561 2.61e-07 ***
## Talla 1.973e-01 3.333e-02 5.921 5.46e-08 ***
## LabAlb 2.086e+00 9.050e-01 2.305 0.023415 *
## RelaciónCC -2.050e+01 5.630e+00 -3.642 0.000447 ***
## linfocitos 5.258e-04 2.784e-04 1.889 0.062082 .
## LabHb -1.829e-01 1.667e-01 -1.097 0.275530
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.939 on 92 degrees of freedom
## Multiple R-squared: 0.8951, Adjusted R-squared: 0.8871
## F-statistic: 112.1 on 7 and 92 DF, p-value: < 2.2e-16
Comparo los 3 modelos por AIC
## df AIC
## m5_lineal 9 425.8712
## m6_sinHb 8 425.1707
## m7_Pablo 6 440.2575
Winner: m6_sinHb. Peso +Male +Talla+ LabAlb +RelaciónCC+ linfocitos. Sin observacion 48
##
## Call:
## randomForest(formula = MuscEsqueléticoKg ~ ., data = bd_compl, mtry = 5, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 5
##
## Mean of squared residuals: 6.013267
## % Var explained: 82.91
## 15 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -2.724338e+01
## RelaciónCC -3.550333e+00
## LabAlb 1.018021e+00
## LabCreat .
## LabUrea .
## LabHb .
## SupCorp 1.362145e+01
## BMI_calc .
## Talla 1.683747e-01
## Peso .
## Edad_calc .
## linfocitos 2.168022e-04
## log_P_Peso .
## Male1 2.986540e+00
## Sitio_sup1 -8.850462e-02
Lasso selecciona 6 variables: Male Talla SupCorp RelaciónCC LabAlb linfocitos
Como método de seleccion usamos REML, que resulta ser mejor que GCV ya que penaliza más el overfitting VE: Male + Peso +Talla+ RelaciónCC+LabAlb + linfocitos +LabHb
## Loading required package: nlme
## This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.
##
## Attaching package: 'mgcv'
## The following objects are masked from 'package:gam':
##
## gam, gam.control, gam.fit, s
## Loading required package: plotfunctions
##
## Attaching package: 'plotfunctions'
## The following object is masked from 'package:ggplot2':
##
## alpha
## Loaded package itsadug 2.3 (see 'help("itsadug")' ).
##
## Family: gaussian
## Link function: identity
##
## Formula:
## MuscEsqueléticoKg ~ s(Peso) + s(Talla) + s(LabAlb) + s(RelaciónCC) +
## s(linfocitos) + Male
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.5207 0.4075 65.089 < 2e-16 ***
## Male1 2.9348 0.5379 5.456 4.13e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Peso) 1.001 1.003 91.345 < 2e-16 ***
## s(Talla) 1.000 1.000 34.595 5.19e-08 ***
## s(LabAlb) 1.000 1.000 4.309 0.040669 *
## s(RelaciónCC) 2.592 3.297 5.830 0.000715 ***
## s(linfocitos) 1.000 1.000 3.256 0.074392 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.891 Deviance explained = 90%
## -REML = 207.27 Scale est. = 3.6149 n = 100
## Summary:
## * Male : factor; set to the value(s): 0, 1.
## * Peso : numeric predictor; with 30 values ranging from 45.700000 to 115.000000.
## * Talla : numeric predictor; set to the value(s): 167.
## * LabAlb : numeric predictor; set to the value(s): 3.9.
## * RelaciónCC : numeric predictor; set to the value(s): 0.91.
## * linfocitos : numeric predictor; set to the value(s): 1620.
## Summary:
## * Male : factor; set to the value(s): 0, 1.
## * Peso : numeric predictor; set to the value(s): 71.5.
## * Talla : numeric predictor; with 30 values ranging from 145.000000 to 188.000000.
## * LabAlb : numeric predictor; set to the value(s): 3.9.
## * RelaciónCC : numeric predictor; set to the value(s): 0.91.
## * linfocitos : numeric predictor; set to the value(s): 1620.
## Summary:
## * Male : factor; set to the value(s): 0, 1.
## * Peso : numeric predictor; set to the value(s): 71.5.
## * Talla : numeric predictor; set to the value(s): 167.
## * LabAlb : numeric predictor; with 30 values ranging from 3.400000 to 4.400000.
## * RelaciónCC : numeric predictor; set to the value(s): 0.91.
## * linfocitos : numeric predictor; set to the value(s): 1620.
## Summary:
## * Male : factor; set to the value(s): 0, 1.
## * Peso : numeric predictor; set to the value(s): 71.5.
## * Talla : numeric predictor; set to the value(s): 167.
## * LabAlb : numeric predictor; set to the value(s): 3.9.
## * RelaciónCC : numeric predictor; with 30 values ranging from 0.720000 to 1.080000.
## * linfocitos : numeric predictor; set to the value(s): 1620.
## Summary:
## * Male : factor; set to the value(s): 0, 1.
## * Peso : numeric predictor; set to the value(s): 71.5.
## * Talla : numeric predictor; set to the value(s): 167.
## * LabAlb : numeric predictor; set to the value(s): 3.9.
## * RelaciónCC : numeric predictor; set to the value(s): 0.91.
## * linfocitos : numeric predictor; with 30 values ranging from 328.000000 to 4600.000000.
##
## Method: REML Optimizer: outer newton
## full convergence after 10 iterations.
## Gradient range [-0.0001123183,0.0005739024]
## (score 207.2742 & scale 3.614904).
## Hessian positive definite, eigenvalue range [9.724347e-06,46.51313].
## Model rank = 47 / 47
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(Peso) 9.00 1.00 0.78 0.01 **
## s(Talla) 9.00 1.00 1.06 0.66
## s(LabAlb) 9.00 1.00 1.03 0.57
## s(RelaciónCC) 9.00 2.59 1.00 0.43
## s(linfocitos) 9.00 1.00 1.05 0.68
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m_lineal <- lm(MuscEsqueléticoKg ~ Peso +Male +Talla+ LabAlb +RelaciónCC+ linfocitos, data=bd_compl[-48,])
m_lasso<- lm(MuscEsqueléticoKg ~ Male +Talla +SupCorp+ RelaciónCC+ LabAlb +linfocitos, , data=bd_compl[-48,])
m_gam<-gam(MuscEsqueléticoKg ~ s(Peso) + s(Talla) + s(LabAlb) + s(RelaciónCC) + s(linfocitos) + Male, data=bd_compl[-48,],method="REML")
##
## Attaching package: 'caret'
## The following objects are masked from 'package:mixOmics':
##
## nearZeroVar, plsda, splsda
## Linear Regression
##
## 101 samples
## 6 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 2.240971 0.8574766 1.718937
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
## Linear Regression
##
## 101 samples
## 6 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 2.240971 0.8574766 1.718937
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
## Linear Regression
##
## 101 samples
## 6 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 2.214402 0.8608193 1.705464
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
## Linear Regression
##
## 101 samples
## 6 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 2.214402 0.8608193 1.705464
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
## Generalized Additive Model using Splines
##
## 101 samples
## 6 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ...
## Resampling results across tuning parameters:
##
## select RMSE Rsquared MAE
## FALSE 2.450007 0.8313108 1.937566
## TRUE 2.427435 0.8332436 1.882016
##
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = TRUE and method = GCV.Cp.
## Generalized Additive Model using Splines
##
## 101 samples
## 6 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ...
## Resampling results across tuning parameters:
##
## select RMSE Rsquared MAE
## FALSE 2.450007 0.8313108 1.937566
## TRUE 2.427435 0.8332436 1.882016
##
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = TRUE and method = GCV.Cp.