1 Descripción base de datos

Los datos son de “Skin Cancer Prevention Study”, un estudio aleatorizado de ensayos clínicos controlados para prevenir cáncer de piel no melanoma(células basales o células escamosas) en la piel de sujetos de alto riesgo. A los sujetos fueron asignados al azar un tratamiento placebo o 50 mg de betacaroteno por día durante 5 años. Los sujetos fueron examinados una vez al año y se les realizaba una biopsia si se sospechaba cáncer para determinar el número de nuevas células cencerosas en la piel que ocurrieron desde el último examen.

#Cargar paquetes
library(brms)
library(dplyr)
library(purrr)
library(tidyr)
library(tidybayes)
library(ggplot2)
library(cowplot)
library(rstan)
library(brms)
library(gganimate)
library(gifski)
theme_set(theme_tidybayes() + panel_border())
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

2 Objetivo

Modelar el número de nuevas células cancerosas en un paciente, además de observar la incedencia de diferentes variables tales como el género, la edad, el tratamiento, tipo de piel, y la exposición. Debido a que hay variables de agrupación, modelar dicha variable mediante modelos lineales jerárquicos puede generar buenos resultados.

3 Análisis descriptivo

Los datos completos tienen 1683 pacientes y un total de 7081 mediciones, en este caso se seleccionará aleatoriamente un subconjunto de los datos, los cuales tendrán en cuenta las mediciones de 100 pacientes.

skin_data <- read.csv2("skin.data.csv") #Lectura base de datos

set.seed(12345678)
pacientes <- sample(unique(skin_data$id),150) #Seleccionar solo 150 pacientes aleatoriamente
skin_data <- skin_data %>% filter(id %in% pacientes) %>%
   mutate(tratamiento = ifelse(tratamiento == 0,"Placebo","Beta-carotene")) %>%
   mutate(genero = as.factor(genero)) %>%
   mutate(genero = as.factor(piel)) 

head(skin_data)

##       id edad piel genero exposicion y tratamiento year
## 1 100067   53    1      1          3 0     Placebo    1
## 2 100067   53    1      1          3 0     Placebo    2
## 3 100067   53    1      1          3 0     Placebo    3
## 4 100067   53    1      1          3 0     Placebo    4
## 5 100067   53    1      1          3 0     Placebo    5
## 6 100214   64    1      1          2 0     Placebo    1

3.1 Explorar variable respuesta

Es importante observar el comportamiento de la variable respuesta para plantear un modelo adecuado, en este caso la variable es de conteos por lo cual deben utilizarse distribuciones que puedan modelar dicho comportamiento.

skin_data %>%
  ggplot(aes(y)) +
  geom_bar(aes(y = (..count..)/sum(..count..)),fill = "salmon") + 
  ylab("Frecuencia")+
  theme_bw()

Se observa que el número de nuevas células cancerosas en la piel que aparece con más frecuencia es cero células, siendo más del 80% de los datos, además tiene un comportamiento decreciente.

3.2 Exploración de variables

3.2.1 Variable edad

ggplot(skin_data, aes(x=edad, y=y)) +
  geom_point() +
  theme_bw()

Para la variable edad no se presentan diferencias muy significativas en cuanto al número de células cancerosas observadas, parece haber un comportamiento parecido para cada edad, excepto para los mayores números de nuevas células cancerosas que como se vio anteriormente no representan una frecuenta alta y es aproximadamente 0, las obtienen pacientes de 57 y 60 años.

3.2.2 Variable piel

tabla <- with(skin_data, table(piel, y))
model::plot_prof(tabla, Row=T, cex=1, xlab="Tipo de piel")

Para esta variable es conveniente realizar comparaciones mediante un gráfico de perfiles, en este caso se observa un comportamiento parecido para ambos tipos de piel, por lo cual puede que no sea muy significativo en cuánto al número de nuevas células cancerosas en los pacientes. ### Variable genero

tabla <- with(skin_data, table(genero, y))
model::plot_prof(tabla, Row=T, cex=1, xlab='Genero')

Tal y como se procedió a analizar el tipo de piel, se realiza un gráfico de perfiles para analizar la variable género, ambos tienen un comportamiento muy similar.

3.2.3 Variable exposición

ggplot(skin_data, aes(x=exposicion, y=y)) +
  geom_point() +
  theme_bw()

La exposición,es decir, el recuento del número de nuevas células cancerosas en la piel está relacionado en cierta medida con las células nuevas que aparecen en los pacientes, quienes han presentado mayor número de celulas cancerosas presentan de 5 a 12 células cancerosas nuevas cada año.

3.2.4 Variable tratamiento

tabla <- with(skin_data, table(tratamiento, y))
model::plot_prof(tabla, Row=T, cex=1, xlab='Tratamiento')

En este caso el número de células cancerosas nuevas se comporta de igual manera para quienes reciben placebo o beta-caroteno.

3.3 Comportamiento de cada paciente

Como se tiene un total de 150 pacientes únicamente se graficarán el número de células cancerosas observadas en cada año discriminadas por tratamiento para los pacientes con id = 418191,107355,103755,404961,413502,106815

skin_data %>% filter(id %in% c(418191,107355,103755,404961,413502,106815))  %>%
  ggplot(aes(x = year, y = y, color = tratamiento)) +
  geom_point() +
  theme_bw() +
  facet_wrap(~ id)

Se puede observar que el comportamiento en cada paciente es diferente, por lo cual el uso de modelos jerárquicos tomando como variable de agrupación el id del paciente puede dar mejores resultados que un modelo de regresión lineal.

4 Modelos a considerar

Se considerarán modelos lineales generalizados y modelos lineales jerárquicos generalizados en ambos casos utilizando distribuciones que puedan modelar conteos, como en la variable respuesta se observó que el número de nuevas células cancerosas que tuvo más frecuencia fue el cero se considerarán las distribuciones Poisson cero-inflado (ZIP) y Binomital negativa cero-inflado(ZINB)

Del análisis descriptivo las covariables que podrían ser significativas en los modelos es la exposición, por la naturaleza del problema también se incluirá la covariable tratamiento, además para los modelos jerárquicos se tendrá en cuenta como variable de agrupación el id del paciente.

4.1 Con distribución ZIP

4.1.1 Modelo lineal generalizado

\[y_{ij} \sim ZIP(\mu_{ij},\sigma)\] \[log(\mu_{ij}) = \beta_0 + \beta_1Exposición_{ij}+\beta_2 Tratamiento_{placeboi}\]

4.1.2 Modelo lineal generalizado con intercepto aleatorio

\[y_{ij} \sim ZIP(\mu_{ij},\sigma)\] \[log(\mu_{ij}) = \beta_0 + \beta_1Exposición_{ij}+\beta_2 Tratamiento_{placeboi} + b_{0i}\] \[b_{0} \sim N(0,\sigma_{b0}^2)\]

4.1.3 Modelo lineal generalizado con intercepto y pendiente aleatoria

\[y_{ij} \sim ZIP(\mu_{ij},\sigma)\] \[log(\mu_{ij}) = \beta_0 + \beta_1Exposición_{ij}+\beta_2 Tratamiento_{placeboi} + b_{0i}+b_{1i}Exposición_i\] \[\begin{pmatrix} b_0 \\ b_1 \end{pmatrix} \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} \sigma^2_{b0} & \sigma_{b01} \\ \sigma_{b01} & \sigma^2_{b1} \end{pmatrix} \right)\]

4.2 Con distribución ZINB

4.2.1 Modelo lineal generalizado

\[y_{ij} \sim ZINB(\mu_{ij},\sigma,\nu)\] \[log(\mu_{ij}) = \beta_0 + \beta_1Exposición_{ij}+\beta_2 Tratamiento_{placeboi}\]

4.2.2 Modelo lineal generalizado con intercepto aleatorio

\[y_{ij} \sim ZINB(\mu_{ij},\sigma,\nu)\] \[log(\mu_{ij}) = \beta_0 + \beta_1Exposición_i+\beta_2 Tratamiento_{placeboi} + b_{0i}\] \[b_{0} \sim N(0,\sigma_{b0}^2)\]

4.2.3 Modelo lineal generalizado con intercepto y pendiente aleatoria

\[y_{ij} \sim ZINB(\mu_{ij},\sigma,\nu)\] \[log(\mu_{ij}) = \beta_0 + \beta_1Exposición_{ij}+\beta_2 Tratamiento_{placeboi} + b_{0i}+b_{1i}Exposición_i\] \[\begin{pmatrix} b_0 \\ b_1 \end{pmatrix} \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} \sigma^2_{b0} & \sigma_{b01} \\ \sigma_{b01} & \sigma^2_{b1} \end{pmatrix} \right)\]

5 Construcción de los modelos

Para la construcción de los modelos se utilizará la función brm del paquete brms

#Modelo 4.0.1 
mod1 <- brm(y ~ exposicion + tratamiento, data = skin_data, family = zero_inflated_poisson(),cores = 4, seed = 12345678)

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -D_REENTRANT  -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -include stan/math/prim/mat/fun/Eigen.hpp   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include  -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1

summary(mod1)

##  Family: zero_inflated_poisson 
##   Links: mu = log; zi = identity 
## Formula: y ~ exposicion + tratamiento 
##    Data: skin_data (Number of observations: 649) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept             -0.98      0.22    -1.39    -0.55 1.00     1306
## exposicion             0.12      0.01     0.09     0.15 1.00     1522
## tratamientoPlacebo    -0.66      0.18    -1.01    -0.32 1.00     2296
##                    Tail_ESS
## Intercept              1908
## exposicion             2075
## tratamientoPlacebo     2071
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## zi     0.55      0.07     0.41     0.68 1.00     1411     2313
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Modelo lineal ZIP

\[y_{ij} \sim ZIP(\mu_{ij},\sigma)\] \[\mu_{ij} = exp(-0.56 + 0.11Exposición_i-0.71Tratamiento_{placeboi})\] \[\sigma = 0.60\]

Modelo lineal ZIP con intercepto aleatorio

#Modelo 4.0.2 
mod2 <- brm(y ~ exposicion + tratamiento + (1|id), data = skin_data, family = zero_inflated_poisson(),cores = 4, seed = 12345678)

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -D_REENTRANT  -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -include stan/math/prim/mat/fun/Eigen.hpp   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include  -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1

summary(mod2)

##  Family: zero_inflated_poisson 
##   Links: mu = log; zi = identity 
## Formula: y ~ exposicion + tratamiento + (1 | id) 
##    Data: skin_data (Number of observations: 649) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~id (Number of levels: 150) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.14      0.18     0.81     1.53 1.00      885     1784
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept             -2.28      0.31    -2.91    -1.68 1.00     1629
## exposicion             0.19      0.03     0.13     0.25 1.00     1355
## tratamientoPlacebo    -0.59      0.31    -1.21     0.02 1.00     2406
##                    Tail_ESS
## Intercept              2016
## exposicion             2017
## tratamientoPlacebo     2933
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## zi     0.32      0.07     0.19     0.46 1.00     3166     2964
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

\[y_{ij} \sim ZIP(\mu_{ij},\sigma)\] \[\mu_{ij} = exp(-2.28 + 0.19Exposición_i-0.59 Tratamiento_{placeboi} + b_{0i})\] \[b_{0} \sim N(0,\sigma_{b0}^2=1.14^2 )\] \[\sigma = 0.32\]

Modelo lineal ZIP con intercepto y pendiente aleatoria

#Modelo 4.0.3 
mod3 <- brm(y ~ exposicion + tratamiento + (1+exposicion|id), data = skin_data, family = zero_inflated_poisson(),cores = 4, seed = 12345678)

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -D_REENTRANT  -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -include stan/math/prim/mat/fun/Eigen.hpp   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include  -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## [[1]]
## Stan model 'c810fb673b85bb84cc8e5c1c6c62cb5c' does not contain samples.

summary(mod3)

## Warning: There were 156 divergent transitions after warmup. Increasing
## adapt_delta above 0.8 may help. See http://mc-stan.org/misc/
## warnings.html#divergent-transitions-after-warmup

##  Family: zero_inflated_poisson 
##   Links: mu = log; zi = identity 
## Formula: y ~ exposicion + tratamiento + (1 + exposicion | id) 
##    Data: skin_data (Number of observations: 649) 
## Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 3000
## 
## Group-Level Effects: 
## ~id (Number of levels: 150) 
##                           Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                 0.98      0.28     0.45     1.59 1.01
## sd(exposicion)                0.11      0.06     0.01     0.24 1.01
## cor(Intercept,exposicion)    -0.01      0.55    -0.90     0.93 1.00
##                           Bulk_ESS Tail_ESS
## sd(Intercept)                  284      113
## sd(exposicion)                 143      340
## cor(Intercept,exposicion)      202      248
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept             -2.26      0.30    -2.86    -1.70 1.00      745
## exposicion             0.22      0.05     0.14     0.34 1.00      338
## tratamientoPlacebo    -0.62      0.30    -1.24    -0.04 1.00     1219
##                    Tail_ESS
## Intercept              1557
## exposicion              183
## tratamientoPlacebo     1336
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## zi     0.32      0.07     0.19     0.46 1.00     1240     1609
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

\[y_{ij} \sim ZIP(\mu_{ij},\sigma)\] \[\mu_{ij} = exp(-2.26 + 0.22Exposición_{ij}-0.62Tratamiento_{placeboi} + b_{0i}+b_{1i}Exposición_{ij})\]

\[\begin{pmatrix} b_0 \\ b_1 \end{pmatrix} \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} \sigma^2_{b0}=0.98^2 & \sigma_{b01}=0 \\ \sigma_{b01}=0 & \sigma^2_{b1}=0.11^2 \end{pmatrix} \right)\]

Modelo lineal ZINB

#Modelo 4.1.1 
mod4 <- brm(y ~ exposicion + tratamiento, data = skin_data, family = zero_inflated_negbinomial(),cores = 4, seed = 12345678)

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -D_REENTRANT  -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -include stan/math/prim/mat/fun/Eigen.hpp   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include  -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## [[1]]
## Stan model 'f0ccf07018dc47a02f63a4092e5db6fa' does not contain samples.
## 
## [[2]]
## Stan model 'f0ccf07018dc47a02f63a4092e5db6fa' does not contain samples.
## 
## [[3]]
## Stan model 'f0ccf07018dc47a02f63a4092e5db6fa' does not contain samples.

summary(mod4)

##  Family: zero_inflated_negbinomial 
##   Links: mu = log; shape = identity; zi = identity 
## Formula: y ~ exposicion + tratamiento 
##    Data: skin_data (Number of observations: 649) 
## Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 1000
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept             -1.87      0.24    -2.30    -1.38 1.00      315
## exposicion             0.17      0.02     0.13     0.22 1.00      662
## tratamientoPlacebo    -0.58      0.23    -1.05    -0.16 1.00      652
##                    Tail_ESS
## Intercept               462
## exposicion              547
## tratamientoPlacebo      510
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     0.63      0.27     0.32     1.40 1.00      370      274
## zi        0.16      0.11     0.01     0.41 1.00      288      322
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

\[y_{ij} \sim ZINB(\mu_{ij},\sigma,\nu)\] \[\mu_{ij} = exp(-1.87 + 0.17Exposición_{ij}-0.58 Tratamiento_{placeboi})\] \[\sigma = 0.63\] \[\nu = 0.16\]

Modelo lineal ZINB con intercepto aleatorio

#Modelo 4.1.2
mod5 <- brm(y ~ exposicion + tratamiento + (1|id), data = skin_data, family = zero_inflated_negbinomial(),cores = 4, seed = 12345678)

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -D_REENTRANT  -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -include stan/math/prim/mat/fun/Eigen.hpp   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include  -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## [[1]]
## Stan model '062fa29afb279a55cd028f79d403ed29' does not contain samples.

summary(mod5)

##  Family: zero_inflated_negbinomial 
##   Links: mu = log; shape = identity; zi = identity 
## Formula: y ~ exposicion + tratamiento + (1 | id) 
##    Data: skin_data (Number of observations: 649) 
## Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 3000
## 
## Group-Level Effects: 
## ~id (Number of levels: 150) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.12      0.18     0.78     1.48 1.00      877     1374
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept             -2.28      0.32    -2.94    -1.68 1.00     1098
## exposicion             0.19      0.03     0.13     0.25 1.00      944
## tratamientoPlacebo    -0.58      0.31    -1.17     0.03 1.01     1306
##                    Tail_ESS
## Intercept              1575
## exposicion             1280
## tratamientoPlacebo     1956
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape    42.08     52.47     1.65   194.30 1.00      954      598
## zi        0.31      0.08     0.13     0.46 1.00      877      519
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

\[y_{ij} \sim ZINB(\mu_{ij},\sigma,\nu)\] \[\mu_{ij} = exp(-2.28 + 0.19Exposición_i-0.58 Tratamiento_{placeboi} + b_{0i})\] \[b_{0} \sim N(0,\sigma_{b0}^2 = 1.12^2)\] \[\sigma = 42.08^2\] \[\nu = 0.31^2\]

Modelo lineal ZINB con intercepto y pendiente aleatoria

#Modelo 4.1.2 
mod6 <- brm(y ~ exposicion + tratamiento + (1+exposicion|id), data = skin_data, family = zero_inflated_negbinomial(),cores = 4, seed = 1234567)

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -D_REENTRANT  -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -include stan/math/prim/mat/fun/Eigen.hpp   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include  -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## [[1]]
## Stan model 'dde101458a80bc54782575cf7d6ad328' does not contain samples.
## 
## [[2]]
## Stan model 'dde101458a80bc54782575cf7d6ad328' does not contain samples.
## 
## [[3]]
## Stan model 'dde101458a80bc54782575cf7d6ad328' does not contain samples.

summary(mod6)

##  Family: zero_inflated_negbinomial 
##   Links: mu = log; shape = identity; zi = identity 
## Formula: y ~ exposicion + tratamiento + (1 + exposicion | id) 
##    Data: skin_data (Number of observations: 649) 
## Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 1000
## 
## Group-Level Effects: 
## ~id (Number of levels: 150) 
##                           Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                 0.95      0.27     0.40     1.50 1.00
## sd(exposicion)                0.11      0.06     0.01     0.25 1.02
## cor(Intercept,exposicion)    -0.01      0.55    -0.87     0.96 1.00
##                           Bulk_ESS Tail_ESS
## sd(Intercept)                  191      313
## sd(exposicion)                  58       84
## cor(Intercept,exposicion)       60      223
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept             -2.28      0.32    -2.98    -1.72 1.00      566
## exposicion             0.22      0.05     0.15     0.33 1.00      253
## tratamientoPlacebo    -0.60      0.31    -1.24     0.01 1.00      764
##                    Tail_ESS
## Intercept               911
## exposicion              438
## tratamientoPlacebo      875
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape    48.11     65.31     2.86   230.63 1.00      763      453
## zi        0.31      0.07     0.17     0.44 1.00      597      454
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

\[y_{ij} \sim ZINB(\mu_{ij},\sigma,\nu)\] \[\mu_{ij} = exp(-2.28+ 0.22Exposición_{ij}-0.60Tratamiento_{placeboi} + b_{0i}+b_{1i}Exposición_{ij})\] \[\begin{pmatrix} b_0 \\ b_1 \end{pmatrix} \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} \sigma^2_{b0}=0.95^2 & \sigma_{b01}=0 \\ \sigma_{b01}=0 & \sigma^2_{b1}=0.11^2 \end{pmatrix} \right)\]

6 Comparación de los modelos

Para la comparación de modelos se utilizarán los criterios que puedan aplicarse a objetos de tipo brmsfit en este caso dos criterios waic que es la versión generalizada del AIC y loo leave-one-out cross-validation.

modelos <- c(mod1,mod2,mod3,mod4,mod5,mod6)
#Añadir criterios a los modelos
waic_c <- c(WAIC(mod1)$estimates[3],
          WAIC(mod2)$estimates[3],
          WAIC(mod3)$estimates[3],
          WAIC(mod4)$estimates[3],
          WAIC(mod5)$estimates[3],
          WAIC(mod6)$estimates[3])
loo_c <- c(loo(mod1)$estimates[3],
          loo(mod2)$estimates[3],
          loo(mod3)$estimates[3],
          loo(mod4)$estimates[3],
          loo(mod5)$estimates[3],
          loo(mod6)$estimates[3])

## Warning: Found 6 observations with a pareto_k > 0.7 in model 'mod2'. It is
## recommended to set 'reloo = TRUE' in order to calculate the ELPD without
## the assumption that these observations are negligible. This will refit
## the model 6 times to compute the ELPDs for the problematic observations
## directly.

## Warning: Found 4 observations with a pareto_k > 0.7 in model 'mod3'. It is
## recommended to set 'reloo = TRUE' in order to calculate the ELPD without
## the assumption that these observations are negligible. This will refit
## the model 4 times to compute the ELPDs for the problematic observations
## directly.

## Warning: Found 1 observations with a pareto_k > 0.7 in model 'mod4'. It is
## recommended to set 'reloo = TRUE' in order to calculate the ELPD without
## the assumption that these observations are negligible. This will refit
## the model 1 times to compute the ELPDs for the problematic observations
## directly.

## Warning: Found 3 observations with a pareto_k > 0.7 in model 'mod5'. It is
## recommended to set 'reloo = TRUE' in order to calculate the ELPD without
## the assumption that these observations are negligible. This will refit
## the model 3 times to compute the ELPDs for the problematic observations
## directly.

## Warning: Found 8 observations with a pareto_k > 0.7 in model 'mod6'. It is
## recommended to set 'reloo = TRUE' in order to calculate the ELPD without
## the assumption that these observations are negligible. This will refit
## the model 8 times to compute the ELPDs for the problematic observations
## directly.

criterios <- data.frame(waic = waic_c,
                        loo = loo_c)
rownames(criterios) <- c("mod1","mod2","mod3","mod4","mod5","mod6")
criterios

##          waic      loo
## mod1 724.0727 724.2066
## mod2 636.3050 645.8142
## mod3 637.6824 643.5194
## mod4 692.1062 692.3049
## mod5 641.7868 649.8512
## mod6 642.4121 648.0960

Como puede observarse los modelos que tuvieron mejor rendimiento fueron los que consideraron efectos aleatorios en este caso la pendiente aleatoria parece no aportar mucho a los modelos, por lo tanto los modelos solo con intercepto aleatorio tuvieron mejor rendimiento, los modelos en los que se tuvo en cuenta la distribución ZIP fueron mejores que los de la ZINB, el mejor modelo entre los seis planteados es el modelo 2, modelo lineal ZIP con intercepto aleatorio.

7 Mejor modelo

El modelo que tuvo mejor desempeño fue le modelo ZIP con covariables exposición y tratamiento, además con intercepto aleatorio.

Predicción de nuevas células cancerosas en la piel con modelos jerárquicos

Valentina García Velásquez

3/21/2020