Ejercico de análisis - Modelo de varianza (desbalanceado)

Estableciendo los datos

set.seed(123)

porc_germ = c(
  rnorm(40, 60,6),
  rnorm(40,70,7),
  rnorm(40,80,8)
)

acido = gl(3,40,120, c('c0','c1','c2'))

datos = data.frame(acido, porc_germ)
head(datos)
##   acido porc_germ
## 1    c0  56.63715
## 2    c0  58.61894
## 3    c0  69.35225
## 4    c0  60.42305
## 5    c0  60.77573
## 6    c0  70.29039
table(datos$acido)
## 
## c0 c1 c2 
## 40 40 40
datos_des = datos[-c(50,111,120), ]

table(datos_des$acido)
## 
## c0 c1 c2 
## 40 39 38

Hipótesis nula

\[H_0:\mu_{c0}=\mu_{c1}=\mu_{c2}=\]

Analisis de varianza (balanceado)

mod1 = aov(porc_germ ~ acido, datos)
summary(mod1)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## acido         2   7835    3918   98.15 <2e-16 ***
## Residuals   117   4670      40                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Graficando el modelo

boxplot(datos$porc_germ ~ datos$acido)

Analisis de varianza (desbalanceado)

mod2 = aov(porc_germ ~ acido, datos_des)
summary(mod2)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## acido         2   7898    3949   98.39 <2e-16 ***
## Residuals   114   4576      40                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estableciendo el modelo con la función “lm”

mod3 = lm(porc_germ ~ acido, datos_des)
library(car)
## Loading required package: carData
mod3_res = Anova(mod3, type = 'II')
mod3_res
## Anova Table (Type II tests)
## 
## Response: porc_germ
##           Sum Sq  Df F value    Pr(>F)    
## acido     7898.3   2  98.392 < 2.2e-16 ***
## Residuals 4575.6 114                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Generando nuevos datos para un diseño con 2 factores

Diseño de Bloques Desbalanceado

  • Se emplea la función aov para detallar en el análisis de varianza la diferencia entre las funciones para el diseño desbalanceado.

  • La funcion aov no funciona para diseños desbalanceados con mas de un factor

set.seed(123)

porc_germ = c(
  rnorm(40, 60,6),
  rnorm(40,70,7),
  rnorm(40,80,8)
)

bloq = gl(3,40,120, c('B0','B1','B2'))
acido = gl(4, 10, 120, c('C0','C1','C2','C3'))

datos = data.frame(acido, bloq, porc_germ)
datos_des = datos[-c(50,111,120), ]
datos_des
##     acido bloq porc_germ
## 1      C0   B0  56.63715
## 2      C0   B0  58.61894
## 3      C0   B0  69.35225
## 4      C0   B0  60.42305
## 5      C0   B0  60.77573
## 6      C0   B0  70.29039
## 7      C0   B0  62.76550
## 8      C0   B0  52.40963
## 9      C0   B0  55.87888
## 10     C0   B0  57.32603
## 11     C1   B0  67.34449
## 12     C1   B0  62.15888
## 13     C1   B0  62.40463
## 14     C1   B0  60.66410
## 15     C1   B0  56.66495
## 16     C1   B0  70.72148
## 17     C1   B0  62.98710
## 18     C1   B0  48.20030
## 19     C1   B0  64.20814
## 20     C1   B0  57.16325
## 21     C2   B0  53.59306
## 22     C2   B0  58.69215
## 23     C2   B0  53.84397
## 24     C2   B0  55.62665
## 25     C2   B0  56.24976
## 26     C2   B0  49.87984
## 27     C2   B0  65.02672
## 28     C2   B0  60.92024
## 29     C2   B0  53.17118
## 30     C2   B0  67.52289
## 31     C3   B0  62.55879
## 32     C3   B0  58.22957
## 33     C3   B0  65.37075
## 34     C3   B0  65.26880
## 35     C3   B0  64.92949
## 36     C3   B0  64.13184
## 37     C3   B0  63.32351
## 38     C3   B0  59.62853
## 39     C3   B0  58.16422
## 40     C3   B0  57.71717
## 41     C0   B1  65.13705
## 42     C0   B1  68.54458
## 43     C0   B1  61.14223
## 44     C0   B1  85.18269
## 45     C0   B1  78.45573
## 46     C0   B1  62.13824
## 47     C0   B1  67.17981
## 48     C0   B1  66.73341
## 49     C0   B1  75.45976
## 51     C1   B1  71.77323
## 52     C1   B1  69.80017
## 53     C1   B1  69.69991
## 54     C1   B1  79.58022
## 55     C1   B1  68.41960
## 56     C1   B1  80.61529
## 57     C1   B1  59.15873
## 58     C1   B1  74.09230
## 59     C1   B1  70.86698
## 60     C1   B1  71.51159
## 61     C2   B1  72.65748
## 62     C2   B1  66.48374
## 63     C2   B1  67.66755
## 64     C2   B1  62.86997
## 65     C2   B1  62.49746
## 66     C2   B1  72.12470
## 67     C2   B1  73.13747
## 68     C2   B1  70.37103
## 69     C2   B1  76.45587
## 70     C2   B1  84.35059
## 71     C3   B1  66.56278
## 72     C3   B1  53.83582
## 73     C3   B1  77.04017
## 74     C3   B1  65.03559
## 75     C3   B1  65.18394
## 76     C3   B1  77.17900
## 77     C3   B1  68.00659
## 78     C3   B1  61.45498
## 79     C3   B1  71.26912
## 80     C3   B1  69.02776
## 81     C0   B2  80.04611
## 82     C0   B2  83.08224
## 83     C0   B2  77.03472
## 84     C0   B2  85.15501
## 85     C0   B2  78.23611
## 86     C0   B2  82.65426
## 87     C0   B2  88.77471
## 88     C0   B2  83.48145
## 89     C0   B2  77.39255
## 90     C0   B2  89.19046
## 91     C1   B2  87.94803
## 92     C1   B2  84.38718
## 93     C1   B2  81.90985
## 94     C1   B2  74.97675
## 95     C1   B2  90.88522
## 96     C1   B2  75.19792
## 97     C1   B2  97.49866
## 98     C1   B2  92.26089
## 99     C1   B2  78.11440
## 100    C1   B2  71.78863
## 101    C2   B2  74.31675
## 102    C2   B2  82.05507
## 103    C2   B2  78.02646
## 104    C2   B2  77.21966
## 105    C2   B2  72.38705
## 106    C2   B2  79.63978
## 107    C2   B2  73.72076
## 108    C2   B2  66.65646
## 109    C2   B2  76.95819
## 110    C2   B2  87.35197
## 112    C3   B2  84.86371
## 113    C3   B2  67.05694
## 114    C3   B2  79.55550
## 115    C3   B2  84.15526
## 116    C3   B2  82.40923
## 117    C3   B2  80.84541
## 118    C3   B2  74.87435
## 119    C3   B2  73.20237
table(datos_des$bloq, datos_des$acido)
##     
##      C0 C1 C2 C3
##   B0 10 10 10 10
##   B1  9 10 10 10
##   B2 10 10 10  8
mod1_A = aov(porc_germ ~ bloq * acido, datos_des)
summary(mod1_A)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## bloq          2   7898    3949 102.076 <2e-16 ***
## acido         3    248      83   2.133  0.100    
## bloq:acido    6    266      44   1.145  0.342    
## Residuals   105   4062      39                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod2_A = lm(porc_germ ~ bloq * acido, datos_des)
mod2_A_res = Anova(mod2_A, type = 'II')
mod2_A_res
## Anova Table (Type II tests)
## 
## Response: porc_germ
##            Sum Sq  Df  F value Pr(>F)    
## bloq       7850.6   2 101.4603 <2e-16 ***
## acido       247.6   3   2.1333 0.1004    
## bloq:acido  265.7   6   1.1447 0.3418    
## Residuals  4062.3 105                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Tipo 2 del modelo teniendo en cuenta el artículo: Langsrud, Ø. ANOVA for unbalanced data: Use Type II instead of Type III sums of squares. Statistics and Computing 13, 163–167 (2003). https://doi.org/10.1023/A:1023260610025

Forma correcta de ejecutar el analisis de varianza con más de un factor y datos desbalanceados

mod3 = lm(porc_germ ~ bloq + acido + bloq:acido, datos_des)
Anova(mod3, type='II')
## Anova Table (Type II tests)
## 
## Response: porc_germ
##            Sum Sq  Df  F value Pr(>F)    
## bloq       7850.6   2 101.4603 <2e-16 ***
## acido       247.6   3   2.1333 0.1004    
## bloq:acido  265.7   6   1.1447 0.3418    
## Residuals  4062.3 105                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod3 = lm(porc_germ ~ acido + bloq + bloq:acido, datos_des)
Anova(mod3, type='II')
## Anova Table (Type II tests)
## 
## Response: porc_germ
##            Sum Sq  Df  F value Pr(>F)    
## acido       247.6   3   2.1333 0.1004    
## bloq       7850.6   2 101.4603 <2e-16 ***
## acido:bloq  265.7   6   1.1447 0.3418    
## Residuals  4062.3 105                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod3 = lm(porc_germ ~ bloq:acido + acido + bloq, datos_des)
Anova(mod3, type='II')
## Anova Table (Type II tests)
## 
## Response: porc_germ
##            Sum Sq  Df  F value Pr(>F)    
## acido       247.6   3   2.1333 0.1004    
## bloq       7850.6   2 101.4603 <2e-16 ***
## bloq:acido  265.7   6   1.1447 0.3418    
## Residuals  4062.3 105                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod3 = lm(porc_germ ~ bloq:acido + bloq + acido, datos_des)
Anova(mod3, type='II')
## Anova Table (Type II tests)
## 
## Response: porc_germ
##            Sum Sq  Df  F value Pr(>F)    
## bloq       7850.6   2 101.4603 <2e-16 ***
## acido       247.6   3   2.1333 0.1004    
## bloq:acido  265.7   6   1.1447 0.3418    
## Residuals  4062.3 105                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mismo diseño (más una covariable)

Estableciendo los datos

set.seed(123)

porc_germ = c(
  rnorm(40, 60,6),
  rnorm(40,70,7),
  rnorm(40,80,8)
)

diam_med = sort(rnorm(120,12,1.3))
bloq = gl(3,40,120, c('B0','B1','B2'))
acido = gl(4, 10, 120, c('C0','C1','C2','C3'))

datos = data.frame(acido, bloq, porc_germ, diam_med)
datos_des = datos[-c(50,111,120), ]
datos_des
##     acido bloq porc_germ  diam_med
## 1      C0   B0  56.63715  9.330779
## 2      C0   B0  58.61894  9.918003
## 3      C0   B0  69.35225  9.956213
## 4      C0   B0  60.42305 10.030932
## 5      C0   B0  60.77573 10.099718
## 6      C0   B0  70.29039 10.101168
## 7      C0   B0  62.76550 10.122939
## 8      C0   B0  52.40963 10.287879
## 9      C0   B0  55.87888 10.295958
## 10     C0   B0  57.32603 10.326860
## 11     C1   B0  67.34449 10.329870
## 12     C1   B0  62.15888 10.361798
## 13     C1   B0  62.40463 10.373347
## 14     C1   B0  60.66410 10.392845
## 15     C1   B0  56.66495 10.458876
## 16     C1   B0  70.72148 10.575205
## 17     C1   B0  62.98710 10.617676
## 18     C1   B0  48.20030 10.636070
## 19     C1   B0  64.20814 10.679730
## 20     C1   B0  57.16325 10.689110
## 21     C2   B0  53.59306 10.709741
## 22     C2   B0  58.69215 10.749586
## 23     C2   B0  53.84397 10.768283
## 24     C2   B0  55.62665 10.836028
## 25     C2   B0  56.24976 10.874833
## 26     C2   B0  49.87984 10.974816
## 27     C2   B0  65.02672 10.984003
## 28     C2   B0  60.92024 11.036263
## 29     C2   B0  53.17118 11.039914
## 30     C2   B0  67.52289 11.060014
## 31     C3   B0  62.55879 11.067616
## 32     C3   B0  58.22957 11.084025
## 33     C3   B0  65.37075 11.152465
## 34     C3   B0  65.26880 11.205484
## 35     C3   B0  64.92949 11.226998
## 36     C3   B0  64.13184 11.253295
## 37     C3   B0  63.32351 11.253834
## 38     C3   B0  59.62853 11.309822
## 39     C3   B0  58.16422 11.329117
## 40     C3   B0  57.71717 11.350920
## 41     C0   B1  65.13705 11.362275
## 42     C0   B1  68.54458 11.371085
## 43     C0   B1  61.14223 11.380879
## 44     C0   B1  85.18269 11.404125
## 45     C0   B1  78.45573 11.426488
## 46     C0   B1  62.13824 11.450754
## 47     C0   B1  67.17981 11.458085
## 48     C0   B1  66.73341 11.461358
## 49     C0   B1  75.45976 11.513045
## 51     C1   B1  71.77323 11.527246
## 52     C1   B1  69.80017 11.545454
## 53     C1   B1  69.69991 11.577908
## 54     C1   B1  79.58022 11.635486
## 55     C1   B1  68.41960 11.655311
## 56     C1   B1  80.61529 11.659143
## 57     C1   B1  59.15873 11.667080
## 58     C1   B1  74.09230 11.692837
## 59     C1   B1  70.86698 11.720005
## 60     C1   B1  71.51159 11.743671
## 61     C2   B1  72.65748 11.762197
## 62     C2   B1  66.48374 11.844712
## 63     C2   B1  67.66755 11.882585
## 64     C2   B1  62.86997 11.907299
## 65     C2   B1  62.49746 11.929763
## 66     C2   B1  72.12470 11.955713
## 67     C2   B1  73.13747 12.049125
## 68     C2   B1  70.37103 12.053603
## 69     C2   B1  76.45587 12.077675
## 70     C2   B1  84.35059 12.084881
## 71     C3   B1  66.56278 12.101349
## 72     C3   B1  53.83582 12.110158
## 73     C3   B1  77.04017 12.122959
## 74     C3   B1  65.03559 12.152941
## 75     C3   B1  65.18394 12.155019
## 76     C3   B1  77.17900 12.278779
## 77     C3   B1  68.00659 12.278900
## 78     C3   B1  61.45498 12.306003
## 79     C3   B1  71.26912 12.316794
## 80     C3   B1  69.02776 12.387696
## 81     C0   B2  80.04611 12.403625
## 82     C0   B2  83.08224 12.421596
## 83     C0   B2  77.03472 12.431863
## 84     C0   B2  85.15501 12.479654
## 85     C0   B2  78.23611 12.544677
## 86     C0   B2  82.65426 12.567481
## 87     C0   B2  88.77471 12.586955
## 88     C0   B2  83.48145 12.671921
## 89     C0   B2  77.39255 12.706152
## 90     C0   B2  89.19046 12.731886
## 91     C1   B2  87.94803 12.780921
## 92     C1   B2  84.38718 12.803382
## 93     C1   B2  81.90985 12.827541
## 94     C1   B2  74.97675 12.894292
## 95     C1   B2  90.88522 12.912320
## 96     C1   B2  75.19792 12.919865
## 97     C1   B2  97.49866 12.961932
## 98     C1   B2  92.26089 12.980270
## 99     C1   B2  78.11440 12.999755
## 100    C1   B2  71.78863 13.024061
## 101    C2   B2  74.31675 13.150046
## 102    C2   B2  82.05507 13.270065
## 103    C2   B2  78.02646 13.368525
## 104    C2   B2  77.21966 13.442803
## 105    C2   B2  72.38705 13.442896
## 106    C2   B2  79.63978 13.470738
## 107    C2   B2  73.72076 13.602219
## 108    C2   B2  66.65646 13.642141
## 109    C2   B2  76.95819 13.706137
## 110    C2   B2  87.35197 13.877916
## 112    C3   B2  84.86371 14.178406
## 113    C3   B2  67.05694 14.397021
## 114    C3   B2  79.55550 14.481835
## 115    C3   B2  84.15526 14.541882
## 116    C3   B2  82.40923 14.596377
## 117    C3   B2  80.84541 14.730142
## 118    C3   B2  74.87435 14.766987
## 119    C3   B2  73.20237 14.858453
table(datos_des$bloq, datos_des$acido)
##     
##      C0 C1 C2 C3
##   B0 10 10 10 10
##   B1  9 10 10 10
##   B2 10 10 10  8

Estableciendo el modelo

mod0 = lm(porc_germ ~ diam_med + bloq + acido + bloq:acido, datos_des)
Anova(mod0, type = 'II')
## Anova Table (Type II tests)
## 
## Response: porc_germ
##            Sum Sq  Df F value   Pr(>F)   
## diam_med      8.3   1  0.2135 0.644977   
## bloq        520.7   2  6.6793 0.001866 **
## acido       139.8   3  1.1956 0.315216   
## bloq:acido  265.9   6  1.1371 0.346172   
## Residuals  4053.9 104                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Corriendo el modelo con datos faltantes “NA”

set.seed(123)

porc_germ = c(
  rnorm(40, 60,6),
  rnorm(40,70,7),
  rnorm(40,80,8)
)

diam_med = sort(rnorm(120,12,1.3))
bloq = gl(3,40,120, c('B0','B1','B2'))
acido = gl(4, 10, 120, c('C0','C1','C2','C3'))

datos = data.frame(acido, bloq, porc_germ, diam_med)
datos_des = datos
datos_des[sample(120, 5), 'porc_germ'] = NA
table(datos_des$bloq, datos_des$acido)
##     
##      C0 C1 C2 C3
##   B0 10 10 10 10
##   B1 10 10 10 10
##   B2 10 10 10 10
tapply(datos_des$porc_germ,
       datos_des$acido,
       mean, na.rm=TRUE)
##       C0       C1       C2       C3 
## 70.96384 72.04659 68.44023 69.07068

Observaciones finales

  1. Los valores faltantes del modelo (NA) pueden suponer faltantes de manera aleatorio o no aleatoria. Esto en consecuencia puede afectar el análisis de datos al disminuir el tamaño y representación de las muestras y por tanto la potencia de las pruebas y la revisión de hipotesis.

  2. Para asegurar que el anova pueda correr el análisis de los datos, se debe asegurar que existan datos verdaderos (aunque sean pocos). Siendo así, es recomendable realizar imputación de datos de manera que permita el análisis de varianza.