1.- INTRODUCCIÓN

library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.2.3
#datos
data(penguins, package = "palmerpenguins")
head(penguins)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>
str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
library(car)
## Warning: package 'car' was built under R version 4.2.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.2.3
library(heplots)
## Warning: package 'heplots' was built under R version 4.2.3
## Loading required package: broom
## Warning: package 'broom' was built under R version 4.2.3
library(broom)
library(candisc)
## Warning: package 'candisc' was built under R version 4.2.3
## 
## Attaching package: 'candisc'
## The following object is masked from 'package:stats':
## 
##     cancor

MANOVA

peng.mod0 <- lm(cbind(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) ~ 
                  species, data=penguins)
Anova(peng.mod0)
## 
## Type II MANOVA Tests: Pillai test statistic
##         Df test stat approx F num Df den Df    Pr(>F)    
## species  2    1.6366   379.49      8    674 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

El estadístico F asciende a 379.49 con un p valor de 0.000 < 0.05. Tenemos evidencia empírica suficiente para rechazar H0 al nivel de significación del 5%. Por tanto, para alguna de las variables cuantitativas no se presentan las mismas medias por cada especie.

summary.aov(peng.mod0)
##  Response bill_length_mm :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## species       2 7194.3  3597.2   410.6 < 2.2e-16 ***
## Residuals   339 2969.9     8.8                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response bill_depth_mm :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## species       2 903.97  451.98  359.79 < 2.2e-16 ***
## Residuals   339 425.87    1.26                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response flipper_length_mm :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## species       2  52473 26236.6   594.8 < 2.2e-16 ***
## Residuals   339  14953    44.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response body_mass_g :
##              Df    Sum Sq  Mean Sq F value    Pr(>F)    
## species       2 146864214 73432107  343.63 < 2.2e-16 ***
## Residuals   339  72443483   213698                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 2 observations deleted due to missingness

Para todas las variables cuantitativas tenemos que las medias de cada especie son diferentes.

heplot(peng.mod0, fill=TRUE, fill.alpha=0.1, 
       size="effect",
       xlim=c(35,52), ylim=c(14,20))

heplot(peng.mod0, fill=TRUE, fill.alpha=0.1)

heplot(peng.mod0, variables=3:4, 
       size="effect",
       fill=TRUE, fill.alpha=0.2)

pairs(peng.mod0, size="effect", fill=c(TRUE, TRUE))

In these pairwise views, we see some new things:

The within-group residuals are positively correlated for all pairs. This makes sense to me.

As we saw above, group means for flipper length and body mass are positively correlated, but there are negative correlations of bill depth with the two size variables, and between species bill depth has a correlation ≈1 with body size. Time to call in the biologists!

peng.mod1 <-lm(cbind(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) ~
                 species + island + sex, data=penguins)
Anova(peng.mod1)
## 
## Type II MANOVA Tests: Pillai test statistic
##         Df test stat approx F num Df den Df Pr(>F)    
## species  2   1.49305  239.297      8    650 <2e-16 ***
## island   2   0.02872    1.184      8    650  0.306    
## sex      1   0.64169  145.060      4    324 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Species: El estadístico F asciende a 239.297 con un p valor nulo. Esto implica que la media de las variables son diferentes en el factor de las especies.

Island: El estadístico F asciende a 1.184 con un p valor de 0.30, siendo mayor al nivel de significación del 5%. Las medias de las variables son similares en cada una de las islas.

Sexo: El estadístico F asciende a 145.060 con un p valor nulo. Esto implica que la media de las variables son diferentes en el factor de sexo.

summary.aov(peng.mod1)
##  Response bill_length_mm :
##              Df Sum Sq Mean Sq  F value Pr(>F)    
## species       2 7015.4  3507.7 648.9339 <2e-16 ***
## island        2    8.2     4.1   0.7626 0.4673    
## sex           1 1137.7  1137.7 210.4843 <2e-16 ***
## Residuals   327 1767.5     5.4                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response bill_depth_mm :
##              Df Sum Sq Mean Sq  F value Pr(>F)    
## species       2 870.79  435.39 628.0988 <2e-16 ***
## island        2   1.16    0.58   0.8388 0.4332    
## sex           1 188.84  188.84 272.4173 <2e-16 ***
## Residuals   327 226.67    0.69                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response flipper_length_mm :
##              Df Sum Sq Mean Sq F value  Pr(>F)    
## species       2  50526 25262.9 779.132 < 2e-16 ***
## island        2    173    86.4   2.666 0.07104 .  
## sex           1   3917  3917.1 120.806 < 2e-16 ***
## Residuals   327  10603    32.4                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response body_mass_g :
##              Df    Sum Sq  Mean Sq  F value Pr(>F)    
## species       2 145190219 72595110 720.0508 <2e-16 ***
## island        2      2064     1032   0.0102 0.9898    
## sex           1  37099430 37099430 367.9790 <2e-16 ***
## Residuals   327  32967953   100819                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 11 observations deleted due to missingness

En todas las variables las medias son significativamente diferentes en el sexo y en la especie.

heplot(peng.mod1, size="effect", 
       fill=TRUE, fill.alpha=0.2)

(peng.can <- candisc(peng.mod0))
## 
## Canonical Discriminant Analysis for species:
## 
##    CanRsq Eigenvalue Difference Percent Cumulative
## 1 0.93757    15.0192     12.696  86.605     86.605
## 2 0.69907     2.3231     12.696  13.395    100.000
## 
## Test of H0: The canonical correlations in the 
## current row and all that follow are zero
## 
##   LR test stat approx F numDF denDF   Pr(> F)    
## 1     0.018785   528.87     8   672 < 2.2e-16 ***
## 2     0.300927   260.96     3   337 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Las funciones discriminantes son significativas y permiten discriminar las observaciones de acuerdo a las variables cuantitativas.

col <- scales::hue_pal()(3)
plot(peng.can, col=col, pch=15:17, 
     ellipse=TRUE, scale=9, 
     var.lwd = 2, var.col = "black")

From this we can see:

The first dimension (Can1) separates Gentoo from the other two species. Can2 largely separates Adelie from Chinstrap.

Canl is positively correlated with flipper length and body mass, our penguin “size” variables. But culmen depth also separates Gentoo from the others, in the opposite direction.

Can2 here largely reflects bill length, with Chintrap having longer beaks than the other two.

INFO DE SESIÓN

sesion_info <- devtools::session_info()
dplyr::select(
  tibble::as_tibble(sesion_info$packages),
  c(package, loadedversion, source)
)
## # A tibble: 74 × 3
##    package   loadedversion source        
##    <chr>     <chr>         <chr>         
##  1 abind     1.4-5         CRAN (R 4.2.0)
##  2 backports 1.4.1         CRAN (R 4.2.0)
##  3 base64enc 0.1-3         CRAN (R 4.2.0)
##  4 broom     1.0.5         CRAN (R 4.2.3)
##  5 bslib     0.4.2         CRAN (R 4.2.2)
##  6 cachem    1.0.6         CRAN (R 4.2.2)
##  7 callr     3.7.3         CRAN (R 4.2.3)
##  8 candisc   0.8-6         CRAN (R 4.2.3)
##  9 car       3.1-2         CRAN (R 4.2.3)
## 10 carData   3.0-5         CRAN (R 4.2.3)
## # ℹ 64 more rows