library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.2.3
#datos
data(penguins, package = "palmerpenguins")
head(penguins)
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## # ℹ 2 more variables: sex <fct>, year <int>
str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
## $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
## $ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
## $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
## $ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
## $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
## $ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
library(car)
## Warning: package 'car' was built under R version 4.2.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.2.3
library(heplots)
## Warning: package 'heplots' was built under R version 4.2.3
## Loading required package: broom
## Warning: package 'broom' was built under R version 4.2.3
library(broom)
library(candisc)
## Warning: package 'candisc' was built under R version 4.2.3
##
## Attaching package: 'candisc'
## The following object is masked from 'package:stats':
##
## cancor
peng.mod0 <- lm(cbind(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) ~
species, data=penguins)
Anova(peng.mod0)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## species 2 1.6366 379.49 8 674 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
El estadístico F asciende a 379.49 con un p valor de 0.000 < 0.05. Tenemos evidencia empírica suficiente para rechazar H0 al nivel de significación del 5%. Por tanto, para alguna de las variables cuantitativas no se presentan las mismas medias por cada especie.
summary.aov(peng.mod0)
## Response bill_length_mm :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 7194.3 3597.2 410.6 < 2.2e-16 ***
## Residuals 339 2969.9 8.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response bill_depth_mm :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 903.97 451.98 359.79 < 2.2e-16 ***
## Residuals 339 425.87 1.26
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response flipper_length_mm :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 52473 26236.6 594.8 < 2.2e-16 ***
## Residuals 339 14953 44.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response body_mass_g :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 146864214 73432107 343.63 < 2.2e-16 ***
## Residuals 339 72443483 213698
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## 2 observations deleted due to missingness
Para todas las variables cuantitativas tenemos que las medias de cada especie son diferentes.
heplot(peng.mod0, fill=TRUE, fill.alpha=0.1,
size="effect",
xlim=c(35,52), ylim=c(14,20))
heplot(peng.mod0, fill=TRUE, fill.alpha=0.1)
heplot(peng.mod0, variables=3:4,
size="effect",
fill=TRUE, fill.alpha=0.2)
pairs(peng.mod0, size="effect", fill=c(TRUE, TRUE))
In these pairwise views, we see some new things:
The within-group residuals are positively correlated for all pairs. This makes sense to me.
As we saw above, group means for flipper length and body mass are positively correlated, but there are negative correlations of bill depth with the two size variables, and between species bill depth has a correlation ≈1 with body size. Time to call in the biologists!
peng.mod1 <-lm(cbind(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) ~
species + island + sex, data=penguins)
Anova(peng.mod1)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## species 2 1.49305 239.297 8 650 <2e-16 ***
## island 2 0.02872 1.184 8 650 0.306
## sex 1 0.64169 145.060 4 324 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Species: El estadístico F asciende a 239.297 con un p valor nulo. Esto implica que la media de las variables son diferentes en el factor de las especies.
Island: El estadístico F asciende a 1.184 con un p valor de 0.30, siendo mayor al nivel de significación del 5%. Las medias de las variables son similares en cada una de las islas.
Sexo: El estadístico F asciende a 145.060 con un p valor nulo. Esto implica que la media de las variables son diferentes en el factor de sexo.
summary.aov(peng.mod1)
## Response bill_length_mm :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 7015.4 3507.7 648.9339 <2e-16 ***
## island 2 8.2 4.1 0.7626 0.4673
## sex 1 1137.7 1137.7 210.4843 <2e-16 ***
## Residuals 327 1767.5 5.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response bill_depth_mm :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 870.79 435.39 628.0988 <2e-16 ***
## island 2 1.16 0.58 0.8388 0.4332
## sex 1 188.84 188.84 272.4173 <2e-16 ***
## Residuals 327 226.67 0.69
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response flipper_length_mm :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 50526 25262.9 779.132 < 2e-16 ***
## island 2 173 86.4 2.666 0.07104 .
## sex 1 3917 3917.1 120.806 < 2e-16 ***
## Residuals 327 10603 32.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response body_mass_g :
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 145190219 72595110 720.0508 <2e-16 ***
## island 2 2064 1032 0.0102 0.9898
## sex 1 37099430 37099430 367.9790 <2e-16 ***
## Residuals 327 32967953 100819
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## 11 observations deleted due to missingness
En todas las variables las medias son significativamente diferentes en el sexo y en la especie.
heplot(peng.mod1, size="effect",
fill=TRUE, fill.alpha=0.2)
(peng.can <- candisc(peng.mod0))
##
## Canonical Discriminant Analysis for species:
##
## CanRsq Eigenvalue Difference Percent Cumulative
## 1 0.93757 15.0192 12.696 86.605 86.605
## 2 0.69907 2.3231 12.696 13.395 100.000
##
## Test of H0: The canonical correlations in the
## current row and all that follow are zero
##
## LR test stat approx F numDF denDF Pr(> F)
## 1 0.018785 528.87 8 672 < 2.2e-16 ***
## 2 0.300927 260.96 3 337 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Las funciones discriminantes son significativas y permiten discriminar las observaciones de acuerdo a las variables cuantitativas.
col <- scales::hue_pal()(3)
plot(peng.can, col=col, pch=15:17,
ellipse=TRUE, scale=9,
var.lwd = 2, var.col = "black")
From this we can see:
The first dimension (Can1) separates Gentoo from the other two species. Can2 largely separates Adelie from Chinstrap.
Canl is positively correlated with flipper length and body mass, our penguin “size” variables. But culmen depth also separates Gentoo from the others, in the opposite direction.
Can2 here largely reflects bill length, with Chintrap having longer beaks than the other two.
sesion_info <- devtools::session_info()
dplyr::select(
tibble::as_tibble(sesion_info$packages),
c(package, loadedversion, source)
)
## # A tibble: 74 × 3
## package loadedversion source
## <chr> <chr> <chr>
## 1 abind 1.4-5 CRAN (R 4.2.0)
## 2 backports 1.4.1 CRAN (R 4.2.0)
## 3 base64enc 0.1-3 CRAN (R 4.2.0)
## 4 broom 1.0.5 CRAN (R 4.2.3)
## 5 bslib 0.4.2 CRAN (R 4.2.2)
## 6 cachem 1.0.6 CRAN (R 4.2.2)
## 7 callr 3.7.3 CRAN (R 4.2.3)
## 8 candisc 0.8-6 CRAN (R 4.2.3)
## 9 car 3.1-2 CRAN (R 4.2.3)
## 10 carData 3.0-5 CRAN (R 4.2.3)
## # ℹ 64 more rows