Today, we’ll expand upon some of the R techniques we began to cover last week, in the context of testing a research question. Then, we’ll talk about why it’s important to plan data analysis ahead of time. At the end of class, we’ll have a brainstorming session for each team to get feedback from the rest of the class. If time allows, you’ll also work in teams to begin formulating a plan for analyzing your data.
Rows: 333
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…
$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
$ body_mass_g <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
$ sex <fct> male, female, female, female, male, female, male, fe…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
# A tibble: 6 × 3
bill_length_mm flipper_length_mm bill_to_flipper_ratio
<dbl> <int> <dbl>
1 58 181 0.320
2 51.5 187 0.275
3 54.2 201 0.270
4 55.8 207 0.270
5 52.7 197 0.268
6 51.7 194 0.266
You can generally obtain a rough assessment of your hypothesis through summary statistics.
# A tibble: 6 × 4
# Groups: species, sex [6]
species sex mean_ratio sd_ratio
<fct> <fct> <dbl> <dbl>
1 Adelie female 19.9 1.27
2 Adelie male 21.0 1.17
3 Chinstrap female 24.3 1.75
4 Chinstrap male 25.6 0.981
5 Gentoo female 21.4 0.966
6 Gentoo male 22.3 1.04
Compared to the work that goes into data wrangling, the culminating analysis is (typically) relatively trivial to run.
Call:
lm(formula = bill_to_flipper_ratio ~ sex + species, data = penguins_noNA)
Residuals:
Min 1Q Median 3Q Max
-0.034988 -0.007340 0.000028 0.006569 0.076452
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.198885 0.001182 168.210 < 2e-16 ***
sexmale 0.010850 0.001306 8.310 2.54e-15 ***
speciesChinstrap 0.045105 0.001749 25.792 < 2e-16 ***
speciesGentoo 0.014440 0.001471 9.815 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01191 on 329 degrees of freedom
Multiple R-squared: 0.6906, Adjusted R-squared: 0.6878
F-statistic: 244.8 on 3 and 329 DF, p-value: < 2.2e-16