This is an illustration of R Notebooks using the example from Computer Lab 1.

1. Load required packages

  • Packages must already be installed before they can be loaded here.
  • Here, we load two packages:
    • ggplot2: advanced graphing
    • RcmdrMisc: contains some helper functions for basic stats.
    • dplyr: data management package (part of tidyverse)
library(ggplot2) 
library(RcmdrMisc)
library(dplyr)

2. Data import

Heliconia <- read.csv("Heliconia.csv")
Heliconia
##    variety length
## 1    bihai  47.12
## 2    bihai  46.75
## 3    bihai     NA
## 4    bihai  47.12
## 5    bihai  46.67
## 6    bihai  47.43
## 7     <NA>  46.44
## 8    bihai  46.64
## 9    bihai  48.07
## 10   bihai  48.34
## 11   bihai  48.15
## 12   bihai  50.26
## 13   bihai  50.12
## 14   bihai  46.34
## 15   bihai  46.94
## 16   bihai  48.36
## 17     red  41.90
## 18     red  42.01
## 19     red  41.93
## 20     red  43.09
## 21     red  41.47
## 22     red  41.69
## 23     red  39.78
## 24     red  40.57
## 25     red  39.63
## 26     red  42.18
## 27     red  40.66
## 28     red  37.87
## 29     red  39.16
## 30     red  37.40
## 31     red  38.20
## 32     red  38.07
## 33     red  38.10
## 34     red  37.97
## 35     red  38.79
## 36     red  38.23
## 37     red  38.87
## 38     red  37.78
## 39     red  38.01
## 40  yellow  36.78
## 41  yellow  37.02
## 42  yellow  36.52
## 43  yellow  36.11
## 44  yellow  36.03
## 45  yellow  35.45
## 46  yellow  38.13
## 47  yellow  37.10
## 48  yellow  35.17
## 49  yellow  36.82
## 50  yellow  36.66
## 51  yellow  35.68
## 52  yellow  36.03
## 53  yellow  34.57
## 54  yellow  34.63

3. Summary statistics

a) Base R

This is the default summary function:

summary(Heliconia)
##    variety              length     
##  Length:54          Min.   :34.57  
##  Class :character   1st Qu.:37.10  
##  Mode  :character   Median :39.16  
##                     Mean   :40.96  
##                     3rd Qu.:46.44  
##                     Max.   :50.26  
##                     NA's   :1

b) With helper function numSummary from RcmdrMisc

Here’s a more useful summary with mean, sd and sample size (valid and NA) for each group.

numSummary(Heliconia[,"length", drop=FALSE], groups=Heliconia$variety, 
  statistics=c("mean", "sd"))
##            mean        sd length:n length:NA
## bihai  47.73643 1.2352490       14         1
## red    39.71130 1.7987630       23         0
## yellow 36.18000 0.9753241       15         0

c) With package dplyr

Heliconia %>% group_by(variety) %>% 
  summarize(mean=mean(length, na.rm=TRUE), 
            sd=sd(length, na.rm=TRUE),
            n.total=n(), n.valid=sum(!is.na(length)), n.missing=sum(is.na(length)))
## # A tibble: 4 × 6
##   variety  mean     sd n.total n.valid n.missing
##   <chr>   <dbl>  <dbl>   <int>   <int>     <int>
## 1 bihai    47.7  1.24       15      14         1
## 2 red      39.7  1.80       23      23         0
## 3 yellow   36.2  0.975      15      15         0
## 4 <NA>     46.4 NA           1       1         0

4. Figures

a) Boxplot by group (Base R)

boxplot(length~variety, data=Heliconia)

b) Boxplot by group (ggplot2)

ggplot(data=Heliconia, aes(x=variety, y=length)) +
  geom_boxplot()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).

Exclude the observation with unknown variety from plot:

ggplot(data=Heliconia %>% filter(!is.na(variety)), aes(x=variety, y=length)) +
  geom_boxplot()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).