This is an illustration of R Notebooks using the example from Computer Lab 1.
ggplot2: advanced graphingRcmdrMisc: contains some helper functions for basic
stats.dplyr: data management package (part of tidyverse)library(ggplot2)
library(RcmdrMisc)
library(dplyr)
Heliconia <- read.csv("Heliconia.csv")
Heliconia
## variety length
## 1 bihai 47.12
## 2 bihai 46.75
## 3 bihai NA
## 4 bihai 47.12
## 5 bihai 46.67
## 6 bihai 47.43
## 7 <NA> 46.44
## 8 bihai 46.64
## 9 bihai 48.07
## 10 bihai 48.34
## 11 bihai 48.15
## 12 bihai 50.26
## 13 bihai 50.12
## 14 bihai 46.34
## 15 bihai 46.94
## 16 bihai 48.36
## 17 red 41.90
## 18 red 42.01
## 19 red 41.93
## 20 red 43.09
## 21 red 41.47
## 22 red 41.69
## 23 red 39.78
## 24 red 40.57
## 25 red 39.63
## 26 red 42.18
## 27 red 40.66
## 28 red 37.87
## 29 red 39.16
## 30 red 37.40
## 31 red 38.20
## 32 red 38.07
## 33 red 38.10
## 34 red 37.97
## 35 red 38.79
## 36 red 38.23
## 37 red 38.87
## 38 red 37.78
## 39 red 38.01
## 40 yellow 36.78
## 41 yellow 37.02
## 42 yellow 36.52
## 43 yellow 36.11
## 44 yellow 36.03
## 45 yellow 35.45
## 46 yellow 38.13
## 47 yellow 37.10
## 48 yellow 35.17
## 49 yellow 36.82
## 50 yellow 36.66
## 51 yellow 35.68
## 52 yellow 36.03
## 53 yellow 34.57
## 54 yellow 34.63
This is the default summary function:
summary(Heliconia)
## variety length
## Length:54 Min. :34.57
## Class :character 1st Qu.:37.10
## Mode :character Median :39.16
## Mean :40.96
## 3rd Qu.:46.44
## Max. :50.26
## NA's :1
numSummary from
RcmdrMiscHere’s a more useful summary with mean, sd and sample size (valid and NA) for each group.
numSummary(Heliconia[,"length", drop=FALSE], groups=Heliconia$variety,
statistics=c("mean", "sd"))
## mean sd length:n length:NA
## bihai 47.73643 1.2352490 14 1
## red 39.71130 1.7987630 23 0
## yellow 36.18000 0.9753241 15 0
dplyrHeliconia %>% group_by(variety) %>%
summarize(mean=mean(length, na.rm=TRUE),
sd=sd(length, na.rm=TRUE),
n.total=n(), n.valid=sum(!is.na(length)), n.missing=sum(is.na(length)))
## # A tibble: 4 × 6
## variety mean sd n.total n.valid n.missing
## <chr> <dbl> <dbl> <int> <int> <int>
## 1 bihai 47.7 1.24 15 14 1
## 2 red 39.7 1.80 23 23 0
## 3 yellow 36.2 0.975 15 15 0
## 4 <NA> 46.4 NA 1 1 0
boxplot(length~variety, data=Heliconia)
ggplot(data=Heliconia, aes(x=variety, y=length)) +
geom_boxplot()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
Exclude the observation with unknown variety from plot:
ggplot(data=Heliconia %>% filter(!is.na(variety)), aes(x=variety, y=length)) +
geom_boxplot()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).