Now we are ready to describe some properties of our data using descriptive statistics, and presenting them in tables.
We are interested in calculating some statistics on our Melocactus data set: number of values, maximum and minimum values, mean, and standard deviation for measurements on the size of the individuals, and frequencies of the status, but we want the calculations for each year.
First, descriptive statistics for values of plant height, but we want also to have the values of the vegetative part of the plant, substracting the longitude of the inflorescence from the total height of the plant:
library(tidyverse)
# descriptive statistics plant_height
descrip.stats.plht <- melo.select %>%
group_by(year) %>%
summarise(n = length(plant_height),
Max = max(plant_height),
min = min(plant_height),
median = median(plant_height),
mean = mean(plant_height),
st.err=sd(plant_height)/sqrt(length(plant_height)))
descrip.stats.plht
## # A tibble: 2 × 7
## year n Max min median mean st.err
## <int> <int> <int> <int> <int> <dbl> <dbl>
## 1 2019 125 66 1 9 14.2 1.24
## 2 2020 153 61 1 4 9.41 0.962
# new column with the height of the vegetative part - veg_height
melo.select.veg <- melo.select %>%
mutate(veg_height = plant_height - long.inflo)
# descriptive statistics veg_height
descrip.stats.vght <- melo.select.veg %>%
group_by(year) %>%
summarise(n = length(veg_height),
Max = max(veg_height, na.rm = T),
min = min(veg_height, na.rm = T),
median = median(veg_height, na.rm = T),
mean = mean(veg_height, na.rm = T),
st.err=sd(veg_height, na.rm = T)/sqrt(n))
descrip.stats.vght
## # A tibble: 2 × 7
## year n Max min median mean st.err
## <int> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 2019 125 43 0 7 10.8 0.856
## 2 2020 153 45 1 4 8.09 0.730
And now, descriptive statistics for the longitude of the inflorescence, eliminating zeroes (when the inflorescence is not present):
# filtering non-zero values for long.inflo
melo.nonzero <- melo.select %>%
filter(long.inflo > 0)
# descriptive statistics long.inflo - without zero values
descrip.stats.infl <- melo.nonzero %>%
group_by(year) %>%
summarise(n = length(long.inflo),
Max = max(long.inflo, na.rm = T),
min = min(long.inflo, na.rm = T),
median = median(long.inflo, na.rm = T),
mean = mean(long.inflo, na.rm = T),
st.err=sd(long.inflo, na.rm = T)/sqrt(n))
descrip.stats.infl
## # A tibble: 2 × 7
## year n Max min median mean st.err
## <int> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 2019 32 32 1 11 13.4 1.49
## 2 2020 17 26 1 11 10.6 1.62
Now we are going to use the above results to build more aesthetic tables for reports and publications. We will use the package gt. First a simple table, just using the previous results and the command gt:
library(gt)
gt_basic <- descrip.stats.vght %>%
gt()
gtsave(gt_basic, "gt_basic.png")
We can make the table a bit more elaborated:
table1 <- descrip.stats.vght %>%
gt() %>%
# title for the table
tab_header(title = md("Table 1. Descriptive statistics for vegetative part height (cm) of individuals of _M. intortus_, in a population of the Guánica Dry Forest, Puerto Rico")) %>%
# subtitle for the columns
tab_spanner(
label = "Descriptive Statistics",
columns = c(2:7)) %>%
# renaming the columns
cols_label(
year = md("__Year__"),
n = md("__n__"),
Max = md("__Maximum__"),
min = md("__Minimum__"),
median = md("__Median__"),
mean = md("__Mean__"),
st.err = md("__Standard Error__")) %>%
# formatting the columns
cols_align(
align = "center") %>%
# formatting the data values
fmt_number(
columns = c(6:7),
decimals = 2
)
gtsave(table1,"table_1.png")
The variable status is a categorical variable and we want to create a table with the frequencies for each category. The package janitor is one alternative to do this.
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
# removing missing values
melo.select.narm <- filter(melo.select, status == "E" | status == "S" | status == "X")
# frequency and % table
tabl_freq <-
tabyl(melo.select.narm, year, status) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(affix_sign = FALSE) %>%
adorn_ns()
tabl_freq
## year E S X
## 2019 25.6 (32) 52.0 (65) 22.4 (28)
## 2020 13.8 (21) 48.0 (73) 38.2 (58)
# a more polished table
table2 <- tabl_freq %>%
gt() %>%
# title for the table
tab_header(title = md("Table 2. Percentage (count) of individuals in each category of status and sampling year")) %>%
# subtitle for the columns
tab_spanner(
label = "Status",
columns = c(2:4)) %>%
# renaming the columns
cols_label(
year = md("__Year__"),
E = md("__Sick__"),
S = md("__Healthy__"),
X = md("__Dead__")) %>%
# formatting the columns
cols_align(
align = "center")
gtsave(table2, "table_2.png")