Graphs

Visualizing Data and Statistics

The descriptive statistics showed some patterns in our data and now we can extend our data analysis using visualization tools that may show more patterns.

Spatial distribution and more

A simple graph may show the distribution of the individual plants in the study area. We first use the command plot from base R.

plot(melo.2019$x_coord,melo.2019$y_coord)

plot(melo.2020$x_coord,melo.2020$y_coord)

Using the package ggplot2 we can create a more detailed graph, including information from each individual.

library(ggplot2)
# 2019
melo2019 <-  ggplot(melo.2019, aes(x_coord, y_coord, color = status, size = plant_height)) +
  geom_point() +
  xlim(75,125) +
  ylim(75,125) 
  # scale_color_hue(labels = c("sick", "healthy", "dead"))
melo2019

Figure 1 Spatial location of M. intortus individuals in a population in the Guánica Forest in 2019. The size of the circles represent the plant height (cm) and the color indicates the status of the individual.

# 2020
melo2020 <-  ggplot(subset(melo.2020, !is.na(long.inflo)), aes(x_coord, y_coord, color = status, size = plant_height)) +
  geom_point() +
  xlim(75,125) +
  ylim(75,125) +
  scale_color_hue(labels = c("sick", "healthy", "dead"))
melo2020

Figure 2 Spatial location of M. intortus individuals in a population in the Guánica Forest in 2020. The size of the circles represent the plant height (cm) and the color indicates the status of the individual.

Frequency distribution of numeric variables

In the previous sections we obtained descriptive statistics for numeric variables, now we want to visualize the distribution of values of such variables and how it relates to the computed statistics.

# frequency histogram with normal curve
# plant height
msd2019 <- melo.2019 %>%
  summarise(means = mean(plant_height), sd=sd(plant_height))
ggplot(melo.2019, aes(x=plant_height)) +
  geom_histogram(aes(y=..density..), position="identity", binwidth=4, color="blue", fill="gray") +
  stat_function(fun = dnorm, color="red", size=1, args=list(mean=msd2019$means, sd=msd2019$sd)) +
  labs(x="Plant Height, cm", y="Density") +
  geom_vline(xintercept = mean(melo.2019$plant_height), color = "red") +
  geom_vline(xintercept = median(melo.2019$plant_height), color = "green")

Figure 3. Histogram for plant height (cm) of individuals of M. intortus in Guánica Forest, in 2019. The normal curve for the mean and standard deviation of the data is shown in red, with the red vertical line showing the media, and the green vertical line the median.

Bar graph for categorical variables

We can present in a graph previous results (Table 2) on the frequency of the plant status.

# frequency calculation and tidy data
library(janitor)
tabys1 <- tabyl(melo.select.narm, status, year) %>% 
  adorn_percentages("col")
tabys2 <- tabys1 %>% 
  pivot_longer(!status, names_to = "year", values_to = "freq")
# bar graph
barfreq <- ggplot(tabys2, aes(x = status, y = freq, 
                        fill = year)) +
  geom_bar(position=position_dodge(), stat="identity") +
  xlab("Status") + ylab("Frequency") +
  scale_x_discrete(labels= c("Sick","Healthy","Dead"))
barfreq

Figure 4. Frequency of the status (sick, healthy or dead) of individuals of M. intortus in the Guánica Forest in 2019 and 2020.

Graphing the relationship between variables

Now we want to consider if there is a relationship between the size of the vegetative part of the plants and the size of the inflorescence.

# calculating vegetative part size
melo.2019.veg <- melo.2019 %>% 
  mutate(veg_height = plant_height - long.inflo)
# selecting individuals with inflorescence
melo.2019.reg <- filter(melo.2019.veg, long.inflo > 0)
# graph with regression line
library(ggpmisc)
ggplot(melo.2019.reg, aes(veg_height, long.inflo)) +
  geom_point() + 
  geom_smooth(method="lm", se=TRUE) +
  stat_poly_eq(aes(label = paste(..eq.label..,..adj.rr.label..,sep="~~~")), 
      parse = TRUE) +
  ylab("Inflorescence length, cm") + xlab("Vegetative part height, cm")

Figure 5. Scatter plot for plant vegetative part height (cm) and longitude of the inflorescence (cm) of individuals of M. intortus in the Guánica Forest in 2019. A regression line (blue) and its equation and \(R^2_{adj}\) are shown, with a 95% confidence interval.