The code in the chunk above sets the global behaviour for all R code chunks in this document. The “include=FALSE” part of the code will hide the section above when you knit this document.
The code in the chunk above loads the library and data that you need.
#1A
table(trees$forestType) #count trees in each forest type
##
## BF DDF DDFP DEF DMDF LMF MMDF PF
## 116 3087 316 429 907 168 546 756
#1B
totalno.oftrees <- sum(table(trees$forestType))
DDFpercentage <- 3087/sum(table(trees$forestType))
DDFpercentage
## [1] 0.4880632
1A) Which forest type has the largest number of
trees?
DDF (deciduous dipterocarp forest) has the largest number of trees,
3087.
1B) What percentage of the dataset do trees in this forest type represent?
DDF percentage in total number of trees is 48.81%. It is not evenly distributed. There are 8 forest types in total and one forest type (DDF) has been found to make up nearly 50% of the number of trees, indicating an uneven distribution of forest types. Additionally, this is indicated in the table (#1A) showing the uneven distribution of trees (out of a total of 6325) within each forest type.
#2A. mean dbh
mean(trees$dbh)
## [1] 40.18248
#2B. median dbh
median(trees$dbh)
## [1] 34.5
#2C. minimum tree height
min(trees$height)
## [1] 2
#2D. maximum tree height
max(trees$height)
## [1] 52.5
2A) dbh mean
40.18cm
2B) dbh median
34.50cm
2C) tree height minimum
2m
2D) tree height maximum
52.50m
p <- ggplot(data = trees, mapping = aes(x = dbh, y = height))
#3A, Version 1: changing colour
p + geom_point(colour = "green")+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships in Thai forests",
caption = "Source: Thai ForestGEO")
#3A, Version 2: changing variables
p + geom_point(aes(colour = "green"))+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships in Thai forests",
caption = "Source: Thai ForestGEO")
3A) How do the plots differ?
The first plot (Version 1) has displayed the points on the graph in the
colour green. The second plot (Version 2) has displayed the points on
the graph in red, but labelled the variables under the colour,
green.
3B) Why does this happen?
The first plot (Version 1) has modified the display of the graph, by
modifying the geometry that is displaying our data (geom_point) and
changing the geometry to the colour green. The second plot (Version 2)
has modified our graph, resulting in changing the mapping of the
variables of the graph (which are misleadingly called “aesthetics” in
ggplot functions) and labeling the variables under “green” rather than
changing the geometry of how we wish to display our data, to the colour
green. It’s important to note with ggplots the difference between
changing aesthetics (graph variables) and geometry (graph display) when
creating graphs.
#grouping data showing the relationship between DBH and height for each forest type
ggplot(trees, aes(x = dbh, y = height, colour = forestType)) +
geom_point()+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships in Thai forests",
caption = "Source: Thai ForestGEO")
#4 faceted plot showing the relationship between DBH and height for each forest type
p <- ggplot(data = trees, mapping = aes(x = dbh, y = height))
p + geom_point() + facet_wrap(~forestType)+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships between Forest Types, in Thai forests",
caption = "Source: Thai ForestGEO")
#4A faceted plot + smooth curve (for easier interpretation)
p <- ggplot(data = trees, mapping = aes(x = dbh, y = height))
p + facet_wrap(~forestType)+ geom_smooth()+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships between Forest Types, in Thai forests",
caption = "Source: Thai ForestGEO")
4A) Which forest type appears to have the tallest trees for a
given DBH?
DDFP (deciduous dipterocarp and pine forest) appear to have the tallest
trees (~45m) to dbh (~60cm), no other tree exceeds the height of DDFP at
a dbh of around 60cm.
4B) Do the DBH–height relationships appear similar across forest
types?
Not across all forest types, they vary between forest types quite
dramatically. There are a few forest types that appear to have a similar
relationship between height and dbh, such as DEF,DMDF & MMDF.
#Q5 - Species Comparison
#grouping showing the relationship between DBH and height for each tree species
p <- ggplot(data = trees, mapping = aes(x = dbh, y = height,colour = sppCode))
p + geom_point() +labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships between Tree Species, in Thai forests",
caption = "Source: Thai ForestGEO")
#faceting showing the relationship between DBH and height for each tree species
p <- ggplot(data = trees, mapping = aes(x = dbh, y = height, colour=sppCode))
p + geom_point() + facet_wrap(~sppCode)+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships between Tree Species, in Thai forests",
caption = "Source: Thai ForestGEO")
#faceting & smooth curve showing the relationship between DBH and height for each tree species
p <- ggplot(data = trees, mapping = aes(x = dbh, y = height))
p + geom_smooth() + facet_wrap(~sppCode)+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships between Tree Species, in Thai forests",
caption = "Source: Thai ForestGEO")
Q5) Why might faceting by species be harder to interpret than
faceting by forest type?
Many more relationships to compare. We had only 8 Forest Types, in
comparison to 20 Tree Species. Depends what we are trying to interpret,
are we wanting to distinguish patterns in different Forest Types or
different Tree Species? It depends on the question we are trying to
answer. If we are looking to answer more generally, DBH-height
relationships in Thai Forests, simplifying the data into more broad
groups, such as Forest Types, could make for more simple data
interpretations.
#Q6 faceting, smooth curve and observed point data showing the relationship between DBH and height for each tree species
p <- ggplot(data = trees, mapping = aes(x = dbh, y = height))
p + geom_point() + geom_smooth() + facet_wrap(~sppCode)+labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships between Tree Species, in Thai forests",
caption = "Source: Thai ForestGEO")
Q6) Why can including observed data improve interpretation of
a figure?
The help function in RStudio defines a “geom_point scatterplot [to be]
most useful for displaying the relationship between two continuous
variables” and that a “geom_smooth plot aids the eye in seeing patterns
in the presence of overplotting.”
Using both a smooth curve overlaid onto a scatterplot of observed data points, allows the perception of both general patterns present (particuarly useful in dense scatterplots as observed here) as well as individually observed points and their relationships the general patterns made apparent, being able to see how many outliers exist, and where they exist on the graph can help deeper interpretation of data (such as understanding it’s overall variability).
#7A. Calculate the bin width used in the figure.
ggplot(trees, aes(x = dbh)) +
geom_histogram()+labs(x = "DBH (cm)", y = "No. of Trees",
title = "No. of Trees to DBH in Thai forests (30 bins)",
caption = "Source: Thai ForestGEO")
#default number of bins is always 30, unless specified otherwise
#7B. Recreate the histogram using 10 bins and 50 bins.
#10 bins
ggplot(trees, aes(x = dbh)) +
geom_histogram(bins = 10)+labs(x = "DBH (cm)", y = "No. of Trees",
title = "No. of Trees to DBH in Thai forests (10 bins)",
caption = "Source: Thai ForestGEO")
#50 bins
ggplot(trees, aes(x = dbh)) +
geom_histogram(bins = 50)+labs(x = "DBH (cm)", y = "No. of Trees",
title = "No. of Trees to DBH in Thai forests (50 bins)",
caption = "Source: Thai ForestGEO")
#7A bin width
(max(trees$dbh)-min(trees$dbh))/30
## [1] 5.85
#7B bin widths
(max(trees$dbh)-min(trees$dbh))/10
## [1] 17.55
(max(trees$dbh)-min(trees$dbh))/50
## [1] 3.51
Q7A/B) How does changing the number of bins influence your
interpretation of the distribution?
“You should always override the default value of bins in histogram
function, exploring multiple widths to find the best to illustrate the
stories in your data.” - RStudio Help function.
You can make the histogram more simplistic or detailed, depending on the
number of bins you choose. In #7B, we are varying between collating dbh
data within 17.55cm’s, or within 3.51cm’s. A ~20cm dbh difference is a
large jump for tree growth, so this would be potentially an overly
simplistic/ overly large collation of dbh data. Collating dbh data
within 3.51cm’s seems more appropriate for assessing tree ring growth
(dbh), as this occurs slowly, with only cm’s of growth representing a
significant change in data. Therefore, assessing smaller amounts of
change in cm’s would produce a more accurate interpretation of the data.
The scale of bin width chosen (through the number of bins selected for)
is dependent on the data you are assessing, whether it is more
accurately interpreted at finer scales, or broader scales.
#Q8 density plot
ggplot(trees, aes(x = dbh)) +
geom_density()+labs(x = "DBH (cm)", y = "No. of Trees",
title = "No. of Trees by DBH in Thai forests",
caption = "Source: Thai ForestGEO")
8A) How does the density plot differ from the
histogram?
Presents the percentage of trees per dbh (cm), rather than the total
count of tree per dbh (cm), which a histogram presents.
8B) What additional information does a density plot reveal about
the distribution?
The density plot gives a clearer picture on the amount of trees, within
the total number of trees, that exhibit a particular trait of dbh (cm).
For example, we can easily assess that around a quarter of the total
trees in our data, have a dbh of approx. 25cm.
#Q9 - Investigating a Subset
#Histogram of DDF Forest Type, no. of trees x dbh
ggplot(subset(trees, forestType == "DDF"),aes(x = dbh)) +
geom_histogram(bins = 50)+labs(x = "DBH (cm)", y = "No. of DDF Forest Type Trees",
title = "No. of DDF Forest Type Trees by DBH in Thai forests (50 bins)",
caption = "Source: Thai ForestGEO")
9A) Is it symmetric (balanced on both sides) or skewed (with
a tail to the left or right)?
It is skewed to the left, the forest type of DDF have more trees with a
lower dbh, than higher (above 50cm dbh), with a tail extending to the
right of the histogram.
9B) What ecological process might explain this
pattern?
Perhaps, the recruitment stand of DDF is younger, so there are less
trees that have reached a wider dbh of 50cm, due to being a younger
stand of trees.
#Q10 Design your own figure
#ggplot of dbh and height relationship in DDF forest type, faceted into species with smooth curve overlay
p <- ggplot(data = subset(trees, forestType == "DDF"), mapping = aes(x = dbh, y = height))
p + geom_point(colour = 'purple', size = 0.50, shape = 5) + geom_smooth(colour='red') + facet_wrap(~sppCode) +
labs(x = "DBH (cm)", y = "Height (m)",
title = "DBH-Height relationships in Forest Type DDF(deciduous dipterocarp forest) by Species, in Thai forests",
caption = "Source: Thai ForestGEO")
10A) What the figure reveals about the data
set.
There is a lot of variability in each species dbh x height relationship,
within the Forest Type of DDF. It also show’s the variance in number of
data points that were collected within each species, with much higher
numbers collected for some (DIPTOB, SHOROB, DIPTTU and SHORSI) than
others (LAGECA, PINUKE, TECTGR, etc.).
10B) What does it show that would be harder to see using a
different type of visualisation? The smooth curve helps depict
the relationship, correlation, and trend between the two continuous
variables correlation of height x dbh. This would be harder to visualise
using a box plot, which would help at comparing each species dbh x
height side by side, outlining each species’ center, spread, skewness,
and potential outliers.