Report for Nitrogen % in Fuzzy Box and Grey Box Eucalypt species

ENVX1002 Project 2

Part A

Introduction

Originally, I wanted to find my own data they was related to meat consumption per country relating to the average income of that country, however as spent hours researching for a good data set I couldn’t find anything and then had to decide to move on and try and find something else. Therefore, I decided to use the Leaf nitrogen data. As part of the Air bears project Mat, Tom, Brad, Gab and Floris conducted research around the Gunnedah area measuring leaf nitrogen content and spectral reflectance from leaves at various sites. Nitrogen levels in Eucalyptus foliage is important to koalas for survival (School of Biological Science, 2012).

I chose this data as it has practical applications to my course as well as interest areas like agriculture, land management and conservation. After the data set was downloaded, I looked at the 7 different columns (ID, Sample ID, Time, Site ID, Tree species, Mean leaf nitrogen and Measurements of spectral reflectance) and 246 separate data collections (based on 7 different eucalypt species). I have decided to compare Mean Leaf Nitrogen Percentage (numerical variable) with 2 tree species Grey Box and Fuzzy Box (categorical variable). I am fascinated to see if the different species will affect the Mean leaf nitrogen percentage within leaves.

Aim:To investigate whether there is a significant difference in the mean leaf nitrogen percentage between two tree species Grey Box and Fuzzy Box.

Prediction:Fuzzy Box will be more common with koalas in the Gunnedah area as Grey Box trees live in close proximity to highways and therefore noise might deter Koalas. (North West Ecological Services, 2016)

Exploratory data analysis

Data selection and entry: To address the aims and hypothesis, I decided to compare 2 of the most occurring species of eucalyptus and the nitrogen levels for them.

Code
# Load the required libraries
library(readxl)
library(dplyr)
library(knitr)
library(kableExtra)
library(ggplot2)
library(moments)

Exploratory data analysis (EDA)

Data selection and entry: To address the aim and hypothesis, I subset the data using the dplyr package to only the two species.

Code
# Load the data
eucalypt <- read_excel("data/proj2data.xlsx")
Code
eucalypt2 <- eucalypt %>%
    filter(`TreeSp` %in% c("Grey Box", "Fuzzy Box")) %>% ## subset data to only mandarins and oranges
    na.omit() ## remove missing values
Code
knitr::kable(eucalypt2)
Table 1: Mean Nitrogen Content % in Grey Box and Fuzzy Box Eucalypt trees
TreeSp MeanN%
Grey Box 1.38760
Grey Box 1.38760
Grey Box 1.18190
Grey Box 1.18190
Grey Box 1.18190
Grey Box 1.19385
Grey Box 1.19385
Grey Box 1.19385
Grey Box 1.36545
Grey Box 1.36545
Grey Box 1.36545
Grey Box 1.55205
Grey Box 1.55205
Grey Box 1.55205
Grey Box 1.06485
Grey Box 1.06485
Grey Box 1.06485
Grey Box 1.27440
Grey Box 1.27440
Grey Box 1.27440
Grey Box 1.12940
Grey Box 1.12940
Grey Box 1.12940
Grey Box 0.99785
Grey Box 0.99785
Grey Box 0.99785
Grey Box 1.32385
Grey Box 1.32385
Grey Box 1.32385
Grey Box 1.28415
Grey Box 1.28415
Grey Box 1.28415
Grey Box 1.20030
Grey Box 1.20030
Grey Box 1.20030
Fuzzy Box 1.75075
Fuzzy Box 1.75075
Fuzzy Box 1.75075
Fuzzy Box 1.75300
Fuzzy Box 1.75300
Fuzzy Box 1.75300
Fuzzy Box 1.45265
Fuzzy Box 1.45265
Fuzzy Box 1.45265
Fuzzy Box 1.48875
Fuzzy Box 1.48875
Fuzzy Box 1.48875
Fuzzy Box 1.86090
Fuzzy Box 1.86090
Fuzzy Box 1.86090
Fuzzy Box 1.65560
Fuzzy Box 1.65560
Fuzzy Box 1.65560
Fuzzy Box 1.64225
Fuzzy Box 1.64225
Fuzzy Box 1.64225
Fuzzy Box 1.74150
Fuzzy Box 1.74150
Fuzzy Box 1.74150
Fuzzy Box 1.65315
Fuzzy Box 1.65315
Fuzzy Box 1.65315
Fuzzy Box 1.45140
Fuzzy Box 1.45140
Fuzzy Box 1.45140
Fuzzy Box 1.38760

From Table 1 it can be seen that there is only data for the Fuzzy box and Grey box tree species. this table also presents the mean nitrogen percentage.

Code
# Summary statistics
eucalypt2 %>%
    group_by(`TreeSp`) %>%
    summarise(
        mean = mean(`MeanN%`),
        median = median(`MeanN%`),
        sd = sd(`MeanN%`),
        IQR = IQR(`MeanN%`),
        n = n(),
        skewness = skewness(`MeanN%`)
    ) %>%
    kable()
Table 2: Summary of Nitrogen in Eucalyptus species
TreeSp mean median sd IQR n skewness
Fuzzy Box 1.636692 1.65315 0.1411466 0.2620 31 -0.1986282
Grey Box 1.242267 1.20030 0.1474394 0.1682 35 0.3425129

From the summary statistic in the table Table 2 and the boxplot below Figure 1 we can see that the mean Nitrogen content in Eucalypt species Fuzzy Box is higher than those in the Grey Box. This is also true for the median seen in Table 2. In Figure 1 it shows that there is a slight right skew for for Grey Box and a slight left skew for Fuzzy Box. Both groups have a relatively low standard deviation meaning that the data points are close to the mean. The IQR shows that the spread of the middle 50% of data points is wider in the Fuzzy Box species than the Grey Box species.

Code
# Boxplot
eucalypt2 %>%
    ggplot(aes(x = `TreeSp`, y = `MeanN%`, fill = `TreeSp`)) +
    geom_boxplot() +
    labs(
        x = "Tree Species",
        y = "Mean Nitrogen Percentage %",
    ) +
    theme_minimal()

Figure 1: Boxplot of Nitrogen in Fuzzy Box and Grey box.

Hypothesis testing (HATPC)

Checking the structure of the data:

Code
str(eucalypt2)
tibble [66 × 2] (S3: tbl_df/tbl/data.frame)
 $ TreeSp: chr [1:66] "Grey Box" "Grey Box" "Grey Box" "Grey Box" ...
 $ MeanN%: num [1:66] 1.39 1.39 1.18 1.18 1.18 ...

Hypothesis (H)

Statistical hypothesis

\(H_0\): The mean nitrogen percentage of fuzzy box trees is equal to the mean nitrogen percentage of grey box trees.

\(H_1\): The mean nitrogen percentage of fuzzy box trees is not equal to the mean nitrogen percentage of grey box trees.

\[H_0: \mu_F = \mu_G\] \[H_1: \mu_F \neq \mu_G\]

Assumptions (A)

Normality

Code
ggplot(eucalypt2, aes(x = eucalypt2$`MeanN%`, fill = eucalypt$TreeSp)) +
  geom_histogram(position = "identity", bins = 20, alpha = 0.6, color = "black") +
  labs(x = "Mean Nitrogen Percentage", y = "Frequency", fill = "Tree Species") +
  theme_minimal()
Warning: Use of `` eucalypt2$`MeanN%` `` is discouraged.
ℹ Use `MeanN%` instead.
Warning: Use of `eucalypt$TreeSp` is discouraged.
ℹ Use `TreeSp` instead.

Figure 2: Histogram of Nitrogen in Fuzzy Box and Grey box.
Code
ggplot(eucalypt2, aes(sample = `MeanN%`)) +
    stat_qq() +
    stat_qq_line() +
    facet_wrap(~TreeSp)

Figure 3: QQ Plot of Nitrogen in Fuzzy Box and Grey box.

The Figure 3 show that the data are not normally distributed. Transformation of the data is required.

Homogeneity of Variance

From the Figure 1, we can see that there is an indication that the variances are not equal. We can test this assumption using Bartlett’s test or Levene’s test.

Code
bartlett.test(`MeanN%` ~ TreeSp, data = eucalypt2)

    Bartlett test of homogeneity of variances

data:  MeanN% by TreeSp
Bartlett's K-squared = 0.059578, df = 1, p-value = 0.8072

With a p-value of 0.8072, we would fail to reject the null hypothesis, indicating that there is no significant evidence to suggest that the variances differ across groups.

Distribution of data

Code
eucalypt2 %>%
  group_by(TreeSp) %>%
  summarise(skewness = skewness(`MeanN%`), kurtosis = kurtosis(`MeanN%`))
# A tibble: 2 × 3
  TreeSp    skewness kurtosis
  <chr>        <dbl>    <dbl>
1 Fuzzy Box   -0.199     1.85
2 Grey Box     0.343     2.79

Fuzzy Box has a skewness of -0.199, indicating a slight left skewness. Grey Box has a skewness of 0.343, indicating a slight right skewness.

Both Kurtosis were less than 3 indicating a distribution with light tails and flat center. there are fewer to no outliers.

Transformation of Data

Square root transformation

It is acceptable to use a square root transformation as the Kurtosis is less than 3, and the skewness is between 0.5 and 1. However it show below that the data is still not normally distributed so I moved onto the next transformation.

Code
Meansqr <- sqrt(eucalypt2$`MeanN%`)
Code
ggplot(eucalypt2, aes(sample = Meansqr)) +
    stat_qq() +
    stat_qq_line() +
    facet_wrap(~TreeSp)

Logarithmic transformation

this test below also shows that it is still not normally distributed so i moved onto the last transformation.

Code
MeanLog10 <- log10(eucalypt2$`MeanN%`)
Code
ggplot(eucalypt2, aes(sample = MeanLog10)) +
    stat_qq() +
    stat_qq_line() +
    facet_wrap(~TreeSp)

Reciprocal transformation

this also didnt make the data normally distributed so therefore i have to perform a non-parametric test (Mann- whitney u test)

Code
Meanrec <- 1/eucalypt2$`MeanN%`
Code
ggplot(eucalypt2, aes(sample = Meanrec)) +
    stat_qq() +
    stat_qq_line() +
    facet_wrap(~TreeSp)

Assumption of normality

To test the assumption of normality, we can compare the mean and the median, use the Shapiro-Wilk test on the difference, and check the qqplot.

Code
# Calculate the difference in the Gross Value of mandarins and oranges
eucalyptdiff <- tibble(`MeanN%` = filter(eucalypt2, `TreeSp` == "Grey Box") %>% pull(`MeanN%`) - filter(eucalypt2, `TreeSp` == "Fuzzy Box") %>% pull(`MeanN%`))
Warning in filter(eucalypt2, TreeSp == "Grey Box") %>% pull(`MeanN%`) - :
longer object length is not a multiple of shorter object length
Code
# Shapiro-Wilk test
sharp <- shapiro.test(eucalyptdiff$`MeanN%`)
sharp

    Shapiro-Wilk normality test

data:  eucalyptdiff$`MeanN%`
W = 0.95078, p-value = 0.1197

The Shapiro Wilk test for normality is not significant (p = 0.1197) and the W value is (w=0.95078), which suggests that the data is normally distributed.

Code
# QQ plot
ggplot(eucalyptdiff, aes(sample = `MeanN%`)) +
    stat_qq() +
    stat_qq_line() +
    theme_minimal()

Figure 4: QQ plot of the difference in Mean Nitrogen percentage in Fuzzy Box and Grey Box

Looking at the qqplot, we can see that the data most of the data points lie close the line and the, however some appear a little further out and curved showing that there could be slight kurtotic.

Code
eucalyptdiff %>%
  summarise(
    mean = mean(`MeanN%`),
    median = median(`MeanN%`)
  ) %>%
  kable()
Table 3: Summary of the difference between Fuzzy Box and Grey Box Nitrogen %
mean median
-0.4075243 -0.3812

Finally, the mean and median as calculated above Table 3 are similar so we can assume that the data is reasonably normally distributed for a relatively small sample size.

Test (T) & P-value (P)

Non Parametric test

As the data did not meet the assumptions of normality required by the Shapiro test, the non-parametric test Man- Whitney U Test was used to test the hypothesis not assuming symmetry and so therefore the most appropriate summary statistic to use when comparing the two groups is the median, and the difference in distribution of the two groups.

Code
wilcox.test(eucalypt2$`MeanN%` ~ eucalypt2$TreeSp, data = eucalypt2)
Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
compute exact p-value with ties

    Wilcoxon rank sum test with continuity correction

data:  eucalypt2$`MeanN%` by eucalypt2$TreeSp
W = 1054, p-value = 4.984e-11
alternative hypothesis: true location shift is not equal to 0

The results indicate that the mean nitrogen percentage of eucalypt trees Fuzzy Box and Grey Box is not significantly different (W = 1054, p-value = 4.984e-11).

Conclusion (C)

Statistical conclusion

Since the p-value (4.984e-11) is much smaller than 0.05, we reject the null hypothesis, Based on the Wilcoxon rank sum test results, we conclude that there is a statistically significant difference in the mean nitrogen percentage between fuzzy box trees and grey box trees.

Scientific conclusion

In Concluion the results show that there is a significant difference in the Mean Nitrogen percentantage between fuzzy box trees and grey box trees (p < 0.001). Showing that (Table 2 and Figure 1) Fuzzy Box eucalyptus trees have a higher Nitrogen percentage compared to the grey box. Suggesting that koalas would prefer Fuzzy box trees.

Part B

A 2014 study conducted in the forests between Bermagui and Tathra looked at the in influence of leaf chemistry on the distribution and ecology of a low-density population of koalas. wanting to understand if leaf chemistry influenced which trees koalas visited. Their results and models revealed a preference among koalas for eucalyptus trees with higher nitrogen concentration compared to neighboring conspecifics, indicating a selective foraging behavior based on nutritional quality. The statistical analyses used was spot assessment methods, linear mixed models and algorithms, Near-Infrared Reflectance Spectroscopy to predict foliar chemistry and comparatives. (Stalenberg, 2014)

Words: 92

Part C

References

Stalenberg, E., Wallis, I. R., Cunningham, R. B., Allen, C., & Foley, W. J. (2014). Nutritional Correlates of Koala Persistence in a Low-Density Population. PLOS ONE, 9(12), e113930. https://doi.org/10.1371/journal.pone.0113930

The above paper has been published in a peer-reviewed journal and is a reliable source of information as it has gone through rigorous evaluation and editing, the authors are repeatable and the report is consitent and could be repeated.

OpenAI. (n.d.). ChatGPT (Version 3.5) [Computer software]. Retrieved from https://openai.com/chatgpt

I used chat GPT to help me work out what species i should select, help me get codes for the different transformations and examples of GGplots. ChatGPT collects data from diverse range of data sources, including books, articles, websites, and other text sources meaning that it is crossed checked often with multiple sources.

For most of the report i used class resources. This included lectures as well as tutorials and labs. This helped with visual examples of what data should look like as well as helped with understanding interpretations of this.

I also used Ed discussion for a number of questions, as at the begining i was very confused about why none of the transformations would become ‘normal’. someone else had already asked this question and i then understood that i needed to use a non-parametric test.

Other websites

Ecological, N 2016, Gunnedah Koala Conservation Plan, viewed 1 May 2024, https://gunnedah.nsw.gov.au/index.php/listfiles/preview?path=GunnedahShireCouncil%252FENVIRONMENT-WILDLIFE-AND-WASTE%252FENVIRONMENTAL-MANAGEMENT%252FGunnedahKoalaConservationPlan%252FGunnedah%2BKoala%2BConservation%2BLandscape%2BPlan%2BNovember%2B2016.pdf.

Reddit - Dive into anything 2021, Reddit.com, viewed 3 May 2024, https://www.reddit.com/r/rstats/comments/kvl66f/correct_use_of_shapirotest_shapiro_wilk/?rdt=57582.

Assessing the Assumption of Normality · UC Business Analytics R Programming Guide 2024, Github.io, viewed 3 May 2024, https://uc-r.github.io/assumptions_normality#:~:text=Shapiro%2DWilk%20Test%20for%20Normality&text=If%20the%20test%20is%20non,different%20from%20a%20normal%20distribution..

author 2023, Koala habitat, NSW Environment and Heritage, viewed 3 May 2024, https://www.environment.nsw.gov.au/topics/animals-and-plants/native-animals/native-animal-facts/koala/koala-habitat.

Koalas selective about eucalyptus leaves at mealtime: Koalas selected leaves with more nitrogen, fewer toxins 2014, ScienceDaily, viewed 3 May 2024, https://www.sciencedaily.com/releases/2014/12/141203142544.htm#:~:text=They%20found%20that%20koalas%20visited,tree%20of%20the%20same%20species..

LibGuides: SPSS Tutorials: Independent Samples t Test 2018, Kent.edu, viewed 3 May 2024, https://libguides.library.kent.edu/SPSS/IndependentTTest.