if (!require("tidyverse")) install.packages("tidyverse"); library("tidyverse") #install and load tidyverse

## Loading required package: tidyverse

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

if (!require("neonUtilities")) install.packages("neonUtilities"); library("neonUtilities") #used to download NEON data

## Loading required package: neonUtilities

Choose NEON Data

Use Explore Data by Location to start with a map-based view of of sites.

Or use Explore Data Products to start with the kind of data that is available. On the left-hand side Filter, I recommend you stick with the Theme “Organisms, Populations, and Communities”, because you’ve learned the tools to work with that kind of data.

We already explored the mosquito pathogen status for all NEON sites across time, so you choose a different set of data.

Below, type a few sentences on what data you plan on using. Include information on which sites and information which data sets. It may also be appropriate to include information on which year(s) you are examining.

1) WHAT DATA DO YOU PLAN ON USING?

I plan on using data “aquatic plant bryophyte chemical properties”. I will be examining the two major sources of the National Institutes of Health (NIH) which explains the way this chemical works and National Ecological Observatory Network (NEON) Data Portal giving us total nitrogen and carbon samples to need in order to code properly. I will be examining the years of 2014-2025 to make sure the data is all up to date.

Explore the data to understand it

python

url = https://data.neonscience.org/data-products/explore data = pd.read_csv(url) data.head()

2) COMPLETE THE CODE CHUNK BELOW

In the code below, fill in details for dpID and site based on what data decided to examine.

#datalist = loadByProduct(dpID="DP1.20063.001", site="KONZ", check.size = F) 
#saveRDS(datalist,file="NEONdatalist_KONZ")
datalist <- readRDS(file = "NEONdatalist_KONZ")

After executing the above code once, add # in front of the first and second line, i.e. the line that starts with datalist = loadByProduct and the line that starts with saveRDS. Commenting out those two lines will run the code much faster each time because it will just load the data from the local copy.

Optionally, you may need to load multiple products and store them in different datalist variables.

names(datalist)

## [1] "categoricalCodes_10041"      "citation_10041_RELEASE-2024"
## [3] "issueLog_10041"              "mos_pathogenpooling"        
## [5] "mos_pathogenresults"         "readme_10041"               
## [7] "validation_10041"            "variables_10041"

From the list of files downloaded, determine which file as the data you want to look at. Replace the xxx in the code chunk below with the name of the file you will be using. If it is multiple files, include lines of code for each file.

3) COMPLETE THE CODE CHUNK BELOW

datalist$mos_pathogenpooling

## # A tibble: 224 × 12
##    uid     namedLocation domainID siteID startCollectDate    endCollectDate     
##    <chr>   <chr>         <chr>    <chr>  <dttm>              <dttm>             
##  1 4688e6… KONZ          D06      KONZ   2016-05-18 22:12:00 2016-05-20 13:18:00
##  2 3e65a9… KONZ          D06      KONZ   2016-05-18 22:12:00 2016-05-20 13:18:00
##  3 c1a950… KONZ          D06      KONZ   2016-05-18 22:12:00 2016-05-20 13:18:00
##  4 71681a… KONZ          D06      KONZ   2016-05-18 22:12:00 2016-05-20 13:18:00
##  5 ddb1c7… KONZ          D06      KONZ   2016-05-18 22:12:00 2016-05-20 13:18:00
##  6 ac6d14… KONZ          D06      KONZ   2016-05-18 22:12:00 2016-05-20 13:18:00
##  7 f1a6c8… KONZ          D06      KONZ   2016-05-18 22:12:00 2016-05-20 13:18:00
##  8 026a43… KONZ          D06      KONZ   2016-06-01 22:45:00 2016-06-03 13:43:00
##  9 2d419d… KONZ          D06      KONZ   2016-06-01 22:45:00 2016-06-03 13:43:00
## 10 f6cbea… KONZ          D06      KONZ   2016-06-01 22:45:00 2016-06-03 13:43:00
## # ℹ 214 more rows
## # ℹ 6 more variables: testingID <chr>, testingVialID <chr>, poolSize <dbl>,
## #   dataQF <chr>, publicationDate <chr>, release <chr>

Use the select() function to select just those columns from above that appear to contain the interesting data you want to examine.

If you would like some guidance, go back and reference the steps we did in BIOS100-NEON for the mos_pathogenpooling file. You are repeating those steps here, but for the data in which you are interested.

datalist$variables_10041

## # A tibble: 59 × 9
##    table   fieldName description dataType units downloadPkg pubFormat primaryKey
##    <chr>   <chr>     <chr>       <chr>    <chr> <chr>       <chr>     <chr>     
##  1 mos_pa… uid       Unique ID … string   <NA>  basic       asIs      N         
##  2 mos_pa… namedLoc… Name of th… string   <NA>  basic       asIs      N         
##  3 mos_pa… domainID  Unique ide… string   <NA>  basic       asIs      N         
##  4 mos_pa… siteID    NEON site … string   <NA>  basic       asIs      N         
##  5 mos_pa… startCol… Earliest k… dateTime <NA>  basic       yyyy-MM-… N         
##  6 mos_pa… endColle… Latest kno… dateTime <NA>  basic       yyyy-MM-… N         
##  7 mos_pa… testingID Identifier… string   <NA>  basic       asIs      N         
##  8 mos_pa… testingV… Identifier… string   <NA>  basic       asIs      Y         
##  9 mos_pa… poolSize  Number of … unsigne… numb… basic       integer   N         
## 10 mos_pa… dataQF    Data quali… string   <NA>  basic       asIs      N         
## # ℹ 49 more rows
## # ℹ 1 more variable: categoricalCodeName <chr>

Remember that you can examine the description of variables to help understand what field contain the data in which you are interested. You will have replace the 00000 with the number from your file names above, and you will have to fill in the full name of the table inside the quotes.

datalist$variables_10041 |>
  filter(table == "mos_pathogenpooling")

## # A tibble: 12 × 9
##    table   fieldName description dataType units downloadPkg pubFormat primaryKey
##    <chr>   <chr>     <chr>       <chr>    <chr> <chr>       <chr>     <chr>     
##  1 mos_pa… uid       Unique ID … string   <NA>  basic       asIs      N         
##  2 mos_pa… namedLoc… Name of th… string   <NA>  basic       asIs      N         
##  3 mos_pa… domainID  Unique ide… string   <NA>  basic       asIs      N         
##  4 mos_pa… siteID    NEON site … string   <NA>  basic       asIs      N         
##  5 mos_pa… startCol… Earliest k… dateTime <NA>  basic       yyyy-MM-… N         
##  6 mos_pa… endColle… Latest kno… dateTime <NA>  basic       yyyy-MM-… N         
##  7 mos_pa… testingID Identifier… string   <NA>  basic       asIs      N         
##  8 mos_pa… testingV… Identifier… string   <NA>  basic       asIs      Y         
##  9 mos_pa… poolSize  Number of … unsigne… numb… basic       integer   N         
## 10 mos_pa… dataQF    Data quali… string   <NA>  basic       asIs      N         
## 11 mos_pa… publicat… Date of da… dateTime <NA>  appended b… <NA>      N         
## 12 mos_pa… release   Identifier… string   <NA>  appended b… <NA>      N         
## # ℹ 1 more variable: categoricalCodeName <chr>

Develop a question and a hypotheses

Ask a question that your data has the potential to address. A hypothesis is an informed expectation based on what you know about the system.

4) WHAT IS YOUR QUESTION?

My question is how will the bryophyte percent cover at the Arikaree River (ARIK) be different or similar to the other NEON aquatic sites and how will the temperature/moisture infleunce habitat conditions that infleunce the abundance of bryophyte.

5) WHAT IS YOUR HYPOTHESIS

I believe the bryophyte percent will be lower than when its at a more moist NEON aquatic site all because the Arikaree River climate and variable stream will flow less stable than the drier habitats that limit the growth of bryophyte.

BIOS100-Project

Manar Al-Robaie

Fall 2025