if (!require("tidyverse")) install.packages("tidyverse"); library("tidyverse") #install and load tidyverse
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require("neonUtilities")) install.packages("neonUtilities"); library("neonUtilities") #used to download NEON data
## Loading required package: neonUtilities
Use Explore Data by Location to start with a map-based view of of sites.
Or use Explore Data Products to start with the kind of data that is available. On the left-hand side Filter, I recommend you stick with the Theme “Organisms, Populations, and Communities”, because you’ve learned the tools to work with that kind of data.
We already explored the mosquito pathogen status for all NEON sites across time, so you choose a different set of data.
Below, type a few sentences on what data you plan on using. Include information on which sites and information which data sets. It may also be appropriate to include information on which year(s) you are examining.
I plan on using data “aquatic plant bryophyte chemical properties”. I will be examining the two major sources of the National Institutes of Health (NIH) which explains the way this chemical works and National Ecological Observatory Network (NEON) Data Portal giving us total nitrogen and carbon samples to need in order to code properly. I will be examining the years of 2014-2025 to make sure the data is all up to date.
python
url = https://data.neonscience.org/data-products/explore data = pd.read_csv(url) data.head()
In the code below, fill in details for dpID and site based on what data decided to examine.
#datalist = loadByProduct(dpID="DP1.20063.001", site="KONZ", check.size = F)
#saveRDS(datalist,file="NEONdatalist_KONZ")
datalist <- readRDS(file = "NEONdatalist_KONZ")
After executing the above code once, add # in front of the first and second line, i.e. the line that starts with datalist = loadByProduct and the line that starts with saveRDS. Commenting out those two lines will run the code much faster each time because it will just load the data from the local copy.
Optionally, you may need to load multiple products and store them in different datalist variables.
names(datalist)
## [1] "categoricalCodes_10041" "citation_10041_RELEASE-2024"
## [3] "issueLog_10041" "mos_pathogenpooling"
## [5] "mos_pathogenresults" "readme_10041"
## [7] "validation_10041" "variables_10041"
From the list of files downloaded, determine which file as the data you want to look at. Replace the xxx in the code chunk below with the name of the file you will be using. If it is multiple files, include lines of code for each file.
datalist$mos_pathogenpooling
## # A tibble: 224 × 12
## uid namedLocation domainID siteID startCollectDate endCollectDate
## <chr> <chr> <chr> <chr> <dttm> <dttm>
## 1 4688e6… KONZ D06 KONZ 2016-05-18 22:12:00 2016-05-20 13:18:00
## 2 3e65a9… KONZ D06 KONZ 2016-05-18 22:12:00 2016-05-20 13:18:00
## 3 c1a950… KONZ D06 KONZ 2016-05-18 22:12:00 2016-05-20 13:18:00
## 4 71681a… KONZ D06 KONZ 2016-05-18 22:12:00 2016-05-20 13:18:00
## 5 ddb1c7… KONZ D06 KONZ 2016-05-18 22:12:00 2016-05-20 13:18:00
## 6 ac6d14… KONZ D06 KONZ 2016-05-18 22:12:00 2016-05-20 13:18:00
## 7 f1a6c8… KONZ D06 KONZ 2016-05-18 22:12:00 2016-05-20 13:18:00
## 8 026a43… KONZ D06 KONZ 2016-06-01 22:45:00 2016-06-03 13:43:00
## 9 2d419d… KONZ D06 KONZ 2016-06-01 22:45:00 2016-06-03 13:43:00
## 10 f6cbea… KONZ D06 KONZ 2016-06-01 22:45:00 2016-06-03 13:43:00
## # ℹ 214 more rows
## # ℹ 6 more variables: testingID <chr>, testingVialID <chr>, poolSize <dbl>,
## # dataQF <chr>, publicationDate <chr>, release <chr>
Use the select() function to select just those columns from above that appear to contain the interesting data you want to examine.
If you would like some guidance, go back and reference the steps we did in BIOS100-NEON for the mos_pathogenpooling file. You are repeating those steps here, but for the data in which you are interested.
datalist$variables_10041
## # A tibble: 59 × 9
## table fieldName description dataType units downloadPkg pubFormat primaryKey
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 mos_pa… uid Unique ID … string <NA> basic asIs N
## 2 mos_pa… namedLoc… Name of th… string <NA> basic asIs N
## 3 mos_pa… domainID Unique ide… string <NA> basic asIs N
## 4 mos_pa… siteID NEON site … string <NA> basic asIs N
## 5 mos_pa… startCol… Earliest k… dateTime <NA> basic yyyy-MM-… N
## 6 mos_pa… endColle… Latest kno… dateTime <NA> basic yyyy-MM-… N
## 7 mos_pa… testingID Identifier… string <NA> basic asIs N
## 8 mos_pa… testingV… Identifier… string <NA> basic asIs Y
## 9 mos_pa… poolSize Number of … unsigne… numb… basic integer N
## 10 mos_pa… dataQF Data quali… string <NA> basic asIs N
## # ℹ 49 more rows
## # ℹ 1 more variable: categoricalCodeName <chr>
Remember that you can examine the description of variables to help understand what field contain the data in which you are interested. You will have replace the 00000 with the number from your file names above, and you will have to fill in the full name of the table inside the quotes.
datalist$variables_10041 |>
filter(table == "mos_pathogenpooling")
## # A tibble: 12 × 9
## table fieldName description dataType units downloadPkg pubFormat primaryKey
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 mos_pa… uid Unique ID … string <NA> basic asIs N
## 2 mos_pa… namedLoc… Name of th… string <NA> basic asIs N
## 3 mos_pa… domainID Unique ide… string <NA> basic asIs N
## 4 mos_pa… siteID NEON site … string <NA> basic asIs N
## 5 mos_pa… startCol… Earliest k… dateTime <NA> basic yyyy-MM-… N
## 6 mos_pa… endColle… Latest kno… dateTime <NA> basic yyyy-MM-… N
## 7 mos_pa… testingID Identifier… string <NA> basic asIs N
## 8 mos_pa… testingV… Identifier… string <NA> basic asIs Y
## 9 mos_pa… poolSize Number of … unsigne… numb… basic integer N
## 10 mos_pa… dataQF Data quali… string <NA> basic asIs N
## 11 mos_pa… publicat… Date of da… dateTime <NA> appended b… <NA> N
## 12 mos_pa… release Identifier… string <NA> appended b… <NA> N
## # ℹ 1 more variable: categoricalCodeName <chr>
Ask a question that your data has the potential to address. A hypothesis is an informed expectation based on what you know about the system.
My question is how will the bryophyte percent cover at the Arikaree River (ARIK) be different or similar to the other NEON aquatic sites and how will the temperature/moisture infleunce habitat conditions that infleunce the abundance of bryophyte.
I believe the bryophyte percent will be lower than when its at a more moist NEON aquatic site all because the Arikaree River climate and variable stream will flow less stable than the drier habitats that limit the growth of bryophyte.