BSC 2011L Labs 1 & 2:
Exploring the Ecology and Biodiversity of South Florida and BBC using Systematics, Taxonomy, and the Scientific Process

    Part 2: Reviewing the Scientific Process and an Introduction to Diversity Indices -
                    A Glance into SFL Biodiversity with FIU’s Biscayne Bay and
                    Modesto Maidique Campuses


Intro to Exploring iNaturalist Data with RStudio


[ Click to Display Introduction ]

Getting Started:

 
Here, we will begin working with iNaturalist data in RStudio. Although R may seem intimidating, not to worry! We do not expect you to be able to write or develop code on your own. The bulk of any coding used in this class will be provided for you. However, throughout the course, you may need to understand where and how to alter very small portions of the provided code as it applies to your end of semester projects. Your TA and LA will be available to help. This is NOT a coding class, this is NOT a statistics class; and therefore, we do not expect you to understand the major mechanics or theory behind the code or the analyses— only how to apply and interpret results.

Checking the Exported Data:

In order for the code to run correctly, the .csv file exported from iNaturalist MUST be arranged and labeled correctly. Even the smallest of typos can cause the script not to run. Remember: Codeing is CasE SeNSiTivE! If the export and download instructions were followed closely, the .csv file should be arranged correctly. Take a moment to open the .csv files in Excel and double-check that the column names match #’s 1-8 below.

Column # Column Name Column Notes
1 time_observed_at This is the date/time stamp of the observation in UTC (Coordinated Universal Time), equivalent to GMT (Greenwich Mean Time). There are numerous time zones throughout the world, so having a single, standard time reference is ideal for record keeping and exchanging data. For example, if an event happened on, January 1, 2021 at 2:00 PM in Miami, Florida this event would have occurred on January 2, 2021 at 3:00 AM in Tokyo, Japan. Instead, this event could be collectively discussed as happening on January 1, 2021 at 6:00 PM. For analyses where time of day is important, data must be converted out of UTC and into the local time zones where the data were collected.
2 time_zone This denotes the timezone in which the observation was made. This information is needed in order for the UTC time to be meaningful. For example, Eastern Standard Time (EST) is 5 hours behind UTC and Eastern Daylight Time (EDT) is 4 hours behind UTC.
3 captive_cultivated This column lets you know if the observation was captive/cultivated (TRUE) or not (FALSE).
4 latitude This is the latitude of the observation.
5 longitude This is the longitude of the observation.
6 scientific_name This is the scientific name of the observed organism. If the organism was not IDd to species, this will NOT be the species name. Instead, it will be the scientific name for the lowest taxonomic group at which the specimen was IDd by the observer.
7 common_name This is the common name for the observed organism. If the organism wasn’t IDd to species level, it will be the common name name for the lowest taxonomic group at which the specimen was IDd by the observer.
8 iconic_taxon_name This is the major organism taxonomic classifications set by iNaturalist. iNaturalist’s 13 “iconic taxon names” are outlined below.

iNaturalist time zones in our data set and their equivalent names in R:

iNaturalist R
America/New_York America/New_York
Eastern Time (US & Canada) US/Eastern
Central Time (US & Canada) US/Central
Pacific Time (US & Canada) US/Pacific
Arizona US/Arizona
Hawaii US/Hawaii
Atlantic Time (Canada) Canada/Atlantic
UTC UTC


iNaturalist’s 13 Iconic Taxonomic Names:

Name Taxonomic Level Taxon Common Name
Protozoa Kingdom Protozoans
Chromista Kingdom Kelp, Diatoms, Allies
Plantae Kingdom Plants
Fungi Kingdom Fungi, Lichens
Animalia Kingdom Animals+
Mollusca Phylum Mollusks
Mammalia Class Mammals
Aves Class Birds
Actinopterygii Class Ray-fined Fishes
Reptilia Class Reptiles
Amphibia Class Amphibians
Insecta Class Insects
Arachnida Class Arachnids

+Organisms listed in the rows below Animalia are also technically in this Kingdom, but the “Animalia” taxon ID for iNaturalist is used as a “catch-all” for organisms that don’t fall under the lower taxonomic classifications of the lower iconic taxon groups.


Disclaimer:  For the purpose of this lab, calculations are done per habitat, per iNaturalist’s
                           iconic taxon. In practice, these would be done for the entire system as a
                           whole.

NOTE: This code was designed to run MMC and BBC Data interchangeably
  1. After you have run ALL code for “MMC_getting_started”: Go to File > Save
  2. Go to File > New File > R Script
    1. An empty script file should open up un another tab
  3. Come back to this script (MMC_getting-started.R)
  4. press Ctrl>+A (select/highlight all) then Ctrl>+C (copy)
  5. Go back to the blank script and press Ctrl+V (paste)
  6. In your new script, press Ctrl+F (find)
  7. A search box should appear above your script window
  8. Press Ctrl+A to select ALL text in your script window
  9. Click on the white check box that says In selection
  10. In the search bar that says Find, type “MMC_”
  11. Click the box next to the Find bar that says ALL
    1. This will highlight all instances of “MMC_” in the selected text
    2. Quickly scroll through and check that only “MMC_” is highlighted
  12. In the bar that says Replace, type “BBC_”
  13. Once you are sure the steps above have been followed correctly, after the bar that says Replace, click All.
    1. This will replace all instances of “MMC_” with “BBC_”
    2. By changing these prefixes of our object and value names, we can use the same code without “saving over” the values and objects already saved in our Global Environment.
  14. Give another quick scroll to make sure changes have been made correctly
  15. Save a copy of the updated script:
    1. Go to: File > Save As and rename the script “BBC_getting-started.R”
    2. Make sure to Save As and not Save. We don’t want to overwrite our MMC script by mistake!
    This message will be repeated below the MMC tutorial as a reminder :)
[Return to Top] or [Return Home]


——— Filtering and Organizing the Data ————


Exploring MMC iNaturalist Data:


[ Click to Display “MMC_getting-started” Script ]

If you haven’t already, open the R script file “MMC_getting-started.R” with RStudio.

When working with on a new script project, it’s always a good idea to clear your memory:

rm(list = ls())

Before we load our data, want to make sure all the packages we need are installed install.packages() and loaded library(). Packages only need to be installed once, but packages must be loaded each session. For this section of the lab, we will be using 3 packages: pivottabler, openxlsx, and dplyr.

#install.packages("pivottabler")
#install.packages("openxlsx")
#install.packages("dplyr")

library("pivottabler")
library("openxlsx")
library("dplyr")



Setting the Working Directory:

Unlike with most common computer programs, you cannot simply “open” a data file within RStudio. You must code for it by telling R WHERE the file is located and WHAT the file is called. We do this by setting the working directory. This lets R know from where it should pull files. This can be done manually by code setwd() or by using the keyboard shortcut Ctrl+Shift+H and selecting the appropriate folder. We will use the keyboard shortcut this time. Select the folder location designated by your TA.

Now that you have selected your working directory, lets check how R records our working directory location:

getwd()
## [1] "C:/Users/Kelle/OneDrive/PhD_Files/Academic Year 2021-2022/Head TA/Fall 2021/BSC2011L/Updated Labs/Lab2/Scripts"



| Your working directory will be different than what is printed here since this script was run on a different computer. Notice how R prints the name of your working directory. By following that same format, you could now write it into your code with the function setwd() instead by typing the name of your working directory inside of the function parentheses using quotation marks (“…”).
| Example: setwd("C:/Users/Kelle/OneDrive/Desktop/Lab2")



Reading in the data:

“Reading in” is how we open files with R. using the function read.csv( ). Remember, we are working with our MMC data first. The code below tells R to load in a .csv file called MMC_data.csv as an object and name <- it “MMC_data”. After running the code, you should see it appear under Data in your Global Environment.

MMC_data<-read.csv("MMC_data.csv")



Quick Checks:

Let’s take some quick first glaces at the data. With the function unique(), we can view how many unique entries there are for a particular column of a data frame. We can combine that with the function length(), which will count how count the number of rows in a data frame, or the “length” of a data frame.

We can use the unique() function to tell us a complete list of iconic taxon names that are present in our data frame. If we stack the length function on top of that, length(unique()), we can quickly count how many different iconic taxon names are present in our data frame.

unique(MMC_data$iconic_taxon_name) #list of all iconic taxon names in the data
[ Click to Display Comprehensive List of Iconic Taxon Names in MMC_data ]
##  [1] "Actinopterygii" "Aves"           "Plantae"        "Insecta"       
##  [5] "Reptilia"       "Fungi"          "Amphibia"       "Mammalia"      
##  [9] "Arachnida"      "Animalia"       "Mollusca"       ""

<br

length(unique(MMC_data$iconic_taxon_name)) #How many different iconic taxon names are present
## [1] 12


Similarly, we can do the same for the common names to get a count of how many different common names are present in the data, and a comprehensive list of those common names.

unique(MMC_data$common_name) #list of all iconic taxon names in the data
[ Click to Display Comprehensive List of Common Names in MMC_data ]
##   [1] "Warmouth"                                                 
##   [2] "Letourneuxi's Jewel Cichlid"                              
##   [3] "Spotted Tilapia"                                          
##   [4] "Bluegill"                                                 
##   [5] "Black-and-white Warbler"                                  
##   [6] "Prairie Warbler"                                          
##   [7] "Blue-headed Vireo"                                        
##   [8] ""                                                         
##   [9] "sea hibiscus"                                             
##  [10] "Northern Cardinal"                                        
##  [11] "Zebra Longwing"                                           
##  [12] "Satinleaf"                                                
##  [13] "slash pine"                                               
##  [14] "shoebutton Ardisia"                                       
##  [15] "Julia Heliconian"                                         
##  [16] "Pigeon Plum"                                              
##  [17] "cabbage palmetto"                                         
##  [18] "Madagascar Periwinkle"                                    
##  [19] "Brown Anole"                                              
##  [20] "Shiny-leaved Wild Coffee"                                 
##  [21] "bracket fungi"                                            
##  [22] "Siam weed"                                                
##  [23] "sea grape"                                                
##  [24] "colicwood"                                                
##  [25] "Firebush"                                                 
##  [26] "Ray-finned Fishes"                                        
##  [27] "Green Heron"                                              
##  [28] "Yellow-bellied Slider"                                    
##  [29] "Cooters"                                                  
##  [30] "Fungi Including Lichens"                                  
##  [31] "Three-spotted Skipper"                                    
##  [32] "Cane Toad"                                                
##  [33] "Green Iguana"                                             
##  [34] "Northern Mockingbird"                                     
##  [35] "Brown Basilisk"                                           
##  [36] "White-crowned Sparrow"                                    
##  [37] "Clay-colored Sparrow"                                     
##  [38] "Ruby-throated Hummingbird"                                
##  [39] "Common Yellowthroat"                                      
##  [40] "Empidonax Flycatchers"                                    
##  [41] "Gray Fox"                                                 
##  [42] "Black-throated Blue Warbler"                              
##  [43] "Ovenbird"                                                 
##  [44] "Wilson's Warbler"                                         
##  [45] "Ruby-crowned Kinglet"                                     
##  [46] "American Coot"                                            
##  [47] "Pied-billed Grebe"                                        
##  [48] "Yellow-rumped Warbler"                                    
##  [49] "Boat-tailed Grackle"                                      
##  [50] "Osprey"                                                   
##  [51] "Southern Ringneck Snake"                                  
##  [52] "Bark Anole"                                               
##  [53] "Dorantes Longtail"                                        
##  [54] "North American Racer"                                     
##  [55] "Band-winged Dragonlet"                                    
##  [56] "Belted Kingfisher"                                        
##  [57] "Northern Parula"                                          
##  [58] "Tersa Sphinx"                                             
##  [59] "Vinegar and Fruit Flies"                                  
##  [60] "Cotton Stainer Bugs"                                      
##  [61] "Common Cotton Stainer Bug"                                
##  [62] "Cassius Blue"                                             
##  [63] "Mabel Orchard Orbweaver"                                  
##  [64] "Great Pondhawk"                                           
##  [65] "Florida Tussock Moth"                                     
##  [66] "Monarch"                                                  
##  [67] "Blue Dasher"                                              
##  [68] "Needham's Skimmer"                                        
##  [69] "Atala"                                                    
##  [70] "Everglades Racer"                                         
##  [71] "Sleepy Orange"                                            
##  [72] "Horace's Duskywing"                                       
##  [73] "Diaprepes Root Weevil"                                    
##  [74] "Fiery Skipper"                                            
##  [75] "Asian Lady Beetle"                                        
##  [76] "Halloween Pennant"                                        
##  [77] "Gulf Fritillary"                                          
##  [78] "Tropical Orbweaver"                                       
##  [79] "Butterflies and Moths"                                    
##  [80] "Knight Anole"                                             
##  [81] "Hammock Skipper"                                          
##  [82] "Eastern Phoebe"                                           
##  [83] "Common Green Jewel Fly"                                   
##  [84] "Large Milkweed Bug"                                       
##  [85] "Northern Plushback"                                       
##  [86] "Chimney Swift"                                            
##  [87] "Calcareous Morning-glory"                                 
##  [88] "Killdeer"                                                 
##  [89] "trailing daisy"                                           
##  [90] "Tropical Checkered-Skipper"                               
##  [91] "Umbrella Paper Wasps"                                     
##  [92] "Baracoa Skipper"                                          
##  [93] "Crambid Snout Moths"                                      
##  [94] "Blackpoll Warbler"                                        
##  [95] "Cape May Warbler"                                         
##  [96] "Orange-crowned Warbler"                                   
##  [97] "Cooper's Hawk"                                            
##  [98] "painted leaf"                                             
##  [99] "stately maiden fern"                                      
## [100] "Bishop wood"                                              
## [101] "pineland heliotrope"                                      
## [102] "castor bean"                                              
## [103] "Bitter Melon"                                             
## [104] "Green Anole"                                              
## [105] "Blue Jay"                                                 
## [106] "Gray Catbird"                                             
## [107] "European Starling"                                        
## [108] "Great Blue Heron"                                         
## [109] "Northern Waterthrush"                                     
## [110] "Clouded Skipper"                                          
## [111] "Rice Stink Bug"                                           
## [112] "Variegated Fritillary"                                    
## [113] "Arabesque Orbweaver"                                      
## [114] "Calligrapher Flies"                                       
## [115] "Basidiomycete Fungi"                                      
## [116] "Indigo Bunting"                                           
## [117] "Monk Skipper"                                             
## [118] "Spotted Sandpiper"                                        
## [119] "Solitary Sandpiper"                                       
## [120] "Red-shouldered Hawk"                                      
## [121] "Dusky Herpetogramma Moth"                                 
## [122] "shrubby false buttonweed"                                 
## [123] "Brahminy Blindsnake"                                      
## [124] "Rusty Millipede"                                          
## [125] "Barn Swallow"                                             
## [126] "Eastern Kingbird"                                         
## [127] "Wandering Glider"                                         
## [128] "Orange-spotted Flower Moth"                               
## [129] "Roseate Skimmer"                                          
## [130] "Cerulean Warbler"                                         
## [131] "Eastern Giant Swallowtail"                                
## [132] "Woodlice and Pillbugs"                                    
## [133] "rustgills and gyms"                                       
## [134] "Eastern Wood-Pewee"                                       
## [135] "Common Raccoon"                                           
## [136] "Warrior Beetles"                                          
## [137] "Oleander Aphid"                                           
## [138] "Blue-gray Gnatcatcher"                                    
## [139] "Soil Centipedes"                                          
## [140] "Common Gilled Mushrooms and Allies"                       
## [141] "bluemink"                                                 
## [142] "morning-glories"                                          
## [143] "Assembly Moth"                                            
## [144] "Melonworm Moth"                                           
## [145] "Crab Spiders"                                             
## [146] "Summer Tanager"                                           
## [147] "Worm-eating Warbler"                                      
## [148] "Large Orange Sulphur"                                     
## [149] "Ornate Bella Moth"                                        
## [150] "Cuban Grassquit"                                          
## [151] "Red-eyed Vireo"                                           
## [152] "shelf fungi"                                              
## [153] "fringed sawgill"                                          
## [154] "Black Saddlebags"                                         
## [155] "Hooded Warbler"                                           
## [156] "Yellow-throated Warbler"                                  
## [157] "Baltimore Oriole"                                         
## [158] "Blue Grosbeak"                                            
## [159] "Merlin"                                                   
## [160] "Scarlet Tanager"                                          
## [161] "Twilight Darner"                                          
## [162] "Golden-winged Warbler"                                    
## [163] "Swainson's Thrush"                                        
## [164] "Milkweed Assassin Bug"                                    
## [165] "Stink Bugs, Shield Bugs, and Allies"                      
## [166] "Hawaiian Beet Webworm Moth"                               
## [167] "Common Green Leafhopper"                                  
## [168] "Milky Argyria Moth"                                       
## [169] "House Flies and Allies"                                   
## [170] "Calyptrate Flies"                                         
## [171] "Carpet-grass Webworm Moth"                                
## [172] "Black-throated Green Warbler"                             
## [173] "Blackburnian Warbler"                                     
## [174] "Tennessee Warbler"                                        
## [175] "Speckled Duns"                                            
## [176] "Chuck-will's-widow"                                       
## [177] "Sharp-shinned Hawk"                                       
## [178] "Brown Thrasher"                                           
## [179] "Black Rat"                                                
## [180] "White-lined Sphinx"                                       
## [181] "Surinam Cockroach"                                        
## [182] "dayflowers"                                               
## [183] "amaranths"                                                
## [184] "false daisy"                                              
## [185] "Red Saddlebags"                                           
## [186] "blue mistflower"                                          
## [187] "oceanblue morning glory"                                  
## [188] "Common Purslane"                                          
## [189] "Insects"                                                  
## [190] "American Redstart"                                        
## [191] "Chamberbitter"                                            
## [192] "Bumblebee Millipede"                                      
## [193] "Myrtle Warbler"                                           
## [194] "Largeflower pink-sorrel"                                  
## [195] "Coffee-loving Pyrausta Moth"                              
## [196] "Sawflies, Horntails, and Wood Wasps"                      
## [197] "Florida Carpenter Ant"                                    
## [198] "Flatworms"                                                
## [199] "Anoles"                                                   
## [200] "mustard family"                                           
## [201] "straggler daisy"                                          
## [202] "Cuban Tree Frog"                                          
## [203] "Plant Bugs"                                               
## [204] "Click Beetles"                                            
## [205] "Armyworm Moths"                                           
## [206] "Brazilian Leafhopper"                                     
## [207] "Dark Flower Scarab"                                       
## [208] "Banded Garden Spider"                                     
## [209] "Yellow alder"                                             
## [210] "Red Bay Psyllid"                                          
## [211] "redbay"                                                   
## [212] "whitetop sedge"                                           
## [213] "Field Copperleaf"                                         
## [214] "Turkey Vulture"                                           
## [215] "Cedar Waxwing"                                            
## [216] "Beetles"                                                  
## [217] "largeflower Mexican clover"                               
## [218] "turkey tangle frogfruit"                                  
## [219] "lanceleaf arrowhead"                                      
## [220] "Bladder Snails"                                           
## [221] "Animals"                                                  
## [222] "Mexican Primrose-willow"                                  
## [223] "pickerelweed"                                             
## [224] "stoneworts"                                               
## [225] "dicots"                                                   
## [226] "Gumbo Limbo"                                              
## [227] "plants"                                                   
## [228] "Long-legged Flies"                                        
## [229] "maiden ferns"                                             
## [230] "Eastern Mosquitofish"                                     
## [231] "Blolly"                                                   
## [232] "common lantana"                                           
## [233] "ladder fern"                                              
## [234] "muscadine"                                                
## [235] "Cocoplum"                                                 
## [236] "southern live oak"                                        
## [237] "flowering plants"                                         
## [238] "Brazilian pepper"                                         
## [239] "heathers, balsams, primroses, and allies"                 
## [240] "Red Tasselflower"                                         
## [241] "Florida Strangler Fig"                                    
## [242] "palms, bullanocks, and allies"                            
## [243] "yellowtops"                                               
## [244] "oaks"                                                     
## [245] "Nettletree"                                               
## [246] "Coontie"                                                  
## [247] "Wild Tamarind"                                            
## [248] "True Toads"                                               
## [249] "water-lilies"                                             
## [250] "Frogs and Toads"                                          
## [251] "alligator flag"                                           
## [252] "Locustberry"                                              
## [253] "graceful spurge"                                          
## [254] "Ageratums"                                                
## [255] "blue-eyed grasses"                                        
## [256] "dogfennel"                                                
## [257] "golden polypody"                                          
## [258] "Figs"                                                     
## [259] "common ragweed"                                           
## [260] "bonesets, blazingstars, and allies"                       
## [261] "fourspike heliotrope"                                     
## [262] "lilac tasselflower"                                       
## [263] "False Mastic"                                             
## [264] "Myrtle-of-the-River"                                      
## [265] "Airplants"                                                
## [266] "mahogany family"                                          
## [267] "Mammals"                                                  
## [268] "pale passionflower"                                       
## [269] "Royal Palm"                                               
## [270] "dahoon holly"                                             
## [271] "Sleepy Morning"                                           
## [272] "Pine fern"                                                
## [273] "Monk Orchid"                                              
## [274] "marsh fern"                                               
## [275] "Necklace pod"                                             
## [276] "wild leadwort"                                            
## [277] "Virginia creeper"                                         
## [278] "bindweed family"                                          
## [279] "ticktrefoils"                                             
## [280] "Higher Ascomycetes"                                       
## [281] "Western Honey Bee"                                        
## [282] "palms"                                                    
## [283] "wild potato vine"                                         
## [284] "Hairy Hexagonia"                                          
## [285] "Velvet bean"                                              
## [286] "Romerillo"                                                
## [287] "ferns"                                                    
## [288] "saw palmetto"                                             
## [289] "Narrowleaf Yellowtops"                                    
## [290] "rockweed"                                                 
## [291] "Moses-in-the-cradle"                                      
## [292] "saw greenbrier"                                           
## [293] "Bayberries"                                               
## [294] "Scarabs"                                                  
## [295] "Spinybacked Orbweaver"                                    
## [296] "Florida swampprivet"                                      
## [297] "Monk Parakeet"                                            
## [298] "Plushback Flies"                                          
## [299] "Leptosporangiate Ferns"                                   
## [300] "Bladderworts"                                             
## [301] "Papaya"                                                   
## [302] "Tropical sage"                                            
## [303] "Carpenter Ants"                                           
## [304] "Sea Almond"                                               
## [305] "Spotless Lady Beetle"                                     
## [306] "grasses"                                                  
## [307] "Scorpion's-tail"                                          
## [308] "orchard spider and allies"                                
## [309] "high-latitude oaks"                                       
## [310] "nightshades and allies"                                   
## [311] "American hard pines"                                      
## [312] "Grass Skippers"                                           
## [313] "South Florida slash pine"                                 
## [314] "Bahama Brake"                                             
## [315] "grasses, sedges, cattails, and allies"                    
## [316] "Birds"                                                    
## [317] "White Peacock"                                            
## [318] "Spiders"                                                  
## [319] "Skimmers"                                                 
## [320] "Eastern Gray Squirrel"                                    
## [321] "Red Imported Fire Ant"                                    
## [322] "True Bugs"                                                
## [323] "Cannabis, Hackberries, Hops, and Allies"                  
## [324] "Giant Sweet Potato Bug"                                   
## [325] "Indian banyan"                                            
## [326] "Water, Rove, Scarab, Long-horned, Leaf, and Snout Beetles"
## [327] "Cichlids"                                                 
## [328] "Spiny Fiddlewood"                                         
## [329] "Oyster Mushroom"                                          
## [330] "Long-tailed Skipper"                                      
## [331] "Tricolored Heron"                                         
## [332] "Pin-tailed Pondhawk"                                      
## [333] "White-eyed Vireo"                                         
## [334] "Porterweed"                                               
## [335] "Minor Blueleg Centipede"                                  
## [336] "Cinnabar Bracket"                                         
## [337] "Stubby Hover Fly"                                         
## [338] "Bagworm Moths"                                            
## [339] "Io Moth"                                                  
## [340] "Abbot's Bagworm Moth"                                     
## [341] "Black-olive Caterpillar Moth"                             
## [342] "Common Tan Wave"                                          
## [343] "Twice-stabbed Lady Beetles"                               
## [344] "Genista Broom Moth"                                       
## [345] "Ceraunus Blue"                                            
## [346] "greater plantain"                                         
## [347] "Mallow Scrub-Hairstreak"                                  
## [348] "Caribbean scoliid wasp"                                   
## [349] "Cabbage Webworm Moth"                                     
## [350] "Southern Green Stink Bug"                                 
## [351] "Caribbean Fruit Fly"                                      
## [352] "Soldier Flies"                                            
## [353] "Palm Flatid Planthopper"                                  
## [354] "Yellow Mocis Moth"                                        
## [355] "Small-spotted Fairy Lady Beetle"                          
## [356] "Planarians"                                               
## [357] "New Guinea Flatworm"                                      
## [358] "Yellow-legged Hover Fly"                                  
## [359] "white octoblepharum moss"                                 
## [360] "Moderately Smooth Warrior Beetle"                         
## [361] "Euphorbia Bug"                                            
## [362] "Shoestring Fern"                                          
## [363] "Lauxaniid Flies"                                          
## [364] "Six-Spotted Carpenter Ant"                                
## [365] "Acalyptrate Flies"                                        
## [366] "Red-banded Stink Bug"                                     
## [367] "Phaon Crescent"                                           
## [368] "Citrine Forktail"                                         
## [369] "Short-horned Grasshoppers"                                
## [370] "Geometer Moths"                                           
## [371] "Spider Wasps"                                             
## [372] "Fragile Forktail"                                         
## [373] "Three-cornered Alfalfa Hopper"                            
## [374] "Red Caustic-creeper"                                      
## [375] "hyssop spurge"                                            
## [376] "Eastern Amberwing"                                        
## [377] "West Indian Flatid Planthopper"                           
## [378] "Paradise Tree"                                            
## [379] "Chinese Crown Orchid"                                     
## [380] "Florida Fast Woodlouse"                                   
## [381] "Darkling Beetles"                                         
## [382] "sword ferns"                                              
## [383] "Goose Grass"                                              
## [384] "painted spurge"                                           
## [385] "Rattail"                                                  
## [386] "Rattlepods"                                               
## [387] "legumes"                                                  
## [388] "Phasey Bean"                                              
## [389] "Carolina ruellia"                                         
## [390] "pale bitter bolete"                                       
## [391] "Greenbottle Flies"                                        
## [392] "orange bladder"                                           
## [393] "Brazilian Skipper"                                        
## [394] "Candlesnuff Fungus"                                       
## [395] "White-jawed Jumping Spider"                               
## [396] "Metallic Flea Beetles"                                    
## [397] "Common Milkcaps"                                          
## [398] "indigo milk cap"                                          
## [399] "Bushbeans"                                                
## [400] "Primrose-willows"                                         
## [401] "hairy indigo"                                             
## [402] "sunshine mimosa"                                          
## [403] "mosses"                                                   
## [404] "Florida firebush"                                         
## [405] "broomsedge bluestem"                                      
## [406] "Cochineal Nopal Cactus"                                   
## [407] "Ocola Skipper"                                            
## [408] "Bay-breasted Warbler"                                     
## [409] "Yellows and Sulphurs"                                     
## [410] "Gray-cheeked Thrush"                                      
## [411] "toothpetal false reinorchid"                              
## [412] "Perching Birds"                                           
## [413] "Bur Marigolds"                                            
## [414] "Southern Swamp Crinum"                                    
## [415] "Regal Darner"                                             
## [416] "day jessamine"                                            
## [417] "Cure-for-all"                                             
## [418] "Willow Bustic"                                            
## [419] "pines"                                                    
## [420] "bluestems"                                                
## [421] "air potato"                                               
## [422] "Mosquitoes"                                               
## [423] "Button Sage"                                              
## [424] "Leavenworth's Tickseed"                                   
## [425] "Eastern Leaf-footed Bug"                                  
## [426] "Mexican Alvaradoa"                                        
## [427] "Lichens"                                                  
## [428] "Loquat"                                                   
## [429] "napier grass"                                             
## [430] "leadtrees"                                                
## [431] "American beautyberry"                                     
## [432] "Wood ear fungi"                                           
## [433] "Blue-striped Spreadwing"                                  
## [434] "soapberries, cashews, mahoganies, and allies"             
## [435] "Coffee Senna"                                             
## [436] "Blue Porterweed"                                          
## [437] "Cloudless Sulphur"                                        
## [438] "Lentinus sect. Lentinus"                                  
## [439] "manyflower marshpennywort"                                
## [440] "Magnolia Green Jumping Spider"                            
## [441] "Scoliid Wasps"                                            
## [442] "Cluster-leafs"                                            
## [443] "tropical milkweed"                                        
## [444] "Indian blanket"                                           
## [445] "splitgill mushroom"                                       
## [446] "woodcaps and sawgills"                                    
## [447] "Great Southern White"                                     
## [448] "Black-sided Calligrapher"                                 
## [449] "Florida Calligrapher"                                     
## [450] "Neotropical Red-shouldered Stink Bug"                     
## [451] "Bermudagrass Leafhopper"                                  
## [452] "Centipede Tongavine"                                      
## [453] "Bushy Bluestem"                                           
## [454] "Dragonflies"                                              
## [455] "Vasey Grass"                                              
## [456] "Java Glorybower"                                          
## [457] "nerved witchgrass"                                        
## [458] "Twining Soldierbush"                                      
## [459] "West Indian Lilac"                                        
## [460] "limewater brookweed"                                      
## [461] "Swamp Bay"                                                
## [462] "black calabash"                                           
## [463] "Leavenworth's goldenrod"                                  
## [464] "creeping beggarweed"                                      
## [465] "Wire Bluestem"                                            
## [466] "Violet Crabgrass"                                         
## [467] "monocots"                                                 
## [468] "sea torchwood"                                            
## [469] "Pepper Cinnamon"                                          
## [470] "Jumping Spiders"                                          
## [471] "Hydrilla"                                                 
## [472] "Candy Cap"                                                
## [473] "Red-bellied Woodpecker"                                   
## [474] "pinwheels and parachute mushrooms"                        
## [475] "rosy navel"                                               
## [476] "Swamplilies"                                              
## [477] "Mustard Yellow Polypore"                                  
## [478] "Conifer-base Polypore"                                    
## [479] "Sensitive Plant"                                          
## [480] "Orange-barred Sulphur"                                    
## [481] "Evening Skimmer"                                          
## [482] "Sunshine Tree"                                            
## [483] "Slippery Jacks"                                           
## [484] "snow fungus"                                              
## [485] "Burma Reed"


length(unique(MMC_data$common_name)) #How many different iconic taxon names are present
## [1] 485



Creating Pivot Tables:

To quickly browse our observation data, let’s build a quick pivot table. As a reminder, anything you see after a " # " are just notes, and not executable code.

MMC_pivot<- PivotTable$new() #tell R you want to make a new pivot table called "MMC_pivot"
MMC_pivot$addData(MMC_data) #tell R the data frame you want to use
MMC_pivot$addRowDataGroups("iconic_taxon_name") #tell R which data columns you want to appear on the rows
MMC_pivot$addRowDataGroups("common_name", addTotal = TRUE) #tell R which data you want appear as subset on the rows
MMC_pivot$defineCalculation(calculationName="Total Observations", summariseExpression="n()") #counts the number of time each common name per taxon group was observed
MMC_pivot$renderPivot()#tells R that you are ready for it to create your pivot table


After you run the code, you should see a pivot table in the Viewer tab that looks similar to the one below:

[ Click to Display Pivot Table ]


If everything looks OK, let’s go ahead and save a copy of your table into an Excel Workbook. createWorkbook(creator=Sys.getenv("USERNAME"))

MMC_pivot_excel <- createWorkbook(creator = Sys.getenv("USERNAME"))#tell R what you would like the name of the workbook to be. Don't change "USERNAME"
addWorksheet(MMC_pivot_excel, "MMC_taxon_data")#tell R what you would like the tab/worksheet of the workbook to be called
MMC_pivot$writeToExcelWorksheet(wb=MMC_pivot_excel, wsName="MMC_taxon_data", 
                         topRowNumber=1, leftMostColumnNumber=1, applyStyles=FALSE)#tell R which workbook you would like to use, and the name of the tab/worksheet of that workbook, and where you would like the table to be placed on the page
saveWorkbook(MMC_pivot_excel, file="MMC_pivot.xlsx", overwrite = TRUE)#Saves the file in your working directory under the name that you have chosen for "file="


Converting and Extracting Time:

Before we get started with calculations, let’s convert the “time_observed_at” column out of UTC and into a more meaningful time zone. Although there are several different time zones listed in the data frame unique(MMC_data$time_zone), we know that all observations from the MMC_data.csv file were made at MMC. Therefore, it is OK to assume that all observations were made originally under the “US/Eastern” time zone, since MMC is located in this time zone. Here, we first make a new column, local_time, in the “MMC_data” data frame, to store the “local time” once it is converted out of the UTC time zone.

Then, we tell R:
                i. that the data in column time_observed_at is in fact time data formatted as "YYYY-MM-DD HH:MM:SS in
                     the UTC time zone" format="%Y-%m-%d %H:%M:%S", tz='UTC',
                ii. to convert the time from UTC to US/Eastern, and
                iii. . reiterate that after it was converted that it is still time data formatted as "YYYY-MM-DD HH:MM:SS in
                     the US/Eastern" time zone format="%Y-%m-%d %H:%M:%S", tz='US/Eastern'.


MMC_data$local_time<-MMC_data$time_observed_at #Create new column for local date time
MMC_data$local_time <- as.POSIXct(MMC_data$local_time, format="%Y-%m-%d %H:%M:%S", tz='UTC') #put in posixct
MMC_data$local_time<-format(MMC_data$local_time,tz='US/Eastern') #format to local time zone (this step will change the format back to character)
MMC_data$local_time<-as.POSIXct(MMC_data$local_time,format="%Y-%m-%d %H:%M",tz='US/Eastern') #put back in posixct



Let’s practice extracting time by isolating the year, month, and hour of each observation and copy those data into their own respective columns titled “year”, “month”, and “hour”.

MMC_data$year<-format(MMC_data$local_time, format = "%Y") #add column for year
MMC_data$month<-format(MMC_data$local_time, format = "%m") #add column for month
MMC_data$hour<-format(MMC_data$local_time, format = "%H") #add column for time of day



Notice how the bottom two rows of column 1, time_observed_at, have no data. This is because the observer failed to record the information when they submitted their upload.

time_observed_at time_zone captive_cultivated latitude longitude scientific_name common_name iconic_taxon_name local_time year month hour
2014-10-10 13:19:25 UTC Eastern Time (US & Canada) false 25.75595 -80.37972 Lepomis gulosus Warmouth Actinopterygii 2014-10-10 09:19:00 2014 10 09
2015-05-20 11:47:58 UTC Eastern Time (US & Canada) false 25.75583 -80.37973 Hemichromis letourneuxi Letourneuxi’s Jewel Cichlid Actinopterygii 2015-05-20 07:47:00 2015 05 07
2014-12-18 15:08:26 UTC Eastern Time (US & Canada) false 25.75572 -80.37967 Pelmatolapia mariae Spotted Tilapia Actinopterygii 2014-12-18 10:08:00 2014 12 10
2014-12-18 14:47:43 UTC Eastern Time (US & Canada) false 25.75585 -80.37971 Lepomis macrochirus Bluegill Actinopterygii 2014-12-18 09:47:00 2014 12 09
Eastern Time (US & Canada) false 25.75432 -80.37931 Mniotilta varia Black-and-white Warbler Aves NA NA NA NA
Eastern Time (US & Canada) false 25.75447 -80.37908 Setophaga discolor Prairie Warbler Aves NA NA NA NA

Let’s save a copy of this data frame and call it MMC_timedata — in case we want to reference these data later.

Since this data frame is centered around time, we will also remove any rows where there was no time of observation, and therefore local time, recorded.

Finally, we will save MMC_timedata to our working directory as a .csv named MMC_timedata.csv.

MMC_timedata<-select(MMC_data,"local_time" ,"year", "month", "hour", "iconic_taxon_name", "common_name")#select the columns we want to keep
MMC_timedata<-MMC_timedata[!is.na(MMC_timedata$local_time),] #remove all rows where there is no local time
write.csv(MMC_timedata,"MMC_timedata.csv", row.names = FALSE) #Save a copy of MMC_diversity data frame as a .csv

Notice how rows 5-7 are now excluded from the data frame:

local_time year month hour iconic_taxon_name common_name
1 2014-10-10 09:19:00 2014 10 09 Actinopterygii Warmouth
2 2015-05-20 07:47:00 2015 05 07 Actinopterygii Letourneuxi’s Jewel Cichlid
3 2014-12-18 10:08:00 2014 12 10 Actinopterygii Spotted Tilapia
4 2014-12-18 09:47:00 2014 12 09 Actinopterygii Bluegill
8 2015-09-11 14:24:00 2015 09 14 Plantae
9 2015-09-11 14:29:00 2015 09 14 Plantae sea hibiscus



Biodiversity Measurements and Indices:

Summary table of biodiversity measurements and indices:

Measurement/Index Formula
Species Richness (S) \(S =\) number of species
Shannon-Wiener Diversity Index (H) \(H = -\sum{(p_i * ln(p_i))}\)
Species Eveness (E) \(S = \frac{H}{H_{MAX}}\)
Effective Number of Species (ENS) \(ENS = e^H\)


Other parameters and definitions needed to calculate the items above:

Metric Description Formula
Total Abundance (NTOTAL) Total number of individuals across all species
ni Total number of individuals of a species, \(x\)
pi Proportion of individuals of a species, \(x\), compared to \(N_{TOTAL}\) \(p_i = \frac{n_i}{N_{TOTAL}}\)

The chunk of code below might look intimidating, but its OK! Remember, we don’t expect you to know the complete ins and outs of all the “behind the scenes” work that is happening in the code. You can follow along what is happening in each step by reading the notes that follow the #.

MMC_spp_rich <- MMC_data %>%
  group_by(iconic_taxon_name) %>%
  summarise(S = length(unique(common_name)))#Creates column with each unique taxon ID with the corresponding species richness of each taxon

MMC_sum_taxoN <- MMC_data %>%
  group_by(iconic_taxon_name) %>%
  summarise(Total_Abundance_per_Taxon = n()) #Creates a column with each unique taxon ID with the corresponding total abundance (total observations) for each taxon

MMC_var<-MMC_spp_rich %>%
 inner_join(., MMC_sum_taxoN) #merges the previous two data frames by the list of unique taxon
#"." after inner_join is a place holder that denotes it is the same data frame that is listed above (short hand)

MMC_spp_abundance <- MMC_data %>%
  group_by(iconic_taxon_name, common_name) %>%
  summarise(Species_Abundance = n()) #Creates a data frame with the complete list of common names and their summed abundances

MMC_biod<-inner_join(MMC_spp_abundance,MMC_var)#Join MMC_spp_abundance with MMC_var
MMC_biod$Pi<-(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon)#Calculate Pi
MMC_biod$lnPi<-(log(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon))#Calculate ln(pi)
MMC_biod$PixlnPi<-(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon*log(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon))#Calculate Pi*ln(Pi)

MMC_diversity<-as.data.frame(MMC_var)#Make new data frame for final numbers

MMC_H <- MMC_biod %>%
  group_by(iconic_taxon_name) %>%
  summarise(H=(sum(PixlnPi))*(-1))#Calculate H
MMC_diversity<-inner_join(MMC_diversity,MMC_H)#Append H to MMC_diversity

MMC_Hmax <- MMC_biod %>%
  group_by(iconic_taxon_name) %>%
  summarise(Hmax=log(S))#Calculate Hmax
MMC_diversity<-inner_join(MMC_diversity,distinct(MMC_Hmax,))#Append Hmax to MMC_diversity

MMC_diversity$E<-(MMC_diversity$H)/(MMC_diversity$Hmax)#Calculated E and append it to MMC_diversity
MMC_diversity$ENS<-exp(MMC_diversity$H)#Calculated ENS and append it to MMC_diversity


MMC_diversity data frame:

iconic_taxon_name S Total_Abundance_per_Taxon H Hmax E ENS
1 6 0.0000000 0.0000000 NaN 1.000000
Actinopterygii 7 12 1.8200760 1.9459101 0.9353340 6.172327
Amphibia 4 11 1.2882523 1.3862944 0.9292776 3.626443
Animalia 11 26 1.9203009 2.3978953 0.8008277 6.823011
Arachnida 12 27 2.2585197 2.4849066 0.9088952 9.568914
Aves 73 152 4.0463972 4.2904594 0.9431151 57.191039
Fungi 29 75 3.0200424 3.3672958 0.8968747 20.492160
Insecta 143 431 4.0213775 4.9628446 0.8102969 55.777887
Mammalia 5 18 1.2260761 1.6094379 0.7618039 3.407831
Mollusca 2 4 0.5623351 0.6931472 0.8112781 1.754765
Plantae 191 667 4.6527333 5.2522734 0.8858513 104.871243
Reptilia 13 104 2.0426519 2.5649494 0.7963712 7.711031

Great! Now lets save a copy of data frame MMC_diversity as a .csv.

write.csv(MMC_diversity,"MMC_diversity.csv", row.names = FALSE)



During that big chunk of code, we created another data frame: MMC_spp_abundance. This data frame is similar to the pivot table we created earlier and was created by the code:

MMC_spp_abundance <- MMC_data %>%
  group_by(iconic_taxon_name, common_name) %>%
  summarise(Species_Abundance = n())



First 6 rows of data frame MMC_spp_abundance:

iconic_taxon_name common_name Species_Abundance
6
Actinopterygii Bluegill 3
Actinopterygii Cichlids 1
Actinopterygii Eastern Mosquitofish 2
Actinopterygii Letourneuxi’s Jewel Cichlid 1
Actinopterygii Ray-finned Fishes 1

Let’s save a copy of data frame MMC_spp_abundance as a .csv, also!

write.csv(MMC_spp_abundance,"MMC_spp_abundance.csv", row.names = FALSE)



Before we wrap up with MMC_data.csv, write a .csv with specific columns from the MMC_data data frame for us to use in Part 3: Mapping and Visualizing Data in Google Earth. Call the data frame MMC_latlon_year and save it as "MMC_latlon_year.csv" in our working directory. The data frame will include the following columns:

  • latitude
  • longitude
  • year
  • iconic_taxon_name
# Make a data frame with only latitude, longitude, year, and iconic_taxon_name
MMC_latlon_year <- MMC_data %>% select(latitude,longitude, year, iconic_taxon_name)
#Save a copy of MMC_latlon_year as a .csv
write.csv(MMC_latlon_year,"MMC_latlon_year.csv", row.names = FALSE)



First 6 rows of data frame MMC_latlon_year:

latitude longitude year iconic_taxon_name
25.75595 -80.37972 2014 Actinopterygii
25.75583 -80.37973 2015 Actinopterygii
25.75572 -80.37967 2014 Actinopterygii
25.75585 -80.37971 2014 Actinopterygii
25.75432 -80.37931 NA Aves
25.75447 -80.37908 NA Aves

At this point, you should have the following files in the folder designated as your working directory:

  • MMC_pivot.xlsx
  • MMC_diversity.csv
  • MMC_spp_abundance.csv
  • MMC_latlon_year.csv
NOTICE: We have now COMPLETED all code for MMC_data!
  1. Save your R script now: Go to File > Save
  2. Go to File > New File > R Script
    1. An empty script file should open up un another tab
  3. Come back to this script (MMC_getting-started.R)
  4. Press Ctrl+A (select/highlight all) then Ctrl+C (copy)
  5. Go back to the new, blank script and press Ctrl+V (paste)
  6. In your new script, press Ctrl+F (find)
  7. A search box should appear above your script window
  8. Press Ctrl+A to select ALL text in your script window
  9. Click on the white check box that says In selection
  10. In the search bar that says Find, type “MMC_”
  11. Click the box next to the Find bar that says ALL
    1. This will highlight all instances of “MMC_” in the selected text
    2. Quickly scroll through and check that only “MMC_” is highlighted
  12. In the bar that says Replace, type “BBC_”
  13. Once you are sure the steps above have been followed correctly, after the bar that says Replace, click All.
    1. This will replace all instances of “MMC_” with “BBC_”
    2. By changing these prefixes of our object and value names, we can use the same code without “saving over” the values and objects already saved in our Global Environment.
  14. Give another quick scroll to make sure changes have been made correctly
  15. Save a copy of the updated script:
    1. Go to: File > Save As and rename the script “BBC_getting-started.R”
    2. Make sure to Save As and not Save. We don’t want to overwrite our MMC script by mistake!
[Return to Top] or [Return Home]


Exploring BBC iNaturalist Data:


[ Click to Display “BBC_getting-started” Script ]

If you haven’t already, open the R script file “BBC_getting-started.R” with RStudio.

Remember, packages only need to be installed once, but they must be loaded during each session. Since we have already installed these packages, put a # in front of each of the three lines of install.packages code:

install.packages("pivottabler")
install.packages("openxlsx")
install.packages("dplyr")
#install.packages("pivottabler")
#install.packages("openxlsx")
#install.packages("dplyr")



This will retain the lines of code in case we want to reference them again, but will prevent them from being executed by R.

However, we DO still want clear the memory from our Global Environment Pane. Since we are now working with the new script, “BBC-getting-started.R”, we want to make sure that the code from the current script isn’t accidentally dependent or otherwise influenced by the code on the previous script, _“MMC_getting-started.R”.


Therefore, we will leave rm(list = ls()) as is.

One of the many perks of working with RStudio and other programming software over programs such as Google Sheets or Microsoft Excel is that once the code is written, you can execute the entire script at the press of a button, ok, two buttons, instead of having to click through everything again and again.

We will still have the step-by-step instructions and explanations for each line of code below as with the previous script, but this time, we are going to run the entire script all at once!

Press Ctrl+A
Press Ctrl+Enter

  • It might take a few moments, but you should now have the BBC version of the outputs we created with the last script.

At this point, you should have the following files in the folder designated as your working directory:

  • BBC_pivot.xlsx
  • BBC_diversity.csv
  • BBC_spp_abundance.csv
  • BBC_latlon_year.csv

[ Click for Breakdown of “BBC_getting-started” Code ]



Setting the Working Directory:

Unlike with most common computer programs, you cannot simply “open” a data file within RStudio. You must code for it by telling R WHERE the file is located and WHAT the file is called. We do this by setting the working directory. This lets R know from where it should pull files. This can be done manually by code setwd() or by using the keyboard shortcut Ctrl+Shift+H and selecting the appropriate folder. We will use the keyboard shortcut this time. Select the folder location designated by your TA.

Now that you have selected your working directory, lets check how R records our working directory location:

getwd()
## [1] "C:/Users/Kelle/OneDrive/PhD_Files/Academic Year 2021-2022/Head TA/Fall 2021/BSC2011L/Updated Labs/Lab2/Scripts"



Your working directory will be different than what is printed here since this script was run on a different computer. Notice how R prints the name of your working directory. By following that same format, you could now write it into your code with the function setwd() instead by typing the name of your working directory inside of the function parentheses using quotation marks (“…”).
Example: setwd("C:/Users/Kelle/OneDrive/Desktop/Lab2")



Reading in the Data:

Remember, we are working with our BBC data first. The code below tells R to load in a .csv file called BBC_data.csv as an object and name <- it “BBC_data”. After running the code, you should see it appear under Data in your Global Environment.

BBC_data<-read.csv("BBC_data.csv")



Quick Checks:

Let’s take some quick first glaces at the data. With the function unique(), we can view how many unique entries there are for a particular column of a data frame. We can combine that with the function length(), which will count how count the number of rows in a data frame, or the “length” of a data frame.

We can use the unique() function to tell us a complete list of iconic taxon names that are present in our data frame. If we stack the length function on top of that, length(unique()), we can quickly count how many different iconic taxon names are present in our data frame.

unique(BBC_data$iconic_taxon_name) #list of all iconic taxon names in the data
[ Click to Display Comprehensive List of Iconic Taxon Names in BBC_data ]
##  [1] "Plantae"        "Aves"           "Animalia"       "Mollusca"      
##  [5] "Insecta"        "Arachnida"      "Reptilia"       "Mammalia"      
##  [9] "Actinopterygii" "Amphibia"       "Chromista"      "Fungi"         
## [13] ""


length(unique(BBC_data$iconic_taxon_name)) #How many different iconic taxon names are present
## [1] 13


Similarly, we can do the same for the common names to get a count of how many different common names are present in the data, and a comprehensive list of those common names.

unique(BBC_data$common_name) #list of all iconic taxon names in the data
[ Click to Display Comprehensive List of Common Names in BBC_data ]
##   [1] "sea grape"                              
##   [2] "Egyptian Goose"                         
##   [3] ""                                       
##   [4] "Variegated Sea Urchin"                  
##   [5] "Atlantic Rock-boring Urchin"            
##   [6] "Bristle Worms"                          
##   [7] "Caribbean Reef Squid"                   
##   [8] "Yellow Stingray"                        
##   [9] "Florida Tussock Moth"                   
##  [10] "Green Anole"                            
##  [11] "Wood Stork"                             
##  [12] "Silver Garden Orbweaver"                
##  [13] "Gulf Fritillary"                        
##  [14] "Polychaete Worms"                       
##  [15] "Insects"                                
##  [16] "Mangrove Periwinkle"                    
##  [17] "Spinybacked Orbweaver"                  
##  [18] "Spiders"                                
##  [19] "Arthropods"                             
##  [20] "black mangrove"                         
##  [21] "Button Sage"                            
##  [22] "ladder fern"                            
##  [23] "Turkey Vulture"                         
##  [24] "Halloween Pennant"                      
##  [25] "Atala"                                  
##  [26] "Eastern Lubber Grasshopper"             
##  [27] "Beach Sunflower"                        
##  [28] "Monk Skipper"                           
##  [29] "Blue-Green Citrus Root Weevil"          
##  [30] "Great Blue Heron"                       
##  [31] "Florida Softshell Turtle"               
##  [32] "Domestic Mallard"                       
##  [33] "Mangrove Skipper"                       
##  [34] "Firebush"                               
##  [35] "Shiny-leaved Wild Coffee"               
##  [36] "American beautyberry"                   
##  [37] "red mangrove"                           
##  [38] "Romerillo"                              
##  [39] "White Ibis"                             
##  [40] "Zebra Longwing"                         
##  [41] "Indian blanket"                         
##  [42] "Thoracotrematan Crabs"                  
##  [43] "Tersa Sphinx"                           
##  [44] "Thinstripe Hermit Crab"                 
##  [45] "Tricolored Heron"                       
##  [46] "Osprey"                                 
##  [47] "Tropical Orbweaver"                     
##  [48] "Cure-for-all"                           
##  [49] "Scorpion's-tail"                        
##  [50] "Burma Reed"                             
##  [51] "Monk Orchid"                            
##  [52] "common ragweed"                         
##  [53] "Rust Weed"                              
##  [54] "Caesar weed"                            
##  [55] "Cocoplum"                               
##  [56] "Herb-of-Grace"                          
##  [57] "Streaked Rattlepod"                     
##  [58] "Common Purslane"                        
##  [59] "Narrowleaf Yellowtops"                  
##  [60] "Madagascar Periwinkle"                  
##  [61] "Chinese Crown Orchid"                   
##  [62] "Caribbean scoliid wasp"                 
##  [63] "Asian Lady Beetle"                      
##  [64] "Margined Leatherwing Beetle"            
##  [65] "Green Iguana"                           
##  [66] "Diaprepes Root Weevil"                  
##  [67] "coinvine"                               
##  [68] "castor bean"                            
##  [69] "Spanish moss"                           
##  [70] "Domestic Muscovy Duck"                  
##  [71] "Boat-tailed Grackle"                    
##  [72] "Florida Strangler Fig"                  
##  [73] "Florida Carpenter Ant"                  
##  [74] "legumes"                                
##  [75] "Eastern Gray Squirrel"                  
##  [76] "Regal Jumping Spider"                   
##  [77] "Bark Anole"                             
##  [78] "West Indian Bulimulus"                  
##  [79] "Dark Flower Scarab"                     
##  [80] "Little Blue Heron"                      
##  [81] "Curacao Bush"                           
##  [82] "Portia tree"                            
##  [83] "Blue Land Crab"                         
##  [84] "Checkered Puffer"                       
##  [85] "Orbweavers"                             
##  [86] "flowering plants"                       
##  [87] "largeflower Mexican clover"             
##  [88] "Common Comb Jelly"                      
##  [89] "Polka-Dot Wasp Moth"                    
##  [90] "Atlantic Flyingfish"                    
##  [91] "Brazilian pepper"                       
##  [92] "Anhinga"                                
##  [93] "dicots"                                 
##  [94] "oakleaf fleabane"                       
##  [95] "graceful spurge"                        
##  [96] "Grassleaf Spurge"                       
##  [97] "beach naupaka"                          
##  [98] "Snow Squarestem"                        
##  [99] "trailing daisy"                         
## [100] "True Toads"                             
## [101] "White Mangrove"                         
## [102] "sea ox-eye"                             
## [103] "Cane Toad"                              
## [104] "Knight Anole"                           
## [105] "Brown Anole"                            
## [106] "Northern Mockingbird"                   
## [107] "Whiteflies"                             
## [108] "Double-crested Cormorant"               
## [109] "Green Heron"                            
## [110] "Beaded Periwinkle"                      
## [111] "Yellow-throated Warbler"                
## [112] "Blolly"                                 
## [113] "Great Egret"                            
## [114] "Raspberry Wave"                         
## [115] "Scraped Pilocrocis Moth"                
## [116] "Watermilfoil Leafcutter Moth"           
## [117] "Common Green Darner"                    
## [118] "Three-spotted Skipper"                  
## [119] "Common Lovebug"                         
## [120] "Phaon Crescent"                         
## [121] "Poey's Furrow Bee"                      
## [122] "Northern Plushback"                     
## [123] "Mountain Fig"                           
## [124] "Bright Beefly"                          
## [125] "clustered yellowtops"                   
## [126] "Dilemma Orchid Bee"                     
## [127] "Keyhole Wasp"                           
## [128] "Fourleaf Vetch"                         
## [129] "rougeplant"                             
## [130] "Ghost Ant"                              
## [131] "American Broad-front Fiddler Crabs"     
## [132] "Brown-winged Striped Sweat Bee"         
## [133] "Painted Lady"                           
## [134] "Red-marked Pachodynerus Wasp"           
## [135] "Velvetbean Caterpillar Moth"            
## [136] "Bay-breasted Warbler"                   
## [137] "Blue-gray Gnatcatcher"                  
## [138] "Yellow-bellied Sapsucker"               
## [139] "Short-tailed Hawk"                      
## [140] "Swainson's Warbler"                     
## [141] "White leadtree"                         
## [142] "American Coot"                          
## [143] "Loggerhead Shrike"                      
## [144] "Streaked Sphinx"                        
## [145] "Green Feather Alga"                     
## [146] "West Indian False Cerith"               
## [147] "Sponges"                                
## [148] "Short-legged Springtails"               
## [149] "Brown Pelican"                          
## [150] "mangroves"                              
## [151] "Vetches"                                
## [152] "Bur Marigolds"                          
## [153] "Chamberbitter"                          
## [154] "carrots, ivies, and allies"             
## [155] "woodsorrels"                            
## [156] "creeping cucumber"                      
## [157] "shrubby false buttonweed"               
## [158] "Blue-legged Hermit Crab"                
## [159] "Mulsant's Water Treader"                
## [160] "Everglades Racer"                       
## [161] "Carolina ponysfoot"                     
## [162] "limewater brookweed"                    
## [163] "Seven-year Apple"                       
## [164] "Sea Almond"                             
## [165] "Hog plum"                               
## [166] "Figs"                                   
## [167] "Lined Treesnail"                        
## [168] "American Bird Grasshopper"              
## [169] "West Indian milkberry"                  
## [170] "Beefwoods"                              
## [171] "Anoles"                                 
## [172] "Moses-in-the-cradle"                    
## [173] "oaks"                                   
## [174] "Jamaican feverplant"                    
## [175] "Yellow-crowned Night-Heron"             
## [176] "Sri Lanka Weevil"                       
## [177] "Brachyceran Flies"                      
## [178] "turkey tangle frogfruit"                
## [179] "manyflower marshpennywort"              
## [180] "Domestic Cat"                           
## [181] "Virginia Opossum"                       
## [182] "Eurasian Collared-Dove"                 
## [183] "St. Andrew's Cotton Stainer"            
## [184] "cabbage palmetto"                       
## [185] "Carpenter Ants"                         
## [186] "Common Raccoon"                         
## [187] "Rattail"                                
## [188] "Fishbone Fern"                          
## [189] "coconut palm"                           
## [190] "Western Honey Bee"                      
## [191] "Cobweb Spiders"                         
## [192] "Silversides"                            
## [193] "Ray-finned Fishes"                      
## [194] "Tidewater Mojarra"                      
## [195] "Shads, Sardines, and Menhadens"         
## [196] "Long-legged Flies"                      
## [197] "Mexican ruellia"                        
## [198] "leadtrees"                              
## [199] "Southern Flannel Moth"                  
## [200] "Cattle Egret"                           
## [201] "Black Vulture"                          
## [202] "Common Gallinule"                       
## [203] "Gumbo Limbo"                            
## [204] "Beetles"                                
## [205] "creeping beggarweed"                    
## [206] "ticktrefoils"                           
## [207] "fanflowers"                             
## [208] "Virginia creeper"                       
## [209] "Chinese banyan"                         
## [210] "Saltwort"                               
## [211] "Cotton Stainer Bugs"                    
## [212] "Sargassum"                              
## [213] "Queen Conch"                            
## [214] "Barracudas"                             
## [215] "Atlantic Needlefish"                    
## [216] "Great Barracuda"                        
## [217] "Tinted Cantharus"                       
## [218] "Yellow Jack"                            
## [219] "Timucu Needlefish"                      
## [220] "Sargassumfish"                          
## [221] "Mojarras"                               
## [222] "Flat Needlefish"                        
## [223] "plants"                                 
## [224] "Needlefishes"                           
## [225] "Animals"                                
## [226] "Slender Mojarra"                        
## [227] "gentians, dogbanes, madders, and allies"
## [228] "Hermit Crabs"                           
## [229] "Bay Anchovy"                            
## [230] "Banded Coral Shrimp"                    
## [231] "Swimming Crabs"                         
## [232] "sea purslane"                           
## [233] "White Peacock"                          
## [234] "Bandtail Puffer"                        
## [235] "Puffers and Filefishes"                 
## [236] "Malacostracans"                         
## [237] "Grunts"                                 
## [238] "Perch-like Fishes"                      
## [239] "Pinfish"                                
## [240] "woodcaps and sawgills"                  
## [241] "green algae"                            
## [242] "American eelgrass"                      
## [243] "Frogs and Toads"                        
## [244] "Northern Barracuda"                     
## [245] "Royal Palm"                             
## [246] "Varunid crabs"                          
## [247] "grasses"                                
## [248] "Artist's Brackets, Reishi, and Allies"  
## [249] "Three-striped Dasher"                   
## [250] "stately maiden fern"                    
## [251] "Bagworms, Clothes Moths, and Allies"    
## [252] "bluestems"                              
## [253] "shelf fungi"                            
## [254] "Tridax daisy"                           
## [255] "Fungi Including Lichens"                
## [256] "Higher Ascomycetes"                     
## [257] "Scarabs"                                
## [258] "rockweed"                               
## [259] "Ants"                                   
## [260] "June Beetles"                           
## [261] "Broad-leaved gulfweed"                  
## [262] "Golden Silk Spider"                     
## [263] "Indian banyan"                          
## [264] "Winged and Once-winged Insects"         
## [265] "True Bugs"                              
## [266] "Southern Green-striped Grasshopper"     
## [267] "monocots"                               
## [268] "Sargassum and Allies"                   
## [269] "Birds"                                  
## [270] "Leaf-footed Bugs and Allies"            
## [271] "Hydrilla"                               
## [272] "water pennyworts"                       
## [273] "Creeping Woodsorrel"                    
## [274] "Northern Curly-tailed Lizard"           
## [275] "baldcypresses"                          
## [276] "Brown Bullhead"                         
## [277] "Musk Fern"                              
## [278] "yellowtops"                             
## [279] "Phasey Bean"                            
## [280] "North American Freshwater Catfishes"    
## [281] "Eastern Mosquitofish"                   
## [282] "Schoolmaster Snapper"                   
## [283] "Great White Heron"                      
## [284] "Crabs, Lobsters, and Allies"            
## [285] "House Geckos"                           
## [286] "Greenhouse Frog"                        
## [287] "Orange-spotted Flower Moth"             
## [288] "Louisiana Waterthrush"                  
## [289] "Milkweed Assassin Bug"                  
## [290] "Pantropical Jumping Spider"             
## [291] "Jaguar Guapote"                         
## [292] "Flamingo Chanterelle"                   
## [293] "Prothonotary Warbler"                   
## [294] "Sesarmid Marsh Crabs"                   
## [295] "Squareback Marsh Crab"                  
## [296] "sweet potato"                           
## [297] "Coffee-loving Pyrausta Moth"            
## [298] "Caribbean Land Hermit Crab"             
## [299] "Gregorywood"                            
## [300] "White-flowered black mangrove"          
## [301] "Rustic Sphinx"                          
## [302] "Typical Herons and Egrets"              
## [303] "Rusty Millipede"                        
## [304] "Black Saddlebags"                       
## [305] "ballmoss"                               
## [306] "Bivalves"                               
## [307] "Checkered Nerite"                       
## [308] "Gotu Cola"                              
## [309] "Snowy Egret"                            
## [310] "Red-bellied Woodpecker"                 
## [311] "Southern Tussock Moth"                  
## [312] "Beach Sheoak"                           
## [313] "Florida Stone Crab"                     
## [314] "deer mushroom"                          
## [315] "Camphor Tree"                           
## [316] "Red Imported Fire Ant"                  
## [317] "Skullcap Dapperling"                    
## [318] "southern live oak"                      
## [319] "Bluestriped Grunt"                      
## [320] "Mangrove Tree Crab"                     
## [321] "Mexican Paper Wasp"                     
## [322] "Garden Orbweavers"                      
## [323] "Purple-edged Lute"                      
## [324] "Gastropods"                             
## [325] "Mangrove Cupped Oyster"                 
## [326] "Assassin Bugs"                          
## [327] "Common Grackle"                         
## [328] "Flat Tree Oyster"                       
## [329] "Fish Crow"                              
## [330] "Blue Jay"                               
## [331] "Flies"                                  
## [332] "Band-winged Dragonlet"                  
## [333] "Indian fig opuntia"                     
## [334] "Butterflies and Moths"                  
## [335] "Longspined Porcupinefish"               
## [336] "Cochineal Nopal Cactus"                 
## [337] "Inshore Lizardfish"                     
## [338] "Scorpionfishes"                         
## [339] "Planehead filefish"                     
## [340] "Longhorn Crazy Ant"                     
## [341] "shoebutton Ardisia"                     
## [342] "Knotted Spikerush"                      
## [343] "vascular plants"                        
## [344] "Giant leather fern"                     
## [345] "dayflowers"                             
## [346] "Southern Moon Jelly"                    
## [347] "White Mullet"                           
## [348] "Hairy Hexagonia"                        
## [349] "marsh elder"                            
## [350] "Rambur's Forktail"                      
## [351] "redbay"                                 
## [352] "Scarlet Skimmer"                        
## [353] "Herrings"                               
## [354] "True Whelks and Allies"                 
## [355] "Silver Jenny"                           
## [356] "Umbrella Paper Wasps"                   
## [357] "Dragonflies"                            
## [358] "Pigeon Plum"                            
## [359] "Four-toothed Nerite"                    
## [360] "Gray Wall Jumping Spider"               
## [361] "Twig Ants"                              
## [362] "Fig Sphinx"                             
## [363] "Broad-tipped Conehead"


length(unique(BBC_data$common_name)) #How many different iconic taxon names are present
## [1] 363



Pivot Tables:

To quickly browse our observation data, let’s build a quick pivot table. As a reminder, anything you see after a " # " are just notes, and not executable code.

BBC_pivot<- PivotTable$new() #tell R you want to make a new pivot table called "BBC_pivot"
BBC_pivot$addData(BBC_data) #tell R the data frame you want to use
BBC_pivot$addRowDataGroups("iconic_taxon_name") #tell R which data columns you want to appear on the rows
BBC_pivot$addRowDataGroups("common_name", addTotal = TRUE) #tell R which data you want appear as subset on the rows
BBC_pivot$defineCalculation(calculationName="Total Observations", summariseExpression="n()") #counts the number of time each common name per taxon group was observed
BBC_pivot$renderPivot()#tells R that you are ready for it to create your pivot table
After you run the code, you should see a pivot table in the Viewer tab that looks similar to the one below:
[ Click to Display Pivot Table ]



If everything looks OK, let’s go ahead and save a copy of your table into an Excel Workbook.

BBC_pivot_excel <- createWorkbook(creator = Sys.getenv("USERNAME"))#tell R what you would like the name of the workbook to be. Don't change "USERNAME"
addWorksheet(BBC_pivot_excel, "BBC_taxon_data")#tell R what you would like the tab/worksheet of the workbook to be called
BBC_pivot$writeToExcelWorksheet(wb=BBC_pivot_excel, wsName="BBC_taxon_data", 
                         topRowNumber=1, leftMostColumnNumber=1, applyStyles=FALSE)#tell R which workbook you would like to use, and the name of the tab/worksheet of that workbook, and where you would like the table to be placed on the page
saveWorkbook(BBC_pivot_excel, file="BBC_pivot.xlsx", overwrite = TRUE)#Saves the file in your working directory under the name that you have chosen for "file="



Converting and Extracting Time:

Before we get started with calculations, let’s convert the “time_observed_at” column out of UTC and into a more meaningful time zone. Although there are several different time zones listed in the data frame unique(BBC_data$time_zone), we know that all observations from the BBC_data.csv file were made at BBC. Therefore, it is OK to assume that all observations were made originally under the “US/Eastern” time zone, since BBC is located in this time zone. Here, we first make a new column, local_time, in the “BBC_data” data frame, to store the “local time” once it is converted out of the UTC time zone.

Then, we tell R:

  1. That the data in column time_observed_at is in fact time data formatted as “YYYY-MM-DD HH:MM:SS in the UTC time zone” format="%Y-%m-%d %H:%M:%S", tz='UTC',
  2. To convert the time from UTC to US/Eastern, and
  3. reiterate that after it was converted that it is still time data formatted as “YYYY-MM-DD HH:MM:SS in the US/Eastern” time zone format="%Y-%m-%d %H:%M:%S", tz='US/Eastern'
BBC_data$local_time<-BBC_data$time_observed_at #Create new column for local date time
BBC_data$local_time <- as.POSIXct(BBC_data$local_time, format="%Y-%m-%d %H:%M:%S", tz='UTC') #put in posixct
BBC_data$local_time<-format(BBC_data$local_time,tz='US/Eastern') #format to local time zone (this step will change the format back to character)
BBC_data$local_time<-as.POSIXct(BBC_data$local_time,format="%Y-%m-%d %H:%M",tz='US/Eastern') #put back in posixct



Let’s practice extracting time by isolating the year, month, and hour of each observation and copy those data into their own respective columns titled “year”, “month”, and “hour”.

BBC_data$year<-format(BBC_data$local_time, format = "%Y") #add column for year
BBC_data$month<-format(BBC_data$local_time, format = "%m") #add column for month
BBC_data$hour<-format(BBC_data$local_time, format = "%H") #add column for time of day



Notice how only two rows of column 1, time_observed_at, have data. This is because the observer failed to record the information when they submitted their upload.

time_observed_at time_zone captive_cultivated latitude longitude scientific_name common_name iconic_taxon_name local_time year month hour
Eastern Time (US & Canada) false 25.90932 -80.13774 Coccoloba uvifera sea grape Plantae NA NA NA NA
2015-05-19 19:00:00 UTC Eastern Time (US & Canada) false 25.90927 -80.13847 Alopochen aegyptiaca Egyptian Goose Aves 2015-05-19 15:00:00 2015 05 15
2015-02-27 16:01:13 UTC Eastern Time (US & Canada) false 25.90980 -80.13726 Chloeia viridis Animalia 2015-02-27 11:01:00 2015 02 11
Eastern Time (US & Canada) false 25.90982 -80.13721 Lytechinus variegatus Variegated Sea Urchin Animalia NA NA NA NA
Eastern Time (US & Canada) false 25.90981 -80.13720 Echinometra lucunter Atlantic Rock-boring Urchin Animalia NA NA NA NA
Eastern Time (US & Canada) false 25.90981 -80.13721 Amphinomidae Bristle Worms Animalia NA NA NA NA



Let’s save a copy of this data frame and call it BBC_timedata — in case we want to reference these data later.

Since this data frame is centered around time, we will also remove any rows where there was no time of observation, and therefore local time, recorded.

Finally, we will save BBC_timedata to our working directory as a .csv named BBC_timedata.csv.

BBC_timedata<-select(BBC_data,"local_time" ,"year", "month", "hour", "iconic_taxon_name", "common_name")#select the columns we want to keep
BBC_timedata<-BBC_timedata[!is.na(BBC_timedata$local_time),] #remove all rows where there is no local time
write.csv(BBC_timedata,"BBC_timedata.csv", row.names = FALSE) #Save a copy of BBC_diversity data frame as a .csv


Notice how rows 1, 4-9 are now excluded from the data frame:

local_time year month hour iconic_taxon_name common_name
2 2015-05-19 15:00:00 2015 05 15 Aves Egyptian Goose
3 2015-02-27 11:01:00 2015 02 11 Animalia
10 2015-06-01 17:30:00 2015 06 17 Insecta Florida Tussock Moth
11 2015-09-02 07:59:00 2015 09 07 Arachnida
12 2015-09-01 14:50:00 2015 09 14 Reptilia Green Anole
13 2015-09-22 09:16:00 2015 09 09 Aves Egyptian Goose



Biodiversity Measurements and Indices:

Summary table of biodiversity measurements and indices:

Measurement/Index Formula
Species Richness (S) \(S =\) number of species
Shannon-Wiener Diversity Index (H) \(H = -\sum{(p_i * ln(p_i))}\)
Species Eveness (E) \(S = \frac{H}{H_{MAX}}\)
Effective Number of Species (ENS) \(ENS = e^H\)


Other parameters and definitions needed to calculate the items above:

Metric Description Formula
Total Abundance (NTOTAL) Total number of individuals across all species
ni Total number of individuals of a species, \(x\)
pi Proportion of individuals of a species, \(x\), compared to \(N_{TOTAL}\) \(p_i = \frac{n_i}{N_{TOTAL}}\)



The chunk of code below might look intimidating, but its OK! Remember, we don’t expect you to know the complete ins and outs of all the “behind the scenes” work that is happening in the code. You can follow along with what is happening in each step by reading the notes that follow the #.

BBC_spp_rich <- BBC_data %>%
  group_by(iconic_taxon_name) %>%
  summarise(S = length(unique(common_name)))#Creates column with each unique taxon ID with the corresponding species richness of each taxon

BBC_sum_taxoN <- BBC_data %>%
  group_by(iconic_taxon_name) %>%
  summarise(Total_Abundance_per_Taxon = n()) #Creates a column with each unique taxon ID with the corresponding total abundance (total observations) for each taxon

BBC_var<-BBC_spp_rich %>%
 inner_join(., BBC_sum_taxoN) #merges the previous two data frames by the list of unique taxon
#"." after inner_join is a place holder that denotes it is the same data frame that is listed above (short hand)

BBC_spp_abundance <- BBC_data %>%
  group_by(iconic_taxon_name, common_name) %>%
  summarise(Species_Abundance = n()) #Creates a data frame with the complete list of common names and their summed abundances

BBC_biod<-inner_join(BBC_spp_abundance,BBC_var)#Join BBC_spp_abundance with BBC_var
BBC_biod$Pi<-(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon)#Calculate Pi
BBC_biod$lnPi<-(log(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon))#Calculate ln(pi)
BBC_biod$PixlnPi<-(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon*log(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon))#Calculate Pi*ln(Pi)

BBC_diversity<-as.data.frame(BBC_var)#Make new data frame for final numbers

BBC_H <- BBC_biod %>%
  group_by(iconic_taxon_name) %>%
  summarise(H=(sum(PixlnPi))*(-1))#Calculate H
BBC_diversity<-inner_join(BBC_diversity,BBC_H)#Append H to BBC_diversity

BBC_Hmax <- BBC_biod %>%
  group_by(iconic_taxon_name) %>%
  summarise(Hmax=log(S))#Calculate Hmax
BBC_diversity<-inner_join(BBC_diversity,distinct(BBC_Hmax,))#Append Hmax to BBC_diversity

BBC_diversity$E<-(BBC_diversity$H)/(BBC_diversity$Hmax)#Calculated E and append it to BBC_diversity
BBC_diversity$ENS<-exp(BBC_diversity$H)#Calculated ENS and append it to BBC_diversity

BBC_diversity data frame:

iconic_taxon_name S Total_Abundance_per_Taxon H Hmax E ENS
1 9 0.0000000 0.000000 NaN 1.000000
Actinopterygii 37 152 2.8862478 3.610918 0.7993114 17.925921
Amphibia 4 8 1.2130076 1.386294 0.8750000 3.363586
Animalia 29 68 2.9665579 3.367296 0.8809912 19.424943
Arachnida 12 28 2.1121612 2.484907 0.8499962 8.266087
Aves 40 175 2.9357252 3.688880 0.7958312 18.835158
Chromista 3 15 0.9701158 1.098612 0.8830374 2.638250
Fungi 11 19 2.3056573 2.397895 0.9615338 10.030770
Insecta 79 189 3.8015479 4.369448 0.8700293 44.770430
Mammalia 5 113 0.7541642 1.609438 0.4685886 2.125834
Mollusca 16 48 2.2323441 2.772589 0.8051479 9.321692
Plantae 124 617 3.9307578 4.820282 0.8154623 50.945569
Reptilia 10 200 1.5391800 2.302585 0.6684574 4.660767

Great! Now lets save a copy of data frame BBC_diversity as a .csv.

write.csv(BBC_diversity,"BBC_diversity.csv", row.names = FALSE)



During that big chunk of code, we created another data frame: BBC_spp_abundance. This data frame is similar to the pivot table we created earlier and was created by the code:

BBC_spp_abundance <- BBC_data %>%
  group_by(iconic_taxon_name, common_name) %>%
  summarise(Species_Abundance = n())


First 6 rows of data frame BBC_spp_abundance:

iconic_taxon_name common_name Species_Abundance
9
Actinopterygii 20
Actinopterygii Atlantic Flyingfish 1
Actinopterygii Atlantic Needlefish 6
Actinopterygii Bandtail Puffer 9
Actinopterygii Barracudas 2

Let’s save a copy of data frame BBC_spp_abundance as a .csv, also!

write.csv(BBC_spp_abundance,"BBC_spp_abundance.csv", row.names = FALSE)



If you would like to make a similar data frame and .csv file for BBC as we did for MMC to use in Part 3: Mapping and Visualizing Data in Google Earth you may run the two lines of code below. It will create a data frame called BBC_latlon_year and save it as "BBC_latlon_year.csv" in our working directory.

# Make a data frame with only latitude, longitude, year, and iconic_taxon_name
BBC_latlon_year <- BBC_data %>% select(latitude,longitude, year, iconic_taxon_name)
#Save a copy of BBC_latlon_year as a .csv
write.csv(BBC_latlon_year,"BBC_latlon_year.csv", row.names = FALSE)



First 6 rows of data frame BBC_latlon_year:

latitude longitude year iconic_taxon_name
25.90932 -80.13774 NA Plantae
25.90927 -80.13847 2015 Aves
25.90980 -80.13726 2015 Animalia
25.90982 -80.13721 NA Animalia
25.90981 -80.13720 NA Animalia
25.90981 -80.13721 NA Animalia

If you don’t want to worry about this, place a # in front of the two lines of code as displayed below:

# Make a data frame with only latitude, longitude, year, and iconic_taxon_name
#BBC_latlon_year <- BBC_data %>% select(latitude,longitude, year, iconic_taxon_name)
#Save a copy of BBC_latlon_year as a .csv
#write.csv(BBC_latlon_year,"BBC_latlon_year.csv", row.names = FALSE)


[END of Breakdown of “BBC_getting-started” Code]

[Return to Top] or [Return Home]


Filtering and Organizing Data


[ Click to Display “Filtering and Organizing Data” ]


We will now be working from our statistics script.
If you haven’t already, open the R script file “lab1-stats.R” with RStudio.

Remember, we are now working with a new script, so we want clear the memory from our Global Environment Pane. Also, we have some new packages to install: ggplot2, ggpubr, ggsci, devtools

# Clear memory
rm(list = ls())

# Install packages (only if needed-- once a package is installed, you don't need to re-install it)
#install.packages("pivottabler")
#install.packages("openxlsx")
#install.packages("dplyr")
install.packages("ggplot2")
install.packages("ggpubr")
install.packages ("ggsci")
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

# Load your needed libraries (this DOES need to be done each session)
library("pivottabler")
library("openxlsx")
library("dplyr")
library("ggplot2")
library("ggpubr")
library("ggsci")

#Setting the working directory
#First, set your directory using the keyboard shortcut: [Ctrl+Shift+H], select your designated folder
#Check your working directory
getwd()


Now let’s combine the MMC and BBC data frames so that they will be easier to plot and analyze. Again, this chunk of code might look a bit daunting, but that’s ok! We’re going to break it down:


  1. We want to add a “Habitat” column to each data frame, so we don’t forget where each observation was made once they’re combined. Then, we are going to tell R to populate our Habitat column with each data frame’s respective habitat, “BBC” or “MMC”.

    #Read in data#
    #MMC#
    MMC_diversity<-read.csv("MMC_diversity.csv")
    MMC_spp_abundance<-read.csv("MMC_spp_abundance.csv")
    MMC_timedata<-read.csv("MMC_timedata.csv")
    #BBC#
    BBC_diversity<-read.csv("BBC_diversity.csv")
    BBC_spp_abundance<-read.csv("BBC_spp_abundance.csv")
    BBC_timedata<-read.csv("BBC_timedata.csv")
    
    ##### Add Habitat to each data frame ####
    MMC_diversity$Habitat<-"MMC"
    MMC_spp_abundance$Habitat<-"MMC"
    MMC_timedata$Habitat<-"MMC"
    
    BBC_diversity$Habitat<-"BBC"
    BBC_spp_abundance$Habitat<-"BBC"
    BBC_timedata$Habitat<-"BBC"


  2. We want to join all MMC data frames with their corresponding BBC data frames. Our data frames are going to be combined vertically, as opposed to horizontally. This means that the name of each column and the number of total columns will remain the same— R will stack one data frame on top of the other.

  3. For simplicity’s sake, we are going to remove all incomplete data entries. We will do this by changing any empty cells to read as “NA”. In R, “NA” means “not available”. Then, we instruct R to remove all rows that contain an NA.

  4. Finally, we will double check that each column is in the proper format (factor, numeric, date, etc.), and then tell R to sort and filter each of the combined data frames. For simplicity’s sake, and the type of analyses that we will be running today, we will also tell R to remove rows with unique factors between habitats— meaning that we will only be comparing apples to apples and not apples to zebras. Note: BOTH_timedata will be filtered later as it is dependent on the time resolution selected. Each final data frame is then saved as a csv to you working directory.
# Save a copy of BOTH_diversity data frame as a .csv
write.csv(BOTH_diversity,"BOTH_diversity.csv", row.names = FALSE)
# Save a copy of BOTH_spp_abundance data frame as a .csv
write.csv(BOTH_spp_abundance,"BOTH_spp_abundance.csv", row.names = FALSE)
# Save a copy of BOTH_timedata data frame as a .csv
write.csv(BOTH_timedata,"BOTH_timedata.csv", row.names = FALSE)


View the step breakdown for each finalized data frame:

BOTH_diversity
#### BBC_diversity & MMC_diversity --> BOTH_diversity ####
BOTH_diversity<- rbind(BBC_diversity,MMC_diversity)#vertically combine df's
#if there are any blank cells under taxon column, change them to NAs
BOTH_diversity$iconic_taxon_name[BOTH_diversity$iconic_taxon_name==""]<-NA 
BOTH_diversity<- na.omit(BOTH_diversity)#remove any NAs
      summary(BOTH_diversity)#check summary --> any NAs left?
      str(BOTH_diversity)#check structure
      BOTH_diversity$iconic_taxon_name <- as.factor(BOTH_diversity$iconic_taxon_name) #change to a factor
      BOTH_diversity$Habitat <- as.factor(BOTH_diversity$Habitat) #change to factor
      BOTH_diversity <- BOTH_diversity[, c("Habitat","iconic_taxon_name", "Total_Abundance_per_Taxon", "S","H", "Hmax","E", "ENS")]
BOTH_diversity <- BOTH_diversity %>% #Filter and sort data -> sorts by duplicated iconic taxon name
            filter(duplicated(iconic_taxon_name)|
                        duplicated(iconic_taxon_name, fromLast=TRUE))%>% #removes rows that don't have a paired taxon name
                        arrange(Habitat,iconic_taxon_name) #arranges data frame by habitat, then iconic name
Habitat iconic_taxon_name Total_Abundance_per_Taxon S H Hmax E ENS
BBC Actinopterygii 152 37 2.886248 3.610918 0.7993114 17.925921
BBC Amphibia 8 4 1.213008 1.386294 0.8750000 3.363586
BBC Animalia 68 29 2.966558 3.367296 0.8809912 19.424943
BBC Arachnida 28 12 2.112161 2.484907 0.8499962 8.266087
BBC Aves 175 40 2.935725 3.688880 0.7958312 18.835158
BBC Fungi 19 11 2.305657 2.397895 0.9615338 10.030770
BOTH_spp_abundance
#### BBC_spp_abundance & MMC_spp_abundance --> BOTH_spp_abundance ####

BOTH_spp_abundance<- rbind(BBC_spp_abundance,MMC_spp_abundance)#vertically combine df's
#if there are any blank cells under taxon column, change them to NAs
BOTH_spp_abundance$iconic_taxon_name[BOTH_spp_abundance$iconic_taxon_name==""]<-NA
#if there are any blank cell sunder common_name column, change them to "unspecified"
BOTH_spp_abundance$common_name[BOTH_spp_abundance$common_name==""]<-"unspecified"
BOTH_spp_abundance<- na.omit(BOTH_spp_abundance)#remove any NAs
      summary(BOTH_spp_abundance)#check summary --> any NAs left?
##  iconic_taxon_name  common_name        Species_Abundance   Habitat         
##  Length:860         Length:860         Min.   :  1.000   Length:860        
##  Class :character   Class :character   1st Qu.:  1.000   Class :character  
##  Mode  :character   Mode  :character   Median :  1.000   Mode  :character  
##                                        Mean   :  3.673                     
##                                        3rd Qu.:  3.000                     
##                                        Max.   :105.000
      str(BOTH_spp_abundance)#check structure
## 'data.frame':    860 obs. of  4 variables:
##  $ iconic_taxon_name: chr  "Actinopterygii" "Actinopterygii" "Actinopterygii" "Actinopterygii" ...
##  $ common_name      : chr  "unspecified" "Atlantic Flyingfish" "Atlantic Needlefish" "Bandtail Puffer" ...
##  $ Species_Abundance: int  20 1 6 9 2 1 1 1 34 1 ...
##  $ Habitat          : chr  "BBC" "BBC" "BBC" "BBC" ...
##  - attr(*, "na.action")= 'omit' Named int [1:2] 1 372
##   ..- attr(*, "names")= chr [1:2] "1" "372"
           BOTH_spp_abundance$iconic_taxon_name<-as.factor(BOTH_spp_abundance$iconic_taxon_name)#change to a factor
           BOTH_spp_abundance$Habitat<-as.factor(BOTH_spp_abundance$Habitat)#change to a factor
           BOTH_spp_abundance$common_name<-as.factor(BOTH_spp_abundance$common_name)#change to a factor
BOTH_spp_abundance_filt <-BOTH_spp_abundance %>% #Filter and sort data
           filter(duplicated(common_name)| #filters by duplicated common names and then removes any rows
           duplicated(common_name, fromLast=TRUE))%>%  #where there are no paired common names
           arrange(Habitat,common_name) #data frame is then arranged by habitat and common name
BOTH_spp_abundance<-(BOTH_spp_abundance_filt #removes common names listed as "unspecified
          [BOTH_spp_abundance_filt$common_name !="unspecified",]) # because of overlap in unidentified common names across
                                                                  # iconic taxon names, we have to remove these lines un order
                                                                  # to run our paired analyses between each campus
iconic_taxon_name common_name Species_Abundance Habitat
1 Plantae American beautyberry 2 BBC
142 Plantae American beautyberry 5 MMC
2 Aves American Coot 1 BBC
143 Aves American Coot 1 MMC
3 Animalia Animals 4 BBC
144 Animalia Animals 1 MMC
BOTH_timedata
#### BBC_timedata & MMC_timedata --> BOTH_timedata ####
BOTH_timedata<- rbind(BBC_timedata,MMC_timedata)#vertically combine df's
#if there are any blank cells under taxon column, change them to NAs
BOTH_timedata$iconic_taxon_name[BOTH_timedata$iconic_taxon_name==""]<-NA
#if there are any blank cells under common_name column, change them to "unspecified"
BOTH_timedata$common_name[BOTH_timedata$common_name==""]<-"unspecified"
BOTH_timedata<- na.omit(BOTH_timedata)#remove any NAs
      summary(BOTH_timedata)#check summary --> any NAs left?
##   local_time             year          month             hour      
##  Length:3105        Min.   :2010   Min.   : 1.000   Min.   : 0.00  
##  Class :character   1st Qu.:2018   1st Qu.: 4.000   1st Qu.:11.00  
##  Mode  :character   Median :2021   Median : 5.000   Median :13.00  
##                     Mean   :2020   Mean   : 6.121   Mean   :13.34  
##                     3rd Qu.:2021   3rd Qu.: 9.000   3rd Qu.:16.00  
##                     Max.   :2022   Max.   :12.000   Max.   :23.00  
##  iconic_taxon_name  common_name          Habitat         
##  Length:3105        Length:3105        Length:3105       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 
      str(BOTH_timedata)#check structure
## 'data.frame':    3105 obs. of  7 variables:
##  $ local_time       : chr  "2015-05-19 15:00:00" "2015-02-27 11:01:00" "2015-06-01 17:30:00" "2015-09-02 07:59:00" ...
##  $ year             : int  2015 2015 2015 2015 2015 2015 2015 2016 2016 2017 ...
##  $ month            : int  5 2 6 9 9 9 12 8 10 3 ...
##  $ hour             : int  15 11 17 7 14 9 7 10 17 8 ...
##  $ iconic_taxon_name: chr  "Aves" "Animalia" "Insecta" "Arachnida" ...
##  $ common_name      : chr  "Egyptian Goose" "unspecified" "Florida Tussock Moth" "unspecified" ...
##  $ Habitat          : chr  "BBC" "BBC" "BBC" "BBC" ...
##  - attr(*, "na.action")= 'omit' Named int [1:15] 740 810 832 859 1040 1051 1133 1215 1553 2232 ...
##   ..- attr(*, "names")= chr [1:15] "740" "810" "832" "859" ...
           BOTH_timedata$iconic_taxon_name<-as.factor(BOTH_timedata$iconic_taxon_name)#change to a factor
           BOTH_timedata$Habitat<-as.factor(BOTH_timedata$Habitat)#change to a factor
           BOTH_timedata$common_name<-as.factor(BOTH_timedata$common_name)#change to a factor
           BOTH_timedata$year<-as.factor(BOTH_timedata$year)#change to a factor         
           BOTH_timedata$month<-as.factor(BOTH_timedata$month)#change to a factor   
           BOTH_timedata$hour<-as.factor(BOTH_timedata$hour)#change to a factor
           BOTH_timedata$local_time<-as.POSIXct(BOTH_timedata$local_time,format="%Y-%m-%d %H:%M",tz='US/Eastern') 
                    #put back in posixct
           BOTH_timedata <- BOTH_timedata[, c("Habitat","iconic_taxon_name", "common_name", "year","month",  "hour","local_time")]
#timedata will be sorted later and is dependent on selection of variables
Habitat iconic_taxon_name common_name year month hour local_time
BBC Aves Egyptian Goose 2015 5 15 2015-05-19 15:00:00
BBC Animalia unspecified 2015 2 11 2015-02-27 11:01:00
BBC Insecta Florida Tussock Moth 2015 6 17 2015-06-01 17:30:00
BBC Arachnida unspecified 2015 9 7 2015-09-02 07:59:00
BBC Reptilia Green Anole 2015 9 14 2015-09-01 14:50:00
BBC Aves Egyptian Goose 2015 9 9 2015-09-22 09:16:00


The following .csv files should now be in your working directory:

  • BOTH_diversity.csv
  • BOTH_spp_abundance.csv
  • BOTH_timedata.csv

Now that our data has been cleaned and organized, it’s time to choose the variables that you and your group would like to investigate!

Return to the Lab 2 task sheet and follow the instructions for step

[Return to Top] or [Return Home]


——— Data Analysis ———–



Selecting Variables


[ Click to Display “Selecting Variables” ]
[ Click to Display “Summary table of all variables and finalized data frames” ]

Summary table of all variables and finalized data frames

Data frame Variable Column Name IV or DV? Definition
BOTH_diversity Habitat IV The location of the observation (either MMC or BBC)
BOTH_diversity iconic_taxon_name IV iNaturalist’s major taxonomic group name
BOTH_diversity Total_abundance_per_Taxon DV The number of individuals per each taxon (using the number of observations as a proxy for count)
BOTH_diversity S DV The species richness of each iconic taxon group at each habitat
BOTH_diversity E DV Evenness calculated per iconic taxon group at each habitat
BOTH_diversity ENS DV The effective number of species per iconic taxon group at each habitat
BOTH_spp_abundance Habitat IV The location of the observation (either MMC or BBC)
BOTH_spp_abundance iconic_taxon_name IV iNaturalist’s major taxonomic group name
BOTH_spp_abundance common_name IV The common name of each observed organism at the narrowest level of classification made by the observer
BOTH_spp_abundance Species_Abundance DV The number of individuals observed per each common name—- used here as a proxy for population size of a species.
BOTH_timedata Habitat IV The location of the observation (either MMC or BBC)
BOTH_timedata Year IV The year that the observation was made
BOTH_timedata Month IV The month that the observation was made
BOTH_timedata Hour IV The time of day (0-23hr) that the observation was made
BOTH_timedata S DV The species richness per each unit of time across the entire habitat (MMC or BBC)

Data frames

For the remainder of this document, instructional steps will be broken down into your selected data frames. Each following section will have a tab labeled “BOTH_diversity”, “BOTH_spp_abundance”, of “BOTH_timedata”. There, you will find specific code and instructions for the data frame and variables that you have chosen.

When selecting your variables, they must all be within one data frame. This framework was not designed to make data comparisons between data frames. For example, you cannot compare Effective Number of Species values from BOTH_diversity to variables in data frames BOTH_spp_abundance or BOTH_timedata.

Although the main independent variable used for analysis will be Habitat, you are encouraged to select others in order to build a stronger, more thorough image of the data. These additional variables will be applied in the “Descriptive Statistics and Visualizing Data” section. Due to the level of this course, we will not be looking at the interactions of multiple independent variables on our response variables.

Due to the simplicity of the coding provided, comparisons are only be able to be made where there is overlap in variables between campuses. For example, when comparing Total_abundance_per_Taxon in the BOTH_diversity data frame, these only include taxon groups that were present at both habitats. Since (at the time of the creation of this script), “Chromista” were only observed at BBC and not MMC, the several observations of Chromista from BBC were removed from the data and were excluded from the example calculations. The R script will automatically update with each newly uploaded csv file exported from iNaturalist.

Be sure to record your variables on your data menus!


BOTH_diversity

Habitat and iconic_taxon_name are the only independent variable options in this data frame; however, there are several dependent variables to choose from.

Keep in mind that you are only to choose one dependent variable to observe. The dependent variables in this data frame were calculated per taxon group, not as a whole for each habitat.

Remember, only taxon groups that were present in BOTH habitats were included in this data frame.

BOTH_spp_abundance

This data frame only has one dependent variable to choose from, but it is the only data frame where you can make comparisons at the species/common name level.

Something to keep in mind during your investigation: only the species/common names that occurred at BOTH campuses were included in this data frame.

BOTH_timedata

For the time series data, choose only one resolution of time to investigate: year, month, or hour. These data will be broken down into independent data frames with their own series of code and instructions; however, they will continue to be nested under the “BOTH_timedata” tab.

The species richness (S) in this data frame is species richness as it should be measured— the total count of species observed aross the habitat. This varies from how species richness is reported in “BOTH_diversity”, where it is reported as the species count per iconic taxon name.

Remember, only time observations that were made at both campuses were included in this data frame. (e.g. For the “hour” time frame, if BBC had a 1:00 AM observation, but MMC did not, that observation was excluded.)



Descriptive Statistics and Visualizing Data


[ Click to Display “Descriptive Statistics and Visualizing Data” ]


We can quickly quantitatively and visually describe the data using just a couple lines of code. First, let’s run some quick descriptive statics on the data frame that your group has chosen.

Descriptive Statistics:

summary()

The first function we will apply to the data frame is summary(). This function will provide the mean, median, 25th and 75th quantiles, min and max of the specified data frame.

Below is an example with the data frame “iris”. This is a test data frame included in the base package of R indented for use as practice data to explore how function of various packages work. “iris” is a famous data set that gives the measurements in cm of the variables (i) sepal length, (ii) sepal width, (iii) petal length, and (iv) petal width from 50 flowers across 3 species of iris: Iris setosa, Iris versicolor, and Iris virginica. The first 6 rows of this data frame are printed below.

Try it in on script! Your code is also set up to save the output of the summary as “summary_df”. Running the line of code below will save the output summary as a .csv file in your working directory

  # Save a copy of your summary output as a .csv
  write.csv(summary_df,"summary_df.csv", row.names = FALSE)


After you have attempted to run the code for your data frame, click on the corresponding data frame tab see how your output compares. Keep in mind that your numbers may vary since the data used in this walk-through were exported from iNaturalist in July 2021.

First six rows of df “iris”

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

Summary(iris)

Below is the summary output for data frame “iris”:

  summary(iris)
  ##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
  ##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
  ##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
  ##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
  ##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
  ##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
  ##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
  ##        Species  
  ##  setosa    :50  
  ##  versicolor:50  
  ##  virginica :50  
  ##                 
  ##                 
  ## 

summary(BOTH_diversity)

Below is the summary output for data frame “BOTH_diversity”:

  summary(BOTH_diversity)
  ##  Habitat       iconic_taxon_name Total_Abundance_per_Taxon       S         
  ##  BBC:11   Actinopterygii: 2      Min.   :  4.00            Min.   :  2.00  
  ##  MMC:11   Amphibia      : 2      1st Qu.: 20.75            1st Qu.:  7.75  
  ##           Animalia      : 2      Median : 71.50            Median : 12.50  
  ##           Arachnida     : 2      Mean   :142.91            Mean   : 38.95  
  ##           Aves          : 2      3rd Qu.:169.25            3rd Qu.: 39.25  
  ##           Fungi         : 2      Max.   :667.00            Max.   :191.00  
  ##           (Other)       :10                                                
  ##        H               Hmax              E               ENS         
  ##  Min.   :0.5623   Min.   :0.6931   Min.   :0.4686   Min.   :  1.755  
  ##  1st Qu.:1.6094   1st Qu.:2.0351   1st Qu.:0.7997   1st Qu.:  5.039  
  ##  Median :2.2454   Median :2.5249   Median :0.8327   Median :  9.445  
  ##  Mean   :2.4335   Mean   :2.8985   Mean   :0.8305   Mean   : 21.230  
  ##  3rd Qu.:3.0067   3rd Qu.:3.6694   3rd Qu.:0.8941   3rd Qu.: 20.225  
  ##  Max.   :4.6527   Max.   :5.2523   Max.   :0.9615   Max.   :104.871  
  ## 

summary(BOTH_spp_abundance)

Below is the summary output for data frame “BOTH_spp_abundance”:

  summary(BOTH_spp_abundance)
  ##  iconic_taxon_name               common_name  Species_Abundance Habitat  
  ##  Plantae :124      American beautyberry:  2   Min.   : 1.000    BBC:133  
  ##  Insecta : 66      American Coot       :  2   1st Qu.: 1.000    MMC:133  
  ##  Aves    : 28      Animals             :  2   Median : 2.000             
  ##  Reptilia: 14      Anoles              :  2   Mean   : 5.459             
  ##  Fungi   : 10      Asian Lady Beetle   :  2   3rd Qu.: 5.000             
  ##  Amphibia:  6      Atala               :  2   Max.   :88.000             
  ##  (Other) : 18      (Other)             :254

summary(BOTH_timedata)

Below is the summary output for data frame “BOTH_timedata”:

  summary(BOTH_timedata)
  ##  Habitat         iconic_taxon_name                common_name        year     
  ##  BBC:1605   Plantae       :1268    unspecified          : 244   2021   :1707  
  ##  MMC:1500   Insecta       : 609    Brown Anole          : 103   2018   : 525  
  ##             Aves          : 322    dicots               :  95   2019   : 326  
  ##             Reptilia      : 297    Eastern Gray Squirrel:  91   2017   : 316  
  ##             Actinopterygii: 159    Green Iguana         :  84   2020   : 160  
  ##             Mammalia      : 130    red mangrove         :  57   2016   :  47  
  ##             (Other)       : 320    (Other)              :2431   (Other):  24  
  ##      month          hour        local_time                 
  ##  4      :841   13     : 480   Min.   :2010-05-28 21:06:00  
  ##  5      :503   14     : 396   1st Qu.:2018-07-02 16:50:00  
  ##  10     :484   10     : 366   Median :2021-02-27 14:13:00  
  ##  2      :254   15     : 246   Mean   :2020-03-11 22:34:42  
  ##  9      :222   12     : 235   3rd Qu.:2021-05-26 15:16:00  
  ##  11     :171   11     : 234   Max.   :2022-01-07 11:55:00  
  ##  (Other):630   (Other):1148

sapply()

sapply() is another function that you can use for your data frame to observe the mean, standard deviation (sd), variance (var), min, max, median, range, and quantiles of the data frame.


However, with this function, you must specify which descriptive statistic you want R to calculate. See the example below from the iris data frame. The sapply() function does not work with non-numeric data, and therefor we have to specially tell R to exclude those columns from the calculations or we will receive an error message and the function will be halted. Additionally, the sapply() function is made to work with multiple columns. If you have only one column of numerical data, you can just apply the descriptive statistic function directly to that column. For example, mean(iris$Sepal.Length) or mean(iris[,1]). sapply() will not work with a single column.


[ , ] after the data frame name signifies to R that you are requesting a subset of that data frame. Values entered in the bracket preceding the comma represent rows, values entered in the bracket after the comma represent columns. Numbers are used to relay which you are requesting. The top row of a data frame (excluding column names) is 1, the first column will be the left-most column. Positive numbers mean that you want those rows or columns to be included, while negative numbers mean those are the rows or columns you want to be excluded. If the space is left blank in a bracket, nothing is done. In the example below, we want to exclude column 5, the species name column, since it is not numeric data.


To run this code with your data, you will need to specify which columns R should exclude. To exclude multiple rows use the format df[,-c(#:#)], where df is the name of your data frame, “-c” signifies a range of columns you wish to be excluded from a range of “#” to “#”. For example, if you wanted to exclude columns 1, 3, and 5-7 of a data frame “df”, it would be df[,-c(1,3,5:7)]. You could also write it in terms of which rows you wish to keep. If our make believe data frame, df, had 8 columns, this could also be written df[,c(2,4,8)].

After you have attempted to run your script, check the tabs to see how your outputs compare.

sapply(iris)

Below is the summary output for data frame “iris”:

  sapply(iris[,-5], mean, na.rm=TRUE)
  sapply(iris[,-5], sd, na.rm=TRUE)
  sapply(iris[,-5], var, na.rm=TRUE)
  sapply(iris[,-5], min, na.rm=TRUE)
  sapply(iris[,-5], max, na.rm=TRUE)
  sapply(iris[,-5], median, na.rm=TRUE)
  sapply(iris[,-5], range, na.rm=TRUE)
  sapply(iris[,-5], quantile, na.rm=TRUE)
[ Click to Expand Output for iris ]
  sapply(iris[,-5], mean, na.rm=TRUE)
  ## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
  ##     5.843333     3.057333     3.758000     1.199333
  sapply(iris[,-5], sd, na.rm=TRUE)
  ## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
  ##    0.8280661    0.4358663    1.7652982    0.7622377
  sapply(iris[,-5], var, na.rm=TRUE)
  ## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
  ##    0.6856935    0.1899794    3.1162779    0.5810063
  sapply(iris[,-5], min, na.rm=TRUE)
  ## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
  ##          4.3          2.0          1.0          0.1
  sapply(iris[,-5], max, na.rm=TRUE)
  ## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
  ##          7.9          4.4          6.9          2.5
  sapply(iris[,-5], median, na.rm=TRUE)
  ## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
  ##         5.80         3.00         4.35         1.30
  sapply(iris[,-5], range, na.rm=TRUE)
  ##      Sepal.Length Sepal.Width Petal.Length Petal.Width
  ## [1,]          4.3         2.0          1.0         0.1
  ## [2,]          7.9         4.4          6.9         2.5
  sapply(iris[,-5], quantile, na.rm=TRUE)
  ##      Sepal.Length Sepal.Width Petal.Length Petal.Width
  ## 0%            4.3         2.0         1.00         0.1
  ## 25%           5.1         2.8         1.60         0.3
  ## 50%           5.8         3.0         4.35         1.3
  ## 75%           6.4         3.3         5.10         1.8
  ## 100%          7.9         4.4         6.90         2.5

sapply(BOTH_biodiversity)

Below is the summary output for data frame “BOTH_diversity”:
For this data frame, columns 1 and 2 need to be excluded.

  sapply(BOTH_diversity[,-c(1:2)], mean, na.rm=TRUE)
  sapply(BOTH_diversity[,-c(1:2)], sd, na.rm=TRUE)
  sapply(BOTH_diversity[,-c(1:2)], var, na.rm=TRUE)
  sapply(BOTH_diversity[,-c(1:2)], min, na.rm=TRUE)
  sapply(BOTH_diversity[,-c(1:2)], max, na.rm=TRUE)
  sapply(BOTH_diversity[,-c(1:2)], range, na.rm=TRUE)
  sapply(BOTH_diversity[,-c(1:2)], quantile, na.rm=TRUE)
[ Click to Expand Output for BOTH_biodiversity ]
  sapply(BOTH_diversity[,-c(1:2)], mean, na.rm=TRUE)
  ## Total_Abundance_per_Taxon                         S                         H 
  ##                142.909091                 38.954545                  2.433460 
  ##                      Hmax                         E                       ENS 
  ##                  2.898452                  0.830467                 21.230337
  sapply(BOTH_diversity[,-c(1:2)], sd, na.rm=TRUE)
  ## Total_Abundance_per_Taxon                         S                         H 
  ##               189.4784594                51.7930656                 1.1466292 
  ##                      Hmax                         E                       ENS 
  ##                 1.2754362                 0.1065721                25.9064995
  sapply(BOTH_diversity[,-c(1:2)], var, na.rm=TRUE)
  ## Total_Abundance_per_Taxon                         S                         H 
  ##              3.590209e+04              2.682522e+03              1.314759e+00 
  ##                      Hmax                         E                       ENS 
  ##              1.626737e+00              1.135761e-02              6.711467e+02
  sapply(BOTH_diversity[,-c(1:2)], min, na.rm=TRUE)
  ## Total_Abundance_per_Taxon                         S                         H 
  ##                 4.0000000                 2.0000000                 0.5623351 
  ##                      Hmax                         E                       ENS 
  ##                 0.6931472                 0.4685886                 1.7547654
  sapply(BOTH_diversity[,-c(1:2)], max, na.rm=TRUE)
  ## Total_Abundance_per_Taxon                         S                         H 
  ##               667.0000000               191.0000000                 4.6527333 
  ##                      Hmax                         E                       ENS 
  ##                 5.2522734                 0.9615338               104.8712430
  sapply(BOTH_diversity[,-c(1:2)], range, na.rm=TRUE)
  ##      Total_Abundance_per_Taxon   S         H      Hmax         E        ENS
  ## [1,]                         4   2 0.5623351 0.6931472 0.4685886   1.754765
  ## [2,]                       667 191 4.6527333 5.2522734 0.9615338 104.871243
  sapply(BOTH_diversity[,-c(1:2)], quantile, na.rm=TRUE)
  ##      Total_Abundance_per_Taxon      S         H      Hmax         E        ENS
  ## 0%                        4.00   2.00 0.5623351 0.6931472 0.4685886   1.754765
  ## 25%                      20.75   7.75 1.6094040 2.0350789 0.7996905   5.038657
  ## 50%                      71.50  12.50 2.2454319 2.5249280 0.8327292   9.445303
  ## 75%                     169.25  39.25 3.0066713 3.6693891 0.8941188  20.225355
  ## 100%                    667.00 191.00 4.6527333 5.2522734 0.9615338 104.871243

sapply(BOTH_spp_abudance)

Below is the summary output for data frame “BOTH_spp_abundance”:

Since BOTH_spp_abundance only has one numeric column, we must adjust how we write the command, since “apply” functions are made to work on muliple columns. Insead we can simply use just the functions mean(), sd(), var(), etc.:

  mean(BOTH_spp_abundance[,3])
  sd(BOTH_spp_abundance[,3])
  var(BOTH_spp_abundance[,3])
  min(BOTH_spp_abundance[,3])
  max(BOTH_spp_abundance[,3])
  median(BOTH_spp_abundance[,3])
  range(BOTH_spp_abundance[,3])
  quantile(BOTH_spp_abundance[,3])
[ Click to Expand Output for BOTH_spp_abundance ]
  mean(BOTH_spp_abundance[,3])
  ## [1] 5.458647
  sd(BOTH_spp_abundance[,3])
  ## [1] 10.543
  var(BOTH_spp_abundance[,3])
  ## [1] 111.1549
  min(BOTH_spp_abundance[,3])
  ## [1] 1
  max(BOTH_spp_abundance[,3])
  ## [1] 88
  median(BOTH_spp_abundance[,3])
  ## [1] 2
  range(BOTH_spp_abundance[,3])
  ## [1]  1 88
  quantile(BOTH_spp_abundance[,3])
  ##   0%  25%  50%  75% 100% 
  ##    1    1    2    5   88

sapply(BOTH_timedata)

Below is the summary output for data frame “BOTH_timedata”:

Although BOTH_timedata has numerical data, they are stored in our data frame as “factors” or “POSIXct”(time). You can check the class of a column in a data frame with class(df$), where df is the name of the data frame and you would type the name of the column following the $, for example: class(BOTH_timedata$year).

If we wanted to, we could change the class of the columns “year”, “month”, and “hour” to a numerical class by using as.numberic() or as.integer(). We could use the sapply function to change the class of all 3 of the columns at once with sapply(BOTH_timedata[,c(4:6)]), as.integer) or for an individual column with as.integer(BOTH_timedata$) with the name of the selected column entered after the $.

So we don’t mess up our data for later applications, lets run the sapply function, but we will tell it to store the changes in a new data frame called “test_timedata”. This will create a matrix with only 3 columns: year, month, and hour. To use sapply(), we will first need to turn “test_timedata” back into a data frame with as.data.frame(). Note: if we wanted to change the class of columns in our data frame, and keep them in the same data frame without creating a new matrix, we would have had to save it with the exact same input information, for example: BOTH_timedata[,c(4:6)])<-sapply(BOTH_timedata[,c(4:6)]), as.integer).

[ Click for Summary of How to Change Columns to Class Integer ]
  class(BOTH_timedata$year)
  ## [1] "factor"
  test_timedata<-sapply(BOTH_timedata[,c(4:6)], as.integer)
  head(test_timedata)
  ##      year month hour
  ## [1,]    3     5   14
  ## [2,]    3     2   10
  ## [3,]    3     6   16
  ## [4,]    3     9    6
  ## [5,]    3     9   13
  ## [6,]    3     9    8
  class(test_timedata)
  ## [1] "matrix" "array"
  test_timedata<-as.data.frame(test_timedata)
  class(test_timedata)
  ## [1] "data.frame"
  class(test_timedata$year)
  ## [1] "integer"
  class(test_timedata$month)
  ## [1] "integer"
  class(test_timedata$hour)
  ## [1] "integer"
  sapply(test_timedata, mean, na.rm=TRUE)
  sapply(test_timedata, sd, na.rm=TRUE)
  sapply(test_timedata, var, na.rm=TRUE)
  sapply(test_timedata, min, na.rm=TRUE)
  sapply(test_timedata, max, na.rm=TRUE)
  sapply(test_timedata, median, na.rm=TRUE)
  sapply(test_timedata, range, na.rm=TRUE)
  sapply(test_timedata, quantile, na.rm=TRUE)
[ Click to Expand Output for test_timedata ]
  sapply(test_timedata, mean, na.rm=TRUE)
  ##      year     month      hour 
  ##  7.720451  6.121095 12.346860
  sapply(test_timedata, sd, na.rm=TRUE)
  ##     year    month     hour 
  ## 1.601923 2.993252 3.275546
  sapply(test_timedata, var, na.rm=TRUE)
  ##      year     month      hour 
  ##  2.566157  8.959558 10.729198
  sapply(test_timedata, min, na.rm=TRUE)
  ##  year month  hour 
  ##     1     1     1
  sapply(test_timedata, max, na.rm=TRUE)
  ##  year month  hour 
  ##    10    12    22
  sapply(test_timedata, median, na.rm=TRUE)
  ##  year month  hour 
  ##     9     5    12
  sapply(test_timedata, range, na.rm=TRUE)
  ##      year month hour
  ## [1,]    1     1    1
  ## [2,]   10    12   22
  sapply(test_timedata, quantile, na.rm=TRUE)
  ##      year month hour
  ## 0%      1     1    1
  ## 25%     6     4   10
  ## 50%     9     5   12
  ## 75%     9     9   15
  ## 100%   10    12   22





Visualizing Data:

Bar Plots

Bar plots are figures that show the relationship between discrete numerical data and categorical data.

ToothGrowth

Below is an example from the data frame, “ToothGrowth”, which looks at the effect of Vitamin C on tooth growth in Guinea pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

ToothGrowth Example Code

First six rows of df “ToothGrowth”:

len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5



First we make a copy df of ToothGrowth called “TG” and change “dose” from a numeric value to a factor using as.factor().


Then we plug our variables into the graphic function ggplot().

     ggplot(df, aes(x=IV, y=DV, 
                                   fill=IV))+
       geom_bar(stat= "identity",
               position=position_dodge()) +
                       xlab(" ") +
             ylab(" ") +
             ggtitle(" ")



df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title

    #First make diplicate of ToothGrowth df, then convert dose to factor
      TG<- ToothGrowth
    TG$dose<-as.factor(TG$dose)
      ggplot(TG, aes(x=dose, y=len, 
                                    fill=supp))+
        geom_bar(stat= "identity",
                position=position_dodge()) +
                        xlab("Dose (mg)") +
              ylab("Tooth length") +
              ggtitle("The Effect of Vitamin C on Tooth Growth in Guinea Pigs") 
Tooth Growth Example Plot
    #First make diplicate of ToothGrowth df, then convert dose to factor
      TG<- ToothGrowth
    TG$dose<-as.factor(TG$dose)
      ggplot(TG, aes(x=dose, y=len, 
                                    fill=supp))+
        geom_bar(stat= "identity",
                position=position_dodge()) +
                        xlab("Dose (mg)") +
              ylab("Tooth length") +
              ggtitle("The Effect of Vitamin C on Tooth Growth in Guinea Pigs") 

BOTH_diversity


df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title
facet_wrap(~ ) Breaks the plot into smaller sections based on an additional IV

      bar_df<- ggplot(df, aes(x=IV, y=DV, 
                                    fill=IV))+
        geom_bar(stat= "identity",
                position=position_dodge()) +
                        xlab(" ") +
              ylab(" ") +
              ggtitle(" ")+
              facet_wrap(~ IV)
BOTH_diversity Code

Example Barplot Code for BOTH_diversity

df = BOTH_diversity
x = Habitat
y = S
fill = Habitat
xlab = Habitat
ylab = Species Richness per Iconic Taxon Group
ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~ ) -> facet_wrap(~ iconic_taxon_name)

     bar_BOTH_diversity<-ggplot(BOTH_diversity, aes(x=Habitat, y=S, 
                                   fill=Habitat))+
       geom_bar(stat= "identity",
               position=position_dodge()) +
                       xlab("Habitat") +
             ylab("Species Richness per Iconic Taxon Group") +
             ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")
Both_diversity Plot

Example Bar plot for BOTH_diversity

df = BOTH_diversity
x = Habitat
y = S
fill = Habitat
xlab = Habitat
ylab = Species Richness per Iconic Taxon Group
ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~) -> facet_wrap(~ iconic_taxon_name)

      bar_BOTH_diversity<-ggplot(BOTH_diversity, aes(x=Habitat, y=S, 
                                    fill=Habitat))+
        geom_bar(stat= "identity",
                position=position_dodge()) +
                        xlab("Habitat") +
              ylab("Species Richness per Iconic Taxon Group") +
              ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")+
  facet_wrap(~ iconic_taxon_name)
  
  bar_BOTH_diversity  

  ggsave("bar_BOTH_diversity.png", 
             plot = bar_BOTH_diversity, 
             width = 30, 
             height = 20, 
             units = "cm")

BOTH_spp_abundance


df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title
facet_wrap(~ ) Breaks the plot into smaller sections based on an additional IV

      bar_df<- ggplot(df, aes(x=IV, y=DV, 
                                    fill=IV))+
        geom_bar(stat= "identity",
                position=position_dodge()) +
                        xlab(" ") +
              ylab(" ") +
              ggtitle(" ")+
              facet_wrap(~ IV)
BOTH_spp_abundance Code

Example Barplot Code for BOTH_spp_abundance

df = BOTH_spp_abundance
x = Habitat
y = S
fill = Habitat
xlab = Habitat
ylab = Species Richness per Iconic Taxon Group
ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~ ) -> facet_wrap(~ iconic_taxon_name)

     bar_BOTH_spp_abundance<-gplot(BOTH_spp_abundance, aes(x=Habitat, y=S, 
                                   fill=Habitat))+
       geom_bar(stat= "identity",
               position=position_dodge()) +
                       xlab("Habitat") +
             ylab("Species Richness per Iconic Taxon Group") +
             ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")
BOTH_spp_abundance Plot

Example Barplot for BOTH_spp_abundance

df = BOTH_spp_abundance
x = Habitat
y = Species_Abundance
fill = Habitat
xlab = Habitat
ylab = Number of Observations per Common Name ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~) -> facet_wrap(~ iconic_taxon_name)

      bar_BOTH_spp_abundance<-ggplot(BOTH_spp_abundance, aes(x=Habitat, y=Species_Abundance, 
                                    fill=Habitat))+
        geom_bar(stat= "identity",
                position=position_dodge()) +
                        xlab("Habitat") +
              ylab("Species Richness per Iconic Taxon Group") +
              ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")+
  facet_wrap(~ iconic_taxon_name)
  
  bar_BOTH_spp_abundance  

  ggsave("bar_BOTH_spp_abundance.png", 
             plot = bar_BOTH_spp_abundance, 
             width = 30, 
             height = 20, 
             units = "cm")

BOTH_timeset



Before running code in this section, you need to first choose which time variable you would like to observe: year, month, hour.

year:
If you want to see how the species richness per habitat varies by year, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_year”

  BOTH_year <- BOTH_timedata %>%
    group_by(Habitat,year) %>%
    summarise(S= length(unique(common_name)))
  BOTH_year<-group_by(BOTH_year, year) %>%
    filter(n() != 1) %>%
    arrange(Habitat,year) %>%
    ungroup() 


month:
If you want to see how the species richness per habitat varies by month, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_month”

  BOTH_month <- BOTH_timedata %>%
    group_by(Habitat,month) %>%
    summarise(S= length(unique(common_name)))
  BOTH_month<-group_by(BOTH_month, month) %>%
    filter(n() != 1) %>%
    arrange(Habitat,month) %>%
    ungroup()



hour:
If you want to see how the species richness per habitat varies by hour, this is the filter you will want to run before plotting. Your new “df” for the box plot will be “BOTH_hour”

  BOTH_hour <- BOTH_timedata %>%
    group_by(Habitat,hour) %>%
    summarise(S= length(unique(common_name)))
  BOTH_hour<-group_by(BOTH_hour, hour) %>%
    filter(n() != 1) %>%
    arrange(Habitat,hour) %>%
    ungroup()





df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title
facet_wrap(~ ) Breaks the plot into smaller sections based on an additional TV

      bar_df<- ggplot(df, aes(x=IV, y=DV, 
                                    fill=IV))+
        geom_bar(stat= "identity",
                position=position_dodge()) +
                        xlab(" ") +
              ylab(" ") +
              ggtitle(" ")+
              facet_wrap(~ IV)


BOTH_timedata Code



Example Plot :

df = the data frame you chose
     replace “box_df” with box_“name of your data frame”
     e.g. BOTH_year, BOTH_month, BOTH_hour → box_BOTH_year, box_BOTH_month, BOX_both_hour
IV = the column of your IV [this will be Habitat]
IV-time = the unit of time you chose (year, month, hour)
DV = the column of your DV [this will be S]
fill = Which column you want to designate the bar colors
ggtitle = The title you assign the main plot
xlab = Text that you want as your x-axis label
ylab = Text that you want as your y-axis label

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

    bar_df<-ggplot(df, 
              aes(x=IV, 
                  y=DV, 
                  fill=IV))+
              geom_bar(stat= "identity",
                      position=position_dodge()) + 
                      xlab("IV axis title") + 
                      ylab("DV axis title") +
                    facet_wrap(~ IV-time)
            ggtitle("Title of your figure")
       bar_df
       ggsave("bar_df.png", 
             plot = bar_df, 
             width = 10, 
             height = 15, 
             units = "cm")
BOTH_timedata Plot

Output for box_BOTH_year

      BOTH_year <- BOTH_timedata %>%
              group_by(Habitat,year) %>%
              summarise(S= length(unique(common_name)))
      BOTH_year<-group_by(BOTH_year, year) %>%
              filter(n() != 1) %>%
              arrange(Habitat,year) %>%
              ungroup()
  
    #df<- bar_year
    #IV<- Habitat
    #IV-time <- year
    #DV<- S
    #ggtitle <- Difference in species richness observed across years
    #xlab = "Habitat"
    #ylab = "Species richness observed each year"
    #fill = "Habitat"
  
    bar_BOTH_year<-ggplot(BOTH_year, 
              aes(x=Habitat, 
                  y=S, 
                  fill=Habitat))+
              geom_bar(stat= "identity",
                      position=position_dodge()) + 
                      xlab("Habitat") + 
                      ylab("Species Richness") +
                    facet_wrap(~ year)+
            #ggtitle("Difference in species richness observed across years")
          labs(title="Difference in species richness\n observed across years")+
      theme(plot.title = element_text(lineheight = 0.9))
       bar_BOTH_year

       ggsave("bar_BOTH_year.png", 
             plot = bar_BOTH_year, 
             width = 10, 
             height = 15, 
             units = "cm")
       
       ## Here, the title was too long, so I split it up with "\n" ,
       ## Used labs(title=" ") instead of our ggtitle line of code,
       ## and adjusted the space between lines by specifying the
       ## argument "lineheight" with the theme function element_text().





Output for box_BOTH_month

        BOTH_month <- BOTH_timedata %>%
                group_by(Habitat,month) %>%
                summarise(S= length(unique(common_name)))
        BOTH_month<-group_by(BOTH_month, month) %>%
                filter(n() != 1) %>%
                arrange(Habitat,month) %>%
                ungroup()
  
    #df<- bar_month
    #IV<- Habitat
    #IV-time <- month
    #DV<- S
    #ggtitle <- Difference in species richness observed across months
    #xlab = "Habitat"
    #ylab = "Species richness observed each month"
    #fill = "Habitat"
  
    bar_BOTH_month<-ggplot(BOTH_month, 
              aes(x=Habitat, 
                  y=S, 
                  fill=Habitat))+
              geom_bar(stat= "identity",
                      position=position_dodge()) + 
                      xlab("Habitat") + 
                      ylab("Species Richness") +
                    facet_wrap(~ month)+
            ggtitle("Difference in species richness observed across months")
     
       bar_BOTH_month

       ggsave("bar_BOTH_month.png", 
             plot = bar_BOTH_month, 
             width = 15, 
             height = 15, 
             units = "cm")





Output for box_BOTH_hour

        BOTH_hour <- BOTH_timedata %>%
                group_by(Habitat,hour) %>%
                summarise(S= length(unique(common_name)))
        BOTH_hour<-group_by(BOTH_hour, hour) %>%
                filter(n() != 1) %>%
                arrange(Habitat,hour) %>%
                ungroup()
  
  #df<- bar_hour
    #IV<- Habitat
    #IV-time <- hour
    #DV<- S
    #ggtitle <- Difference in species richness observed across different times of day
    #xlab = "Habitat"
    #ylab = "Species richness"
    #fill = "Habitat"
  
    bar_BOTH_hour<-ggplot(BOTH_hour, 
              aes(x=Habitat, 
                  y=S, 
                  fill=Habitat))+
              geom_bar(stat= "identity",
                      position=position_dodge()) + 
                      xlab("Habitat") + 
                      ylab("Species Richness") +
                    facet_wrap(~ hour)+
            #ggtitle("Difference in species richness observed across hours")
          labs(title="Difference in species richness observed across \n different times of day")+
      theme(plot.title = element_text(lineheight = 0.9))
       bar_BOTH_hour

       ggsave("bar_BOTH_hour.png", 
             plot = bar_BOTH_hour, 
             width = 15, 
             height = 15, 
             units = "cm")
       
       ## Here, the title was too long, so I split it up with "\n" ,
       ## Used labs(title=" ") instead of our ggtitle line of code,
       ## and adjusted the space between lines by specifying the
       ## argument "lineheight" with the theme function element_text().



Cleveland’s Dot Plot

A style of bar plot, also known as a lollipop plot, that shows the relationship between numeric and categorical data. This style of chart can be useful when there might be several or more bars on a standard bar plot of similar height by relieving clutter and making it easier to discern differences.

mtcars Example

Motor TRend Car Road Tests:
This is another example data frame within R. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).


A data frame with 32 observations on 11 (numeric) variables:

[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors


First six rows of df “mtcars”:

mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars Code
  ggdotchart(df, x = " ", y = " ",
             xlab = " ",
             ylab = " ",
             title= "",
             legend.title = " ",
             color = " ",        
             palette = c(" "),  
             sorting = " ",    # Sort value in de-/ascending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = FALSE,          # Rotate vertically
             group = " ",       # Order by groups insead of sorting above
             facet.by = " ",
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(df$ ),  # Add values as dot labels
             font.label = list(color = "black", # Adjust label parameters
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)),
             ggtheme = theme_pubr() )   # ggplot2 theme

df = mtcars
x = name
y = mpg_z
xlab = Vehicle Make and Model
ylab = mpg title = Comparison of fuel efficiency across various vehicles
legend.title= Cylinders
color = cyl
rotate = TRUE
group = cyl
facet.by = #not faceted

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

        # First make duplicate of the mtcars df, then convert "cyl" to a factor
                      cars<- mtcars
                      cars$cyl<-as.factor(cars$cyl)
        # Add a column with the row names of mtcars (or our new df, cars)
                      cars$name<-rownames(cars)
        # Now we're ready!
    
    dot_cars<- ggdotchart(cars, x = "name", y = "mpg",
             xlab = "Vehicle Make and Model",
             ylab = "mpg",
             #title= "Comparison of fuel efficiency across various vehicles",
             legend.title = "Cylinders",
             color = "cyl",        # Color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette 
             sorting = "descending",    # Sort value in descending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = TRUE,          # Rotate vertically
             group = "cyl",       # Order by groups
            #facet.by = " ",
             point.size = 20,
             position= position_dodge(0),
             dot.size = 6,            # Large dot size
             label = round(cars$mpg),  # Add values as dot labels
             font.label = list(color = "black", # Adjust label parameters
                               size = 9,
                               vjust = 0.5,
                               position = position_dodge(0.9)),
             ggtheme = theme_pubr() ) +  # ggplot2 theme
             labs(title="Comparison of fuel efficiency \n across various vehicles")+
             theme(plot.title = element_text(lineheight = 0.7))
     dot_cars
       ggsave("dot_cars.png", 
             plot = dot_cars, 
             width = 15, 
             height = 25, 
             units = "cm")
mtcars Plot

Output for ggdotchart(cars):

BOTH_diversity



df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

    dot_df<- ggdotchart(df, x = "IV", y = "DV",
             xlab = "IV axis title",
             ylab = "DV axis title",
             title= "Main figure title",
             legend.title = "legend title (IV)",
             color = "IV",        # Color by groups
             palette = c("3d"),   # Custom color palette
             sorting = "descending",    # Sort value in descending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = FALSE,          # Rotate vertically
             group = "IV",       # Order by groups [instead of defined by sorting]
             facet.by = "IV",
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(Both_diversity$S),  # Add values as dot labels
             font.label = list(color = "black", 
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)), 
                              # Adjust label parameters
             ggtheme = theme_pubr() )   # ggplot2 theme
BOTH_diversity Code

df = BOTH_diversity
x = iconic_taxon_name
y = S
xlab = iNaturalist Iconic Taxon Groups
ylab = Species Richness per Iconic Taxon Group
title = Comparison of Species Richness per Iconic Taxon Group at BBC vs MMC
legend.title= Iconic Taxon Groups
color = iconic_taxon_group
rotate = FALSE
group = I chose to sort by “descending” instead of by group
facet.by = Habitat

    dot_BOTH_diversity<- ggdotchart(BOTH_diversity, x = "iconic_taxon_name", y = "S",
             xlab = "iNaturalist Iconic Taxon Groups",
             ylab = "Species Richness per Iconic Taxon Group",
             title= "Comparison of Species Richness per Iconic Taxon Group at BBC vs MMC",
             legend.title = "Iconic Taxon Groups",
             color = "iconic_taxon_name",        # Color by groups
             palette = c("3d"),   # Custom color palette #"#00AFBB", "#FC4E07"
             sorting = "descending",    # Sort value in descending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = FALSE,          # Rotate vertically
            # group = "iconic_taxon_name",       # Order by groups
             facet.by = "Habitat",
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(BOTH_diversity$S),  # Add values as dot labels
             font.label = list(color = "black", # Adjust label parameters
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)),
             ggtheme = theme_pubr() )   # ggplot2 theme
     dot_BOTH_diversity
       ggsave("dot_BOTH_diversity.png", 
             plot = dot_BOTH_diversity, 
             width = 25, 
             height = 20, 
             units = "cm")
Both_diversity Plot

BOTH_spp_abundance



df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

    dot_df<- ggdotchart(df, x = "IV", y = "DV",
             xlab = "IV axis title",
             ylab = "DV axis title",
             title= "Main figure title",
             legend.title = "legend title (IV)",
             color = "IV",        # Color by groups
             palette = c("3d"),   # Custom color palette
             sorting = "descending",    # Sort value in descending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = FALSE,          # Rotate vertically
             group = "IV",       # Order by groups [instead of defined by sorting]
             facet.by = "IV",
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(Both_diversity$S),  # Add values as dot labels
             font.label = list(color = "black", 
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)), 
                              # Adjust label parameters
             ggtheme = theme_pubr() )   # ggplot2 theme
BOTH_spp_abundance dot plot Code

df = BOTH_spp_abundance
x = iconic_taxon_name
y = Species_Abundance
xlab = iNaturalist Iconic Taxon Groups
ylab = Species Abundance per Iconic Taxon Group
title = Comparison of Species Abundance per Iconic Taxon Group at BBC vs MMC
legend.title= Iconic Taxon
color = iconic_taxon_group
rotate = TRUE
group = I chose to sort by “descending” instead of by group
facet.by = Habitat

    dot_BOTH_spp_abundance<- ggdotchart(BOTH_spp_abundance, x = "common_name", y = "Species_Abundance",
             xlab = "Common Names of Observed Organisms",
             ylab = "Species Abundance",
             title= "Comparison of Species Abundance at BBC vs MMC",
             legend.title = "Iconic Taxon Groups",
             color = "iconic_taxon_name",        # Color by groups
             palette = c("3d"),   # Custom color palette #"#00AFBB", "#FC4E07"
             sorting = "descending",    # Sort value in descending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = TRUE,          # Rotate vertically
             group = "iconic_taxon_name",       # Order by groups
             facet.by = "Habitat",
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(BOTH_spp_abundance$Species_Abundance),  # Add values as dot labels
             font.label = list(color = "black", # Adjust label parameters
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)),
             ggtheme = theme_pubr() )   # ggplot2 theme  
     dot_BOTH_spp_abundance
       ggsave("dot_BOTH_spp_abundance.png", 
             plot = dot_BOTH_spp_abundance, 
             width = 30, 
             height = 120, 
             units = "cm")
BOTH_spp_abundance Plot

BOTH_timeset



Before running code in this section, you need to first choose which time variable you would like to observe: year, month, hour.


year
If you want to see how the species richness per habitat varies by year, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_year”

  BOTH_year <- BOTH_timedata %>%
    group_by(Habitat,year) %>%
    summarise(S= length(unique(common_name)))
  BOTH_year<-group_by(BOTH_year, year) %>%
    filter(n() != 1) %>%
    arrange(Habitat,year) %>%
    ungroup() 


month
If you want to see how the species richness per habitat varies by month, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_month”

  BOTH_month <- BOTH_timedata %>%
    group_by(Habitat,month) %>%
    summarise(S= length(unique(common_name)))
  BOTH_month<-group_by(BOTH_month, month) %>%
    filter(n() != 1) %>%
    arrange(Habitat,month) %>%
    ungroup()



hour
If you want to see how the species richness per habitat varies by hour, this is the filter you will want to run before plotting. Your new “df” for the box plot will be “BOTH_hour”

  BOTH_hour <- BOTH_timedata %>%
    group_by(Habitat,hour) %>%
    summarise(S= length(unique(common_name)))
  BOTH_hour<-group_by(BOTH_hour, hour) %>%
    filter(n() != 1) %>%
    arrange(Habitat,hour) %>%
    ungroup()
      BOTH_hour$hour <- as.factor(BOTH_hour$hour) #changes hours to factors
    BOTH_hour$hour <- droplevels(BOTH_hour$hour,2) #adjusts level of factors







df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

BOTH_timedata Code



df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections

    dot_df<- ggdotchart(df, x = "IV", y = "DV",
             xlab = "IV axis title",
             ylab = "DV axis title",
             title= "Main figure title",
             legend.title = "legend title (IV)",
             color = "IV",        # Color by groups
             palette = c("3d"),   # Custom color palette
             sorting = "descending",    # Sort value in descending order
             id = "IV-2",
             add = "segments",         # Add segments from y = 0 to dot
             rotate = FALSE,          # Rotate vertically
             group = "IV",       # Order by groups [instead of defined by sorting]
             facet.by = "IV",
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(Both_diversity$S),  # Add values as dot labels
             font.label = list(color = "black", 
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)), 
                              # Adjust label parameters
             ggtheme = theme_pubr() )   # ggplot2 theme
BOTH_timedata dot Plot

year:

  BOTH_year <- BOTH_timedata %>%
    group_by(Habitat,year) %>%
    summarise(S= length(unique(common_name)))
  BOTH_year<-group_by(BOTH_year, year) %>%
    filter(n() != 1) %>%
    arrange(Habitat,year) %>%
    ungroup() 
  
  # time: year  
  # df = BOTH_year  
  # x = Habitat  
  # y = S  
  # xlab = Habitat  
  # ylab = Species Richness Across Habitat  
  # title = Comparison of BBC and MMC species richness over the years  
  # legend.title= Habitat  
  # color = Habitat  
  # rotate = FALSE  
  # group = Habitat  
  # facet.by = Habitat  
  
    dot_BOTH_year<- ggdotchart(BOTH_year, x = "Habitat", y = "S",
             xlab = "Habitat",
             ylab = "Species Richness Across Habitat",
             title= "Comparison of BBC and MMC species richness over the years",
             legend.title = "Habitat",
             color = "Habitat",        # Color by groups
             palette = c("3d"),   # Custom color palette #"#00AFBB", "#FC4E07"
             sorting = "descending",    # Sort value in descending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = FALSE,          # Rotate vertically
             group = "Habitat",       # Order by groups
             #facet.by = "year", # faceting was specified below
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(BOTH_year$S),  # Add values as dot labels
             font.label = list(color = "black", # Adjust label parameters
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)),
             ggtheme = theme_pubr() ) +  # ggplot2 theme  
           facet_wrap(~year, nrow=1) # faceting specified here instead to 
                                     # control the number of rows expressed in
                                     # the plot output
     dot_BOTH_year

       ggsave("dot_BOTH_year.png", 
             plot = dot_BOTH_year, 
             width = 20, 
             height = 15, 
             units = "cm")


month:

  BOTH_month <- BOTH_timedata %>%
    group_by(Habitat,month) %>%
    summarise(S= length(unique(common_name)))
  BOTH_month<-group_by(BOTH_month, month) %>%
    filter(n() != 1) %>%
    arrange(Habitat,month) %>%
    ungroup()
  
  # time: month  
  # df = BOTH_month  
  # x = Habitat  
  # y = S  
  # xlab = Habitat  
  # ylab = Species Richness Across Habitat  
  # title = Comparison of BBC and MMC species richness by month  
  # legend.title= Habitat  
  # color = Habitat  
  # rotate = FALSE  
  # group = Habitat  
  # facet.by = month  
  
    dot_BOTH_month<- ggdotchart(BOTH_month, x = "Habitat", y = "S",
             xlab = "Habitat",
             ylab = "Species Richness Across Habitat",
             title= "Comparison of BBC and MMC Species Richness by Month",
             legend.title = "Habitat",
             color = "Habitat",        # Color by groups
             palette = c("3d"),   # Custom color palette #"#00AFBB", "#FC4E07"
             sorting = "descending",    # Sort value in descending order
             add = "segments",         # Add segments from y = 0 to dot
             rotate = FALSE,          # Rotate vertically
             group = "Habitat",       # Order by groups
             facet.by = "month", # faceting was specified below
             panel.labs = list(month = c('January', 'February', 'March', 'April', 
                                  'May', 'June', 'July', 'August', 
                                  'September', 'October', 'Novemner', 
                                  'December')), #Adjust panel names
             point.size = 20,
             position= position_dodge(0),
             dot.size = 8,            # Large dot size
             label = round(BOTH_month$S),  # Add values as dot labels
             font.label = list(color = "black", # Adjust label parameters
                               size = 10,
                               vjust = 0.5,
                               position = position_dodge(0.9)),
             ggtheme = theme_pubr() )   # ggplot2 theme  
           #facet_wrap(~month, nrow=1) # faceting specified here instead to 
                                     # control the number of rows expressed in
                                     # the plot output
     dot_BOTH_month

       ggsave("dot_BOTH_month.png", 
             plot = dot_BOTH_month, 
             width = 15, 
             height = 20, 
             units = "cm")

hour:

   BOTH_hour <- BOTH_timedata %>%
  group_by(Habitat,hour) %>%
  summarise(S= length(unique(common_name)))
  BOTH_hour<-group_by(BOTH_hour, hour) %>%
  filter(n() != 1) %>%
  arrange(Habitat,hour) %>%
  ungroup()
  
  BOTH_hour$hour <- as.factor(BOTH_hour$hour)
  
  BOTH_hour$hour <- droplevels(BOTH_hour$hour,2)
  
  dot_BOTH_hour<- ggdotchart(BOTH_hour, x = "Habitat", y = "S",
                           xlab = "Habitat",
                           ylab = "Species Richness Across Habitat",
                           #title= "Comparison of BBC and MMC Observed Species Richness by Time of Day",
                           legend.title = "Habitat",
                           color = "Habitat",        # Color by groups
                           palette = c("3d"),   # Custom color palette #"#00AFBB", "#FC4E07"
                           sorting = "descending",    # Sort value in descending order
                           add = "segments",         # Add segments from y = 0 to dot
                           rotate = FALSE,          # Rotate vertically
                           group = "Habitat",       # Order by groups
                           facet.by = "hour", # faceting was specified below
                           panel.labs = list(hour = #Adjust panel names
                                               c('12:00 AM', '1:00 AM', '5:00 AM', 
                                                 '7:00 AM', '8:00 AM', 
                                                 '9:00 AM', '10:00 AM',
                                                 '11:00 AM', '12:00 PM', '1:00 PM', 
                                                 '2:00 PM', '3:00 PM', '4:00 PM', '5:00 PM',
                                                 '6:00 PM', '7:00 PM', '8:00 PM', '9:00 PM', 
                                                 '10:00 PM', '11:00 PM')), 
                           point.size = 20,
                           position= position_dodge(0),
                           dot.size = 8,            # Large dot size
                           label = round(BOTH_hour$S),  # Add values as dot labels
                           font.label = list(color = "black", # Adjust label parameters
                                             size = 10,
                                             vjust = 0.5,
                                             position = position_dodge(0.9)),
                           ggtheme = theme_pubr() ) +  # ggplot2 theme  
  labs(title="Comparison of BBC and MMC Species Richness \n 
                    by Time of Day")+
  theme(plot.title = element_text(lineheight = 0.7))
  
     dot_BOTH_hour

       ggsave("dot_BOTH_hour.png", 
             plot = dot_BOTH_hour, 
             width = 15, 
             height = 20, 
             units = "cm")



Boxplots

Boxplots are a standardized way of showing the distribution of data by displaying the minimum, first quartile, median, third quartile, and maximum. The whiskers display the minimum and maximum, the box represents the first and third quartiles, and the median is shown by the center line of the box. The points that extend past the whiskers are considered outliers. This plot also allows you to visualize how closely grouped the data is and whether or not it is symmetrical.

ToothGrowth

Below is an example from the data frame, “ToothGrowth”, which looks at the effect of Vitamin C on tooth growth in Guinea pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

ToothGrowth Example Code

First six rows of df “ToothGrowth”

len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5
  ggpaired(ToothGrowth, x = "supp", y = "len",
            color = "supp", line.color = "gray", 
            line.size = 0.4, palette = "npg")
Tooth Growth Example Plot
  ggpaired(ToothGrowth, x = "supp", y = "len",
            color = "supp", line.color = "gray", 
            line.size = 0.4, palette = "npg")

BOTH_diversity

Example Plot :

df = the data frame you chose
     replace “box_df” with box_“name of your data frame”
     e.g. box_BOTH_diversity
IV = the column of your IV
DV = the column of your DV
xlab = Text that you want as your x-axis label
ylab = Text that you want as your y-axis label

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

BOTH_diversity Code


    box_df<- ggpaired(df, 
                x = "IV", 
                y = "DV",
           color = "IV", 
           line.color = "gray", 
           line.size = 1,
           point.size = 3,
           palette = "npg",
           xlab = "x-axis label",
           ylab = "y-axis label")
  box_df
  ggsave("box_df.png", 
             plot = box_df, 
             width = 10, 
             height = 15, 
             units = "cm")


Both_diversity Plot

Output for BOTH_diversity Box Plot:

  #df<- BOTH_diversity
  #IV<- Habitat
  #DV<- S
    #xlab = "Habitat"
  #ylab = "Species Richness per Taxon Group"
  
  box_BOTH_diversity<- ggpaired(BOTH_diversity, 
                x = "Habitat", 
                y = "S",
           color = "Habitat", 
           line.color = "gray", 
           line.size = 1,
           point.size = 3,
           palette = "npg",
           xlab = "Habitat",
           ylab = "Species Richness per Taxon Group")
  box_BOTH_diversity  

  ggsave("box_BOTH_diversity.png", 
             plot = box_BOTH_diversity, 
             width = 10, 
             height = 15, 
             units = "cm")

BOTH_spp_abundance

Example Plot :

df = the data frame you chose
     replace “box_df” with box_“name of your data frame”
     e.g. box_BOTH_spp_abundance
IV = the column of your IV
DV = the column of your DV
xlab = Text that you want as your x-axis label
ylab = Text that you want as your y-axis label

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

BOTH_spp_abundance Code


    box_df<- ggpaired(df, 
                x = "IV", 
                y = "DV",
           color = "IV", 
           line.color = "gray", 
           line.size = 1,
           point.size = 3,
           palette = "npg",
           xlab = "x-axis label",
           ylab = "y-axis label")
  box_df
  ggsave("box_df.png", 
             plot = box_df, 
             width = 10, 
             height = 15, 
             units = "cm")


BOTH_spp_abundance Plot

Output for BOTH_spp_abundance Box Plot:

  #df<- BOTH_spp_abundance
  #IV<- Habitat
  #DV<- Species_Abundance
    #xlab = "Habitat"
  #ylab = "Abundance per Species"
  
  box_BOTH_spp_abundance<- ggpaired(BOTH_spp_abundance, 
                x = "Habitat", 
                y = "Species_Abundance",
           color = "Habitat", 
           line.color = "gray", 
           line.size = 1,
           point.size = 3,
           palette = "npg",
           xlab = "Habitat",
           ylab = "Abundance per Species")
  box_BOTH_spp_abundance  

  ggsave("box_BOTH_spp_abundance.png", 
             plot = box_BOTH_spp_abundance, 
             width = 10, 
             height = 15, 
             units = "cm")

BOTH_timedata



Before running code in this section, you need to first choose which time variable you would like to observe: year, month, hour.

year:
If you want to see how the species richness per habitat varies by year, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_year”

  BOTH_year <- BOTH_timedata %>%
    group_by(Habitat,year) %>%
    summarise(S= length(unique(common_name)))
  BOTH_year<-group_by(BOTH_year, year) %>%
    filter(n() != 1) %>%
    arrange(Habitat,year) %>%
    ungroup() 


month:
If you want to see how the species richness per habitat varies by month, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_month”

  BOTH_month <- BOTH_timedata %>%
    group_by(Habitat,month) %>%
    summarise(S= length(unique(common_name)))
  BOTH_month<-group_by(BOTH_month, month) %>%
    filter(n() != 1) %>%
    arrange(Habitat,month) %>%
    ungroup()



hour:
If you want to see how the species richness per habitat varies by hour, this is the filter you will want to run before plotting. Your new “df” for the box plot will be “BOTH_hour”

  BOTH_hour <- BOTH_timedata %>%
    group_by(Habitat,hour) %>%
    summarise(S= length(unique(common_name)))
  BOTH_hour<-group_by(BOTH_hour, hour) %>%
    filter(n() != 1) %>%
    arrange(Habitat,hour) %>%
    ungroup()




BOTH_timedata Code



Example Plot:

df = the data frame you chose
     replace “box_df” with box_“name of your data frame”
     e.g. BOTH_year, BOTH_month, BOTH_hour → box_BOTH_year, box_BOTH_month, BOX_both_hour
IV = the column of your IV [this will be Habitat] IV-time = the unit of time you chose (year, month, hour) DV = the column of your DV [this will be S] color = Which column you want to designate the point colors title = The title you assign the main plot xlab = Text that you want as your x-axis label ylab = Text that you want as your y-axis label
label = df$IV-time (this will label the individual points with the month/year/hour)

ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
     you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.

  box_df <- ggpaired(df, 
                     x = "IV", 
                     y = "DV",
                     color = "IV", 
                  line.color = "gray", 
                     line.size = 1,
                     point.size = 3,
                  palette = "rainbow",
                     title = "   ",
                     xlab = "IV",
                    ylab = "DV",
                    label = df$IV-time,
                    font.label = list(color = "black"))
  box_df  
  ggsave("box_df.png", 
             plot = box_df, 
             width = 20, 
             height = 40, 
             units = "cm")
BOTH_timedata Plot

Output for box_BOTH_year

      BOTH_year <- BOTH_timedata %>%
              group_by(Habitat,year) %>%
              summarise(S= length(unique(common_name)))
      BOTH_year<-group_by(BOTH_year, year) %>%
              filter(n() != 1) %>%
              arrange(Habitat,year) %>%
              ungroup()
  
    #df<- box_year
    #IV<- Habitat
    #IV-time <- year
    #DV<- S
    #title <- Difference in species richness observed across years
    #xlab = "Habitat"
    #ylab = "Species richness observed each year"
    #color = "Habitat"
  
         box_BOTH_year <- ggpaired(BOTH_year, 
                                        x = "Habitat", 
                                        y = "S",
                                        color = "Habitat", 
                                        line.color = "gray", 
                                        line.size = 1,
                                        point.size = 3,
                                        palette = "rainbow",
                                        title = "Difference in species richness observed each year",
                                        xlab = "Habitat",
                                    ylab = "Species richness observed each year",
                                        label = BOTH_year$year,
                                        font.label = list(color = "black"))
     box_BOTH_year  

     ggsave("box_BOTH_year.png", 
             plot = box_BOTH_year, 
             width = 15, 
             height = 30, 
             units = "cm")





Output for box_BOTH_month

        BOTH_month <- BOTH_timedata %>%
                group_by(Habitat,month) %>%
                summarise(S= length(unique(common_name)))
        BOTH_month<-group_by(BOTH_month, month) %>%
                filter(n() != 1) %>%
                arrange(Habitat,month) %>%
                ungroup()
  
    #df<- box_month
    #IV<- Habitat
    #IV-time <- month
    #DV<- S
    #title <- Difference in species richness observed across months
    #xlab = "Habitat"
    #ylab = "Species richness observed each month"
    #color = "Habitat"
  
          box_BOTH_month <- ggpaired(BOTH_month, 
                                                    x = "Habitat", 
                                                    y = "S",
                                                    color = "Habitat", 
                                                    line.color = "gray", 
                                                    line.size = 1,
                                                    point.size = 3,
                                                    palette = "rainbow",
                                                    title = "Difference in species richness observed each month",
                                                    xlab = "Habitat",
                                                    ylab = "Species richness observed each month",
                                                    label = BOTH_month$month,
                                                    font.label = list(color = "black"))
    box_BOTH_month  

    ggsave("box_BOTH_month.png", 
                 plot = box_BOTH_month, 
                 width = 15, 
                 height = 35, 
                 units = "cm")





Output for box_BOTH_hour

        BOTH_hour <- BOTH_timedata %>%
                group_by(Habitat,hour) %>%
                summarise(S= length(unique(common_name)))
        BOTH_hour<-group_by(BOTH_hour, hour) %>%
                filter(n() != 1) %>%
                arrange(Habitat,hour) %>%
                ungroup()
  
    #df<- box_hour
    #IV<- Habitat
    #IV-time <- hour
    #DV<- S
    #title <- Difference in species richness observed across time of day
    #xlab = "Habitat"
    #ylab = "Species richness observed each hour"
    #color = "Habitat"
  
          box_BOTH_hour <- ggpaired(BOTH_hour, 
                                                    x = "Habitat", 
                                                    y = "S",
                                                    color = "Habitat", 
                                                    line.color = "gray", 
                                                    line.size = 1,
                                                    point.size = 3,
                                                    palette = "rainbow",
                                                    title = "Difference in species richness observed across hours",
                                                    xlab = "Habitat",
                                                    ylab = "Species richness observed per hour",
                                                    label = BOTH_hour$hour,
                                                    font.label = list(color = "black"))
    box_BOTH_hour  

    ggsave("box_BOTH_hour.png", 
                 plot = box_BOTH_hour, 
                 width = 20, 
                 height = 40, 
                 units = "cm")



Running the Analysis and Interpreting Results


[ Click to Display “Analyzing Data and Interpreting Results” ]

For today’s analyses, we will be applying a type of t-test. This analysis is used to compare the means of two populations and answer the question: “Is there a significant difference between the two populations?” A t-test cannot be used to compare two different types of data, (e.g. temperature of the air and the number of birds observed). It can only be used to compare two data sets based on the same data type, (e.g. the number of observations between two different sites). The two data sets must also be in the same units. (e.g. you can’t compare air temperature if one is recorded in Celsius and the other is recorded in Fahrenheit).


There are a few different types of t-tests: one-sample t-test, two-sample t-test, and paired t-tests. The type of t-test used depends on the data available and the question being addressed.

paired t-tests

For our analyses today, we will be using a paired t-test. Paired t-tests are used to compare the means between two related groups, or pairs, of samples. Our data is paired because we are testing to see if there are differences in specific values between BBC and MMC.

We want to make sure that the biodiversity metrics of each taxon group at BBC are being compared to their counterpart value at MMC rather than all BBC values being collectively compared to all MMC values collectively. For example, with our BOTH_diversity data frame, we want to make sure that all biodiversity values are being compared across their taxon groupings as shown in the table below:

Habitat Taxon Group Habitat Taxon Group
BBC Actinopterygii MMC Actinopterygii
BBC Amphibia MMC Amphibia
BBC Animalia MMC Animalia
BBC Arachnida MMC Arachnida
BBC Aves MMC Aves
BBC Fungi MMC Fungi
BBC Insecta MMC Insecta
BBC Mammalia MMC Mammalia
BBC Mollusca MMC Mollusca
BBC Plantae MMC Plantae
BBC Reptilia MMC Reptilia

Assumptions and R Functions

Every statistical test has a series of assumptions that must be true in order for the results of that analysis to be valid. For a paired t-test:
Assumption 1:
The two samples must be paired

Assumption 2:
There are no significant outliers. If there are outliers, they would need to be removed from the data set.

Assumption 3:
Is the sample large? (n<30?) If not, the data must be checked for normality. If the data is normally distributed, the 3rd assumption is met.

To test normality, a Shapiro-Wilk’s normality test would be run. If the p-value resulting from that test is greater than 0.05, then the data can be considered normally distributed. If the data is not normally distributed, then a similar, nonparametric test would be used instead. For paired t-tests, the nonparametric alternative would be a paired-sample Wilcoxon test. wilcox.test(x,y,paired=TRUE)


To test for normality, we use what is called the Shapiro-Wilk normality test. To use the Shapiro-Wilk test, you first need to get the difference between the pairs of your dependent variable. For example, the difference between all MMC and BBC values of S (species richness) from the BOTH_diversity data frame. Those values can be saved as an object, and then run through the Shapiro-Wilk function shapiro.test().

d.df <- with (df, y[x == "x1"] - y[x == "x2"])
shapiro.test(d.df)

For the sake of simplicity, we are going to assume that ALL assumptions are met for our data in this exercise. We will revisit this subject later in the semester as we get closer to the group presentations.

BOTH_diversity


R Code: t.df <- t.test( y ~ x , data = df, paired = TRUE)


Where:
t.df = the name of your saved t-test output (e.g. t.BOTH_diversity)
t.test( ) = the t.test function in R
y = is the column name of your DV
x = is the column name of your main IV (this will be Habitat)
df = the name of your data frame (e.g. BOTH_diversity)
paired = TRUE specifies that the data should be treated as paired


Interpretation:
The output of the t.test will give a p-value (probability value). In short, a p-value describes how likely it is that the data would have occurred by chance alone (that the null hypothesis is true). If the p-value is below 0.05 (a 5% probability that results are due to random chance), then results are considered significant and null hypothesis is rejected. If the p-value is above 0.05, then results are considered insignificant and the null hypothesis cannot be rejected → “We fail to reject our null hypothesis”. Remember, results are always discussed in terms of the null hypothesis, since that is what we are testing against.

When reporting results, this is only in relation to the outcome of the analysis and the relation to the hypothesis. The meanings, implications, and potential reasoning or influences behind the results are discussed in the discussion/conclusion.


After you have run your test, make sure to record your results on your data menu and discuss your findings in the discussion/conclusion section.

BOTH_spp_abundance


R Code:
t.df <- t.test( y ~ x , data = df, paired = TRUE)


Where:
t.df = the name of your saved t-test output (e.g. t.BOTH_spp_abundance)
t.test( ) = the t.test function in R
y = is the column name of your DV
x = is the column name of your main IV (this will be Habitat)
df = the name of your data frame (e.g. BOTH_spp_abundance)
paired = TRUE specifies that the data should be treated as paired


Interpretation:
The output of the t.test will give a p-value (probability value). In short, a p-value describes how likely it is that the data would have occurred by chance alone (that the null hypothesis is true). If the p-value is below 0.05 (a 5% probability that results are due to random chance), then results are considered significant and null hypothesis is rejected. If the p-value is above 0.05, then results are considered insignificant and the null hypothesis cannot be rejected → “We fail to reject our null hypothesis”. Remember, results are always discussed in terms of the null hypothesis, since that is what we are testing against.


When reporting results, this is only in relation to the outcome of the analysis and the relation to the hypothesis. The meanings, implications, and potential reasoning or influences behind the results are discussed in the discussion/conclusion.


After you have run your test, make sure to record your results on your data menu and discuss your findings in the discussion/conclusion section.

BOTH_timedata



R Code:
t.df <- t.test( y ~ x , data = df, paired = TRUE)


Where:
t.df = the name of your saved t-test output (e.g. t.BOTH_year)
t.test( ) = the t.test function in R
y = is the column name of your DV
x = is the column name of your main IV (this will be Habitat)
df = the name of your data frame (e.g. BOTH_year)
paired = TRUE specifies that the data should be treated as paired


Interpretation:
The output of the t.test will give a p-value (probability value). In short, a p-value describes how likely it is that the data would have occurred by chance alone (that the null hypothesis is true). If the p-value is below 0.05 (a 5% probability that results are due to random chance), then results are considered significant and null hypothesis is rejected. If the p-value is above 0.05, then results are considered insignificant and the null hypothesis cannot be rejected → “We fail to reject our null hypothesis”. Remember, results are always discussed in terms of the null hypothesis, since that is what we are testing against.


When reporting results, this is only in relation to the outcome of the analysis and the relation to the hypothesis. The meanings, implications, and potential reasoning or influences behind the results are discussed in the discussion/conclusion.


After you have run your test, make sure to record your results on your data menu and discuss your findings in the discussion/conclusion section.