Part 2: Reviewing the Scientific Process and an Introduction to Diversity Indices -
A Glance into SFL Biodiversity with FIU’s Biscayne Bay and
Modesto Maidique Campuses
Getting Started:
Here, we will begin working with iNaturalist data in RStudio. Although R may seem intimidating, not to worry! We do not expect you to be able to write or develop code on your own. The bulk of any coding used in this class will be provided for you. However, throughout the course, you may need to understand where and how to alter very small portions of the provided code as it applies to your end of semester projects. Your TA and LA will be available to help. This is NOT a coding class, this is NOT a statistics class; and therefore, we do not expect you to understand the major mechanics or theory behind the code or the analyses— only how to apply and interpret results.
Checking the Exported Data:
In order for the code to run correctly, the .csv file exported from iNaturalist MUST be arranged and labeled correctly. Even the smallest of typos can cause the script not to run. Remember: Codeing is CasE SeNSiTivE! If the export and download instructions were followed closely, the .csv file should be arranged correctly. Take a moment to open the .csv files in Excel and double-check that the column names match #’s 1-8 below.
| Column # | Column Name | Column Notes |
|---|---|---|
| 1 | time_observed_at | This is the date/time stamp of the observation in UTC (Coordinated Universal Time), equivalent to GMT (Greenwich Mean Time). There are numerous time zones throughout the world, so having a single, standard time reference is ideal for record keeping and exchanging data. For example, if an event happened on, January 1, 2021 at 2:00 PM in Miami, Florida this event would have occurred on January 2, 2021 at 3:00 AM in Tokyo, Japan. Instead, this event could be collectively discussed as happening on January 1, 2021 at 6:00 PM. For analyses where time of day is important, data must be converted out of UTC and into the local time zones where the data were collected. |
| 2 | time_zone | This denotes the timezone in which the observation was made. This information is needed in order for the UTC time to be meaningful. For example, Eastern Standard Time (EST) is 5 hours behind UTC and Eastern Daylight Time (EDT) is 4 hours behind UTC. |
| 3 | captive_cultivated | This column lets you know if the observation was captive/cultivated (TRUE) or not (FALSE). |
| 4 | latitude | This is the latitude of the observation. |
| 5 | longitude | This is the longitude of the observation. |
| 6 | scientific_name | This is the scientific name of the observed organism. If the organism was not IDd to species, this will NOT be the species name. Instead, it will be the scientific name for the lowest taxonomic group at which the specimen was IDd by the observer. |
| 7 | common_name | This is the common name for the observed organism. If the organism wasn’t IDd to species level, it will be the common name name for the lowest taxonomic group at which the specimen was IDd by the observer. |
| 8 | iconic_taxon_name | This is the major organism taxonomic classifications set by iNaturalist. iNaturalist’s 13 “iconic taxon names” are outlined below. |
iNaturalist time zones in our data set and their equivalent names in R:
| iNaturalist | R |
|---|---|
| America/New_York | America/New_York |
| Eastern Time (US & Canada) | US/Eastern |
| Central Time (US & Canada) | US/Central |
| Pacific Time (US & Canada) | US/Pacific |
| Arizona | US/Arizona |
| Hawaii | US/Hawaii |
| Atlantic Time (Canada) | Canada/Atlantic |
| UTC | UTC |
iNaturalist’s 13 Iconic Taxonomic Names:
| Name | Taxonomic Level | Taxon Common Name |
|---|---|---|
| Protozoa | Kingdom | Protozoans |
| Chromista | Kingdom | Kelp, Diatoms, Allies |
| Plantae | Kingdom | Plants |
| Fungi | Kingdom | Fungi, Lichens |
| Animalia | Kingdom | Animals+ |
| Mollusca | Phylum | Mollusks |
| Mammalia | Class | Mammals |
| Aves | Class | Birds |
| Actinopterygii | Class | Ray-fined Fishes |
| Reptilia | Class | Reptiles |
| Amphibia | Class | Amphibians |
| Insecta | Class | Insects |
| Arachnida | Class | Arachnids |
+Organisms listed in the rows below Animalia are also technically in this Kingdom, but the “Animalia” taxon ID for iNaturalist is used as a “catch-all” for organisms that don’t fall under the lower taxonomic classifications of the lower iconic taxon groups.
- After you have run ALL code for “MMC_getting_started”: Go to
File>Save- Go to
File>New File>R Script
- An empty script file should open up un another tab
- Come back to this script (MMC_getting-started.R)
- press
Ctrl>+A(select/highlight all) thenCtrl>+C(copy)- Go back to the blank script and press
Ctrl+V(paste)- In your new script, press
Ctrl+F(find)- A search box should appear above your script window
- Press
Ctrl+Ato select ALL text in your script window
- Click on the white check box that says
In selection
- In the search bar that says
Find, type “MMC_”
- Click the box next to the
Findbar that saysALL
- This will highlight all instances of “MMC_” in the selected text
- Quickly scroll through and check that only “MMC_” is highlighted
- In the bar that says
Replace, type “BBC_”
- Once you are sure the steps above have been followed correctly, after the bar that says
Replace, clickAll.
- This will replace all instances of “MMC_” with “BBC_”
- By changing these prefixes of our object and value names, we can use the same code without “saving over” the values and objects already saved in our Global Environment.
- Give another quick scroll to make sure changes have been made correctly
- Save a copy of the updated script:
- Go to:
File>Save Asand rename the script “BBC_getting-started.R”
- Make sure to
Save Asand not Save. We don’t want to overwrite our MMC script by mistake!This message will be repeated below the MMC tutorial as a reminder :)
→ If you haven’t already, open the R script file “MMC_getting-started.R” with RStudio.
When working with on a new script project, it’s always a good idea to clear your memory:
rm(list = ls())
Before we load our data, want to make sure all the packages we need are installed install.packages() and loaded library(). Packages only need to be installed once, but packages must be loaded each session. For this section of the lab, we will be using 3 packages: pivottabler, openxlsx, and dplyr.
#install.packages("pivottabler")
#install.packages("openxlsx")
#install.packages("dplyr")
library("pivottabler")
library("openxlsx")
library("dplyr")
Setting the Working Directory:
Unlike with most common computer programs, you cannot simply “open” a data file within RStudio. You must code for it by telling R WHERE the file is located and WHAT the file is called. We do this by setting the working directory. This lets R know from where it should pull files. This can be done manually by code setwd() or by using the keyboard shortcut Ctrl+Shift+H and selecting the appropriate folder. We will use the keyboard shortcut this time. Select the folder location designated by your TA.
Now that you have selected your working directory, lets check how R records our working directory location:
getwd()
## [1] "C:/Users/Kelle/OneDrive/PhD_Files/Academic Year 2021-2022/Head TA/Fall 2021/BSC2011L/Updated Labs/Lab2/Scripts"
| Your working directory will be different than what is printed here since this script was run on a different computer. Notice how R prints the name of your working directory. By following that same format, you could now write it into your code with the function setwd() instead by typing the name of your working directory inside of the function parentheses using quotation marks (“…”).
| Example: setwd("C:/Users/Kelle/OneDrive/Desktop/Lab2")
Reading in the data:
“Reading in” is how we open files with R. using the function read.csv( ). Remember, we are working with our MMC data first. The code below tells R to load in a .csv file called MMC_data.csv as an object and name <- it “MMC_data”. After running the code, you should see it appear under Data in your Global Environment.
MMC_data<-read.csv("MMC_data.csv")
Quick Checks:
Let’s take some quick first glaces at the data. With the function unique(), we can view how many unique entries there are for a particular column of a data frame. We can combine that with the function length(), which will count how count the number of rows in a data frame, or the “length” of a data frame.
We can use the unique() function to tell us a complete list of iconic taxon names that are present in our data frame. If we stack the length function on top of that, length(unique()), we can quickly count how many different iconic taxon names are present in our data frame.
unique(MMC_data$iconic_taxon_name) #list of all iconic taxon names in the data
## [1] "Actinopterygii" "Aves" "Plantae" "Insecta"
## [5] "Reptilia" "Fungi" "Amphibia" "Mammalia"
## [9] "Arachnida" "Animalia" "Mollusca" ""
<br
length(unique(MMC_data$iconic_taxon_name)) #How many different iconic taxon names are present
## [1] 12
Similarly, we can do the same for the common names to get a count of how many different common names are present in the data, and a comprehensive list of those common names.
unique(MMC_data$common_name) #list of all iconic taxon names in the data
## [1] "Warmouth"
## [2] "Letourneuxi's Jewel Cichlid"
## [3] "Spotted Tilapia"
## [4] "Bluegill"
## [5] "Black-and-white Warbler"
## [6] "Prairie Warbler"
## [7] "Blue-headed Vireo"
## [8] ""
## [9] "sea hibiscus"
## [10] "Northern Cardinal"
## [11] "Zebra Longwing"
## [12] "Satinleaf"
## [13] "slash pine"
## [14] "shoebutton Ardisia"
## [15] "Julia Heliconian"
## [16] "Pigeon Plum"
## [17] "cabbage palmetto"
## [18] "Madagascar Periwinkle"
## [19] "Brown Anole"
## [20] "Shiny-leaved Wild Coffee"
## [21] "bracket fungi"
## [22] "Siam weed"
## [23] "sea grape"
## [24] "colicwood"
## [25] "Firebush"
## [26] "Ray-finned Fishes"
## [27] "Green Heron"
## [28] "Yellow-bellied Slider"
## [29] "Cooters"
## [30] "Fungi Including Lichens"
## [31] "Three-spotted Skipper"
## [32] "Cane Toad"
## [33] "Green Iguana"
## [34] "Northern Mockingbird"
## [35] "Brown Basilisk"
## [36] "White-crowned Sparrow"
## [37] "Clay-colored Sparrow"
## [38] "Ruby-throated Hummingbird"
## [39] "Common Yellowthroat"
## [40] "Empidonax Flycatchers"
## [41] "Gray Fox"
## [42] "Black-throated Blue Warbler"
## [43] "Ovenbird"
## [44] "Wilson's Warbler"
## [45] "Ruby-crowned Kinglet"
## [46] "American Coot"
## [47] "Pied-billed Grebe"
## [48] "Yellow-rumped Warbler"
## [49] "Boat-tailed Grackle"
## [50] "Osprey"
## [51] "Southern Ringneck Snake"
## [52] "Bark Anole"
## [53] "Dorantes Longtail"
## [54] "North American Racer"
## [55] "Band-winged Dragonlet"
## [56] "Belted Kingfisher"
## [57] "Northern Parula"
## [58] "Tersa Sphinx"
## [59] "Vinegar and Fruit Flies"
## [60] "Cotton Stainer Bugs"
## [61] "Common Cotton Stainer Bug"
## [62] "Cassius Blue"
## [63] "Mabel Orchard Orbweaver"
## [64] "Great Pondhawk"
## [65] "Florida Tussock Moth"
## [66] "Monarch"
## [67] "Blue Dasher"
## [68] "Needham's Skimmer"
## [69] "Atala"
## [70] "Everglades Racer"
## [71] "Sleepy Orange"
## [72] "Horace's Duskywing"
## [73] "Diaprepes Root Weevil"
## [74] "Fiery Skipper"
## [75] "Asian Lady Beetle"
## [76] "Halloween Pennant"
## [77] "Gulf Fritillary"
## [78] "Tropical Orbweaver"
## [79] "Butterflies and Moths"
## [80] "Knight Anole"
## [81] "Hammock Skipper"
## [82] "Eastern Phoebe"
## [83] "Common Green Jewel Fly"
## [84] "Large Milkweed Bug"
## [85] "Northern Plushback"
## [86] "Chimney Swift"
## [87] "Calcareous Morning-glory"
## [88] "Killdeer"
## [89] "trailing daisy"
## [90] "Tropical Checkered-Skipper"
## [91] "Umbrella Paper Wasps"
## [92] "Baracoa Skipper"
## [93] "Crambid Snout Moths"
## [94] "Blackpoll Warbler"
## [95] "Cape May Warbler"
## [96] "Orange-crowned Warbler"
## [97] "Cooper's Hawk"
## [98] "painted leaf"
## [99] "stately maiden fern"
## [100] "Bishop wood"
## [101] "pineland heliotrope"
## [102] "castor bean"
## [103] "Bitter Melon"
## [104] "Green Anole"
## [105] "Blue Jay"
## [106] "Gray Catbird"
## [107] "European Starling"
## [108] "Great Blue Heron"
## [109] "Northern Waterthrush"
## [110] "Clouded Skipper"
## [111] "Rice Stink Bug"
## [112] "Variegated Fritillary"
## [113] "Arabesque Orbweaver"
## [114] "Calligrapher Flies"
## [115] "Basidiomycete Fungi"
## [116] "Indigo Bunting"
## [117] "Monk Skipper"
## [118] "Spotted Sandpiper"
## [119] "Solitary Sandpiper"
## [120] "Red-shouldered Hawk"
## [121] "Dusky Herpetogramma Moth"
## [122] "shrubby false buttonweed"
## [123] "Brahminy Blindsnake"
## [124] "Rusty Millipede"
## [125] "Barn Swallow"
## [126] "Eastern Kingbird"
## [127] "Wandering Glider"
## [128] "Orange-spotted Flower Moth"
## [129] "Roseate Skimmer"
## [130] "Cerulean Warbler"
## [131] "Eastern Giant Swallowtail"
## [132] "Woodlice and Pillbugs"
## [133] "rustgills and gyms"
## [134] "Eastern Wood-Pewee"
## [135] "Common Raccoon"
## [136] "Warrior Beetles"
## [137] "Oleander Aphid"
## [138] "Blue-gray Gnatcatcher"
## [139] "Soil Centipedes"
## [140] "Common Gilled Mushrooms and Allies"
## [141] "bluemink"
## [142] "morning-glories"
## [143] "Assembly Moth"
## [144] "Melonworm Moth"
## [145] "Crab Spiders"
## [146] "Summer Tanager"
## [147] "Worm-eating Warbler"
## [148] "Large Orange Sulphur"
## [149] "Ornate Bella Moth"
## [150] "Cuban Grassquit"
## [151] "Red-eyed Vireo"
## [152] "shelf fungi"
## [153] "fringed sawgill"
## [154] "Black Saddlebags"
## [155] "Hooded Warbler"
## [156] "Yellow-throated Warbler"
## [157] "Baltimore Oriole"
## [158] "Blue Grosbeak"
## [159] "Merlin"
## [160] "Scarlet Tanager"
## [161] "Twilight Darner"
## [162] "Golden-winged Warbler"
## [163] "Swainson's Thrush"
## [164] "Milkweed Assassin Bug"
## [165] "Stink Bugs, Shield Bugs, and Allies"
## [166] "Hawaiian Beet Webworm Moth"
## [167] "Common Green Leafhopper"
## [168] "Milky Argyria Moth"
## [169] "House Flies and Allies"
## [170] "Calyptrate Flies"
## [171] "Carpet-grass Webworm Moth"
## [172] "Black-throated Green Warbler"
## [173] "Blackburnian Warbler"
## [174] "Tennessee Warbler"
## [175] "Speckled Duns"
## [176] "Chuck-will's-widow"
## [177] "Sharp-shinned Hawk"
## [178] "Brown Thrasher"
## [179] "Black Rat"
## [180] "White-lined Sphinx"
## [181] "Surinam Cockroach"
## [182] "dayflowers"
## [183] "amaranths"
## [184] "false daisy"
## [185] "Red Saddlebags"
## [186] "blue mistflower"
## [187] "oceanblue morning glory"
## [188] "Common Purslane"
## [189] "Insects"
## [190] "American Redstart"
## [191] "Chamberbitter"
## [192] "Bumblebee Millipede"
## [193] "Myrtle Warbler"
## [194] "Largeflower pink-sorrel"
## [195] "Coffee-loving Pyrausta Moth"
## [196] "Sawflies, Horntails, and Wood Wasps"
## [197] "Florida Carpenter Ant"
## [198] "Flatworms"
## [199] "Anoles"
## [200] "mustard family"
## [201] "straggler daisy"
## [202] "Cuban Tree Frog"
## [203] "Plant Bugs"
## [204] "Click Beetles"
## [205] "Armyworm Moths"
## [206] "Brazilian Leafhopper"
## [207] "Dark Flower Scarab"
## [208] "Banded Garden Spider"
## [209] "Yellow alder"
## [210] "Red Bay Psyllid"
## [211] "redbay"
## [212] "whitetop sedge"
## [213] "Field Copperleaf"
## [214] "Turkey Vulture"
## [215] "Cedar Waxwing"
## [216] "Beetles"
## [217] "largeflower Mexican clover"
## [218] "turkey tangle frogfruit"
## [219] "lanceleaf arrowhead"
## [220] "Bladder Snails"
## [221] "Animals"
## [222] "Mexican Primrose-willow"
## [223] "pickerelweed"
## [224] "stoneworts"
## [225] "dicots"
## [226] "Gumbo Limbo"
## [227] "plants"
## [228] "Long-legged Flies"
## [229] "maiden ferns"
## [230] "Eastern Mosquitofish"
## [231] "Blolly"
## [232] "common lantana"
## [233] "ladder fern"
## [234] "muscadine"
## [235] "Cocoplum"
## [236] "southern live oak"
## [237] "flowering plants"
## [238] "Brazilian pepper"
## [239] "heathers, balsams, primroses, and allies"
## [240] "Red Tasselflower"
## [241] "Florida Strangler Fig"
## [242] "palms, bullanocks, and allies"
## [243] "yellowtops"
## [244] "oaks"
## [245] "Nettletree"
## [246] "Coontie"
## [247] "Wild Tamarind"
## [248] "True Toads"
## [249] "water-lilies"
## [250] "Frogs and Toads"
## [251] "alligator flag"
## [252] "Locustberry"
## [253] "graceful spurge"
## [254] "Ageratums"
## [255] "blue-eyed grasses"
## [256] "dogfennel"
## [257] "golden polypody"
## [258] "Figs"
## [259] "common ragweed"
## [260] "bonesets, blazingstars, and allies"
## [261] "fourspike heliotrope"
## [262] "lilac tasselflower"
## [263] "False Mastic"
## [264] "Myrtle-of-the-River"
## [265] "Airplants"
## [266] "mahogany family"
## [267] "Mammals"
## [268] "pale passionflower"
## [269] "Royal Palm"
## [270] "dahoon holly"
## [271] "Sleepy Morning"
## [272] "Pine fern"
## [273] "Monk Orchid"
## [274] "marsh fern"
## [275] "Necklace pod"
## [276] "wild leadwort"
## [277] "Virginia creeper"
## [278] "bindweed family"
## [279] "ticktrefoils"
## [280] "Higher Ascomycetes"
## [281] "Western Honey Bee"
## [282] "palms"
## [283] "wild potato vine"
## [284] "Hairy Hexagonia"
## [285] "Velvet bean"
## [286] "Romerillo"
## [287] "ferns"
## [288] "saw palmetto"
## [289] "Narrowleaf Yellowtops"
## [290] "rockweed"
## [291] "Moses-in-the-cradle"
## [292] "saw greenbrier"
## [293] "Bayberries"
## [294] "Scarabs"
## [295] "Spinybacked Orbweaver"
## [296] "Florida swampprivet"
## [297] "Monk Parakeet"
## [298] "Plushback Flies"
## [299] "Leptosporangiate Ferns"
## [300] "Bladderworts"
## [301] "Papaya"
## [302] "Tropical sage"
## [303] "Carpenter Ants"
## [304] "Sea Almond"
## [305] "Spotless Lady Beetle"
## [306] "grasses"
## [307] "Scorpion's-tail"
## [308] "orchard spider and allies"
## [309] "high-latitude oaks"
## [310] "nightshades and allies"
## [311] "American hard pines"
## [312] "Grass Skippers"
## [313] "South Florida slash pine"
## [314] "Bahama Brake"
## [315] "grasses, sedges, cattails, and allies"
## [316] "Birds"
## [317] "White Peacock"
## [318] "Spiders"
## [319] "Skimmers"
## [320] "Eastern Gray Squirrel"
## [321] "Red Imported Fire Ant"
## [322] "True Bugs"
## [323] "Cannabis, Hackberries, Hops, and Allies"
## [324] "Giant Sweet Potato Bug"
## [325] "Indian banyan"
## [326] "Water, Rove, Scarab, Long-horned, Leaf, and Snout Beetles"
## [327] "Cichlids"
## [328] "Spiny Fiddlewood"
## [329] "Oyster Mushroom"
## [330] "Long-tailed Skipper"
## [331] "Tricolored Heron"
## [332] "Pin-tailed Pondhawk"
## [333] "White-eyed Vireo"
## [334] "Porterweed"
## [335] "Minor Blueleg Centipede"
## [336] "Cinnabar Bracket"
## [337] "Stubby Hover Fly"
## [338] "Bagworm Moths"
## [339] "Io Moth"
## [340] "Abbot's Bagworm Moth"
## [341] "Black-olive Caterpillar Moth"
## [342] "Common Tan Wave"
## [343] "Twice-stabbed Lady Beetles"
## [344] "Genista Broom Moth"
## [345] "Ceraunus Blue"
## [346] "greater plantain"
## [347] "Mallow Scrub-Hairstreak"
## [348] "Caribbean scoliid wasp"
## [349] "Cabbage Webworm Moth"
## [350] "Southern Green Stink Bug"
## [351] "Caribbean Fruit Fly"
## [352] "Soldier Flies"
## [353] "Palm Flatid Planthopper"
## [354] "Yellow Mocis Moth"
## [355] "Small-spotted Fairy Lady Beetle"
## [356] "Planarians"
## [357] "New Guinea Flatworm"
## [358] "Yellow-legged Hover Fly"
## [359] "white octoblepharum moss"
## [360] "Moderately Smooth Warrior Beetle"
## [361] "Euphorbia Bug"
## [362] "Shoestring Fern"
## [363] "Lauxaniid Flies"
## [364] "Six-Spotted Carpenter Ant"
## [365] "Acalyptrate Flies"
## [366] "Red-banded Stink Bug"
## [367] "Phaon Crescent"
## [368] "Citrine Forktail"
## [369] "Short-horned Grasshoppers"
## [370] "Geometer Moths"
## [371] "Spider Wasps"
## [372] "Fragile Forktail"
## [373] "Three-cornered Alfalfa Hopper"
## [374] "Red Caustic-creeper"
## [375] "hyssop spurge"
## [376] "Eastern Amberwing"
## [377] "West Indian Flatid Planthopper"
## [378] "Paradise Tree"
## [379] "Chinese Crown Orchid"
## [380] "Florida Fast Woodlouse"
## [381] "Darkling Beetles"
## [382] "sword ferns"
## [383] "Goose Grass"
## [384] "painted spurge"
## [385] "Rattail"
## [386] "Rattlepods"
## [387] "legumes"
## [388] "Phasey Bean"
## [389] "Carolina ruellia"
## [390] "pale bitter bolete"
## [391] "Greenbottle Flies"
## [392] "orange bladder"
## [393] "Brazilian Skipper"
## [394] "Candlesnuff Fungus"
## [395] "White-jawed Jumping Spider"
## [396] "Metallic Flea Beetles"
## [397] "Common Milkcaps"
## [398] "indigo milk cap"
## [399] "Bushbeans"
## [400] "Primrose-willows"
## [401] "hairy indigo"
## [402] "sunshine mimosa"
## [403] "mosses"
## [404] "Florida firebush"
## [405] "broomsedge bluestem"
## [406] "Cochineal Nopal Cactus"
## [407] "Ocola Skipper"
## [408] "Bay-breasted Warbler"
## [409] "Yellows and Sulphurs"
## [410] "Gray-cheeked Thrush"
## [411] "toothpetal false reinorchid"
## [412] "Perching Birds"
## [413] "Bur Marigolds"
## [414] "Southern Swamp Crinum"
## [415] "Regal Darner"
## [416] "day jessamine"
## [417] "Cure-for-all"
## [418] "Willow Bustic"
## [419] "pines"
## [420] "bluestems"
## [421] "air potato"
## [422] "Mosquitoes"
## [423] "Button Sage"
## [424] "Leavenworth's Tickseed"
## [425] "Eastern Leaf-footed Bug"
## [426] "Mexican Alvaradoa"
## [427] "Lichens"
## [428] "Loquat"
## [429] "napier grass"
## [430] "leadtrees"
## [431] "American beautyberry"
## [432] "Wood ear fungi"
## [433] "Blue-striped Spreadwing"
## [434] "soapberries, cashews, mahoganies, and allies"
## [435] "Coffee Senna"
## [436] "Blue Porterweed"
## [437] "Cloudless Sulphur"
## [438] "Lentinus sect. Lentinus"
## [439] "manyflower marshpennywort"
## [440] "Magnolia Green Jumping Spider"
## [441] "Scoliid Wasps"
## [442] "Cluster-leafs"
## [443] "tropical milkweed"
## [444] "Indian blanket"
## [445] "splitgill mushroom"
## [446] "woodcaps and sawgills"
## [447] "Great Southern White"
## [448] "Black-sided Calligrapher"
## [449] "Florida Calligrapher"
## [450] "Neotropical Red-shouldered Stink Bug"
## [451] "Bermudagrass Leafhopper"
## [452] "Centipede Tongavine"
## [453] "Bushy Bluestem"
## [454] "Dragonflies"
## [455] "Vasey Grass"
## [456] "Java Glorybower"
## [457] "nerved witchgrass"
## [458] "Twining Soldierbush"
## [459] "West Indian Lilac"
## [460] "limewater brookweed"
## [461] "Swamp Bay"
## [462] "black calabash"
## [463] "Leavenworth's goldenrod"
## [464] "creeping beggarweed"
## [465] "Wire Bluestem"
## [466] "Violet Crabgrass"
## [467] "monocots"
## [468] "sea torchwood"
## [469] "Pepper Cinnamon"
## [470] "Jumping Spiders"
## [471] "Hydrilla"
## [472] "Candy Cap"
## [473] "Red-bellied Woodpecker"
## [474] "pinwheels and parachute mushrooms"
## [475] "rosy navel"
## [476] "Swamplilies"
## [477] "Mustard Yellow Polypore"
## [478] "Conifer-base Polypore"
## [479] "Sensitive Plant"
## [480] "Orange-barred Sulphur"
## [481] "Evening Skimmer"
## [482] "Sunshine Tree"
## [483] "Slippery Jacks"
## [484] "snow fungus"
## [485] "Burma Reed"
length(unique(MMC_data$common_name)) #How many different iconic taxon names are present
## [1] 485
Creating Pivot Tables:
To quickly browse our observation data, let’s build a quick pivot table. As a reminder, anything you see after a " # " are just notes, and not executable code.
MMC_pivot<- PivotTable$new() #tell R you want to make a new pivot table called "MMC_pivot"
MMC_pivot$addData(MMC_data) #tell R the data frame you want to use
MMC_pivot$addRowDataGroups("iconic_taxon_name") #tell R which data columns you want to appear on the rows
MMC_pivot$addRowDataGroups("common_name", addTotal = TRUE) #tell R which data you want appear as subset on the rows
MMC_pivot$defineCalculation(calculationName="Total Observations", summariseExpression="n()") #counts the number of time each common name per taxon group was observed
MMC_pivot$renderPivot()#tells R that you are ready for it to create your pivot table
After you run the code, you should see a pivot table in the Viewer tab that looks similar to the one below:
If everything looks OK, let’s go ahead and save a copy of your table into an Excel Workbook. createWorkbook(creator=Sys.getenv("USERNAME"))
MMC_pivot_excel <- createWorkbook(creator = Sys.getenv("USERNAME"))#tell R what you would like the name of the workbook to be. Don't change "USERNAME"
addWorksheet(MMC_pivot_excel, "MMC_taxon_data")#tell R what you would like the tab/worksheet of the workbook to be called
MMC_pivot$writeToExcelWorksheet(wb=MMC_pivot_excel, wsName="MMC_taxon_data",
topRowNumber=1, leftMostColumnNumber=1, applyStyles=FALSE)#tell R which workbook you would like to use, and the name of the tab/worksheet of that workbook, and where you would like the table to be placed on the page
saveWorkbook(MMC_pivot_excel, file="MMC_pivot.xlsx", overwrite = TRUE)#Saves the file in your working directory under the name that you have chosen for "file="
Converting and Extracting Time:
Before we get started with calculations, let’s convert the “time_observed_at” column out of UTC and into a more meaningful time zone. Although there are several different time zones listed in the data frame unique(MMC_data$time_zone), we know that all observations from the MMC_data.csv file were made at MMC. Therefore, it is OK to assume that all observations were made originally under the “US/Eastern” time zone, since MMC is located in this time zone. Here, we first make a new column, local_time, in the “MMC_data” data frame, to store the “local time” once it is converted out of the UTC time zone.
format="%Y-%m-%d %H:%M:%S", tz='UTC',format="%Y-%m-%d %H:%M:%S", tz='US/Eastern'.MMC_data$local_time<-MMC_data$time_observed_at #Create new column for local date time
MMC_data$local_time <- as.POSIXct(MMC_data$local_time, format="%Y-%m-%d %H:%M:%S", tz='UTC') #put in posixct
MMC_data$local_time<-format(MMC_data$local_time,tz='US/Eastern') #format to local time zone (this step will change the format back to character)
MMC_data$local_time<-as.POSIXct(MMC_data$local_time,format="%Y-%m-%d %H:%M",tz='US/Eastern') #put back in posixct
Let’s practice extracting time by isolating the year, month, and hour of each observation and copy those data into their own respective columns titled “year”, “month”, and “hour”.
MMC_data$year<-format(MMC_data$local_time, format = "%Y") #add column for year
MMC_data$month<-format(MMC_data$local_time, format = "%m") #add column for month
MMC_data$hour<-format(MMC_data$local_time, format = "%H") #add column for time of day
Notice how the bottom two rows of column 1, time_observed_at, have no data. This is because the observer failed to record the information when they submitted their upload.
| time_observed_at | time_zone | captive_cultivated | latitude | longitude | scientific_name | common_name | iconic_taxon_name | local_time | year | month | hour |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2014-10-10 13:19:25 UTC | Eastern Time (US & Canada) | false | 25.75595 | -80.37972 | Lepomis gulosus | Warmouth | Actinopterygii | 2014-10-10 09:19:00 | 2014 | 10 | 09 |
| 2015-05-20 11:47:58 UTC | Eastern Time (US & Canada) | false | 25.75583 | -80.37973 | Hemichromis letourneuxi | Letourneuxi’s Jewel Cichlid | Actinopterygii | 2015-05-20 07:47:00 | 2015 | 05 | 07 |
| 2014-12-18 15:08:26 UTC | Eastern Time (US & Canada) | false | 25.75572 | -80.37967 | Pelmatolapia mariae | Spotted Tilapia | Actinopterygii | 2014-12-18 10:08:00 | 2014 | 12 | 10 |
| 2014-12-18 14:47:43 UTC | Eastern Time (US & Canada) | false | 25.75585 | -80.37971 | Lepomis macrochirus | Bluegill | Actinopterygii | 2014-12-18 09:47:00 | 2014 | 12 | 09 |
| Eastern Time (US & Canada) | false | 25.75432 | -80.37931 | Mniotilta varia | Black-and-white Warbler | Aves | NA | NA | NA | NA | |
| Eastern Time (US & Canada) | false | 25.75447 | -80.37908 | Setophaga discolor | Prairie Warbler | Aves | NA | NA | NA | NA |
Let’s save a copy of this data frame and call it MMC_timedata — in case we want to reference these data later.
Since this data frame is centered around time, we will also remove any rows where there was no time of observation, and therefore local time, recorded.
Finally, we will save MMC_timedata to our working directory as a .csv named MMC_timedata.csv.
MMC_timedata<-select(MMC_data,"local_time" ,"year", "month", "hour", "iconic_taxon_name", "common_name")#select the columns we want to keep
MMC_timedata<-MMC_timedata[!is.na(MMC_timedata$local_time),] #remove all rows where there is no local time
write.csv(MMC_timedata,"MMC_timedata.csv", row.names = FALSE) #Save a copy of MMC_diversity data frame as a .csv
Notice how rows 5-7 are now excluded from the data frame:
| local_time | year | month | hour | iconic_taxon_name | common_name | |
|---|---|---|---|---|---|---|
| 1 | 2014-10-10 09:19:00 | 2014 | 10 | 09 | Actinopterygii | Warmouth |
| 2 | 2015-05-20 07:47:00 | 2015 | 05 | 07 | Actinopterygii | Letourneuxi’s Jewel Cichlid |
| 3 | 2014-12-18 10:08:00 | 2014 | 12 | 10 | Actinopterygii | Spotted Tilapia |
| 4 | 2014-12-18 09:47:00 | 2014 | 12 | 09 | Actinopterygii | Bluegill |
| 8 | 2015-09-11 14:24:00 | 2015 | 09 | 14 | Plantae | |
| 9 | 2015-09-11 14:29:00 | 2015 | 09 | 14 | Plantae | sea hibiscus |
Biodiversity Measurements and Indices:
Summary table of biodiversity measurements and indices:
| Measurement/Index | Formula |
|---|---|
| Species Richness (S) | \(S =\) number of species |
| Shannon-Wiener Diversity Index (H) | \(H = -\sum{(p_i * ln(p_i))}\) |
| Species Eveness (E) | \(S = \frac{H}{H_{MAX}}\) |
| Effective Number of Species (ENS) | \(ENS = e^H\) |
Other parameters and definitions needed to calculate the items above:
| Metric | Description | Formula |
|---|---|---|
| Total Abundance (NTOTAL) | Total number of individuals across all species | |
| ni | Total number of individuals of a species, \(x\) | |
| pi | Proportion of individuals of a species, \(x\), compared to \(N_{TOTAL}\) | \(p_i = \frac{n_i}{N_{TOTAL}}\) |
The chunk of code below might look intimidating, but its OK! Remember, we don’t expect you to know the complete ins and outs of all the “behind the scenes” work that is happening in the code. You can follow along what is happening in each step by reading the notes that follow the #.
MMC_spp_rich <- MMC_data %>%
group_by(iconic_taxon_name) %>%
summarise(S = length(unique(common_name)))#Creates column with each unique taxon ID with the corresponding species richness of each taxon
MMC_sum_taxoN <- MMC_data %>%
group_by(iconic_taxon_name) %>%
summarise(Total_Abundance_per_Taxon = n()) #Creates a column with each unique taxon ID with the corresponding total abundance (total observations) for each taxon
MMC_var<-MMC_spp_rich %>%
inner_join(., MMC_sum_taxoN) #merges the previous two data frames by the list of unique taxon
#"." after inner_join is a place holder that denotes it is the same data frame that is listed above (short hand)
MMC_spp_abundance <- MMC_data %>%
group_by(iconic_taxon_name, common_name) %>%
summarise(Species_Abundance = n()) #Creates a data frame with the complete list of common names and their summed abundances
MMC_biod<-inner_join(MMC_spp_abundance,MMC_var)#Join MMC_spp_abundance with MMC_var
MMC_biod$Pi<-(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon)#Calculate Pi
MMC_biod$lnPi<-(log(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon))#Calculate ln(pi)
MMC_biod$PixlnPi<-(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon*log(MMC_biod$Species_Abundance/MMC_biod$Total_Abundance_per_Taxon))#Calculate Pi*ln(Pi)
MMC_diversity<-as.data.frame(MMC_var)#Make new data frame for final numbers
MMC_H <- MMC_biod %>%
group_by(iconic_taxon_name) %>%
summarise(H=(sum(PixlnPi))*(-1))#Calculate H
MMC_diversity<-inner_join(MMC_diversity,MMC_H)#Append H to MMC_diversity
MMC_Hmax <- MMC_biod %>%
group_by(iconic_taxon_name) %>%
summarise(Hmax=log(S))#Calculate Hmax
MMC_diversity<-inner_join(MMC_diversity,distinct(MMC_Hmax,))#Append Hmax to MMC_diversity
MMC_diversity$E<-(MMC_diversity$H)/(MMC_diversity$Hmax)#Calculated E and append it to MMC_diversity
MMC_diversity$ENS<-exp(MMC_diversity$H)#Calculated ENS and append it to MMC_diversity
MMC_diversity data frame:
| iconic_taxon_name | S | Total_Abundance_per_Taxon | H | Hmax | E | ENS |
|---|---|---|---|---|---|---|
| 1 | 6 | 0.0000000 | 0.0000000 | NaN | 1.000000 | |
| Actinopterygii | 7 | 12 | 1.8200760 | 1.9459101 | 0.9353340 | 6.172327 |
| Amphibia | 4 | 11 | 1.2882523 | 1.3862944 | 0.9292776 | 3.626443 |
| Animalia | 11 | 26 | 1.9203009 | 2.3978953 | 0.8008277 | 6.823011 |
| Arachnida | 12 | 27 | 2.2585197 | 2.4849066 | 0.9088952 | 9.568914 |
| Aves | 73 | 152 | 4.0463972 | 4.2904594 | 0.9431151 | 57.191039 |
| Fungi | 29 | 75 | 3.0200424 | 3.3672958 | 0.8968747 | 20.492160 |
| Insecta | 143 | 431 | 4.0213775 | 4.9628446 | 0.8102969 | 55.777887 |
| Mammalia | 5 | 18 | 1.2260761 | 1.6094379 | 0.7618039 | 3.407831 |
| Mollusca | 2 | 4 | 0.5623351 | 0.6931472 | 0.8112781 | 1.754765 |
| Plantae | 191 | 667 | 4.6527333 | 5.2522734 | 0.8858513 | 104.871243 |
| Reptilia | 13 | 104 | 2.0426519 | 2.5649494 | 0.7963712 | 7.711031 |
Great! Now lets save a copy of data frame MMC_diversity as a .csv.
write.csv(MMC_diversity,"MMC_diversity.csv", row.names = FALSE)
During that big chunk of code, we created another data frame: MMC_spp_abundance. This data frame is similar to the pivot table we created earlier and was created by the code:
MMC_spp_abundance <- MMC_data %>%
group_by(iconic_taxon_name, common_name) %>%
summarise(Species_Abundance = n())
First 6 rows of data frame MMC_spp_abundance:
| iconic_taxon_name | common_name | Species_Abundance |
|---|---|---|
| 6 | ||
| Actinopterygii | Bluegill | 3 |
| Actinopterygii | Cichlids | 1 |
| Actinopterygii | Eastern Mosquitofish | 2 |
| Actinopterygii | Letourneuxi’s Jewel Cichlid | 1 |
| Actinopterygii | Ray-finned Fishes | 1 |
Let’s save a copy of data frame MMC_spp_abundance as a .csv, also!
write.csv(MMC_spp_abundance,"MMC_spp_abundance.csv", row.names = FALSE)
Before we wrap up with MMC_data.csv, write a .csv with specific columns from the MMC_data data frame for us to use in Part 3: Mapping and Visualizing Data in Google Earth. Call the data frame MMC_latlon_year and save it as "MMC_latlon_year.csv" in our working directory. The data frame will include the following columns:
# Make a data frame with only latitude, longitude, year, and iconic_taxon_name
MMC_latlon_year <- MMC_data %>% select(latitude,longitude, year, iconic_taxon_name)
#Save a copy of MMC_latlon_year as a .csv
write.csv(MMC_latlon_year,"MMC_latlon_year.csv", row.names = FALSE)
First 6 rows of data frame MMC_latlon_year:
| latitude | longitude | year | iconic_taxon_name |
|---|---|---|---|
| 25.75595 | -80.37972 | 2014 | Actinopterygii |
| 25.75583 | -80.37973 | 2015 | Actinopterygii |
| 25.75572 | -80.37967 | 2014 | Actinopterygii |
| 25.75585 | -80.37971 | 2014 | Actinopterygii |
| 25.75432 | -80.37931 | NA | Aves |
| 25.75447 | -80.37908 | NA | Aves |
NOTICE: We have now COMPLETED all code for MMC_data!At this point, you should have the following files in the folder designated as your working directory:
- MMC_pivot.xlsx
- MMC_diversity.csv
- MMC_spp_abundance.csv
- MMC_latlon_year.csv
Saveyour R script now: Go toFile>Save
- Go to
File>New File>R Script
- An empty script file should open up un another tab
- Come back to this script (MMC_getting-started.R)
- Press
Ctrl+A(select/highlight all) thenCtrl+C(copy)
- Go back to the new, blank script and press
Ctrl+V(paste)
- In your new script, press
Ctrl+F(find)
- A search box should appear above your script window
- Press
Ctrl+Ato select ALL text in your script window
- Click on the white check box that says
In selection
- In the search bar that says
Find, type “MMC_”
- Click the box next to the
Findbar that saysALL
- This will highlight all instances of “MMC_” in the selected text
- Quickly scroll through and check that only “MMC_” is highlighted
- In the bar that says
Replace, type “BBC_”
- Once you are sure the steps above have been followed correctly, after the bar that says
Replace, clickAll.
- This will replace all instances of “MMC_” with “BBC_”
- By changing these prefixes of our object and value names, we can use the same code without “saving over” the values and objects already saved in our Global Environment.
- Give another quick scroll to make sure changes have been made correctly
- Save a copy of the updated script:
- Go to:
File>Save Asand rename the script “BBC_getting-started.R”
- Make sure to
Save Asand not Save. We don’t want to overwrite our MMC script by mistake!
→ If you haven’t already, open the R script file “BBC_getting-started.R” with RStudio.
Remember, packages only need to be installed once, but they must be loaded during each session. Since we have already installed these packages, put a # in front of each of the three lines of install.packages code:
install.packages("pivottabler")
install.packages("openxlsx")
install.packages("dplyr")
#install.packages("pivottabler")
#install.packages("openxlsx")
#install.packages("dplyr")
This will retain the lines of code in case we want to reference them again, but will prevent them from being executed by R.
However, we DO still want clear the memory from our Global Environment Pane. Since we are now working with the new script, “BBC-getting-started.R”, we want to make sure that the code from the current script isn’t accidentally dependent or otherwise influenced by the code on the previous script, _“MMC_getting-started.R”.
Therefore, we will leave rm(list = ls()) as is.
One of the many perks of working with RStudio and other programming software over programs such as Google Sheets or Microsoft Excel is that once the code is written, you can execute the entire script at the press of a button, ok, two buttons, instead of having to click through everything again and again.
We will still have the step-by-step instructions and explanations for each line of code below as with the previous script, but this time, we are going to run the entire script all at once!
Press
Ctrl+A
PressCtrl+Enter
- It might take a few moments, but you should now have the BBC version of the outputs we created with the last script.
At this point, you should have the following files in the folder designated as your working directory:
- BBC_pivot.xlsx
- BBC_diversity.csv
- BBC_spp_abundance.csv
- BBC_latlon_year.csv
Setting the Working Directory:
Unlike with most common computer programs, you cannot simply “open” a data file within RStudio. You must code for it by telling R WHERE the file is located and WHAT the file is called. We do this by setting the working directory. This lets R know from where it should pull files. This can be done manually by code setwd() or by using the keyboard shortcut Ctrl+Shift+H and selecting the appropriate folder. We will use the keyboard shortcut this time. Select the folder location designated by your TA.
Now that you have selected your working directory, lets check how R records our working directory location:
getwd()
## [1] "C:/Users/Kelle/OneDrive/PhD_Files/Academic Year 2021-2022/Head TA/Fall 2021/BSC2011L/Updated Labs/Lab2/Scripts"
Your working directory will be different than what is printed here since this script was run on a different computer. Notice how R prints the name of your working directory. By following that same format, you could now write it into your code with the function setwd() instead by typing the name of your working directory inside of the function parentheses using quotation marks (“…”).
Example: setwd("C:/Users/Kelle/OneDrive/Desktop/Lab2")
Reading in the Data:
Remember, we are working with our BBC data first. The code below tells R to load in a .csv file called BBC_data.csv as an object and name <- it “BBC_data”. After running the code, you should see it appear under Data in your Global Environment.
BBC_data<-read.csv("BBC_data.csv")
Quick Checks:
Let’s take some quick first glaces at the data. With the function unique(), we can view how many unique entries there are for a particular column of a data frame. We can combine that with the function length(), which will count how count the number of rows in a data frame, or the “length” of a data frame.
We can use the unique() function to tell us a complete list of iconic taxon names that are present in our data frame. If we stack the length function on top of that, length(unique()), we can quickly count how many different iconic taxon names are present in our data frame.
unique(BBC_data$iconic_taxon_name) #list of all iconic taxon names in the data
## [1] "Plantae" "Aves" "Animalia" "Mollusca"
## [5] "Insecta" "Arachnida" "Reptilia" "Mammalia"
## [9] "Actinopterygii" "Amphibia" "Chromista" "Fungi"
## [13] ""
length(unique(BBC_data$iconic_taxon_name)) #How many different iconic taxon names are present
## [1] 13
Similarly, we can do the same for the common names to get a count of how many different common names are present in the data, and a comprehensive list of those common names.
unique(BBC_data$common_name) #list of all iconic taxon names in the data
## [1] "sea grape"
## [2] "Egyptian Goose"
## [3] ""
## [4] "Variegated Sea Urchin"
## [5] "Atlantic Rock-boring Urchin"
## [6] "Bristle Worms"
## [7] "Caribbean Reef Squid"
## [8] "Yellow Stingray"
## [9] "Florida Tussock Moth"
## [10] "Green Anole"
## [11] "Wood Stork"
## [12] "Silver Garden Orbweaver"
## [13] "Gulf Fritillary"
## [14] "Polychaete Worms"
## [15] "Insects"
## [16] "Mangrove Periwinkle"
## [17] "Spinybacked Orbweaver"
## [18] "Spiders"
## [19] "Arthropods"
## [20] "black mangrove"
## [21] "Button Sage"
## [22] "ladder fern"
## [23] "Turkey Vulture"
## [24] "Halloween Pennant"
## [25] "Atala"
## [26] "Eastern Lubber Grasshopper"
## [27] "Beach Sunflower"
## [28] "Monk Skipper"
## [29] "Blue-Green Citrus Root Weevil"
## [30] "Great Blue Heron"
## [31] "Florida Softshell Turtle"
## [32] "Domestic Mallard"
## [33] "Mangrove Skipper"
## [34] "Firebush"
## [35] "Shiny-leaved Wild Coffee"
## [36] "American beautyberry"
## [37] "red mangrove"
## [38] "Romerillo"
## [39] "White Ibis"
## [40] "Zebra Longwing"
## [41] "Indian blanket"
## [42] "Thoracotrematan Crabs"
## [43] "Tersa Sphinx"
## [44] "Thinstripe Hermit Crab"
## [45] "Tricolored Heron"
## [46] "Osprey"
## [47] "Tropical Orbweaver"
## [48] "Cure-for-all"
## [49] "Scorpion's-tail"
## [50] "Burma Reed"
## [51] "Monk Orchid"
## [52] "common ragweed"
## [53] "Rust Weed"
## [54] "Caesar weed"
## [55] "Cocoplum"
## [56] "Herb-of-Grace"
## [57] "Streaked Rattlepod"
## [58] "Common Purslane"
## [59] "Narrowleaf Yellowtops"
## [60] "Madagascar Periwinkle"
## [61] "Chinese Crown Orchid"
## [62] "Caribbean scoliid wasp"
## [63] "Asian Lady Beetle"
## [64] "Margined Leatherwing Beetle"
## [65] "Green Iguana"
## [66] "Diaprepes Root Weevil"
## [67] "coinvine"
## [68] "castor bean"
## [69] "Spanish moss"
## [70] "Domestic Muscovy Duck"
## [71] "Boat-tailed Grackle"
## [72] "Florida Strangler Fig"
## [73] "Florida Carpenter Ant"
## [74] "legumes"
## [75] "Eastern Gray Squirrel"
## [76] "Regal Jumping Spider"
## [77] "Bark Anole"
## [78] "West Indian Bulimulus"
## [79] "Dark Flower Scarab"
## [80] "Little Blue Heron"
## [81] "Curacao Bush"
## [82] "Portia tree"
## [83] "Blue Land Crab"
## [84] "Checkered Puffer"
## [85] "Orbweavers"
## [86] "flowering plants"
## [87] "largeflower Mexican clover"
## [88] "Common Comb Jelly"
## [89] "Polka-Dot Wasp Moth"
## [90] "Atlantic Flyingfish"
## [91] "Brazilian pepper"
## [92] "Anhinga"
## [93] "dicots"
## [94] "oakleaf fleabane"
## [95] "graceful spurge"
## [96] "Grassleaf Spurge"
## [97] "beach naupaka"
## [98] "Snow Squarestem"
## [99] "trailing daisy"
## [100] "True Toads"
## [101] "White Mangrove"
## [102] "sea ox-eye"
## [103] "Cane Toad"
## [104] "Knight Anole"
## [105] "Brown Anole"
## [106] "Northern Mockingbird"
## [107] "Whiteflies"
## [108] "Double-crested Cormorant"
## [109] "Green Heron"
## [110] "Beaded Periwinkle"
## [111] "Yellow-throated Warbler"
## [112] "Blolly"
## [113] "Great Egret"
## [114] "Raspberry Wave"
## [115] "Scraped Pilocrocis Moth"
## [116] "Watermilfoil Leafcutter Moth"
## [117] "Common Green Darner"
## [118] "Three-spotted Skipper"
## [119] "Common Lovebug"
## [120] "Phaon Crescent"
## [121] "Poey's Furrow Bee"
## [122] "Northern Plushback"
## [123] "Mountain Fig"
## [124] "Bright Beefly"
## [125] "clustered yellowtops"
## [126] "Dilemma Orchid Bee"
## [127] "Keyhole Wasp"
## [128] "Fourleaf Vetch"
## [129] "rougeplant"
## [130] "Ghost Ant"
## [131] "American Broad-front Fiddler Crabs"
## [132] "Brown-winged Striped Sweat Bee"
## [133] "Painted Lady"
## [134] "Red-marked Pachodynerus Wasp"
## [135] "Velvetbean Caterpillar Moth"
## [136] "Bay-breasted Warbler"
## [137] "Blue-gray Gnatcatcher"
## [138] "Yellow-bellied Sapsucker"
## [139] "Short-tailed Hawk"
## [140] "Swainson's Warbler"
## [141] "White leadtree"
## [142] "American Coot"
## [143] "Loggerhead Shrike"
## [144] "Streaked Sphinx"
## [145] "Green Feather Alga"
## [146] "West Indian False Cerith"
## [147] "Sponges"
## [148] "Short-legged Springtails"
## [149] "Brown Pelican"
## [150] "mangroves"
## [151] "Vetches"
## [152] "Bur Marigolds"
## [153] "Chamberbitter"
## [154] "carrots, ivies, and allies"
## [155] "woodsorrels"
## [156] "creeping cucumber"
## [157] "shrubby false buttonweed"
## [158] "Blue-legged Hermit Crab"
## [159] "Mulsant's Water Treader"
## [160] "Everglades Racer"
## [161] "Carolina ponysfoot"
## [162] "limewater brookweed"
## [163] "Seven-year Apple"
## [164] "Sea Almond"
## [165] "Hog plum"
## [166] "Figs"
## [167] "Lined Treesnail"
## [168] "American Bird Grasshopper"
## [169] "West Indian milkberry"
## [170] "Beefwoods"
## [171] "Anoles"
## [172] "Moses-in-the-cradle"
## [173] "oaks"
## [174] "Jamaican feverplant"
## [175] "Yellow-crowned Night-Heron"
## [176] "Sri Lanka Weevil"
## [177] "Brachyceran Flies"
## [178] "turkey tangle frogfruit"
## [179] "manyflower marshpennywort"
## [180] "Domestic Cat"
## [181] "Virginia Opossum"
## [182] "Eurasian Collared-Dove"
## [183] "St. Andrew's Cotton Stainer"
## [184] "cabbage palmetto"
## [185] "Carpenter Ants"
## [186] "Common Raccoon"
## [187] "Rattail"
## [188] "Fishbone Fern"
## [189] "coconut palm"
## [190] "Western Honey Bee"
## [191] "Cobweb Spiders"
## [192] "Silversides"
## [193] "Ray-finned Fishes"
## [194] "Tidewater Mojarra"
## [195] "Shads, Sardines, and Menhadens"
## [196] "Long-legged Flies"
## [197] "Mexican ruellia"
## [198] "leadtrees"
## [199] "Southern Flannel Moth"
## [200] "Cattle Egret"
## [201] "Black Vulture"
## [202] "Common Gallinule"
## [203] "Gumbo Limbo"
## [204] "Beetles"
## [205] "creeping beggarweed"
## [206] "ticktrefoils"
## [207] "fanflowers"
## [208] "Virginia creeper"
## [209] "Chinese banyan"
## [210] "Saltwort"
## [211] "Cotton Stainer Bugs"
## [212] "Sargassum"
## [213] "Queen Conch"
## [214] "Barracudas"
## [215] "Atlantic Needlefish"
## [216] "Great Barracuda"
## [217] "Tinted Cantharus"
## [218] "Yellow Jack"
## [219] "Timucu Needlefish"
## [220] "Sargassumfish"
## [221] "Mojarras"
## [222] "Flat Needlefish"
## [223] "plants"
## [224] "Needlefishes"
## [225] "Animals"
## [226] "Slender Mojarra"
## [227] "gentians, dogbanes, madders, and allies"
## [228] "Hermit Crabs"
## [229] "Bay Anchovy"
## [230] "Banded Coral Shrimp"
## [231] "Swimming Crabs"
## [232] "sea purslane"
## [233] "White Peacock"
## [234] "Bandtail Puffer"
## [235] "Puffers and Filefishes"
## [236] "Malacostracans"
## [237] "Grunts"
## [238] "Perch-like Fishes"
## [239] "Pinfish"
## [240] "woodcaps and sawgills"
## [241] "green algae"
## [242] "American eelgrass"
## [243] "Frogs and Toads"
## [244] "Northern Barracuda"
## [245] "Royal Palm"
## [246] "Varunid crabs"
## [247] "grasses"
## [248] "Artist's Brackets, Reishi, and Allies"
## [249] "Three-striped Dasher"
## [250] "stately maiden fern"
## [251] "Bagworms, Clothes Moths, and Allies"
## [252] "bluestems"
## [253] "shelf fungi"
## [254] "Tridax daisy"
## [255] "Fungi Including Lichens"
## [256] "Higher Ascomycetes"
## [257] "Scarabs"
## [258] "rockweed"
## [259] "Ants"
## [260] "June Beetles"
## [261] "Broad-leaved gulfweed"
## [262] "Golden Silk Spider"
## [263] "Indian banyan"
## [264] "Winged and Once-winged Insects"
## [265] "True Bugs"
## [266] "Southern Green-striped Grasshopper"
## [267] "monocots"
## [268] "Sargassum and Allies"
## [269] "Birds"
## [270] "Leaf-footed Bugs and Allies"
## [271] "Hydrilla"
## [272] "water pennyworts"
## [273] "Creeping Woodsorrel"
## [274] "Northern Curly-tailed Lizard"
## [275] "baldcypresses"
## [276] "Brown Bullhead"
## [277] "Musk Fern"
## [278] "yellowtops"
## [279] "Phasey Bean"
## [280] "North American Freshwater Catfishes"
## [281] "Eastern Mosquitofish"
## [282] "Schoolmaster Snapper"
## [283] "Great White Heron"
## [284] "Crabs, Lobsters, and Allies"
## [285] "House Geckos"
## [286] "Greenhouse Frog"
## [287] "Orange-spotted Flower Moth"
## [288] "Louisiana Waterthrush"
## [289] "Milkweed Assassin Bug"
## [290] "Pantropical Jumping Spider"
## [291] "Jaguar Guapote"
## [292] "Flamingo Chanterelle"
## [293] "Prothonotary Warbler"
## [294] "Sesarmid Marsh Crabs"
## [295] "Squareback Marsh Crab"
## [296] "sweet potato"
## [297] "Coffee-loving Pyrausta Moth"
## [298] "Caribbean Land Hermit Crab"
## [299] "Gregorywood"
## [300] "White-flowered black mangrove"
## [301] "Rustic Sphinx"
## [302] "Typical Herons and Egrets"
## [303] "Rusty Millipede"
## [304] "Black Saddlebags"
## [305] "ballmoss"
## [306] "Bivalves"
## [307] "Checkered Nerite"
## [308] "Gotu Cola"
## [309] "Snowy Egret"
## [310] "Red-bellied Woodpecker"
## [311] "Southern Tussock Moth"
## [312] "Beach Sheoak"
## [313] "Florida Stone Crab"
## [314] "deer mushroom"
## [315] "Camphor Tree"
## [316] "Red Imported Fire Ant"
## [317] "Skullcap Dapperling"
## [318] "southern live oak"
## [319] "Bluestriped Grunt"
## [320] "Mangrove Tree Crab"
## [321] "Mexican Paper Wasp"
## [322] "Garden Orbweavers"
## [323] "Purple-edged Lute"
## [324] "Gastropods"
## [325] "Mangrove Cupped Oyster"
## [326] "Assassin Bugs"
## [327] "Common Grackle"
## [328] "Flat Tree Oyster"
## [329] "Fish Crow"
## [330] "Blue Jay"
## [331] "Flies"
## [332] "Band-winged Dragonlet"
## [333] "Indian fig opuntia"
## [334] "Butterflies and Moths"
## [335] "Longspined Porcupinefish"
## [336] "Cochineal Nopal Cactus"
## [337] "Inshore Lizardfish"
## [338] "Scorpionfishes"
## [339] "Planehead filefish"
## [340] "Longhorn Crazy Ant"
## [341] "shoebutton Ardisia"
## [342] "Knotted Spikerush"
## [343] "vascular plants"
## [344] "Giant leather fern"
## [345] "dayflowers"
## [346] "Southern Moon Jelly"
## [347] "White Mullet"
## [348] "Hairy Hexagonia"
## [349] "marsh elder"
## [350] "Rambur's Forktail"
## [351] "redbay"
## [352] "Scarlet Skimmer"
## [353] "Herrings"
## [354] "True Whelks and Allies"
## [355] "Silver Jenny"
## [356] "Umbrella Paper Wasps"
## [357] "Dragonflies"
## [358] "Pigeon Plum"
## [359] "Four-toothed Nerite"
## [360] "Gray Wall Jumping Spider"
## [361] "Twig Ants"
## [362] "Fig Sphinx"
## [363] "Broad-tipped Conehead"
length(unique(BBC_data$common_name)) #How many different iconic taxon names are present
## [1] 363
Pivot Tables:
To quickly browse our observation data, let’s build a quick pivot table. As a reminder, anything you see after a " # " are just notes, and not executable code.
BBC_pivot<- PivotTable$new() #tell R you want to make a new pivot table called "BBC_pivot"
BBC_pivot$addData(BBC_data) #tell R the data frame you want to use
BBC_pivot$addRowDataGroups("iconic_taxon_name") #tell R which data columns you want to appear on the rows
BBC_pivot$addRowDataGroups("common_name", addTotal = TRUE) #tell R which data you want appear as subset on the rows
BBC_pivot$defineCalculation(calculationName="Total Observations", summariseExpression="n()") #counts the number of time each common name per taxon group was observed
BBC_pivot$renderPivot()#tells R that you are ready for it to create your pivot table
After you run the code, you should see a pivot table in the Viewer tab that looks similar to the one below:
If everything looks OK, let’s go ahead and save a copy of your table into an Excel Workbook.
BBC_pivot_excel <- createWorkbook(creator = Sys.getenv("USERNAME"))#tell R what you would like the name of the workbook to be. Don't change "USERNAME"
addWorksheet(BBC_pivot_excel, "BBC_taxon_data")#tell R what you would like the tab/worksheet of the workbook to be called
BBC_pivot$writeToExcelWorksheet(wb=BBC_pivot_excel, wsName="BBC_taxon_data",
topRowNumber=1, leftMostColumnNumber=1, applyStyles=FALSE)#tell R which workbook you would like to use, and the name of the tab/worksheet of that workbook, and where you would like the table to be placed on the page
saveWorkbook(BBC_pivot_excel, file="BBC_pivot.xlsx", overwrite = TRUE)#Saves the file in your working directory under the name that you have chosen for "file="
Converting and Extracting Time:
Before we get started with calculations, let’s convert the “time_observed_at” column out of UTC and into a more meaningful time zone. Although there are several different time zones listed in the data frame unique(BBC_data$time_zone), we know that all observations from the BBC_data.csv file were made at BBC. Therefore, it is OK to assume that all observations were made originally under the “US/Eastern” time zone, since BBC is located in this time zone. Here, we first make a new column, local_time, in the “BBC_data” data frame, to store the “local time” once it is converted out of the UTC time zone.
Then, we tell R:
format="%Y-%m-%d %H:%M:%S", tz='UTC',format="%Y-%m-%d %H:%M:%S", tz='US/Eastern' BBC_data$local_time<-BBC_data$time_observed_at #Create new column for local date time
BBC_data$local_time <- as.POSIXct(BBC_data$local_time, format="%Y-%m-%d %H:%M:%S", tz='UTC') #put in posixct
BBC_data$local_time<-format(BBC_data$local_time,tz='US/Eastern') #format to local time zone (this step will change the format back to character)
BBC_data$local_time<-as.POSIXct(BBC_data$local_time,format="%Y-%m-%d %H:%M",tz='US/Eastern') #put back in posixct
Let’s practice extracting time by isolating the year, month, and hour of each observation and copy those data into their own respective columns titled “year”, “month”, and “hour”.
BBC_data$year<-format(BBC_data$local_time, format = "%Y") #add column for year
BBC_data$month<-format(BBC_data$local_time, format = "%m") #add column for month
BBC_data$hour<-format(BBC_data$local_time, format = "%H") #add column for time of day
Notice how only two rows of column 1, time_observed_at, have data. This is because the observer failed to record the information when they submitted their upload.
| time_observed_at | time_zone | captive_cultivated | latitude | longitude | scientific_name | common_name | iconic_taxon_name | local_time | year | month | hour |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Eastern Time (US & Canada) | false | 25.90932 | -80.13774 | Coccoloba uvifera | sea grape | Plantae | NA | NA | NA | NA | |
| 2015-05-19 19:00:00 UTC | Eastern Time (US & Canada) | false | 25.90927 | -80.13847 | Alopochen aegyptiaca | Egyptian Goose | Aves | 2015-05-19 15:00:00 | 2015 | 05 | 15 |
| 2015-02-27 16:01:13 UTC | Eastern Time (US & Canada) | false | 25.90980 | -80.13726 | Chloeia viridis | Animalia | 2015-02-27 11:01:00 | 2015 | 02 | 11 | |
| Eastern Time (US & Canada) | false | 25.90982 | -80.13721 | Lytechinus variegatus | Variegated Sea Urchin | Animalia | NA | NA | NA | NA | |
| Eastern Time (US & Canada) | false | 25.90981 | -80.13720 | Echinometra lucunter | Atlantic Rock-boring Urchin | Animalia | NA | NA | NA | NA | |
| Eastern Time (US & Canada) | false | 25.90981 | -80.13721 | Amphinomidae | Bristle Worms | Animalia | NA | NA | NA | NA |
Let’s save a copy of this data frame and call it BBC_timedata — in case we want to reference these data later.
Since this data frame is centered around time, we will also remove any rows where there was no time of observation, and therefore local time, recorded.
Finally, we will save BBC_timedata to our working directory as a .csv named BBC_timedata.csv.
BBC_timedata<-select(BBC_data,"local_time" ,"year", "month", "hour", "iconic_taxon_name", "common_name")#select the columns we want to keep
BBC_timedata<-BBC_timedata[!is.na(BBC_timedata$local_time),] #remove all rows where there is no local time
write.csv(BBC_timedata,"BBC_timedata.csv", row.names = FALSE) #Save a copy of BBC_diversity data frame as a .csv
Notice how rows 1, 4-9 are now excluded from the data frame:
| local_time | year | month | hour | iconic_taxon_name | common_name | |
|---|---|---|---|---|---|---|
| 2 | 2015-05-19 15:00:00 | 2015 | 05 | 15 | Aves | Egyptian Goose |
| 3 | 2015-02-27 11:01:00 | 2015 | 02 | 11 | Animalia | |
| 10 | 2015-06-01 17:30:00 | 2015 | 06 | 17 | Insecta | Florida Tussock Moth |
| 11 | 2015-09-02 07:59:00 | 2015 | 09 | 07 | Arachnida | |
| 12 | 2015-09-01 14:50:00 | 2015 | 09 | 14 | Reptilia | Green Anole |
| 13 | 2015-09-22 09:16:00 | 2015 | 09 | 09 | Aves | Egyptian Goose |
Biodiversity Measurements and Indices:
Summary table of biodiversity measurements and indices:
| Measurement/Index | Formula |
|---|---|
| Species Richness (S) | \(S =\) number of species |
| Shannon-Wiener Diversity Index (H) | \(H = -\sum{(p_i * ln(p_i))}\) |
| Species Eveness (E) | \(S = \frac{H}{H_{MAX}}\) |
| Effective Number of Species (ENS) | \(ENS = e^H\) |
Other parameters and definitions needed to calculate the items above:
| Metric | Description | Formula |
|---|---|---|
| Total Abundance (NTOTAL) | Total number of individuals across all species | |
| ni | Total number of individuals of a species, \(x\) | |
| pi | Proportion of individuals of a species, \(x\), compared to \(N_{TOTAL}\) | \(p_i = \frac{n_i}{N_{TOTAL}}\) |
The chunk of code below might look intimidating, but its OK! Remember, we don’t expect you to know the complete ins and outs of all the “behind the scenes” work that is happening in the code. You can follow along with what is happening in each step by reading the notes that follow the #.
BBC_spp_rich <- BBC_data %>%
group_by(iconic_taxon_name) %>%
summarise(S = length(unique(common_name)))#Creates column with each unique taxon ID with the corresponding species richness of each taxon
BBC_sum_taxoN <- BBC_data %>%
group_by(iconic_taxon_name) %>%
summarise(Total_Abundance_per_Taxon = n()) #Creates a column with each unique taxon ID with the corresponding total abundance (total observations) for each taxon
BBC_var<-BBC_spp_rich %>%
inner_join(., BBC_sum_taxoN) #merges the previous two data frames by the list of unique taxon
#"." after inner_join is a place holder that denotes it is the same data frame that is listed above (short hand)
BBC_spp_abundance <- BBC_data %>%
group_by(iconic_taxon_name, common_name) %>%
summarise(Species_Abundance = n()) #Creates a data frame with the complete list of common names and their summed abundances
BBC_biod<-inner_join(BBC_spp_abundance,BBC_var)#Join BBC_spp_abundance with BBC_var
BBC_biod$Pi<-(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon)#Calculate Pi
BBC_biod$lnPi<-(log(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon))#Calculate ln(pi)
BBC_biod$PixlnPi<-(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon*log(BBC_biod$Species_Abundance/BBC_biod$Total_Abundance_per_Taxon))#Calculate Pi*ln(Pi)
BBC_diversity<-as.data.frame(BBC_var)#Make new data frame for final numbers
BBC_H <- BBC_biod %>%
group_by(iconic_taxon_name) %>%
summarise(H=(sum(PixlnPi))*(-1))#Calculate H
BBC_diversity<-inner_join(BBC_diversity,BBC_H)#Append H to BBC_diversity
BBC_Hmax <- BBC_biod %>%
group_by(iconic_taxon_name) %>%
summarise(Hmax=log(S))#Calculate Hmax
BBC_diversity<-inner_join(BBC_diversity,distinct(BBC_Hmax,))#Append Hmax to BBC_diversity
BBC_diversity$E<-(BBC_diversity$H)/(BBC_diversity$Hmax)#Calculated E and append it to BBC_diversity
BBC_diversity$ENS<-exp(BBC_diversity$H)#Calculated ENS and append it to BBC_diversity
BBC_diversity data frame:
| iconic_taxon_name | S | Total_Abundance_per_Taxon | H | Hmax | E | ENS |
|---|---|---|---|---|---|---|
| 1 | 9 | 0.0000000 | 0.000000 | NaN | 1.000000 | |
| Actinopterygii | 37 | 152 | 2.8862478 | 3.610918 | 0.7993114 | 17.925921 |
| Amphibia | 4 | 8 | 1.2130076 | 1.386294 | 0.8750000 | 3.363586 |
| Animalia | 29 | 68 | 2.9665579 | 3.367296 | 0.8809912 | 19.424943 |
| Arachnida | 12 | 28 | 2.1121612 | 2.484907 | 0.8499962 | 8.266087 |
| Aves | 40 | 175 | 2.9357252 | 3.688880 | 0.7958312 | 18.835158 |
| Chromista | 3 | 15 | 0.9701158 | 1.098612 | 0.8830374 | 2.638250 |
| Fungi | 11 | 19 | 2.3056573 | 2.397895 | 0.9615338 | 10.030770 |
| Insecta | 79 | 189 | 3.8015479 | 4.369448 | 0.8700293 | 44.770430 |
| Mammalia | 5 | 113 | 0.7541642 | 1.609438 | 0.4685886 | 2.125834 |
| Mollusca | 16 | 48 | 2.2323441 | 2.772589 | 0.8051479 | 9.321692 |
| Plantae | 124 | 617 | 3.9307578 | 4.820282 | 0.8154623 | 50.945569 |
| Reptilia | 10 | 200 | 1.5391800 | 2.302585 | 0.6684574 | 4.660767 |
Great! Now lets save a copy of data frame BBC_diversity as a .csv.
write.csv(BBC_diversity,"BBC_diversity.csv", row.names = FALSE)
During that big chunk of code, we created another data frame: BBC_spp_abundance. This data frame is similar to the pivot table we created earlier and was created by the code:
BBC_spp_abundance <- BBC_data %>%
group_by(iconic_taxon_name, common_name) %>%
summarise(Species_Abundance = n())
First 6 rows of data frame BBC_spp_abundance:
| iconic_taxon_name | common_name | Species_Abundance |
|---|---|---|
| 9 | ||
| Actinopterygii | 20 | |
| Actinopterygii | Atlantic Flyingfish | 1 |
| Actinopterygii | Atlantic Needlefish | 6 |
| Actinopterygii | Bandtail Puffer | 9 |
| Actinopterygii | Barracudas | 2 |
Let’s save a copy of data frame BBC_spp_abundance as a .csv, also!
write.csv(BBC_spp_abundance,"BBC_spp_abundance.csv", row.names = FALSE)
If you would like to make a similar data frame and .csv file for BBC as we did for MMC to use in Part 3: Mapping and Visualizing Data in Google Earth you may run the two lines of code below. It will create a data frame called BBC_latlon_year and save it as "BBC_latlon_year.csv" in our working directory.
# Make a data frame with only latitude, longitude, year, and iconic_taxon_name
BBC_latlon_year <- BBC_data %>% select(latitude,longitude, year, iconic_taxon_name)
#Save a copy of BBC_latlon_year as a .csv
write.csv(BBC_latlon_year,"BBC_latlon_year.csv", row.names = FALSE)
First 6 rows of data frame BBC_latlon_year:
| latitude | longitude | year | iconic_taxon_name |
|---|---|---|---|
| 25.90932 | -80.13774 | NA | Plantae |
| 25.90927 | -80.13847 | 2015 | Aves |
| 25.90980 | -80.13726 | 2015 | Animalia |
| 25.90982 | -80.13721 | NA | Animalia |
| 25.90981 | -80.13720 | NA | Animalia |
| 25.90981 | -80.13721 | NA | Animalia |
If you don’t want to worry about this, place a # in front of the two lines of code as displayed below:
# Make a data frame with only latitude, longitude, year, and iconic_taxon_name
#BBC_latlon_year <- BBC_data %>% select(latitude,longitude, year, iconic_taxon_name)
#Save a copy of BBC_latlon_year as a .csv
#write.csv(BBC_latlon_year,"BBC_latlon_year.csv", row.names = FALSE)
We will now be working from our statistics script.
→ If you haven’t already, open the R script file “lab1-stats.R” with RStudio.
Remember, we are now working with a new script, so we want clear the memory from our Global Environment Pane. Also, we have some new packages to install: ggplot2, ggpubr, ggsci, devtools
# Clear memory
rm(list = ls())
# Install packages (only if needed-- once a package is installed, you don't need to re-install it)
#install.packages("pivottabler")
#install.packages("openxlsx")
#install.packages("dplyr")
install.packages("ggplot2")
install.packages("ggpubr")
install.packages ("ggsci")
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
# Load your needed libraries (this DOES need to be done each session)
library("pivottabler")
library("openxlsx")
library("dplyr")
library("ggplot2")
library("ggpubr")
library("ggsci")
#Setting the working directory
#First, set your directory using the keyboard shortcut: [Ctrl+Shift+H], select your designated folder
#Check your working directory
getwd()
We want to add a “Habitat” column to each data frame, so we don’t forget where each observation was made once they’re combined. Then, we are going to tell R to populate our Habitat column with each data frame’s respective habitat, “BBC” or “MMC”.
#Read in data#
#MMC#
MMC_diversity<-read.csv("MMC_diversity.csv")
MMC_spp_abundance<-read.csv("MMC_spp_abundance.csv")
MMC_timedata<-read.csv("MMC_timedata.csv")
#BBC#
BBC_diversity<-read.csv("BBC_diversity.csv")
BBC_spp_abundance<-read.csv("BBC_spp_abundance.csv")
BBC_timedata<-read.csv("BBC_timedata.csv")
##### Add Habitat to each data frame ####
MMC_diversity$Habitat<-"MMC"
MMC_spp_abundance$Habitat<-"MMC"
MMC_timedata$Habitat<-"MMC"
BBC_diversity$Habitat<-"BBC"
BBC_spp_abundance$Habitat<-"BBC"
BBC_timedata$Habitat<-"BBC"
# Save a copy of BOTH_diversity data frame as a .csv
write.csv(BOTH_diversity,"BOTH_diversity.csv", row.names = FALSE)
# Save a copy of BOTH_spp_abundance data frame as a .csv
write.csv(BOTH_spp_abundance,"BOTH_spp_abundance.csv", row.names = FALSE)
# Save a copy of BOTH_timedata data frame as a .csv
write.csv(BOTH_timedata,"BOTH_timedata.csv", row.names = FALSE)
#### BBC_diversity & MMC_diversity --> BOTH_diversity ####
BOTH_diversity<- rbind(BBC_diversity,MMC_diversity)#vertically combine df's
#if there are any blank cells under taxon column, change them to NAs
BOTH_diversity$iconic_taxon_name[BOTH_diversity$iconic_taxon_name==""]<-NA
BOTH_diversity<- na.omit(BOTH_diversity)#remove any NAs
summary(BOTH_diversity)#check summary --> any NAs left?
str(BOTH_diversity)#check structure
BOTH_diversity$iconic_taxon_name <- as.factor(BOTH_diversity$iconic_taxon_name) #change to a factor
BOTH_diversity$Habitat <- as.factor(BOTH_diversity$Habitat) #change to factor
BOTH_diversity <- BOTH_diversity[, c("Habitat","iconic_taxon_name", "Total_Abundance_per_Taxon", "S","H", "Hmax","E", "ENS")]
BOTH_diversity <- BOTH_diversity %>% #Filter and sort data -> sorts by duplicated iconic taxon name
filter(duplicated(iconic_taxon_name)|
duplicated(iconic_taxon_name, fromLast=TRUE))%>% #removes rows that don't have a paired taxon name
arrange(Habitat,iconic_taxon_name) #arranges data frame by habitat, then iconic name
| Habitat | iconic_taxon_name | Total_Abundance_per_Taxon | S | H | Hmax | E | ENS |
|---|---|---|---|---|---|---|---|
| BBC | Actinopterygii | 152 | 37 | 2.886248 | 3.610918 | 0.7993114 | 17.925921 |
| BBC | Amphibia | 8 | 4 | 1.213008 | 1.386294 | 0.8750000 | 3.363586 |
| BBC | Animalia | 68 | 29 | 2.966558 | 3.367296 | 0.8809912 | 19.424943 |
| BBC | Arachnida | 28 | 12 | 2.112161 | 2.484907 | 0.8499962 | 8.266087 |
| BBC | Aves | 175 | 40 | 2.935725 | 3.688880 | 0.7958312 | 18.835158 |
| BBC | Fungi | 19 | 11 | 2.305657 | 2.397895 | 0.9615338 | 10.030770 |
#### BBC_spp_abundance & MMC_spp_abundance --> BOTH_spp_abundance ####
BOTH_spp_abundance<- rbind(BBC_spp_abundance,MMC_spp_abundance)#vertically combine df's
#if there are any blank cells under taxon column, change them to NAs
BOTH_spp_abundance$iconic_taxon_name[BOTH_spp_abundance$iconic_taxon_name==""]<-NA
#if there are any blank cell sunder common_name column, change them to "unspecified"
BOTH_spp_abundance$common_name[BOTH_spp_abundance$common_name==""]<-"unspecified"
BOTH_spp_abundance<- na.omit(BOTH_spp_abundance)#remove any NAs
summary(BOTH_spp_abundance)#check summary --> any NAs left?
## iconic_taxon_name common_name Species_Abundance Habitat
## Length:860 Length:860 Min. : 1.000 Length:860
## Class :character Class :character 1st Qu.: 1.000 Class :character
## Mode :character Mode :character Median : 1.000 Mode :character
## Mean : 3.673
## 3rd Qu.: 3.000
## Max. :105.000
str(BOTH_spp_abundance)#check structure
## 'data.frame': 860 obs. of 4 variables:
## $ iconic_taxon_name: chr "Actinopterygii" "Actinopterygii" "Actinopterygii" "Actinopterygii" ...
## $ common_name : chr "unspecified" "Atlantic Flyingfish" "Atlantic Needlefish" "Bandtail Puffer" ...
## $ Species_Abundance: int 20 1 6 9 2 1 1 1 34 1 ...
## $ Habitat : chr "BBC" "BBC" "BBC" "BBC" ...
## - attr(*, "na.action")= 'omit' Named int [1:2] 1 372
## ..- attr(*, "names")= chr [1:2] "1" "372"
BOTH_spp_abundance$iconic_taxon_name<-as.factor(BOTH_spp_abundance$iconic_taxon_name)#change to a factor
BOTH_spp_abundance$Habitat<-as.factor(BOTH_spp_abundance$Habitat)#change to a factor
BOTH_spp_abundance$common_name<-as.factor(BOTH_spp_abundance$common_name)#change to a factor
BOTH_spp_abundance_filt <-BOTH_spp_abundance %>% #Filter and sort data
filter(duplicated(common_name)| #filters by duplicated common names and then removes any rows
duplicated(common_name, fromLast=TRUE))%>% #where there are no paired common names
arrange(Habitat,common_name) #data frame is then arranged by habitat and common name
BOTH_spp_abundance<-(BOTH_spp_abundance_filt #removes common names listed as "unspecified
[BOTH_spp_abundance_filt$common_name !="unspecified",]) # because of overlap in unidentified common names across
# iconic taxon names, we have to remove these lines un order
# to run our paired analyses between each campus
| iconic_taxon_name | common_name | Species_Abundance | Habitat | |
|---|---|---|---|---|
| 1 | Plantae | American beautyberry | 2 | BBC |
| 142 | Plantae | American beautyberry | 5 | MMC |
| 2 | Aves | American Coot | 1 | BBC |
| 143 | Aves | American Coot | 1 | MMC |
| 3 | Animalia | Animals | 4 | BBC |
| 144 | Animalia | Animals | 1 | MMC |
#### BBC_timedata & MMC_timedata --> BOTH_timedata ####
BOTH_timedata<- rbind(BBC_timedata,MMC_timedata)#vertically combine df's
#if there are any blank cells under taxon column, change them to NAs
BOTH_timedata$iconic_taxon_name[BOTH_timedata$iconic_taxon_name==""]<-NA
#if there are any blank cells under common_name column, change them to "unspecified"
BOTH_timedata$common_name[BOTH_timedata$common_name==""]<-"unspecified"
BOTH_timedata<- na.omit(BOTH_timedata)#remove any NAs
summary(BOTH_timedata)#check summary --> any NAs left?
## local_time year month hour
## Length:3105 Min. :2010 Min. : 1.000 Min. : 0.00
## Class :character 1st Qu.:2018 1st Qu.: 4.000 1st Qu.:11.00
## Mode :character Median :2021 Median : 5.000 Median :13.00
## Mean :2020 Mean : 6.121 Mean :13.34
## 3rd Qu.:2021 3rd Qu.: 9.000 3rd Qu.:16.00
## Max. :2022 Max. :12.000 Max. :23.00
## iconic_taxon_name common_name Habitat
## Length:3105 Length:3105 Length:3105
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
str(BOTH_timedata)#check structure
## 'data.frame': 3105 obs. of 7 variables:
## $ local_time : chr "2015-05-19 15:00:00" "2015-02-27 11:01:00" "2015-06-01 17:30:00" "2015-09-02 07:59:00" ...
## $ year : int 2015 2015 2015 2015 2015 2015 2015 2016 2016 2017 ...
## $ month : int 5 2 6 9 9 9 12 8 10 3 ...
## $ hour : int 15 11 17 7 14 9 7 10 17 8 ...
## $ iconic_taxon_name: chr "Aves" "Animalia" "Insecta" "Arachnida" ...
## $ common_name : chr "Egyptian Goose" "unspecified" "Florida Tussock Moth" "unspecified" ...
## $ Habitat : chr "BBC" "BBC" "BBC" "BBC" ...
## - attr(*, "na.action")= 'omit' Named int [1:15] 740 810 832 859 1040 1051 1133 1215 1553 2232 ...
## ..- attr(*, "names")= chr [1:15] "740" "810" "832" "859" ...
BOTH_timedata$iconic_taxon_name<-as.factor(BOTH_timedata$iconic_taxon_name)#change to a factor
BOTH_timedata$Habitat<-as.factor(BOTH_timedata$Habitat)#change to a factor
BOTH_timedata$common_name<-as.factor(BOTH_timedata$common_name)#change to a factor
BOTH_timedata$year<-as.factor(BOTH_timedata$year)#change to a factor
BOTH_timedata$month<-as.factor(BOTH_timedata$month)#change to a factor
BOTH_timedata$hour<-as.factor(BOTH_timedata$hour)#change to a factor
BOTH_timedata$local_time<-as.POSIXct(BOTH_timedata$local_time,format="%Y-%m-%d %H:%M",tz='US/Eastern')
#put back in posixct
BOTH_timedata <- BOTH_timedata[, c("Habitat","iconic_taxon_name", "common_name", "year","month", "hour","local_time")]
#timedata will be sorted later and is dependent on selection of variables
| Habitat | iconic_taxon_name | common_name | year | month | hour | local_time |
|---|---|---|---|---|---|---|
| BBC | Aves | Egyptian Goose | 2015 | 5 | 15 | 2015-05-19 15:00:00 |
| BBC | Animalia | unspecified | 2015 | 2 | 11 | 2015-02-27 11:01:00 |
| BBC | Insecta | Florida Tussock Moth | 2015 | 6 | 17 | 2015-06-01 17:30:00 |
| BBC | Arachnida | unspecified | 2015 | 9 | 7 | 2015-09-02 07:59:00 |
| BBC | Reptilia | Green Anole | 2015 | 9 | 14 | 2015-09-01 14:50:00 |
| BBC | Aves | Egyptian Goose | 2015 | 9 | 9 | 2015-09-22 09:16:00 |
The following
.csvfiles should now be in your working directory:
- BOTH_diversity.csv
- BOTH_spp_abundance.csv
- BOTH_timedata.csv
Now that our data has been cleaned and organized, it’s time to choose the variables that you and your group would like to investigate!
Return to the Lab 2 task sheet and follow the instructions for step
Summary table of all variables and finalized data frames
For the remainder of this document, instructional steps will be broken down into your selected data frames. Each following section will have a tab labeled “BOTH_diversity”, “BOTH_spp_abundance”, of “BOTH_timedata”. There, you will find specific code and instructions for the data frame and variables that you have chosen.
When selecting your variables, they must all be within one data frame. This framework was not designed to make data comparisons between data frames. For example, you cannot compare Effective Number of Species values from BOTH_diversity to variables in data frames BOTH_spp_abundance or BOTH_timedata.
Although the main independent variable used for analysis will be Habitat, you are encouraged to select others in order to build a stronger, more thorough image of the data. These additional variables will be applied in the “Descriptive Statistics and Visualizing Data” section. Due to the level of this course, we will not be looking at the interactions of multiple independent variables on our response variables.
Due to the simplicity of the coding provided, comparisons are only be able to be made where there is overlap in variables between campuses. For example, when comparing Total_abundance_per_Taxon in the BOTH_diversity data frame, these only include taxon groups that were present at both habitats. Since (at the time of the creation of this script), “Chromista” were only observed at BBC and not MMC, the several observations of Chromista from BBC were removed from the data and were excluded from the example calculations. The R script will automatically update with each newly uploaded csv file exported from iNaturalist.
Be sure to record your variables on your data menus!
Habitat and iconic_taxon_name are the only independent variable options in this data frame; however, there are several dependent variables to choose from.
Keep in mind that you are only to choose one dependent variable to observe. The dependent variables in this data frame were calculated per taxon group, not as a whole for each habitat.
Remember, only taxon groups that were present in BOTH habitats were included in this data frame.
This data frame only has one dependent variable to choose from, but it is the only data frame where you can make comparisons at the species/common name level.
Something to keep in mind during your investigation: only the species/common names that occurred at BOTH campuses were included in this data frame.
For the time series data, choose only one resolution of time to investigate: year, month, or hour. These data will be broken down into independent data frames with their own series of code and instructions; however, they will continue to be nested under the “BOTH_timedata” tab.
The species richness (S) in this data frame is species richness as it should be measured— the total count of species observed aross the habitat. This varies from how species richness is reported in “BOTH_diversity”, where it is reported as the species count per iconic taxon name.
Remember, only time observations that were made at both campuses were included in this data frame. (e.g. For the “hour” time frame, if BBC had a 1:00 AM observation, but MMC did not, that observation was excluded.)
We can quickly quantitatively and visually describe the data using just a couple lines of code. First, let’s run some quick descriptive statics on the data frame that your group has chosen.
Descriptive Statistics:
The first function we will apply to the data frame is summary(). This function will provide the mean, median, 25th and 75th quantiles, min and max of the specified data frame.
Below is an example with the data frame “iris”. This is a test data frame included in the base package of R indented for use as practice data to explore how function of various packages work. “iris” is a famous data set that gives the measurements in cm of the variables (i) sepal length, (ii) sepal width, (iii) petal length, and (iv) petal width from 50 flowers across 3 species of iris: Iris setosa, Iris versicolor, and Iris virginica. The first 6 rows of this data frame are printed below.
Try it in on script! Your code is also set up to save the output of the summary as “summary_df”. Running the line of code below will save the output summary as a .csv file in your working directory
# Save a copy of your summary output as a .csv
write.csv(summary_df,"summary_df.csv", row.names = FALSE)
After you have attempted to run the code for your data frame, click on the corresponding data frame tab see how your output compares. Keep in mind that your numbers may vary since the data used in this walk-through were exported from iNaturalist in July 2021.
First six rows of df “iris”
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
Below is the summary output for data frame “iris”:
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
Below is the summary output for data frame “BOTH_diversity”:
summary(BOTH_diversity)
## Habitat iconic_taxon_name Total_Abundance_per_Taxon S
## BBC:11 Actinopterygii: 2 Min. : 4.00 Min. : 2.00
## MMC:11 Amphibia : 2 1st Qu.: 20.75 1st Qu.: 7.75
## Animalia : 2 Median : 71.50 Median : 12.50
## Arachnida : 2 Mean :142.91 Mean : 38.95
## Aves : 2 3rd Qu.:169.25 3rd Qu.: 39.25
## Fungi : 2 Max. :667.00 Max. :191.00
## (Other) :10
## H Hmax E ENS
## Min. :0.5623 Min. :0.6931 Min. :0.4686 Min. : 1.755
## 1st Qu.:1.6094 1st Qu.:2.0351 1st Qu.:0.7997 1st Qu.: 5.039
## Median :2.2454 Median :2.5249 Median :0.8327 Median : 9.445
## Mean :2.4335 Mean :2.8985 Mean :0.8305 Mean : 21.230
## 3rd Qu.:3.0067 3rd Qu.:3.6694 3rd Qu.:0.8941 3rd Qu.: 20.225
## Max. :4.6527 Max. :5.2523 Max. :0.9615 Max. :104.871
##
Below is the summary output for data frame “BOTH_spp_abundance”:
summary(BOTH_spp_abundance)
## iconic_taxon_name common_name Species_Abundance Habitat
## Plantae :124 American beautyberry: 2 Min. : 1.000 BBC:133
## Insecta : 66 American Coot : 2 1st Qu.: 1.000 MMC:133
## Aves : 28 Animals : 2 Median : 2.000
## Reptilia: 14 Anoles : 2 Mean : 5.459
## Fungi : 10 Asian Lady Beetle : 2 3rd Qu.: 5.000
## Amphibia: 6 Atala : 2 Max. :88.000
## (Other) : 18 (Other) :254
Below is the summary output for data frame “BOTH_timedata”:
summary(BOTH_timedata)
## Habitat iconic_taxon_name common_name year
## BBC:1605 Plantae :1268 unspecified : 244 2021 :1707
## MMC:1500 Insecta : 609 Brown Anole : 103 2018 : 525
## Aves : 322 dicots : 95 2019 : 326
## Reptilia : 297 Eastern Gray Squirrel: 91 2017 : 316
## Actinopterygii: 159 Green Iguana : 84 2020 : 160
## Mammalia : 130 red mangrove : 57 2016 : 47
## (Other) : 320 (Other) :2431 (Other): 24
## month hour local_time
## 4 :841 13 : 480 Min. :2010-05-28 21:06:00
## 5 :503 14 : 396 1st Qu.:2018-07-02 16:50:00
## 10 :484 10 : 366 Median :2021-02-27 14:13:00
## 2 :254 15 : 246 Mean :2020-03-11 22:34:42
## 9 :222 12 : 235 3rd Qu.:2021-05-26 15:16:00
## 11 :171 11 : 234 Max. :2022-01-07 11:55:00
## (Other):630 (Other):1148
sapply() is another function that you can use for your data frame to observe the mean, standard deviation (sd), variance (var), min, max, median, range, and quantiles of the data frame.
However, with this function, you must specify which descriptive statistic you want R to calculate. See the example below from the iris data frame. The sapply() function does not work with non-numeric data, and therefor we have to specially tell R to exclude those columns from the calculations or we will receive an error message and the function will be halted. Additionally, the sapply() function is made to work with multiple columns. If you have only one column of numerical data, you can just apply the descriptive statistic function directly to that column. For example, mean(iris$Sepal.Length) or mean(iris[,1]). sapply() will not work with a single column.
[ , ] after the data frame name signifies to R that you are requesting a subset of that data frame. Values entered in the bracket preceding the comma represent rows, values entered in the bracket after the comma represent columns. Numbers are used to relay which you are requesting. The top row of a data frame (excluding column names) is 1, the first column will be the left-most column. Positive numbers mean that you want those rows or columns to be included, while negative numbers mean those are the rows or columns you want to be excluded. If the space is left blank in a bracket, nothing is done. In the example below, we want to exclude column 5, the species name column, since it is not numeric data.
To run this code with your data, you will need to specify which columns R should exclude. To exclude multiple rows use the format df[,-c(#:#)], where df is the name of your data frame, “-c” signifies a range of columns you wish to be excluded from a range of “#” to “#”. For example, if you wanted to exclude columns 1, 3, and 5-7 of a data frame “df”, it would be df[,-c(1,3,5:7)]. You could also write it in terms of which rows you wish to keep. If our make believe data frame, df, had 8 columns, this could also be written df[,c(2,4,8)].
After you have attempted to run your script, check the tabs to see how your outputs compare.
Below is the summary output for data frame “iris”:
sapply(iris[,-5], mean, na.rm=TRUE)
sapply(iris[,-5], sd, na.rm=TRUE)
sapply(iris[,-5], var, na.rm=TRUE)
sapply(iris[,-5], min, na.rm=TRUE)
sapply(iris[,-5], max, na.rm=TRUE)
sapply(iris[,-5], median, na.rm=TRUE)
sapply(iris[,-5], range, na.rm=TRUE)
sapply(iris[,-5], quantile, na.rm=TRUE)
sapply(iris[,-5], mean, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 5.843333 3.057333 3.758000 1.199333
sapply(iris[,-5], sd, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 0.8280661 0.4358663 1.7652982 0.7622377
sapply(iris[,-5], var, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 0.6856935 0.1899794 3.1162779 0.5810063
sapply(iris[,-5], min, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 4.3 2.0 1.0 0.1
sapply(iris[,-5], max, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 7.9 4.4 6.9 2.5
sapply(iris[,-5], median, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 5.80 3.00 4.35 1.30
sapply(iris[,-5], range, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## [1,] 4.3 2.0 1.0 0.1
## [2,] 7.9 4.4 6.9 2.5
sapply(iris[,-5], quantile, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 0% 4.3 2.0 1.00 0.1
## 25% 5.1 2.8 1.60 0.3
## 50% 5.8 3.0 4.35 1.3
## 75% 6.4 3.3 5.10 1.8
## 100% 7.9 4.4 6.90 2.5
Below is the summary output for data frame “BOTH_diversity”:
For this data frame, columns 1 and 2 need to be excluded.
sapply(BOTH_diversity[,-c(1:2)], mean, na.rm=TRUE)
sapply(BOTH_diversity[,-c(1:2)], sd, na.rm=TRUE)
sapply(BOTH_diversity[,-c(1:2)], var, na.rm=TRUE)
sapply(BOTH_diversity[,-c(1:2)], min, na.rm=TRUE)
sapply(BOTH_diversity[,-c(1:2)], max, na.rm=TRUE)
sapply(BOTH_diversity[,-c(1:2)], range, na.rm=TRUE)
sapply(BOTH_diversity[,-c(1:2)], quantile, na.rm=TRUE)
sapply(BOTH_diversity[,-c(1:2)], mean, na.rm=TRUE)
## Total_Abundance_per_Taxon S H
## 142.909091 38.954545 2.433460
## Hmax E ENS
## 2.898452 0.830467 21.230337
sapply(BOTH_diversity[,-c(1:2)], sd, na.rm=TRUE)
## Total_Abundance_per_Taxon S H
## 189.4784594 51.7930656 1.1466292
## Hmax E ENS
## 1.2754362 0.1065721 25.9064995
sapply(BOTH_diversity[,-c(1:2)], var, na.rm=TRUE)
## Total_Abundance_per_Taxon S H
## 3.590209e+04 2.682522e+03 1.314759e+00
## Hmax E ENS
## 1.626737e+00 1.135761e-02 6.711467e+02
sapply(BOTH_diversity[,-c(1:2)], min, na.rm=TRUE)
## Total_Abundance_per_Taxon S H
## 4.0000000 2.0000000 0.5623351
## Hmax E ENS
## 0.6931472 0.4685886 1.7547654
sapply(BOTH_diversity[,-c(1:2)], max, na.rm=TRUE)
## Total_Abundance_per_Taxon S H
## 667.0000000 191.0000000 4.6527333
## Hmax E ENS
## 5.2522734 0.9615338 104.8712430
sapply(BOTH_diversity[,-c(1:2)], range, na.rm=TRUE)
## Total_Abundance_per_Taxon S H Hmax E ENS
## [1,] 4 2 0.5623351 0.6931472 0.4685886 1.754765
## [2,] 667 191 4.6527333 5.2522734 0.9615338 104.871243
sapply(BOTH_diversity[,-c(1:2)], quantile, na.rm=TRUE)
## Total_Abundance_per_Taxon S H Hmax E ENS
## 0% 4.00 2.00 0.5623351 0.6931472 0.4685886 1.754765
## 25% 20.75 7.75 1.6094040 2.0350789 0.7996905 5.038657
## 50% 71.50 12.50 2.2454319 2.5249280 0.8327292 9.445303
## 75% 169.25 39.25 3.0066713 3.6693891 0.8941188 20.225355
## 100% 667.00 191.00 4.6527333 5.2522734 0.9615338 104.871243
Below is the summary output for data frame “BOTH_spp_abundance”:
Since BOTH_spp_abundance only has one numeric column, we must adjust how we write the command, since “apply” functions are made to work on muliple columns. Insead we can simply use just the functions mean(), sd(), var(), etc.:
mean(BOTH_spp_abundance[,3])
sd(BOTH_spp_abundance[,3])
var(BOTH_spp_abundance[,3])
min(BOTH_spp_abundance[,3])
max(BOTH_spp_abundance[,3])
median(BOTH_spp_abundance[,3])
range(BOTH_spp_abundance[,3])
quantile(BOTH_spp_abundance[,3])
mean(BOTH_spp_abundance[,3])
## [1] 5.458647
sd(BOTH_spp_abundance[,3])
## [1] 10.543
var(BOTH_spp_abundance[,3])
## [1] 111.1549
min(BOTH_spp_abundance[,3])
## [1] 1
max(BOTH_spp_abundance[,3])
## [1] 88
median(BOTH_spp_abundance[,3])
## [1] 2
range(BOTH_spp_abundance[,3])
## [1] 1 88
quantile(BOTH_spp_abundance[,3])
## 0% 25% 50% 75% 100%
## 1 1 2 5 88
Below is the summary output for data frame “BOTH_timedata”:
Although BOTH_timedata has numerical data, they are stored in our data frame as “factors” or “POSIXct”(time). You can check the class of a column in a data frame with class(df$), where df is the name of the data frame and you would type the name of the column following the $, for example: class(BOTH_timedata$year).
If we wanted to, we could change the class of the columns “year”, “month”, and “hour” to a numerical class by using as.numberic() or as.integer(). We could use the sapply function to change the class of all 3 of the columns at once with sapply(BOTH_timedata[,c(4:6)]), as.integer) or for an individual column with as.integer(BOTH_timedata$) with the name of the selected column entered after the $.
So we don’t mess up our data for later applications, lets run the sapply function, but we will tell it to store the changes in a new data frame called “test_timedata”. This will create a matrix with only 3 columns: year, month, and hour. To use sapply(), we will first need to turn “test_timedata” back into a data frame with as.data.frame(). Note: if we wanted to change the class of columns in our data frame, and keep them in the same data frame without creating a new matrix, we would have had to save it with the exact same input information, for example: BOTH_timedata[,c(4:6)])<-sapply(BOTH_timedata[,c(4:6)]), as.integer).
class(BOTH_timedata$year)
## [1] "factor"
test_timedata<-sapply(BOTH_timedata[,c(4:6)], as.integer)
head(test_timedata)
## year month hour
## [1,] 3 5 14
## [2,] 3 2 10
## [3,] 3 6 16
## [4,] 3 9 6
## [5,] 3 9 13
## [6,] 3 9 8
class(test_timedata)
## [1] "matrix" "array"
test_timedata<-as.data.frame(test_timedata)
class(test_timedata)
## [1] "data.frame"
class(test_timedata$year)
## [1] "integer"
class(test_timedata$month)
## [1] "integer"
class(test_timedata$hour)
## [1] "integer"
sapply(test_timedata, mean, na.rm=TRUE)
sapply(test_timedata, sd, na.rm=TRUE)
sapply(test_timedata, var, na.rm=TRUE)
sapply(test_timedata, min, na.rm=TRUE)
sapply(test_timedata, max, na.rm=TRUE)
sapply(test_timedata, median, na.rm=TRUE)
sapply(test_timedata, range, na.rm=TRUE)
sapply(test_timedata, quantile, na.rm=TRUE)
sapply(test_timedata, mean, na.rm=TRUE)
## year month hour
## 7.720451 6.121095 12.346860
sapply(test_timedata, sd, na.rm=TRUE)
## year month hour
## 1.601923 2.993252 3.275546
sapply(test_timedata, var, na.rm=TRUE)
## year month hour
## 2.566157 8.959558 10.729198
sapply(test_timedata, min, na.rm=TRUE)
## year month hour
## 1 1 1
sapply(test_timedata, max, na.rm=TRUE)
## year month hour
## 10 12 22
sapply(test_timedata, median, na.rm=TRUE)
## year month hour
## 9 5 12
sapply(test_timedata, range, na.rm=TRUE)
## year month hour
## [1,] 1 1 1
## [2,] 10 12 22
sapply(test_timedata, quantile, na.rm=TRUE)
## year month hour
## 0% 1 1 1
## 25% 6 4 10
## 50% 9 5 12
## 75% 9 9 15
## 100% 10 12 22
Visualizing Data:
Bar plots are figures that show the relationship between discrete numerical data and categorical data.
Below is an example from the data frame, “ToothGrowth”, which looks at the effect of Vitamin C on tooth growth in Guinea pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
First six rows of df “ToothGrowth”:
| len | supp | dose |
|---|---|---|
| 4.2 | VC | 0.5 |
| 11.5 | VC | 0.5 |
| 7.3 | VC | 0.5 |
| 5.8 | VC | 0.5 |
| 6.4 | VC | 0.5 |
| 10.0 | VC | 0.5 |
First we make a copy df of ToothGrowth called “TG” and change “dose” from a numeric value to a factor using as.factor().
Then we plug our variables into the graphic function ggplot().
ggplot(df, aes(x=IV, y=DV,
fill=IV))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab(" ") +
ylab(" ") +
ggtitle(" ")
df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title
#First make diplicate of ToothGrowth df, then convert dose to factor
TG<- ToothGrowth
TG$dose<-as.factor(TG$dose)
ggplot(TG, aes(x=dose, y=len,
fill=supp))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Dose (mg)") +
ylab("Tooth length") +
ggtitle("The Effect of Vitamin C on Tooth Growth in Guinea Pigs")
#First make diplicate of ToothGrowth df, then convert dose to factor
TG<- ToothGrowth
TG$dose<-as.factor(TG$dose)
ggplot(TG, aes(x=dose, y=len,
fill=supp))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Dose (mg)") +
ylab("Tooth length") +
ggtitle("The Effect of Vitamin C on Tooth Growth in Guinea Pigs")
df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title
facet_wrap(~ ) Breaks the plot into smaller sections based on an additional IV
bar_df<- ggplot(df, aes(x=IV, y=DV,
fill=IV))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab(" ") +
ylab(" ") +
ggtitle(" ")+
facet_wrap(~ IV)
Example Barplot Code for BOTH_diversity
df = BOTH_diversity
x = Habitat
y = S
fill = Habitat
xlab = Habitat
ylab = Species Richness per Iconic Taxon Group
ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~ ) -> facet_wrap(~ iconic_taxon_name)
bar_BOTH_diversity<-ggplot(BOTH_diversity, aes(x=Habitat, y=S,
fill=Habitat))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Habitat") +
ylab("Species Richness per Iconic Taxon Group") +
ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")
Example Bar plot for BOTH_diversity
df = BOTH_diversity
x = Habitat
y = S
fill = Habitat
xlab = Habitat
ylab = Species Richness per Iconic Taxon Group
ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~) -> facet_wrap(~ iconic_taxon_name)
bar_BOTH_diversity<-ggplot(BOTH_diversity, aes(x=Habitat, y=S,
fill=Habitat))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Habitat") +
ylab("Species Richness per Iconic Taxon Group") +
ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")+
facet_wrap(~ iconic_taxon_name)
bar_BOTH_diversity
ggsave("bar_BOTH_diversity.png",
plot = bar_BOTH_diversity,
width = 30,
height = 20,
units = "cm")
df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title
facet_wrap(~ ) Breaks the plot into smaller sections based on an additional IV
bar_df<- ggplot(df, aes(x=IV, y=DV,
fill=IV))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab(" ") +
ylab(" ") +
ggtitle(" ")+
facet_wrap(~ IV)
Example Barplot Code for BOTH_spp_abundance
df = BOTH_spp_abundance
x = Habitat
y = S
fill = Habitat
xlab = Habitat
ylab = Species Richness per Iconic Taxon Group
ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~ ) -> facet_wrap(~ iconic_taxon_name)
bar_BOTH_spp_abundance<-gplot(BOTH_spp_abundance, aes(x=Habitat, y=S,
fill=Habitat))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Habitat") +
ylab("Species Richness per Iconic Taxon Group") +
ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")
Example Barplot for BOTH_spp_abundance
df = BOTH_spp_abundance
x = Habitat
y = Species_Abundance
fill = Habitat
xlab = Habitat
ylab = Number of Observations per Common Name ggtitle = The Effect of Habitat on Species Richness per Iconic Taxon Group
facet_wrap(~) -> facet_wrap(~ iconic_taxon_name)
bar_BOTH_spp_abundance<-ggplot(BOTH_spp_abundance, aes(x=Habitat, y=Species_Abundance,
fill=Habitat))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Habitat") +
ylab("Species Richness per Iconic Taxon Group") +
ggtitle("The Effect of Habitat on the Species Richness per Iconic Taxon Group")+
facet_wrap(~ iconic_taxon_name)
bar_BOTH_spp_abundance
ggsave("bar_BOTH_spp_abundance.png",
plot = bar_BOTH_spp_abundance,
width = 30,
height = 20,
units = "cm")
Before running code in this section, you need to first choose which time variable you would like to observe: year, month, hour.
year:
If you want to see how the species richness per habitat varies by year, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_year”
BOTH_year <- BOTH_timedata %>%
group_by(Habitat,year) %>%
summarise(S= length(unique(common_name)))
BOTH_year<-group_by(BOTH_year, year) %>%
filter(n() != 1) %>%
arrange(Habitat,year) %>%
ungroup()
month:
If you want to see how the species richness per habitat varies by month, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_month”
BOTH_month <- BOTH_timedata %>%
group_by(Habitat,month) %>%
summarise(S= length(unique(common_name)))
BOTH_month<-group_by(BOTH_month, month) %>%
filter(n() != 1) %>%
arrange(Habitat,month) %>%
ungroup()
hour:
If you want to see how the species richness per habitat varies by hour, this is the filter you will want to run before plotting. Your new “df” for the box plot will be “BOTH_hour”
BOTH_hour <- BOTH_timedata %>%
group_by(Habitat,hour) %>%
summarise(S= length(unique(common_name)))
BOTH_hour<-group_by(BOTH_hour, hour) %>%
filter(n() != 1) %>%
arrange(Habitat,hour) %>%
ungroup()
df = data frame
x = your IV
y = your DV
fill = which column you would like the color of the bars to be based off of.
xlab = x-axis label
ylab = y-axis label
ggtitle = main plot title
facet_wrap(~ ) Breaks the plot into smaller sections based on an additional TV
bar_df<- ggplot(df, aes(x=IV, y=DV,
fill=IV))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab(" ") +
ylab(" ") +
ggtitle(" ")+
facet_wrap(~ IV)
Example Plot :
df = the data frame you chose
→ replace “box_df” with box_“name of your data frame”
→ e.g. BOTH_year, BOTH_month, BOTH_hour → box_BOTH_year, box_BOTH_month, BOX_both_hour
IV = the column of your IV [this will be Habitat]
IV-time = the unit of time you chose (year, month, hour)
DV = the column of your DV [this will be S]
fill = Which column you want to designate the bar colors
ggtitle = The title you assign the main plot
xlab = Text that you want as your x-axis label
ylab = Text that you want as your y-axis label
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
bar_df<-ggplot(df,
aes(x=IV,
y=DV,
fill=IV))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("IV axis title") +
ylab("DV axis title") +
facet_wrap(~ IV-time)
ggtitle("Title of your figure")
bar_df
ggsave("bar_df.png",
plot = bar_df,
width = 10,
height = 15,
units = "cm")
Output for box_BOTH_year
BOTH_year <- BOTH_timedata %>%
group_by(Habitat,year) %>%
summarise(S= length(unique(common_name)))
BOTH_year<-group_by(BOTH_year, year) %>%
filter(n() != 1) %>%
arrange(Habitat,year) %>%
ungroup()
#df<- bar_year
#IV<- Habitat
#IV-time <- year
#DV<- S
#ggtitle <- Difference in species richness observed across years
#xlab = "Habitat"
#ylab = "Species richness observed each year"
#fill = "Habitat"
bar_BOTH_year<-ggplot(BOTH_year,
aes(x=Habitat,
y=S,
fill=Habitat))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Habitat") +
ylab("Species Richness") +
facet_wrap(~ year)+
#ggtitle("Difference in species richness observed across years")
labs(title="Difference in species richness\n observed across years")+
theme(plot.title = element_text(lineheight = 0.9))
bar_BOTH_year
ggsave("bar_BOTH_year.png",
plot = bar_BOTH_year,
width = 10,
height = 15,
units = "cm")
## Here, the title was too long, so I split it up with "\n" ,
## Used labs(title=" ") instead of our ggtitle line of code,
## and adjusted the space between lines by specifying the
## argument "lineheight" with the theme function element_text().
Output for box_BOTH_month
BOTH_month <- BOTH_timedata %>%
group_by(Habitat,month) %>%
summarise(S= length(unique(common_name)))
BOTH_month<-group_by(BOTH_month, month) %>%
filter(n() != 1) %>%
arrange(Habitat,month) %>%
ungroup()
#df<- bar_month
#IV<- Habitat
#IV-time <- month
#DV<- S
#ggtitle <- Difference in species richness observed across months
#xlab = "Habitat"
#ylab = "Species richness observed each month"
#fill = "Habitat"
bar_BOTH_month<-ggplot(BOTH_month,
aes(x=Habitat,
y=S,
fill=Habitat))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Habitat") +
ylab("Species Richness") +
facet_wrap(~ month)+
ggtitle("Difference in species richness observed across months")
bar_BOTH_month
ggsave("bar_BOTH_month.png",
plot = bar_BOTH_month,
width = 15,
height = 15,
units = "cm")
Output for box_BOTH_hour
BOTH_hour <- BOTH_timedata %>%
group_by(Habitat,hour) %>%
summarise(S= length(unique(common_name)))
BOTH_hour<-group_by(BOTH_hour, hour) %>%
filter(n() != 1) %>%
arrange(Habitat,hour) %>%
ungroup()
#df<- bar_hour
#IV<- Habitat
#IV-time <- hour
#DV<- S
#ggtitle <- Difference in species richness observed across different times of day
#xlab = "Habitat"
#ylab = "Species richness"
#fill = "Habitat"
bar_BOTH_hour<-ggplot(BOTH_hour,
aes(x=Habitat,
y=S,
fill=Habitat))+
geom_bar(stat= "identity",
position=position_dodge()) +
xlab("Habitat") +
ylab("Species Richness") +
facet_wrap(~ hour)+
#ggtitle("Difference in species richness observed across hours")
labs(title="Difference in species richness observed across \n different times of day")+
theme(plot.title = element_text(lineheight = 0.9))
bar_BOTH_hour
ggsave("bar_BOTH_hour.png",
plot = bar_BOTH_hour,
width = 15,
height = 15,
units = "cm")
## Here, the title was too long, so I split it up with "\n" ,
## Used labs(title=" ") instead of our ggtitle line of code,
## and adjusted the space between lines by specifying the
## argument "lineheight" with the theme function element_text().
A style of bar plot, also known as a lollipop plot, that shows the relationship between numeric and categorical data. This style of chart can be useful when there might be several or more bars on a standard bar plot of similar height by relieving clutter and making it easier to discern differences.
Motor TRend Car Road Tests:
This is another example data frame within R. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
A data frame with 32 observations on 11 (numeric) variables:
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
First six rows of df “mtcars”:
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
ggdotchart(df, x = " ", y = " ",
xlab = " ",
ylab = " ",
title= "",
legend.title = " ",
color = " ",
palette = c(" "),
sorting = " ", # Sort value in de-/ascending order
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
group = " ", # Order by groups insead of sorting above
facet.by = " ",
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(df$ ), # Add values as dot labels
font.label = list(color = "black", # Adjust label parameters
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
ggtheme = theme_pubr() ) # ggplot2 theme
df = mtcars
x = name
y = mpg_z
xlab = Vehicle Make and Model
ylab = mpg title = Comparison of fuel efficiency across various vehicles
legend.title= Cylinders
color = cyl
rotate = TRUE
group = cyl
facet.by = #not faceted
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
# First make duplicate of the mtcars df, then convert "cyl" to a factor
cars<- mtcars
cars$cyl<-as.factor(cars$cyl)
# Add a column with the row names of mtcars (or our new df, cars)
cars$name<-rownames(cars)
# Now we're ready!
dot_cars<- ggdotchart(cars, x = "name", y = "mpg",
xlab = "Vehicle Make and Model",
ylab = "mpg",
#title= "Comparison of fuel efficiency across various vehicles",
legend.title = "Cylinders",
color = "cyl", # Color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = TRUE, # Rotate vertically
group = "cyl", # Order by groups
#facet.by = " ",
point.size = 20,
position= position_dodge(0),
dot.size = 6, # Large dot size
label = round(cars$mpg), # Add values as dot labels
font.label = list(color = "black", # Adjust label parameters
size = 9,
vjust = 0.5,
position = position_dodge(0.9)),
ggtheme = theme_pubr() ) + # ggplot2 theme
labs(title="Comparison of fuel efficiency \n across various vehicles")+
theme(plot.title = element_text(lineheight = 0.7))
dot_cars
ggsave("dot_cars.png",
plot = dot_cars,
width = 15,
height = 25,
units = "cm")
Output for ggdotchart(cars):
df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
dot_df<- ggdotchart(df, x = "IV", y = "DV",
xlab = "IV axis title",
ylab = "DV axis title",
title= "Main figure title",
legend.title = "legend title (IV)",
color = "IV", # Color by groups
palette = c("3d"), # Custom color palette
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
group = "IV", # Order by groups [instead of defined by sorting]
facet.by = "IV",
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(Both_diversity$S), # Add values as dot labels
font.label = list(color = "black",
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
# Adjust label parameters
ggtheme = theme_pubr() ) # ggplot2 theme
df = BOTH_diversity
x = iconic_taxon_name
y = S
xlab = iNaturalist Iconic Taxon Groups
ylab = Species Richness per Iconic Taxon Group
title = Comparison of Species Richness per Iconic Taxon Group at BBC vs MMC
legend.title= Iconic Taxon Groups
color = iconic_taxon_group
rotate = FALSE
group = I chose to sort by “descending” instead of by group
facet.by = Habitat
dot_BOTH_diversity<- ggdotchart(BOTH_diversity, x = "iconic_taxon_name", y = "S",
xlab = "iNaturalist Iconic Taxon Groups",
ylab = "Species Richness per Iconic Taxon Group",
title= "Comparison of Species Richness per Iconic Taxon Group at BBC vs MMC",
legend.title = "Iconic Taxon Groups",
color = "iconic_taxon_name", # Color by groups
palette = c("3d"), # Custom color palette #"#00AFBB", "#FC4E07"
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
# group = "iconic_taxon_name", # Order by groups
facet.by = "Habitat",
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(BOTH_diversity$S), # Add values as dot labels
font.label = list(color = "black", # Adjust label parameters
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
ggtheme = theme_pubr() ) # ggplot2 theme
dot_BOTH_diversity
ggsave("dot_BOTH_diversity.png",
plot = dot_BOTH_diversity,
width = 25,
height = 20,
units = "cm")
df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
dot_df<- ggdotchart(df, x = "IV", y = "DV",
xlab = "IV axis title",
ylab = "DV axis title",
title= "Main figure title",
legend.title = "legend title (IV)",
color = "IV", # Color by groups
palette = c("3d"), # Custom color palette
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
group = "IV", # Order by groups [instead of defined by sorting]
facet.by = "IV",
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(Both_diversity$S), # Add values as dot labels
font.label = list(color = "black",
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
# Adjust label parameters
ggtheme = theme_pubr() ) # ggplot2 theme
df = BOTH_spp_abundance
x = iconic_taxon_name
y = Species_Abundance
xlab = iNaturalist Iconic Taxon Groups
ylab = Species Abundance per Iconic Taxon Group
title = Comparison of Species Abundance per Iconic Taxon Group at BBC vs MMC
legend.title= Iconic Taxon
color = iconic_taxon_group
rotate = TRUE
group = I chose to sort by “descending” instead of by group
facet.by = Habitat
dot_BOTH_spp_abundance<- ggdotchart(BOTH_spp_abundance, x = "common_name", y = "Species_Abundance",
xlab = "Common Names of Observed Organisms",
ylab = "Species Abundance",
title= "Comparison of Species Abundance at BBC vs MMC",
legend.title = "Iconic Taxon Groups",
color = "iconic_taxon_name", # Color by groups
palette = c("3d"), # Custom color palette #"#00AFBB", "#FC4E07"
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = TRUE, # Rotate vertically
group = "iconic_taxon_name", # Order by groups
facet.by = "Habitat",
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(BOTH_spp_abundance$Species_Abundance), # Add values as dot labels
font.label = list(color = "black", # Adjust label parameters
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
ggtheme = theme_pubr() ) # ggplot2 theme
dot_BOTH_spp_abundance
ggsave("dot_BOTH_spp_abundance.png",
plot = dot_BOTH_spp_abundance,
width = 30,
height = 120,
units = "cm")
Before running code in this section, you need to first choose which time variable you would like to observe: year, month, hour.
year
If you want to see how the species richness per habitat varies by year, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_year”
BOTH_year <- BOTH_timedata %>%
group_by(Habitat,year) %>%
summarise(S= length(unique(common_name)))
BOTH_year<-group_by(BOTH_year, year) %>%
filter(n() != 1) %>%
arrange(Habitat,year) %>%
ungroup()
month
If you want to see how the species richness per habitat varies by month, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_month”
BOTH_month <- BOTH_timedata %>%
group_by(Habitat,month) %>%
summarise(S= length(unique(common_name)))
BOTH_month<-group_by(BOTH_month, month) %>%
filter(n() != 1) %>%
arrange(Habitat,month) %>%
ungroup()
hour
If you want to see how the species richness per habitat varies by hour, this is the filter you will want to run before plotting. Your new “df” for the box plot will be “BOTH_hour”
BOTH_hour <- BOTH_timedata %>%
group_by(Habitat,hour) %>%
summarise(S= length(unique(common_name)))
BOTH_hour<-group_by(BOTH_hour, hour) %>%
filter(n() != 1) %>%
arrange(Habitat,hour) %>%
ungroup()
BOTH_hour$hour <- as.factor(BOTH_hour$hour) #changes hours to factors
BOTH_hour$hour <- droplevels(BOTH_hour$hour,2) #adjusts level of factors
df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
df = data frame
x = your IV
y = your DV
xlab = x-axis label
ylab = y-axis label
title = main plot title
legend.title= title of the legend
fill = which column you would like the color of the dots to be based on (IV)
rotate = if you would like to change the direction, change to “TRUE”
group = How you would like the data sorted (one of your IV columns)
→ If you’d rather results be sorted by ascending or descending values, you can block out this like with a “#” as shown below, and it will sort the bars as designated by sorting = " "
facet.by = IV that breaks the overall plot into smaller, grouped sections
dot_df<- ggdotchart(df, x = "IV", y = "DV",
xlab = "IV axis title",
ylab = "DV axis title",
title= "Main figure title",
legend.title = "legend title (IV)",
color = "IV", # Color by groups
palette = c("3d"), # Custom color palette
sorting = "descending", # Sort value in descending order
id = "IV-2",
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
group = "IV", # Order by groups [instead of defined by sorting]
facet.by = "IV",
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(Both_diversity$S), # Add values as dot labels
font.label = list(color = "black",
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
# Adjust label parameters
ggtheme = theme_pubr() ) # ggplot2 theme
year:
BOTH_year <- BOTH_timedata %>%
group_by(Habitat,year) %>%
summarise(S= length(unique(common_name)))
BOTH_year<-group_by(BOTH_year, year) %>%
filter(n() != 1) %>%
arrange(Habitat,year) %>%
ungroup()
# time: year
# df = BOTH_year
# x = Habitat
# y = S
# xlab = Habitat
# ylab = Species Richness Across Habitat
# title = Comparison of BBC and MMC species richness over the years
# legend.title= Habitat
# color = Habitat
# rotate = FALSE
# group = Habitat
# facet.by = Habitat
dot_BOTH_year<- ggdotchart(BOTH_year, x = "Habitat", y = "S",
xlab = "Habitat",
ylab = "Species Richness Across Habitat",
title= "Comparison of BBC and MMC species richness over the years",
legend.title = "Habitat",
color = "Habitat", # Color by groups
palette = c("3d"), # Custom color palette #"#00AFBB", "#FC4E07"
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
group = "Habitat", # Order by groups
#facet.by = "year", # faceting was specified below
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(BOTH_year$S), # Add values as dot labels
font.label = list(color = "black", # Adjust label parameters
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
ggtheme = theme_pubr() ) + # ggplot2 theme
facet_wrap(~year, nrow=1) # faceting specified here instead to
# control the number of rows expressed in
# the plot output
dot_BOTH_year
ggsave("dot_BOTH_year.png",
plot = dot_BOTH_year,
width = 20,
height = 15,
units = "cm")
month:
BOTH_month <- BOTH_timedata %>%
group_by(Habitat,month) %>%
summarise(S= length(unique(common_name)))
BOTH_month<-group_by(BOTH_month, month) %>%
filter(n() != 1) %>%
arrange(Habitat,month) %>%
ungroup()
# time: month
# df = BOTH_month
# x = Habitat
# y = S
# xlab = Habitat
# ylab = Species Richness Across Habitat
# title = Comparison of BBC and MMC species richness by month
# legend.title= Habitat
# color = Habitat
# rotate = FALSE
# group = Habitat
# facet.by = month
dot_BOTH_month<- ggdotchart(BOTH_month, x = "Habitat", y = "S",
xlab = "Habitat",
ylab = "Species Richness Across Habitat",
title= "Comparison of BBC and MMC Species Richness by Month",
legend.title = "Habitat",
color = "Habitat", # Color by groups
palette = c("3d"), # Custom color palette #"#00AFBB", "#FC4E07"
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
group = "Habitat", # Order by groups
facet.by = "month", # faceting was specified below
panel.labs = list(month = c('January', 'February', 'March', 'April',
'May', 'June', 'July', 'August',
'September', 'October', 'Novemner',
'December')), #Adjust panel names
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(BOTH_month$S), # Add values as dot labels
font.label = list(color = "black", # Adjust label parameters
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
ggtheme = theme_pubr() ) # ggplot2 theme
#facet_wrap(~month, nrow=1) # faceting specified here instead to
# control the number of rows expressed in
# the plot output
dot_BOTH_month
ggsave("dot_BOTH_month.png",
plot = dot_BOTH_month,
width = 15,
height = 20,
units = "cm")
hour:
BOTH_hour <- BOTH_timedata %>%
group_by(Habitat,hour) %>%
summarise(S= length(unique(common_name)))
BOTH_hour<-group_by(BOTH_hour, hour) %>%
filter(n() != 1) %>%
arrange(Habitat,hour) %>%
ungroup()
BOTH_hour$hour <- as.factor(BOTH_hour$hour)
BOTH_hour$hour <- droplevels(BOTH_hour$hour,2)
dot_BOTH_hour<- ggdotchart(BOTH_hour, x = "Habitat", y = "S",
xlab = "Habitat",
ylab = "Species Richness Across Habitat",
#title= "Comparison of BBC and MMC Observed Species Richness by Time of Day",
legend.title = "Habitat",
color = "Habitat", # Color by groups
palette = c("3d"), # Custom color palette #"#00AFBB", "#FC4E07"
sorting = "descending", # Sort value in descending order
add = "segments", # Add segments from y = 0 to dot
rotate = FALSE, # Rotate vertically
group = "Habitat", # Order by groups
facet.by = "hour", # faceting was specified below
panel.labs = list(hour = #Adjust panel names
c('12:00 AM', '1:00 AM', '5:00 AM',
'7:00 AM', '8:00 AM',
'9:00 AM', '10:00 AM',
'11:00 AM', '12:00 PM', '1:00 PM',
'2:00 PM', '3:00 PM', '4:00 PM', '5:00 PM',
'6:00 PM', '7:00 PM', '8:00 PM', '9:00 PM',
'10:00 PM', '11:00 PM')),
point.size = 20,
position= position_dodge(0),
dot.size = 8, # Large dot size
label = round(BOTH_hour$S), # Add values as dot labels
font.label = list(color = "black", # Adjust label parameters
size = 10,
vjust = 0.5,
position = position_dodge(0.9)),
ggtheme = theme_pubr() ) + # ggplot2 theme
labs(title="Comparison of BBC and MMC Species Richness \n
by Time of Day")+
theme(plot.title = element_text(lineheight = 0.7))
dot_BOTH_hour
ggsave("dot_BOTH_hour.png",
plot = dot_BOTH_hour,
width = 15,
height = 20,
units = "cm")
Boxplots are a standardized way of showing the distribution of data by displaying the minimum, first quartile, median, third quartile, and maximum. The whiskers display the minimum and maximum, the box represents the first and third quartiles, and the median is shown by the center line of the box. The points that extend past the whiskers are considered outliers. This plot also allows you to visualize how closely grouped the data is and whether or not it is symmetrical.
Below is an example from the data frame, “ToothGrowth”, which looks at the effect of Vitamin C on tooth growth in Guinea pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
First six rows of df “ToothGrowth”
| len | supp | dose |
|---|---|---|
| 4.2 | VC | 0.5 |
| 11.5 | VC | 0.5 |
| 7.3 | VC | 0.5 |
| 5.8 | VC | 0.5 |
| 6.4 | VC | 0.5 |
| 10.0 | VC | 0.5 |
ggpaired(ToothGrowth, x = "supp", y = "len",
color = "supp", line.color = "gray",
line.size = 0.4, palette = "npg")
ggpaired(ToothGrowth, x = "supp", y = "len",
color = "supp", line.color = "gray",
line.size = 0.4, palette = "npg")
Example Plot :
df = the data frame you chose
→ replace “box_df” with box_“name of your data frame”
→ e.g. box_BOTH_diversity
IV = the column of your IV
DV = the column of your DV
xlab = Text that you want as your x-axis label
ylab = Text that you want as your y-axis label
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
box_df<- ggpaired(df,
x = "IV",
y = "DV",
color = "IV",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "npg",
xlab = "x-axis label",
ylab = "y-axis label")
box_df
ggsave("box_df.png",
plot = box_df,
width = 10,
height = 15,
units = "cm")
Output for BOTH_diversity Box Plot:
#df<- BOTH_diversity
#IV<- Habitat
#DV<- S
#xlab = "Habitat"
#ylab = "Species Richness per Taxon Group"
box_BOTH_diversity<- ggpaired(BOTH_diversity,
x = "Habitat",
y = "S",
color = "Habitat",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "npg",
xlab = "Habitat",
ylab = "Species Richness per Taxon Group")
box_BOTH_diversity
ggsave("box_BOTH_diversity.png",
plot = box_BOTH_diversity,
width = 10,
height = 15,
units = "cm")
Example Plot :
df = the data frame you chose
→ replace “box_df” with box_“name of your data frame”
→ e.g. box_BOTH_spp_abundance
IV = the column of your IV
DV = the column of your DV
xlab = Text that you want as your x-axis label
ylab = Text that you want as your y-axis label
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
box_df<- ggpaired(df,
x = "IV",
y = "DV",
color = "IV",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "npg",
xlab = "x-axis label",
ylab = "y-axis label")
box_df
ggsave("box_df.png",
plot = box_df,
width = 10,
height = 15,
units = "cm")
Output for BOTH_spp_abundance Box Plot:
#df<- BOTH_spp_abundance
#IV<- Habitat
#DV<- Species_Abundance
#xlab = "Habitat"
#ylab = "Abundance per Species"
box_BOTH_spp_abundance<- ggpaired(BOTH_spp_abundance,
x = "Habitat",
y = "Species_Abundance",
color = "Habitat",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "npg",
xlab = "Habitat",
ylab = "Abundance per Species")
box_BOTH_spp_abundance
ggsave("box_BOTH_spp_abundance.png",
plot = box_BOTH_spp_abundance,
width = 10,
height = 15,
units = "cm")
Before running code in this section, you need to first choose which time variable you would like to observe: year, month, hour.
year:
If you want to see how the species richness per habitat varies by year, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_year”
BOTH_year <- BOTH_timedata %>%
group_by(Habitat,year) %>%
summarise(S= length(unique(common_name)))
BOTH_year<-group_by(BOTH_year, year) %>%
filter(n() != 1) %>%
arrange(Habitat,year) %>%
ungroup()
month:
If you want to see how the species richness per habitat varies by month, this is the filter you will want to run before plotting. Your new “df” will be “BOTH_month”
BOTH_month <- BOTH_timedata %>%
group_by(Habitat,month) %>%
summarise(S= length(unique(common_name)))
BOTH_month<-group_by(BOTH_month, month) %>%
filter(n() != 1) %>%
arrange(Habitat,month) %>%
ungroup()
hour:
If you want to see how the species richness per habitat varies by hour, this is the filter you will want to run before plotting. Your new “df” for the box plot will be “BOTH_hour”
BOTH_hour <- BOTH_timedata %>%
group_by(Habitat,hour) %>%
summarise(S= length(unique(common_name)))
BOTH_hour<-group_by(BOTH_hour, hour) %>%
filter(n() != 1) %>%
arrange(Habitat,hour) %>%
ungroup()
Example Plot:
df = the data frame you chose
→ replace “box_df” with box_“name of your data frame”
→ e.g. BOTH_year, BOTH_month, BOTH_hour → box_BOTH_year, box_BOTH_month, BOX_both_hour
IV = the column of your IV [this will be Habitat] IV-time = the unit of time you chose (year, month, hour) DV = the column of your DV [this will be S] color = Which column you want to designate the point colors title = The title you assign the main plot xlab = Text that you want as your x-axis label ylab = Text that you want as your y-axis label
label = df$IV-time (this will label the individual points with the month/year/hour)
ggsave( ) will save your plot to your working directory
" .png" = title you want the plot to be saved as
→ you have to specify the file type, here we chose .png
plot = " " is the name that you stored your plot as
width =' is how wide you want your plot to be
height = is how tall you want your plot to be
units = " " is the unit of the width and height, here we chose centimeters, “cm”
NOTE: Make sure that if the code has " " around text, that you keep those there! The code won’t run without them.
box_df <- ggpaired(df,
x = "IV",
y = "DV",
color = "IV",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "rainbow",
title = " ",
xlab = "IV",
ylab = "DV",
label = df$IV-time,
font.label = list(color = "black"))
box_df
ggsave("box_df.png",
plot = box_df,
width = 20,
height = 40,
units = "cm")
Output for box_BOTH_year
BOTH_year <- BOTH_timedata %>%
group_by(Habitat,year) %>%
summarise(S= length(unique(common_name)))
BOTH_year<-group_by(BOTH_year, year) %>%
filter(n() != 1) %>%
arrange(Habitat,year) %>%
ungroup()
#df<- box_year
#IV<- Habitat
#IV-time <- year
#DV<- S
#title <- Difference in species richness observed across years
#xlab = "Habitat"
#ylab = "Species richness observed each year"
#color = "Habitat"
box_BOTH_year <- ggpaired(BOTH_year,
x = "Habitat",
y = "S",
color = "Habitat",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "rainbow",
title = "Difference in species richness observed each year",
xlab = "Habitat",
ylab = "Species richness observed each year",
label = BOTH_year$year,
font.label = list(color = "black"))
box_BOTH_year
ggsave("box_BOTH_year.png",
plot = box_BOTH_year,
width = 15,
height = 30,
units = "cm")
Output for box_BOTH_month
BOTH_month <- BOTH_timedata %>%
group_by(Habitat,month) %>%
summarise(S= length(unique(common_name)))
BOTH_month<-group_by(BOTH_month, month) %>%
filter(n() != 1) %>%
arrange(Habitat,month) %>%
ungroup()
#df<- box_month
#IV<- Habitat
#IV-time <- month
#DV<- S
#title <- Difference in species richness observed across months
#xlab = "Habitat"
#ylab = "Species richness observed each month"
#color = "Habitat"
box_BOTH_month <- ggpaired(BOTH_month,
x = "Habitat",
y = "S",
color = "Habitat",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "rainbow",
title = "Difference in species richness observed each month",
xlab = "Habitat",
ylab = "Species richness observed each month",
label = BOTH_month$month,
font.label = list(color = "black"))
box_BOTH_month
ggsave("box_BOTH_month.png",
plot = box_BOTH_month,
width = 15,
height = 35,
units = "cm")
Output for box_BOTH_hour
BOTH_hour <- BOTH_timedata %>%
group_by(Habitat,hour) %>%
summarise(S= length(unique(common_name)))
BOTH_hour<-group_by(BOTH_hour, hour) %>%
filter(n() != 1) %>%
arrange(Habitat,hour) %>%
ungroup()
#df<- box_hour
#IV<- Habitat
#IV-time <- hour
#DV<- S
#title <- Difference in species richness observed across time of day
#xlab = "Habitat"
#ylab = "Species richness observed each hour"
#color = "Habitat"
box_BOTH_hour <- ggpaired(BOTH_hour,
x = "Habitat",
y = "S",
color = "Habitat",
line.color = "gray",
line.size = 1,
point.size = 3,
palette = "rainbow",
title = "Difference in species richness observed across hours",
xlab = "Habitat",
ylab = "Species richness observed per hour",
label = BOTH_hour$hour,
font.label = list(color = "black"))
box_BOTH_hour
ggsave("box_BOTH_hour.png",
plot = box_BOTH_hour,
width = 20,
height = 40,
units = "cm")
For today’s analyses, we will be applying a type of t-test. This analysis is used to compare the means of two populations and answer the question: “Is there a significant difference between the two populations?” A t-test cannot be used to compare two different types of data, (e.g. temperature of the air and the number of birds observed). It can only be used to compare two data sets based on the same data type, (e.g. the number of observations between two different sites). The two data sets must also be in the same units. (e.g. you can’t compare air temperature if one is recorded in Celsius and the other is recorded in Fahrenheit).
There are a few different types of t-tests: one-sample t-test, two-sample t-test, and paired t-tests. The type of t-test used depends on the data available and the question being addressed.
For our analyses today, we will be using a paired t-test. Paired t-tests are used to compare the means between two related groups, or pairs, of samples. Our data is paired because we are testing to see if there are differences in specific values between BBC and MMC.
We want to make sure that the biodiversity metrics of each taxon group at BBC are being compared to their counterpart value at MMC rather than all BBC values being collectively compared to all MMC values collectively. For example, with our BOTH_diversity data frame, we want to make sure that all biodiversity values are being compared across their taxon groupings as shown in the table below:
Every statistical test has a series of assumptions that must be true in order for the results of that analysis to be valid. For a paired t-test:
Assumption 1:
The two samples must be paired
Assumption 2:
There are no significant outliers. If there are outliers, they would need to be removed from the data set.
Assumption 3:
Is the sample large? (n<30?) If not, the data must be checked for normality. If the data is normally distributed, the 3rd assumption is met.
To test normality, a Shapiro-Wilk’s normality test would be run. If the p-value resulting from that test is greater than 0.05, then the data can be considered normally distributed. If the data is not normally distributed, then a similar, nonparametric test would be used instead. For paired t-tests, the nonparametric alternative would be a paired-sample Wilcoxon test. wilcox.test(x,y,paired=TRUE)
To test for normality, we use what is called the Shapiro-Wilk normality test. To use the Shapiro-Wilk test, you first need to get the difference between the pairs of your dependent variable. For example, the difference between all MMC and BBC values of S (species richness) from the BOTH_diversity data frame. Those values can be saved as an object, and then run through the Shapiro-Wilk function shapiro.test().
d.df <- with (df, y[x == "x1"] - y[x == "x2"])
shapiro.test(d.df)
For the sake of simplicity, we are going to assume that ALL assumptions are met for our data in this exercise. We will revisit this subject later in the semester as we get closer to the group presentations.
R Code: t.df <- t.test( y ~ x , data = df, paired = TRUE)
Where:
t.df = the name of your saved t-test output (e.g. t.BOTH_diversity)
t.test( ) = the t.test function in R
y = is the column name of your DV
x = is the column name of your main IV (this will be Habitat)
df = the name of your data frame (e.g. BOTH_diversity)
paired = TRUE specifies that the data should be treated as paired
Interpretation:
The output of the t.test will give a p-value (probability value). In short, a p-value describes how likely it is that the data would have occurred by chance alone (that the null hypothesis is true). If the p-value is below 0.05 (a 5% probability that results are due to random chance), then results are considered significant and null hypothesis is rejected. If the p-value is above 0.05, then results are considered insignificant and the null hypothesis cannot be rejected → “We fail to reject our null hypothesis”. Remember, results are always discussed in terms of the null hypothesis, since that is what we are testing against.
When reporting results, this is only in relation to the outcome of the analysis and the relation to the hypothesis. The meanings, implications, and potential reasoning or influences behind the results are discussed in the discussion/conclusion.
After you have run your test, make sure to record your results on your data menu and discuss your findings in the discussion/conclusion section.
R Code:
t.df <- t.test( y ~ x , data = df, paired = TRUE)
Where:
t.df = the name of your saved t-test output (e.g. t.BOTH_spp_abundance)
t.test( ) = the t.test function in R
y = is the column name of your DV
x = is the column name of your main IV (this will be Habitat)
df = the name of your data frame (e.g. BOTH_spp_abundance)
paired = TRUE specifies that the data should be treated as paired
Interpretation:
The output of the t.test will give a p-value (probability value). In short, a p-value describes how likely it is that the data would have occurred by chance alone (that the null hypothesis is true). If the p-value is below 0.05 (a 5% probability that results are due to random chance), then results are considered significant and null hypothesis is rejected. If the p-value is above 0.05, then results are considered insignificant and the null hypothesis cannot be rejected → “We fail to reject our null hypothesis”. Remember, results are always discussed in terms of the null hypothesis, since that is what we are testing against.
When reporting results, this is only in relation to the outcome of the analysis and the relation to the hypothesis. The meanings, implications, and potential reasoning or influences behind the results are discussed in the discussion/conclusion.
After you have run your test, make sure to record your results on your data menu and discuss your findings in the discussion/conclusion section.
R Code:
t.df <- t.test( y ~ x , data = df, paired = TRUE)
Where:
t.df = the name of your saved t-test output (e.g. t.BOTH_year)
t.test( ) = the t.test function in R
y = is the column name of your DV
x = is the column name of your main IV (this will be Habitat)
df = the name of your data frame (e.g. BOTH_year)
paired = TRUE specifies that the data should be treated as paired
Interpretation:
The output of the t.test will give a p-value (probability value). In short, a p-value describes how likely it is that the data would have occurred by chance alone (that the null hypothesis is true). If the p-value is below 0.05 (a 5% probability that results are due to random chance), then results are considered significant and null hypothesis is rejected. If the p-value is above 0.05, then results are considered insignificant and the null hypothesis cannot be rejected → “We fail to reject our null hypothesis”. Remember, results are always discussed in terms of the null hypothesis, since that is what we are testing against.
When reporting results, this is only in relation to the outcome of the analysis and the relation to the hypothesis. The meanings, implications, and potential reasoning or influences behind the results are discussed in the discussion/conclusion.
After you have run your test, make sure to record your results on your data menu and discuss your findings in the discussion/conclusion section.