Background

Statistical computing is becoming increasingly important for analyzing biological datasets. R is the de facto language of statistical computing in many fields and thus an integral part of this course.

R enables reproducibility by exactly repeating analyses. It also enables modern, computer-intensive analyses such as likelihood and optimization.

Objectives

Organizational Instructions

You should be viewing this R Markdown document in R Studio. You can edit documents in this window and then save them as files on your computer, much like you can with Word or Excel files.

Create a folder somewhere on your computer called ENVR2501. Save this R Markdown file in that folder, and name it as “IntroLab_ENVR2501_YourLastName”. Repeat this process for each lab this semester.

Introduction to R: Functions and Arguments

R has thousands of functions that perform specific calculations or manipulations. Functions are words or letters followed by parentheses. Inside the parentheses, the user provides arguments to the function. Every function has specific arguments that are needed in order to execute the function. In addition, some functions have optional arguments which are not necessary to provide, but provide additional control over how the function works.

Examples of functions include:

Importantly (and, sometimes, confusingly), it is only sometimes necessary to specify what the arguments are within the function. Functions take arguments in an automatic order, so if you know the order, you do not need to label your arguments! But, if you’re not sure of the order, you should specify what each argument is.

For example:

# Create variable called "temperature" that contains numeric values
# c() = concatenate function
temperature <- c(10, 15, 18, 21, 22, 28, 31)
# Create variable called "activity" that also contains numeric values
activity <- c(54, 60, 79, 71, 99, 92, 101)

# Plot the two vectors against one another
plot(temperature, activity) # this does not label arguments, and assumes a specific order

plot(y = activity, x = temperature) # this one does label arguments, which can now be in any order, since they are labeled!  
plot(activity, temperature) # this switches the plot axes

# Plot the two vectors against one another with optional arguments to customize the plot
plot(temperature, activity, 
     col = "blue", 
     pch = 17)

Variables and Objects

You may have noticed that when we created the objects called “temperature” and “activity”, there were corresponding objects that popped up in your Environment on the right. These are examples of vectors, which are objects that contain multiple numeric values. Depending on what kinds of data are stored in the object, the object has different names. For example, a matrix contains numbers in rows and columns. And, a dataframe contains either characters or numbers in rows and columns.

A variable name is the name of a specific object. Please note that R is case-sensitive! In general, variable names should be short, memorable, and contain no spaces or special characters.

The convenience of saving data as objects is that you can manipulate the entire dataset in a single function, rather than needing to do individual calculations! For example, you can directly add, multiply, or perform custom calculations on entire objects.

Saving output of code

If you simply run a calculation, such as “45 + 15” or “activity * 2”, the output will show up in your Console, but it will not be saved anywhere in R! In order to save the output (which is generally necessary to perform further calculations), you MUST assign the output to a new variable using the <-.

Keyboard shortcuts for <- (the assignment operator):

# This will not save 
activity * 2
## [1] 108 120 158 142 198 184 202
# This will create a new object 
activity2 <- activity * 2

# You can further print the value of an object by either:
(activity2)
## [1] 108 120 158 142 198 184 202
print(activity2)
## [1] 108 120 158 142 198 184 202

Another option is to type activity2 into the Console, rather than into your document, and that will also print the contents of the object.

This brings up the important distinction that ANYTHING TYPED INTO THE CONSOLE WILL NOT SAVE. Thus, it is most everyone’s preference to type into the document and run code from there.

Note that code MUST be typed into the grey code “chunks”. Most exercises have chunks already formatted for you, but you can add a new chunk by clicking the green C button at the top right of the script.

Since activity2 is a numeric variable, you can implement algebraic functions directly on it. For instance, to compute its square root or log base 10:

sqrt(activity2) 
## [1] 10.39230 10.95445 12.56981 11.91638 14.07125 13.56466 14.21267
log10(activity2) 
## [1] 2.033424 2.079181 2.198657 2.152288 2.296665 2.264818 2.305351

Note that these calculations were not stored or saved anywhere, and the original vector has not changed.

More complex objects: working with data in rows and columns

As mentioned, objects can also be used to store entire datasets. For the remainder of this lab, you will be working with the PISCO dataset which contains the mean Sea Surface Temperature, chlorophyll a concentration (a measure of algal density), and upwelling currents at 48 different sites along the West Coast of the US from 1999-2004. Download the dataset and store it in a variable called “pisco” by running the following code chunk:

pisco <- read.csv(file="https://raw.githubusercontent.com/tgouhier/biostats/main/pisco_env.csv")

To determine the contents of a dataset, you can use function str() to determine the structure of the dataset. The head() function will show you the first 6 lines of a dataset, though there is an optional argument to display a specific amount!

You can also visually inspect the data by clicking on the object in the Environment tab. Take a moment to do this.

# Print the structure of PISCO
str(pisco)
## 'data.frame':    239 obs. of  7 variables:
##  $ sitenum  : int  18 17 16 19 15 14 13 20 12 11 ...
##  $ latitude : num  43.3 43.3 43.3 41.1 44.2 ...
##  $ longitude: num  -124 -124 -124 -124 -124 ...
##  $ year     : int  1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
##  $ chla     : num  4.4 4.4 3.03 3.94 5.48 ...
##  $ sst      : num  12 12 12.1 12 12.1 ...
##  $ upwelling: num  -35.1 -35.1 -35.1 11 -39.9 ...
# Print the first 6 rows of PISCO
head(pisco)
##   sitenum latitude longitude year     chla      sst upwelling
## 1      18 43.30000 -124.4000 1999 4.404954 12.03498 -35.13245
## 2      17 43.31000 -124.3933 1999 4.404954 12.03498 -35.13245
## 3      16 43.33000 -124.3740 1999 3.032030 12.06844 -35.13245
## 4      19 41.06000 -124.1500 1999 3.938939 12.00153  11.02540
## 5      15 44.20000 -124.1100 1999 5.482347 12.14849 -39.88423
## 6      14 44.21833 -124.1100 1999 5.482347 12.12739 -39.88423
# Print the first 8 rows of PISCO
head(pisco, n = 8)
##   sitenum latitude longitude year     chla      sst upwelling
## 1      18 43.30000 -124.4000 1999 4.404954 12.03498 -35.13245
## 2      17 43.31000 -124.3933 1999 4.404954 12.03498 -35.13245
## 3      16 43.33000 -124.3740 1999 3.032030 12.06844 -35.13245
## 4      19 41.06000 -124.1500 1999 3.938939 12.00153  11.02540
## 5      15 44.20000 -124.1100 1999 5.482347 12.14849 -39.88423
## 6      14 44.21833 -124.1100 1999 5.482347 12.12739 -39.88423
## 7      13 44.24250 -124.1100 1999 5.632734 12.12739 -39.88423
## 8      20 40.02000 -124.0733 1999 3.098366 12.14947  15.09708

Remember that you can run code by either:

Extracting information from dataframes

There are multiple options for extracting subsets of data from an object. There are some manual options, where you select the specific rows and columns, and then there are functions that can be more customizable. First, let’s look at the manual options:

# Print all values
pisco[]
##     sitenum latitude longitude year       chla       sst  upwelling
## 1        18 43.30000 -124.4000 1999  4.4049543 12.034980 -35.132450
## 2        17 43.31000 -124.3933 1999  4.4049543 12.034980 -35.132450
## 3        16 43.33000 -124.3740 1999  3.0320300 12.068437 -35.132450
## 4        19 41.06000 -124.1500 1999  3.9389394 12.001528  11.025395
## 5        15 44.20000 -124.1100 1999  5.4823468 12.148486 -39.884226
## 6        14 44.21833 -124.1100 1999  5.4823468 12.127386 -39.884226
## 7        13 44.24250 -124.1100 1999  5.6327341 12.127386 -39.884226
## 8        20 40.02000 -124.0733 1999  3.0983662 12.149466  15.097083
## 9        12 44.74667 -124.0600 1999  4.8879040 11.973498 -40.180534
## 10       11 44.80667 -124.0600 1999  4.7631959 12.109478 -40.180534
## 11       10 44.82667 -124.0567 1999  4.3641512 12.063868 -40.180534
## 12        8 45.93000 -123.9867 1999  5.2478979 12.682416 -35.725067
## 13       23 39.34000 -123.8200 1999  2.5137408 12.512527  28.449875
## 14       22 39.41500 -123.8100 1999  2.8919887 12.504663  28.449875
## 15       24 39.27250 -123.7975 1999  2.4258334 12.486477  28.449875
## 16       21 39.59750 -123.7800 1999  3.8231853 12.386262  15.097083
## 17        2 48.33750 -124.6875 2000  4.3845333  9.673738 -57.686607
## 18        3 48.31333 -124.6600 2000  4.1588333  9.575097 -57.686607
## 19        4 47.94000 -124.6583 2000  9.5587500 10.610722 -57.686607
## 20        1 48.38917 -124.6500 2000  3.4601111  9.555175 -57.686607
## 21        5 47.87000 -124.6000 2000 11.3688125 10.555787 -57.686607
## 22        6 47.86000 -124.5700 2000 11.9475455 10.597739 -57.686607
## 23       18 43.30000 -124.4000 2000  2.4302143 10.834296 -32.744499
## 24       17 43.31000 -124.3933 2000  2.4302143 10.834296 -32.744499
## 25       16 43.33000 -124.3740 2000  2.4445833 10.880724 -32.744499
## 26       19 41.06000 -124.1500 2000  3.1838334 10.765276  74.612120
## 27       15 44.20000 -124.1100 2000  5.9704545 10.838888 -43.020206
## 28       14 44.21833 -124.1100 2000  5.9704545 10.861467 -43.020206
## 29       13 44.24250 -124.1100 2000  5.8974443 10.861467 -43.020206
## 30       20 40.02000 -124.0733 2000  1.3153333 11.474341  93.681786
## 31       12 44.74667 -124.0600 2000  4.7380000 10.769907 -44.420166
## 32       11 44.80667 -124.0600 2000  4.6942000 10.833976 -44.420166
## 33       10 44.82667 -124.0567 2000  4.7765001 10.685871 -44.420166
## 34       50 45.91333 -123.9767 2000  9.4478750 10.938100 -35.544418
## 35        9 45.77000 -123.9700 2000  8.2754444 10.698111 -35.544418
## 36       51 45.75500 -123.9650 2000  8.1022499 10.698111 -35.544418
## 37       23 39.34000 -123.8200 2000  1.2892222 11.452128  94.701778
## 38       22 39.41500 -123.8100 2000  1.3951250 11.399696  94.701778
## 39       24 39.27250 -123.7975 2000  1.2143333 11.509117  94.701778
## 40       21 39.59750 -123.7800 2000  1.4593636 11.130998  93.681786
## 41       25 38.38333 -123.0867 2000  3.3864286 10.512050  92.148140
## 42       27 38.31000 -123.0700 2000  3.2207000 10.626089  92.148140
## 43       26 38.32000 -123.0700 2000  3.2207000 10.642555  92.148140
## 44       28 36.51000 -121.9467 2000  2.4538000 12.167320 120.816498
## 45       29 36.48000 -121.9400 2000  1.7062143 12.167320 118.223621
## 46       30 36.44667 -121.9233 2000  1.5873333 12.176420 118.223621
## 47       31 35.66000 -121.2800 2000  1.2146666 12.477096 118.223621
## 48       32 35.63000 -121.1900 2000  1.2835384 12.506956 118.223621
## 49       33 35.61000 -121.1500 2000  1.3248750 12.515458 118.223621
## 50       34 34.46000 -120.2700 2000  1.9698667 14.287216 104.393696
## 51       35 34.46000 -120.2682 2000  1.9698667 14.287216 104.393696
## 52       36 34.45400 -120.0560 2000  1.6439231 14.438620 104.393696
## 53       37 34.06000 -119.8500 2000  1.2848261 14.415033 104.393696
## 54       39 34.04333 -119.7800 2000  1.1827391 14.545897 104.393696
## 55       38 34.03667 -119.6633 2000  1.1071579 14.741820 104.393696
## 56       43 33.46000 -118.5200 2000  0.5343750 16.498135  56.261532
## 57       44 33.45000 -118.4800 2000  0.5720000 16.476903  70.719320
## 58       45 33.44000 -118.4767 2000  0.5720000 16.538595  70.719320
## 59       40 33.76000 -118.4200 2000  1.4357857 16.025387  56.261532
## 60       41 33.71000 -118.3100 2000  1.4068824 16.099260  56.261532
## 61       42 33.70000 -118.2867 2000  1.4068824 16.214894  56.261532
## 62       46 32.84000 -117.2800 2000  0.8490909 17.352343  85.177109
## 63       47 32.82000 -117.2767 2000  0.9014000 17.286834  85.177109
## 64       48 32.71167 -117.2500 2000  1.0771667 17.053478  85.177109
## 65        2 48.33750 -124.6875 2001  4.2448572 11.015978 -42.955811
## 66        3 48.31333 -124.6600 2001  4.3077273 10.983317 -42.955811
## 67        4 47.94000 -124.6583 2001  9.1652501 11.958649 -42.955811
## 68        1 48.38917 -124.6500 2001  3.7044706 10.814936 -42.955811
## 69        5 47.87000 -124.6000 2001  9.1279376 12.047073 -42.955811
## 70        6 47.86000 -124.5700 2001  9.3374547 12.216199 -42.955811
## 71       18 43.30000 -124.4000 2001  3.1430000 12.080857   1.496550
## 72       17 43.31000 -124.3933 2001  3.1430000 12.080857   1.496550
## 73       16 43.33000 -124.3740 2001  3.0755000 12.030069   1.496550
## 74       19 41.06000 -124.1500 2001  8.0648334 11.600239  66.746107
## 75       15 44.20000 -124.1100 2001  5.1966153 11.709444 -10.830112
## 76       14 44.21833 -124.1100 2001  5.1966153 11.841968 -10.830112
## 77       13 44.24250 -124.1100 2001  5.1407272 11.841968 -10.830112
## 78       20 40.02000 -124.0733 2001  2.4937273 11.689526  72.723942
## 79       12 44.74667 -124.0600 2001  6.0310001 11.766674 -26.068162
## 80       11 44.80667 -124.0600 2001  6.0298182 11.812106 -26.068162
## 81       10 44.82667 -124.0567 2001  6.0911111 11.825172 -26.068162
## 82       50 45.91333 -123.9767 2001  7.3410001 12.439812 -28.979549
## 83        9 45.77000 -123.9700 2001  6.5397273 12.237367 -28.979549
## 84       51 45.75500 -123.9650 2001  6.3372501 12.237367 -28.979549
## 85       23 39.34000 -123.8200 2001  3.6092500 12.062417  61.816126
## 86       22 39.41500 -123.8100 2001  3.8275715 12.036823  61.816126
## 87       24 39.27250 -123.7975 2001  3.6403636 12.077283  61.816126
## 88       21 39.59750 -123.7800 2001  4.6046000 11.990326  72.723942
## 89       25 38.38333 -123.0867 2001  7.8575001 11.598323  58.586666
## 90       27 38.31000 -123.0700 2001  7.3953637 11.819632  58.586666
## 91       26 38.32000 -123.0700 2001  7.3953637 11.770542  58.586666
## 92       28 36.51000 -121.9467 2001  3.5182000 12.483049  84.535072
## 93       29 36.48000 -121.9400 2001  2.3596429 12.483049  84.364826
## 94       30 36.44667 -121.9233 2001  2.1477778 12.442520  84.364826
## 95       31 35.66000 -121.2800 2001  2.2548462 12.879475  84.364826
## 96       32 35.63000 -121.1900 2001  2.1928571 12.963262  84.364826
## 97       33 35.61000 -121.1500 2001  2.2707778 12.968475  84.364826
## 98       34 34.46000 -120.2700 2001  2.5301333 15.039318  77.186127
## 99       35 34.46000 -120.2682 2001  2.5301333 15.039318  77.186127
## 100      36 34.45400 -120.0560 2001  2.3338461 15.215952  77.186127
## 101      37 34.06000 -119.8500 2001  1.8685652 15.153027  77.186127
## 102      39 34.04333 -119.7800 2001  1.5652609 15.361396  77.186127
## 103      38 34.03667 -119.6633 2001  1.2406316 15.577495  77.186127
## 104      43 33.46000 -118.5200 2001  0.4345417 17.221867  45.404415
## 105      44 33.45000 -118.4800 2001  0.4454167 17.222195  57.317041
## 106      45 33.44000 -118.4767 2001  0.4454167 17.307498  57.317041
## 107      40 33.76000 -118.4200 2001  1.1493571 16.887446  45.404415
## 108      41 33.71000 -118.3100 2001  1.1420000 17.012415  45.404415
## 109      42 33.70000 -118.2867 2001  1.1420000 17.110456  45.404415
## 110      46 32.84000 -117.2800 2001  0.9445000 17.153362  69.229668
## 111      47 32.82000 -117.2767 2001  1.0917059 17.162627  69.229668
## 112      48 32.71167 -117.2500 2001  1.3320000 17.124957  69.229668
## 113       2 48.33750 -124.6875 2002  6.1877142  9.161873 -37.136116
## 114       3 48.31333 -124.6600 2002  6.5135453  9.086350 -37.136116
## 115       4 47.94000 -124.6583 2002  7.9688332 10.082057 -37.136116
## 116       1 48.38917 -124.6500 2002  5.8014705  8.975316 -37.136116
## 117       5 47.87000 -124.6000 2002  7.5032499 10.218884 -37.136116
## 118       6 47.86000 -124.5700 2002  7.8503635 10.181114 -37.136116
## 119      18 43.30000 -124.4000 2002  6.9812309 10.603102  12.459001
## 120      17 43.31000 -124.3933 2002  6.9812309 10.603102  12.459001
## 121      16 43.33000 -124.3740 2002  6.5536365 10.685145  12.459001
## 122      19 41.06000 -124.1500 2002  9.8098333 10.609383  94.800919
## 123      15 44.20000 -124.1100 2002 10.0552001 10.772405  -1.200875
## 124      14 44.21833 -124.1100 2002 10.0552001 10.840752  -1.200875
## 125      13 44.24250 -124.1100 2002 10.2010001 10.840752  -1.200875
## 126      20 40.02000 -124.0733 2002  4.0116666 11.094081 100.205376
## 127      12 44.74667 -124.0600 2002  7.3575555 11.438861 -14.996813
## 128      11 44.80667 -124.0600 2002  7.3521818 11.362642 -14.996813
## 129      10 44.82667 -124.0567 2002  7.2411110 10.926095 -14.996813
## 130      50 45.91333 -123.9767 2002 15.0063000 11.891072 -15.132875
## 131       9 45.77000 -123.9700 2002 12.7730769 11.427969 -15.132875
## 132      51 45.75500 -123.9650 2002  7.9132000 11.427969 -15.132875
## 133      23 39.34000 -123.8200 2002  5.9913334 11.343178  80.826229
## 134      22 39.41500 -123.8100 2002  4.9612499 11.356493  80.826229
## 135      24 39.27250 -123.7975 2002  5.6156667 11.370781  80.826229
## 136      21 39.59750 -123.7800 2002  4.9231818 11.504496 100.205376
## 137      25 38.38333 -123.0867 2002 11.5030001 11.179432  72.198959
## 138      27 38.31000 -123.0700 2002 10.1696364 11.282457  72.198959
## 139      26 38.32000 -123.0700 2002 10.1696364 11.282457  72.198959
## 140      28 36.51000 -121.9467 2002  5.6074001 12.426344  94.482033
## 141      29 36.48000 -121.9400 2002  4.5521429 12.426344  94.225182
## 142      30 36.44667 -121.9233 2002  4.6677778 12.416300  94.225182
## 143      31 35.66000 -121.2800 2002  3.0204615 13.019675  94.225182
## 144      32 35.63000 -121.1900 2002  3.1978571 12.998018  94.225182
## 145      33 35.61000 -121.1500 2002  3.2343334 12.980354  94.225182
## 146      34 34.46000 -120.2700 2002  2.3976667 14.373093  82.587753
## 147      35 34.46000 -120.2682 2002  2.3976667 14.373093  82.587753
## 148      36 34.45400 -120.0560 2002  2.4329231 14.894326  82.587753
## 149      37 34.06000 -119.8500 2002  2.0805652 14.632147  82.587753
## 150      39 34.04333 -119.7800 2002  1.7327391 14.581233  82.587753
## 151      38 34.03667 -119.6633 2002  1.4370526 14.708295  82.587753
## 152      43 33.46000 -118.5200 2002  0.5846250 16.347279  42.931252
## 153      44 33.45000 -118.4800 2002  0.5920833 16.389616  57.165653
## 154      45 33.44000 -118.4767 2002  0.5920833 16.457952  57.165653
## 155      40 33.76000 -118.4200 2002  1.2854286 16.144470  42.931252
## 156      41 33.71000 -118.3100 2002  1.3825882 16.166532  42.931252
## 157      42 33.70000 -118.2867 2002  1.3825882 16.207427  42.931252
## 158      46 32.84000 -117.2800 2002  0.9767273 16.135002  71.400055
## 159      47 32.82000 -117.2767 2002  1.1364667 16.023130  71.400055
## 160      48 32.71167 -117.2500 2002  1.4037500 16.020145  71.400055
## 161       2 48.33750 -124.6875 2003  3.3308667  9.142811 -47.043646
## 162       3 48.31333 -124.6600 2003  3.2892501  8.959091 -47.043646
## 163       4 47.94000 -124.6583 2003  8.4810000  9.108361 -47.043646
## 164       1 48.38917 -124.6500 2003  2.7219445  8.762535 -47.043646
## 165       5 47.87000 -124.6000 2003  9.5728126  9.017313 -47.043646
## 166       6 47.86000 -124.5700 2003 10.3878182  8.900959 -47.043646
## 167      18 43.30000 -124.4000 2003  8.3046924 10.882115 -11.535832
## 168      17 43.31000 -124.3933 2003  8.3046924 10.882115 -11.535832
## 169      16 43.33000 -124.3740 2003  8.4822728 10.952167 -11.535832
## 170      19 41.06000 -124.1500 2003  7.6963848 10.816442  48.549543
## 171      15 44.20000 -124.1100 2003 12.0539998 10.831846 -22.000290
## 172      14 44.21833 -124.1100 2003 12.0539998 10.873231 -22.000290
## 173      13 44.24250 -124.1100 2003 12.1561998 10.873231 -22.000290
## 174      20 40.02000 -124.0733 2003  3.6103637 11.116463  59.931833
## 175      12 44.74667 -124.0600 2003  9.3874999 10.684355 -39.725155
## 176      11 44.80667 -124.0600 2003  9.5099999 10.753206 -39.725155
## 177      10 44.82667 -124.0567 2003 10.5450000 10.540385 -39.725155
## 178      50 45.91333 -123.9767 2003  8.5299998 11.082901 -46.985561
## 179       9 45.77000 -123.9700 2003  8.0589089 10.638696 -46.985561
## 180      51 45.75500 -123.9650 2003  9.2151248 10.638696 -46.985561
## 181      23 39.34000 -123.8200 2003  1.7694445 11.381402  62.207167
## 182      22 39.41500 -123.8100 2003  1.9801250 11.414616  62.207167
## 183      24 39.27250 -123.7975 2003  2.0216667 11.258881  62.207167
## 184      21 39.59750 -123.7800 2003  4.6528182 11.244035  59.931833
## 185      25 38.38333 -123.0867 2003  7.0821251 10.967979  66.307251
## 186      27 38.31000 -123.0700 2003  6.9166364 11.108239  66.307251
## 187      26 38.32000 -123.0700 2003  6.9166364 11.091972  66.307251
## 188      28 36.51000 -121.9467 2003  3.0062000 12.070711 100.724251
## 189      29 36.48000 -121.9400 2003  1.9590666 12.070711  99.260765
## 190      30 36.44667 -121.9233 2003  2.1387000 12.016375  99.260765
## 191      31 35.66000 -121.2800 2003  1.2787500 12.558400  99.260765
## 192      32 35.63000 -121.1900 2003  1.4478462 12.534754  99.260765
## 193      33 35.61000 -121.1500 2003  1.5071250 12.455743  99.260765
## 194      34 34.46000 -120.2700 2003  1.7644667 14.270845  96.177868
## 195      35 34.46000 -120.2682 2003  1.7644667 14.270845  96.177868
## 196      36 34.45400 -120.0560 2003  2.0494615 14.376707  96.177868
## 197      37 34.06000 -119.8500 2003  1.9372174 14.459003  96.177868
## 198      39 34.04333 -119.7800 2003  1.7786522 14.553424  96.177868
## 199      38 34.03667 -119.6633 2003  1.7212105 14.741746  96.177868
## 200      43 33.46000 -118.5200 2003  0.5367200 16.484493  51.534904
## 201      44 33.45000 -118.4800 2003  0.5488400 16.474932  65.404392
## 202      45 33.44000 -118.4767 2003  0.5488400 16.465265  65.404392
## 203      40 33.76000 -118.4200 2003  1.4679286 15.953458  51.534904
## 204      41 33.71000 -118.3100 2003  1.4188235 16.094422  51.534904
## 205      42 33.70000 -118.2867 2003  1.4188235 16.240593  51.534904
## 206      46 32.84000 -117.2800 2003  0.7663333 15.307574  79.273880
## 207      47 32.82000 -117.2767 2003  0.9967647 15.457413  79.273880
## 208      48 32.71167 -117.2500 2003  1.1250714 15.679302  79.273880
## 209       2 48.33750 -124.6875 2004  4.8459333 10.236027 -57.072613
## 210       3 48.31333 -124.6600 2004  4.4521666 10.116756 -57.072613
## 211       4 47.94000 -124.6583 2004  9.3408333 11.254329 -57.072613
## 212       1 48.38917 -124.6500 2004  4.3155556 10.081898 -57.072613
## 213       5 47.87000 -124.6000 2004  8.9538750 11.324538 -57.072613
## 214       6 47.86000 -124.5700 2004  9.2177273 11.371373 -57.072613
## 215      18 43.30000 -124.4000 2004  4.9034286 12.026998  -7.702940
## 216      17 43.31000 -124.3933 2004  4.9034286 12.026998  -7.702940
## 217      16 43.33000 -124.3740 2004  4.6930000 12.072333  -7.702940
## 218      19 41.06000 -124.1500 2004  7.8518461 11.656117  78.904888
## 219      15 44.20000 -124.1100 2004  6.8915000 12.174138 -23.245136
## 220      14 44.21833 -124.1100 2004  6.8915000 12.187472 -23.245136
## 221      13 44.24250 -124.1100 2004  6.9893000 12.187472 -23.245136
## 222      20 40.02000 -124.0733 2004  5.5013333 11.715408  73.585711
## 223      12 44.74667 -124.0600 2004  4.3575000 11.403199 -43.838291
## 224      11 44.80667 -124.0600 2004  4.6368001 11.506754 -43.838291
## 225      10 44.82667 -124.0567 2004  4.9682501 11.510463 -43.838291
## 226      50 45.91333 -123.9767 2004  6.1703750 12.093767 -48.889252
## 227       9 45.77000 -123.9700 2004  5.5031112 12.001981 -48.889252
## 228      51 45.75500 -123.9650 2004  5.3061250 12.001981 -48.889252
## 229      23 39.34000 -123.8200 2004  3.9160000 11.938137  60.089834
## 230      22 39.41500 -123.8100 2004  3.9317500 11.941461  60.089834
## 231      24 39.27250 -123.7975 2004  4.0500833 11.933761  60.089834
## 232      21 39.59750 -123.7800 2004  3.8032727 11.847662  73.585711
## 233      25 38.38333 -123.0867 2004  4.3427500 12.124054  61.060165
## 234      27 38.31000 -123.0700 2004  4.4498182 12.094118  61.060165
## 235      26 38.32000 -123.0700 2004  4.4498182 12.081184  61.060165
## 236      28 36.51000 -121.9467 2004  2.6808000 13.282690  88.587502
## 237      29 36.48000 -121.9400 2004  2.0256429 13.282690  89.688965
## 238      30 36.44667 -121.9233 2004  2.1056667 13.271455  89.688965
## 239      31 35.66000 -121.2800 2004  1.4803333 13.270370  89.688965
# Select the value in the 4th row and 2nd column
pisco[4, 2]
## [1] 41.06
# Select the 3rd row of pisco
pisco[3, ]
##   sitenum latitude longitude year    chla      sst upwelling
## 3      16    43.33  -124.374 1999 3.03203 12.06844 -35.13245
# Select the 3rd column of pisco
pisco[ , 3]
##   [1] -124.4000 -124.3933 -124.3740 -124.1500 -124.1100 -124.1100 -124.1100
##   [8] -124.0733 -124.0600 -124.0600 -124.0567 -123.9867 -123.8200 -123.8100
##  [15] -123.7975 -123.7800 -124.6875 -124.6600 -124.6583 -124.6500 -124.6000
##  [22] -124.5700 -124.4000 -124.3933 -124.3740 -124.1500 -124.1100 -124.1100
##  [29] -124.1100 -124.0733 -124.0600 -124.0600 -124.0567 -123.9767 -123.9700
##  [36] -123.9650 -123.8200 -123.8100 -123.7975 -123.7800 -123.0867 -123.0700
##  [43] -123.0700 -121.9467 -121.9400 -121.9233 -121.2800 -121.1900 -121.1500
##  [50] -120.2700 -120.2682 -120.0560 -119.8500 -119.7800 -119.6633 -118.5200
##  [57] -118.4800 -118.4767 -118.4200 -118.3100 -118.2867 -117.2800 -117.2767
##  [64] -117.2500 -124.6875 -124.6600 -124.6583 -124.6500 -124.6000 -124.5700
##  [71] -124.4000 -124.3933 -124.3740 -124.1500 -124.1100 -124.1100 -124.1100
##  [78] -124.0733 -124.0600 -124.0600 -124.0567 -123.9767 -123.9700 -123.9650
##  [85] -123.8200 -123.8100 -123.7975 -123.7800 -123.0867 -123.0700 -123.0700
##  [92] -121.9467 -121.9400 -121.9233 -121.2800 -121.1900 -121.1500 -120.2700
##  [99] -120.2682 -120.0560 -119.8500 -119.7800 -119.6633 -118.5200 -118.4800
## [106] -118.4767 -118.4200 -118.3100 -118.2867 -117.2800 -117.2767 -117.2500
## [113] -124.6875 -124.6600 -124.6583 -124.6500 -124.6000 -124.5700 -124.4000
## [120] -124.3933 -124.3740 -124.1500 -124.1100 -124.1100 -124.1100 -124.0733
## [127] -124.0600 -124.0600 -124.0567 -123.9767 -123.9700 -123.9650 -123.8200
## [134] -123.8100 -123.7975 -123.7800 -123.0867 -123.0700 -123.0700 -121.9467
## [141] -121.9400 -121.9233 -121.2800 -121.1900 -121.1500 -120.2700 -120.2682
## [148] -120.0560 -119.8500 -119.7800 -119.6633 -118.5200 -118.4800 -118.4767
## [155] -118.4200 -118.3100 -118.2867 -117.2800 -117.2767 -117.2500 -124.6875
## [162] -124.6600 -124.6583 -124.6500 -124.6000 -124.5700 -124.4000 -124.3933
## [169] -124.3740 -124.1500 -124.1100 -124.1100 -124.1100 -124.0733 -124.0600
## [176] -124.0600 -124.0567 -123.9767 -123.9700 -123.9650 -123.8200 -123.8100
## [183] -123.7975 -123.7800 -123.0867 -123.0700 -123.0700 -121.9467 -121.9400
## [190] -121.9233 -121.2800 -121.1900 -121.1500 -120.2700 -120.2682 -120.0560
## [197] -119.8500 -119.7800 -119.6633 -118.5200 -118.4800 -118.4767 -118.4200
## [204] -118.3100 -118.2867 -117.2800 -117.2767 -117.2500 -124.6875 -124.6600
## [211] -124.6583 -124.6500 -124.6000 -124.5700 -124.4000 -124.3933 -124.3740
## [218] -124.1500 -124.1100 -124.1100 -124.1100 -124.0733 -124.0600 -124.0600
## [225] -124.0567 -123.9767 -123.9700 -123.9650 -123.8200 -123.8100 -123.7975
## [232] -123.7800 -123.0867 -123.0700 -123.0700 -121.9467 -121.9400 -121.9233
## [239] -121.2800
# Select the values in rows 3 to 5 and columns 4 and 7
pisco[3:5, c(4, 7)]
##   year upwelling
## 3 1999 -35.13245
## 4 1999  11.02540
## 5 1999 -39.88423

On your own:

# Select the values in the 100th and 200th rows, and the 5th through 7th columns:
pisco[c(100,200), 5:7]
##         chla      sst upwelling
## 100 2.333846 15.21595  77.18613
## 200 0.536720 16.48449  51.53490

Subsetting using logical values

It’s generally poor form to manually select values, because it depends on the data and code remaining exactly the same in order for the selection to stay the same. Although your data files in this course will not change, it is very common for real analysis of data to span several months and several different versions of data. Thus, this class will emphasize methods that are robust to foreseeable data changes.

We will begin with introducing logical operations. A logical statement returns TRUE if the condition is met, and FALSE if the condition is not met. You can also do mathematical operations on logical values, because TRUE is counted as a 1, while FALSE is counted as a 0:

# Determine whether sea surface temperature is above 11 degrees
pisco$sst > 11
##   [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [13]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [61]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
##  [73]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [97]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [109]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
## [133]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [145]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [157]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
## [181]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [193]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [205]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
## [217]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [229]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# Using the sum() function on this output will count the number of TRUE values
sum(pisco$sst > 11)
## [1] 179
# Determine whether upwelling is greater than or equal to zero
pisco$upwelling >= 0
##   [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [13]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [61]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
##  [73]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [97]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [109]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
## [121]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [133]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [145]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [157]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [181]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [193]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [205]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [229]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# Use sum() to count the number of cases where this condition is met
sum(pisco$upwelling >= 0)
## [1] 145
# Impose both constraints by combining with "&"
pisco$upwelling >= 0 & pisco$sst > 11 # now only TRUE if both constraints are met
##   [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [13]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [61]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
##  [73]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [97]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [109]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [133]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [145]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [157]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [181]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [193]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [205]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [229]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
sum(pisco$upwelling >= 0 & pisco$sst > 11) # how many meet both criteria?
## [1] 135

These are some examples of implementing logical operators:

On your own:

# Count how many observations come from the year 1999:
sum(pisco$year == 1999)
## [1] 16
# Count how many observations have a sst greater than 13 and less than 15:
sum(pisco$sst > 13 & pisco$sst < 15)
## [1] 23
# Count how many observations have a sst greater than 13 and less than 15 and are from the year 1999:
sum(pisco$sst > 13 & pisco$sst < 15 & pisco$year == 1999)
## [1] 0

Logical values are also helpful in subsetting data. A TRUE value will keep an observation, whereas a FALSE value will get rid of that observation:

# Create a vector with only values of upwelling greater than or equal to zero 
pos.upwelling <- pisco$upwelling[pisco$upwelling >= 0]

# Create a new dataframe with only observations of upwelling greater than or equal to zero
pisco.sub1 <- pisco[pisco$upwelling >= 0, ]

# Create a new dataframe with only observations of upwelling greater than or equal to zero and sst greater than 15 
pisco.sub2 <- pisco[pisco$upwelling >= 0 & pisco$sst > 15, ]

On your own:

# Create a dataframe called "pisco.sub3" that keeps only observations from the year 1999:
pisco.sub3 <- pisco[pisco$year == 1999, ]

Finally, you can place constraints on both rows and columns simultaneously. Columns have character names in dataframes, and you can use these names to select columns to keep in a data subset. Here, we place constraints on the rows using a logical operator, and columns by creating a vector of column names to keep:

# Create a new dataframe that keeps positive values of upwelling and the 4 columns for upwelling, year, chla, and sst 
pisco.sub4 <- pisco[pisco$upwelling >= 0, c("upwelling", "year", "chla", "sst") ]

On your own:

# Create a dataframe called "pisco.sub5" that keeps only observations from the year 1999 and the columns for year, sitenum, sst, and chla:
pisco.sub5 <- pisco[pisco$year == 1999, c("year", "sitenum", "sst", "chla") ]

Complex calculations and summary statistics using aggregate()

What if you wanted to calculate the mean chlorophyll values for every location that you sampled? It wouldn’t be efficient to manually select the rows corresponding to every different study site. The aggregate() function will allow you to perform some function on the values in each category of the dataframe. For example, if you wanted to calculate the mean sea surface temperature for each year, this would be the appropriate implementation of aggregate():

# Compute annual mean of sst across all sites
aggregate(sst ~ year, FUN=mean, data=pisco)
##   year      sst
## 1 1999 12.21324
## 2 2000 12.50183
## 3 2001 13.36238
## 4 2002 12.52644
## 5 2003 12.25960
## 6 2004 11.87153

This function is more complex than others you have seen, as it has multiple required arguments with different formats. The first argument is stating that we want sst split out by year. The variable sst is on the left of the ~, and the categorizing variable year is on the right of the ~. Then, we specify what function we want to be implemented on the different categories. Finally, we indicate what dataset the values can be found.

It is possible to keep giving more complicated instructions! This code will calculate the chlorophyll concentrations at every combination of the different latitudes and longitudes. This would be helpful if some sites had the same latitude, for example.

# Compute chla as a function of latitude and longitude
head(aggregate(chla ~ latitude + longitude, data = pisco, FUN = mean))
##   latitude longitude     chla
## 1 48.33750 -124.6875 4.598781
## 2 48.31333 -124.6600 4.544305
## 3 47.94000 -124.6583 8.902933
## 4 48.38917 -124.6500 4.000710
## 5 47.87000 -124.6000 9.305338
## 6 47.86000 -124.5700 9.748182

On your own:

# Use aggregate() to calculate the average chla for each year:
aggregate(chla ~ year, FUN=mean, data = pisco)
##   year     chla
## 1 1999 4.149661
## 2 2000 3.275534
## 3 2001 3.895443
## 4 2002 5.490546
## 5 2003 4.754450
## 6 2004 5.094501

Creating custom functions

What if you want to perform a calculation that is not a function that R already contains? R will allow you to make your own function!

To create your own custom function, you must specify

Let’s look at an example of creating a custom function to calculate coefficient of variation:

# Create custom cv function
cv <- function (vec) {
  cv.calc <- sd(vec)/mean(vec)
  return(cv.calc)}

# Find the cv for temperature 
cv(temperature)
## [1] 0.3500297

In the above function, we first give it the name cv.

Then, we specify that the function will take one argument called “vec”. But, we could have named this argument anything! As long as it stays consistent within the function, any name will work.

Afterwards, we create a new variable called “cv.calc” which calculates the cv of the argument. Then, we indicate that the value we want returned is the number stored in “cv.calc”.

Once the custom function is created, you can use it just like any other R function!

# Compute annual CV of sst across all years using cv function
aggregate(sst ~ year, FUN=cv, data=pisco)
##   year        sst
## 1 1999 0.01821214
## 2 2000 0.19116718
## 3 2001 0.16151562
## 4 2002 0.18214963
## 5 2003 0.19164300
## 6 2004 0.06699092

Let’s look at more examples of custom functions to get the hang of how to create them. These are just toy examples to illustrate how to pass arguments and set up the custom functions.

This function rescales data to have standard deviation of 1:

sd1 <- function(vector){
  new.nums <- vector / sd(vector)
  return(new.nums)
}

# Implement this function on temperature
sd1(temperature)
## [1] 1.379193 2.068790 2.482548 2.896306 3.034225 3.861741 4.275499
# Check that it works
sd(sd1(temperature))
## [1] 1

Note that any intermediary objects or variables created within the function ARE NOT SAVED in your environment. They only exist temporarily to execute the function!

On your own:

# Create a function named calc.diff that subtracts the minimum value in a vector from the mean value of a vector
# Use min() to find the minimum and mean() to find the mean
calc.diff <- function(vec){
  min.val <- min(vec)
  avg.val <- mean(vec)
  return(avg.val - min.val)
}


# Check to see if it works by implementing on a column within pisco:
calc.diff(pisco$sst)
## [1] 3.767345

Plotting

It’s possible to make a plot in R using only 2 arguments (the x and y axis vectors), as seen at the beginning of this document. Yet, R is extremely flexible and powerful for making custom graphics. Here, we will take a look at some of the options for how to pass data to the plot function.

Let’s plot the sea surface temperature across latitudes for the year 2000 and the year 2001 in side by side panels.

# This command specifies that the output figure should have 1 row and 2 columns 
# In order for it to render correctly, you must run the entire chunk! Otherwise, it will keep showing a 1 panel figure. 
par(mfrow = c(1, 2))

# Subset data for year 2000
pisco.2000 <- pisco[pisco$year == 2000, ]

# Plot data for year 2000. Note that arguments can span multiple lines! Sometimes it's visually easier to see and edit long functions this way
plot(pisco.2000$latitude, pisco.2000$sst, 
     pch = 20, 
     col="red",
     ylab="SST", 
     xlab="Latitude (degrees N)")

# Subset data for year 2001
pisco.2001 <- pisco[pisco$year == 2001, ]

# Plot data for year 2001. 
plot(pisco.2001$latitude, pisco.2001$sst, 
     pch = 20, 
     col="blue",
     ylab="SST", 
     xlab="Latitude (degrees N)")

Here are some plotting arguments you may find useful:

On your own:

# Modify this code to make the size of datapoints larger, and to use squares as the plotting symbol:

# Plot data for year 2001. 
plot(pisco.2001$latitude, pisco.2001$sst, 
     pch = 15,
     col="blue",
     cex=2,
     ylab="SST", 
     xlab="Latitude (degrees N)")

Task 1

Compute the mean of the variable chla for each latitude between 36 and 50 degrees North: Your first step will be to create a subsetted dataframe containing only the relevant observations. Your second step will be to use aggregate() on the subsetted data to implement the calculation.

pisco.lat3650 <- pisco[pisco$latitude < 50 & pisco$latitude > 36, c("latitude", "chla")]

aggregate(chla ~ latitude, FUN = mean, data = pisco.lat3650)
##    latitude     chla
## 1  36.44667 2.529451
## 2  36.48000 2.520542
## 3  36.51000 3.453280
## 4  38.31000 6.430431
## 5  38.32000 6.430431
## 6  38.38333 6.834361
## 7  39.27250 3.161325
## 8  39.34000 3.181498
## 9  39.41500 3.164635
## 10 39.59750 3.877737
## 11 40.02000 3.338465
## 12 41.06000 6.757612
## 13 43.30000 5.027920
## 14 43.31000 5.027920
## 15 43.33000 4.713504
## 16 44.20000 7.608353
## 17 44.21833 7.608353
## 18 44.24250 7.669568
## 19 44.74667 6.126577
## 20 44.80667 6.164366
## 21 44.82667 6.331021
## 22 45.75500 7.374790
## 23 45.77000 8.230054
## 24 45.91333 9.299110
## 25 45.93000 5.247898
## 26 47.86000 9.748182
## 27 47.87000 9.305338
## 28 47.94000 8.902933
## 29 48.31333 4.544305
## 30 48.33750 4.598781
## 31 48.38917 4.000710

Task 2

Create a custom function to calculate the standard error of a vector. (Remember, standard error is simply the standard deviation divided by the square root of the number of observations!) The functions you will need sd() for standard deviation, sqrt() for square root, and length() for the number of observations in a vector.

Using your custom function, use aggregate() to calculate the standard error of chlorophyll values at each latitude between 36 and 50 degrees. Reuse the subsetted dataset you created in Task 1!

se <- function(vec) {
  sd.vec <- sd(vec)
  return(sd.vec / sqrt(length(vec)))
}

aggregate(chla ~ latitude, FUN = se, data = pisco.lat3650)
##    latitude      chla
## 1  36.44667 0.5448844
## 2  36.48000 0.5184882
## 3  36.51000 0.5674511
## 4  38.31000 1.2119065
## 5  38.32000 1.2119065
## 6  38.38333 1.4317953
## 7  39.27250 0.6499669
## 8  39.34000 0.6985256
## 9  39.41500 0.5435789
## 10 39.59750 0.5189440
## 11 40.02000 0.5793508
## 12 41.06000 1.0617648
## 13 43.30000 0.9175555
## 14 43.31000 0.9175555
## 15 43.33000 0.9703700
## 16 44.20000 1.1443165
## 17 44.21833 1.1443165
## 18 44.24250 1.1645447
## 19 44.74667 0.7905726
## 20 44.80667 0.7975044
## 21 44.82667 0.9449109
## 22 45.75500 0.6914398
## 23 45.77000 1.2442634
## 24 45.91333 1.5298506
## 25 45.93000        NA
## 26 47.86000 0.6818405
## 27 47.87000 0.6217712
## 28 47.94000 0.2950621
## 29 48.31333 0.5322648
## 30 48.33750 0.4671490
## 31 48.38917 0.5175477

Task 3

Create a 2-panel figure of the following: In the first panel, plot the mean chla values vs. latitude. In the second panel, plot the standard error of the chla values vs. latitude.

Fill in the code chunk below to create the plot:

pisco.plot <- aggregate(chla ~ latitude, FUN = mean, data = pisco)

plot(pisco.plot$latitude, pisco.plot$chla,
     main = "Average Chlorophyll vs Latitude",
     ylab = "Average Chlorophyll",
     xlab = "Latitude",
     col = "RED",
     pch = 15)

pisco.plot2 <- aggregate(chla ~ latitude, FUN = se, data = pisco)

plot(pisco.plot2$latitude, pisco.plot2$chla,
     main = "Standard Error of Chlorophyll vs Latitude",
     ylab = "Standard Error of Chlorophyll",
     xlab = "Latitude",
     col = "BLUE",
     cex = 1.5,
     pch = 20)

Knitting your document

If you knit this template, it will create a nice report that pops up as an html file. Fill in your answers to tasks 1, 2, and 3, and then knit your document. The document won’t knit if there are any errors!