If you’re reading this, you’ve sucessfully opened this R script in R!
To prepare for our first workshop, please read the text and run the code “chunks” that are surrounded by the three backticks (`).
This is an R Markdown script, a type of R script which uses a simple formatting syntax to create publishable, reproducible documents from your code. When you execute code within the document, the results appear beneath the code.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
You can add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
Below is the “setup” chunk, where may want to load packages into your library or read in data. For now we’ll just set an option that says we want to see the code and the outputs.
Try executing this chunk by clicking the Run button (green arrow) within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
The first thing we’ll do is install the packages we’ll be using often. In R Studio, you can download packages by using the “Install” button in the “Packages” tab, but you can also install them programatically (like below). *Note: you’ll need an internet connection to download packages initially, but not to run R/R Studio generally.
Installing the tidyverse package includes group of packages like dplyr and ggplot that will help us wrangle and plot our data later on.
Packages are automatically saved in a folder that is created when you download R. The path should look something like this: C:-library\4.0 *where the number 4.0 is your version number.
Run this chunk to install the tidyverse and datasets packages.
R comes with a “base” language that doesn’t require any packages. However, we’ll be using “tidyverse” commands in addition to “base” commands. To use functions from specific packages, we need to load that package into our library.
*Note: You only need to install a package once, but you need to load it into your library in every script.
Run this chunk to load the tidyverse and datasets packages into your library.
Let’s look at some different types of data from datasets stored in the package, appropriately named “datasets.” Because we’ve loaded “datasets” into our library, we can just call on these datasets even though you don’t see them in your environment.
Dataset: Edgar Anderson’s Iris Data Description: This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
We’ll use three functions to explore the data initially:
class() - What is the format or “class” of this dataset? For example, it can be a dataframe (or tibble), time series, or a matrix
str() - What is the “structure” of this dataset? This function gives us a list of the variable names and some additional information about each of them.
head() - Give me a preview of the dataset. This function shows us the first few (6) lines of the dataset.
Let’s check the class for the “iris” dataset from the datasets package.
class(datasets::iris)
## [1] "data.frame"
Data frame is probably the most common format for data in R, and most similar to what you’re familiar with from Excel. You can store numeric data, character data, and factor data in a single dataframe.
Now, lets check the structure of the iris dataset.In addition to telling us that this is a dataframe, the str() function tells us how many observations, how many variables, what type of variables, and a “preview” of some of the values.
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Another way to look at your data is by using the head() function.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Perhaps you’re interested in just one column from this dataset. You can select it by using the dollar sign notation. First, we’ll inquire about the class of that column, then we’ll look at a preview of it.
class(iris$Sepal.Length)
## [1] "numeric"
head(iris$Sepal.Length)
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
We can add columns to our dataframe using mutate and change the data types.
For example, you can change a numeric varaible to a character (Petal.Width)
Factors can be either numeric categories, such as Species.Numeric
There can also be logical varaibles such as Narrow.Petal, where the only options are “TRUE” and “FALSE”
iris %>%
mutate(Petal.Width = as.character(Petal.Width)) %>%
mutate(Species.Numeric = as.numeric(iris$Species)) %>%
mutate(Species.Numeric = as.factor(Species.Numeric)) %>%
mutate(Narrow.Petal = as.logical(ifelse(Petal.Width < 1, "TRUE","FALSE")))
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2 virginica
## 123 7.7 2.8 6.7 2 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
## Species.Numeric Narrow.Petal
## 1 1 TRUE
## 2 1 TRUE
## 3 1 TRUE
## 4 1 TRUE
## 5 1 TRUE
## 6 1 TRUE
## 7 1 TRUE
## 8 1 TRUE
## 9 1 TRUE
## 10 1 TRUE
## 11 1 TRUE
## 12 1 TRUE
## 13 1 TRUE
## 14 1 TRUE
## 15 1 TRUE
## 16 1 TRUE
## 17 1 TRUE
## 18 1 TRUE
## 19 1 TRUE
## 20 1 TRUE
## 21 1 TRUE
## 22 1 TRUE
## 23 1 TRUE
## 24 1 TRUE
## 25 1 TRUE
## 26 1 TRUE
## 27 1 TRUE
## 28 1 TRUE
## 29 1 TRUE
## 30 1 TRUE
## 31 1 TRUE
## 32 1 TRUE
## 33 1 TRUE
## 34 1 TRUE
## 35 1 TRUE
## 36 1 TRUE
## 37 1 TRUE
## 38 1 TRUE
## 39 1 TRUE
## 40 1 TRUE
## 41 1 TRUE
## 42 1 TRUE
## 43 1 TRUE
## 44 1 TRUE
## 45 1 TRUE
## 46 1 TRUE
## 47 1 TRUE
## 48 1 TRUE
## 49 1 TRUE
## 50 1 TRUE
## 51 2 FALSE
## 52 2 FALSE
## 53 2 FALSE
## 54 2 FALSE
## 55 2 FALSE
## 56 2 FALSE
## 57 2 FALSE
## 58 2 FALSE
## 59 2 FALSE
## 60 2 FALSE
## 61 2 FALSE
## 62 2 FALSE
## 63 2 FALSE
## 64 2 FALSE
## 65 2 FALSE
## 66 2 FALSE
## 67 2 FALSE
## 68 2 FALSE
## 69 2 FALSE
## 70 2 FALSE
## 71 2 FALSE
## 72 2 FALSE
## 73 2 FALSE
## 74 2 FALSE
## 75 2 FALSE
## 76 2 FALSE
## 77 2 FALSE
## 78 2 FALSE
## 79 2 FALSE
## 80 2 FALSE
## 81 2 FALSE
## 82 2 FALSE
## 83 2 FALSE
## 84 2 FALSE
## 85 2 FALSE
## 86 2 FALSE
## 87 2 FALSE
## 88 2 FALSE
## 89 2 FALSE
## 90 2 FALSE
## 91 2 FALSE
## 92 2 FALSE
## 93 2 FALSE
## 94 2 FALSE
## 95 2 FALSE
## 96 2 FALSE
## 97 2 FALSE
## 98 2 FALSE
## 99 2 FALSE
## 100 2 FALSE
## 101 3 FALSE
## 102 3 FALSE
## 103 3 FALSE
## 104 3 FALSE
## 105 3 FALSE
## 106 3 FALSE
## 107 3 FALSE
## 108 3 FALSE
## 109 3 FALSE
## 110 3 FALSE
## 111 3 FALSE
## 112 3 FALSE
## 113 3 FALSE
## 114 3 FALSE
## 115 3 FALSE
## 116 3 FALSE
## 117 3 FALSE
## 118 3 FALSE
## 119 3 FALSE
## 120 3 FALSE
## 121 3 FALSE
## 122 3 FALSE
## 123 3 FALSE
## 124 3 FALSE
## 125 3 FALSE
## 126 3 FALSE
## 127 3 FALSE
## 128 3 FALSE
## 129 3 FALSE
## 130 3 FALSE
## 131 3 FALSE
## 132 3 FALSE
## 133 3 FALSE
## 134 3 FALSE
## 135 3 FALSE
## 136 3 FALSE
## 137 3 FALSE
## 138 3 FALSE
## 139 3 FALSE
## 140 3 FALSE
## 141 3 FALSE
## 142 3 FALSE
## 143 3 FALSE
## 144 3 FALSE
## 145 3 FALSE
## 146 3 FALSE
## 147 3 FALSE
## 148 3 FALSE
## 149 3 FALSE
## 150 3 FALSE
Sometimes its also really helpful to visualize the distribution of the data, though of course this can only be done for numeric variables. You can make a histogram in baseplot, or in ggplot:
hist(iris$Sepal.Length)
ggplot(iris, aes(Sepal.Length)) + geom_histogram(bins=8)
I definitely don’t expect anyone to be able to replicate anything in the “Where we’re headed” sections, much less interpret the code at this point. I just want to give you an idea of what R can do and what you’ll hopefully be able to do by the end of this workshop.
So in looking at the iris data, perhaps you might want to know whether petal length and width are correlated. You can do that easily in baseplot.
plot(iris$Petal.Length, iris$Petal.Width)
Perhaps you want to know whether those relationships differ by species. This can also be done in baseplot, but is much faster/easier to do in ggplot. In our first week of workshops, we’ll talk mostly about data visualization and wrangling.
ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) + geom_point() + geom_smooth(method = "lm")
The next step could be modeling that relationship. In our second week of R Workshops, we’ll talk more about statistics and modeling.
petal.model<-glm(iris$Petal.Length ~ iris$Petal.Width + iris$Species)
summary(petal.model)
##
## Call:
## glm(formula = iris$Petal.Length ~ iris$Petal.Width + iris$Species)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.02977 -0.22241 -0.01514 0.18180 1.17449
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.21140 0.06524 18.568 < 2e-16 ***
## iris$Petal.Width 1.01871 0.15224 6.691 4.41e-10 ***
## iris$Speciesversicolor 1.69779 0.18095 9.383 < 2e-16 ***
## iris$Speciesvirginica 2.27669 0.28132 8.093 2.08e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1426948)
##
## Null deviance: 464.325 on 149 degrees of freedom
## Residual deviance: 20.833 on 146 degrees of freedom
## AIC: 139.57
##
## Number of Fisher Scoring iterations: 2
You might also see data in the form of a time series (class = “ts”). You may see some of your fisheries or climate data in time series formats.
Dataset: Mauna Loa Atmospheric CO2 Concentration Description: Atmospheric concentrations of CO2 are expressed in parts per million (ppm) and reported in the preliminary 1997 SIO manometric mole fraction scale. This time series goes from 1959 to 1998.
str(datasets::co2)
## Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
In the time series data format, the “year” or “date” is embedded, so when we look at the data using head(), it just looks like a vector.
head(co2)
## [1] 315.42 316.31 316.50 317.56 318.13 318.00
But if we plot it, R knows how to handle the x axis.
plot(co2)
We could convert the time series to a dataframe using the as.data.frame() function, but not we’ve lost the “time” component, which we can see if we use head() to look at the dataframe.
co2_df<-as.data.frame(co2)
head(co2_df)
## x
## 1 315.42
## 2 316.31
## 3 316.50
## 4 317.56
## 5 318.13
## 6 318.00
But say we wanted to combine this time series with another annual covariate we might have, like global temperature anomaly.
We’ll read in a dataset from GitHub using the read_csv() function from the readr package (tidyverse), then add the co2 data using mutate function from the dplyr package (tidyverse). I’ve already wrangled this data so that it is the same length/periods as the CO2 data, so we can just add the data using the dplyr::mutate() function.
co2_covs<-read_csv("https://raw.githubusercontent.com/LGCarlson/Intro-to-Tidyverse/master/monthly_anomaly.csv") %>%
mutate(co2_ppm = co2_df$x)
## Parsed with column specification:
## cols(
## year = col_double(),
## month = col_double(),
## time = col_double(),
## anomaly = col_double()
## )
head(co2_covs)
## # A tibble: 6 x 5
## year month time anomaly co2_ppm
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1959 1 1959 0.16 315.
## 2 1959 2 1959. 0.09 316.
## 3 1959 3 1959. 0.21 316.
## 4 1959 4 1959. 0.12 318.
## 5 1959 5 1959. 0.06 318.
## 6 1959 6 1959. 0.11 318
Now, we might want to see how the CO2 is related to the global anomaly.
ggplot(co2_covs, aes(x=co2_ppm,anomaly)) + geom_point() + geom_smooth(method = "lm")
Dataset: Topographic Information on Auckland’s Maunga Whau Volcano Description: Maunga Whau (Mt Eden) is one of about 50 volcanos in the Auckland volcanic field. This data set gives topographic information for Maunga Whau on a 10m by 10m grid.
You might also see data in a matrix, and for matrices, it is useful to check the dimensions using the dim() function. Some datasets you might see in matrix form are sea surface temperatue by area (lat/long) block. Here is an example dataset from the datasets package:
class(datasets::volcano)
## [1] "matrix"
dim(volcano)
## [1] 87 61
Now we know this data is a matrix with 87 rows and 61 columns Let’s look at how data stored as a matrix is different from data stored in a dataframe.
head(volcano)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 100 100 101 101 101 101 101 100 100 100 101 101 102 102
## [2,] 101 101 102 102 102 102 102 101 101 101 102 102 103 103
## [3,] 102 102 103 103 103 103 103 102 102 102 103 103 104 104
## [4,] 103 103 104 104 104 104 104 103 103 103 103 104 104 104
## [5,] 104 104 105 105 105 105 105 104 104 103 104 104 105 105
## [6,] 105 105 105 106 106 106 106 105 105 104 104 105 105 106
## [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
## [1,] 102 102 103 104 103 102 101 101 102 103 104 104
## [2,] 103 103 104 105 104 103 102 102 103 105 106 106
## [3,] 104 104 105 106 105 104 104 105 106 107 108 110
## [4,] 105 105 106 107 106 106 106 107 108 110 111 114
## [5,] 105 106 107 108 108 108 109 110 112 114 115 118
## [6,] 106 107 109 110 110 112 113 115 116 118 119 121
## [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
## [1,] 105 107 107 107 108 108 110 110 110 110 110 110
## [2,] 107 109 110 110 110 110 111 112 113 114 116 115
## [3,] 111 113 114 115 114 115 116 118 119 119 121 121
## [4,] 117 118 117 119 120 121 122 124 125 126 127 127
## [5,] 121 122 121 123 128 131 129 130 131 131 132 132
## [6,] 124 126 126 129 134 137 137 136 136 135 136 136
## [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
## [1,] 110 110 108 108 108 107 107 108 108 108 108 108
## [2,] 114 112 110 110 110 109 108 109 109 109 109 108
## [3,] 120 118 116 114 112 111 110 110 110 110 109 109
## [4,] 126 124 122 120 117 116 113 111 110 110 110 109
## [5,] 131 130 128 126 122 119 115 114 112 110 110 110
## [6,] 136 135 133 129 126 122 118 116 115 113 111 110
## [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61]
## [1,] 107 107 107 107 106 106 105 105 104 104 103
## [2,] 108 108 108 107 107 106 106 105 105 104 104
## [3,] 109 109 108 108 107 107 106 106 105 105 104
## [4,] 109 109 109 108 108 107 107 106 106 105 105
## [5,] 110 110 109 109 108 107 107 107 106 106 105
## [6,] 110 110 110 109 108 108 108 107 107 106 106
Rather than the dollar sign notation to call upon column names in a dataframe, we’ll use brackets to select a specific part of the matrix.
Here’s the first column of the data:
volcano[,1]
## [1] 100 101 102 103 104 105 105 106 107 108 109 110 110 111 114 116 118 120 120
## [20] 121 122 122 123 124 123 123 120 118 117 115 114 115 113 111 110 109 108 108
## [39] 107 107 107 108 109 110 111 111 112 113 113 114 115 115 114 113 112 111 111
## [58] 112 112 112 113 114 114 115 115 116 116 117 117 116 114 112 109 106 104 102
## [77] 101 100 100 99 99 99 99 98 98 97 97
Here’s the first row:
volcano[1,]
## [1] 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102 103 104 103
## [20] 102 101 101 102 103 104 104 105 107 107 107 108 108 110 110 110 110 110 110
## [39] 110 110 108 108 108 107 107 108 108 108 108 108 107 107 107 107 106 106 105
## [58] 105 104 104 103
And here’s an individual cell (second row, third column)
volcano[2,3]
## [1] 102
We can visualize the data using functionality from the plot.matrix package. Now we’re seeing the volcano’s topography as a heatmap.
plot(volcano)
Matrices can be handy, because you can apply a variety of simple arithmetic directly to them. Let’s create some matrices to test this with:
mat_A <- matrix(c(2,3,-2,1,2,2),3,2)
mat_B <- matrix(c(1,4,-2,1,2,1),3,2)
mat_A;mat_B
## [,1] [,2]
## [1,] 2 1
## [2,] 3 2
## [3,] -2 2
## [,1] [,2]
## [1,] 1 1
## [2,] 4 2
## [3,] -2 1
Multiply a matrix by a scalar.
mat_A*3
## [,1] [,2]
## [1,] 6 3
## [2,] 9 6
## [3,] -6 6
Note: If you do arithmetic (or any alteration) to a matrix (or other data class), but don’t assign it to a new object, the alteration won’t be saved. You can test this by running “mat_A” again in the console to check its values. If you want to save a modified matrix as a new object in our environment, you just need to assign it as an object. Here, we’ll take the log of the absolute value of mat_A10 and save that as mat_C. As you can see, matrices can have doubles (demical values) as well as integers.
mat_C<-log(abs(mat_A*10))
mat_C
## [,1] [,2]
## [1,] 2.995732 2.302585
## [2,] 3.401197 2.995732
## [3,] 2.995732 2.995732
You can also do arithmetic on (such as add together) matrices of the same size.
mat_C + mat_B
## [,1] [,2]
## [1,] 3.9957323 3.302585
## [2,] 7.4011974 4.995732
## [3,] 0.9957323 3.995732
We’ll continue with data wrangling, visualization, and exploration in our first R workshop!