Getting started in R

Here is some code to assist you in completing Assignment 2, “Getting started in R.”

Note: it’s probably better practice to type out the code rather than copy/paste. The hand-brain connection/muscle memory will make it easier for you to recall the various commands (perhaps unless you have a highly visual memory).

Let’s get started!

What version of R are you running?

R.Version ()
## $platform
## [1] "x86_64-apple-darwin17.0"
## 
## $arch
## [1] "x86_64"
## 
## $os
## [1] "darwin17.0"
## 
## $system
## [1] "x86_64, darwin17.0"
## 
## $status
## [1] ""
## 
## $major
## [1] "4"
## 
## $minor
## [1] "2.2"
## 
## $year
## [1] "2022"
## 
## $month
## [1] "10"
## 
## $day
## [1] "31"
## 
## $`svn rev`
## [1] "83211"
## 
## $language
## [1] "R"
## 
## $version.string
## [1] "R version 4.2.2 (2022-10-31)"
## 
## $nickname
## [1] "Innocent and Trusting"

The most recent version is 4.2.2, “Innocent and Trusting”; it would be good to upgrade if necessary so we’re all using the same version in this course. THE EXCEPTION TO THIS would be if you are already using packages for your research that might be incompatible with the most recent version of R. If you are worried about this, let’s look into this before you make changes.

What version of R Studio are you running?

Note: one can’t include the code specific only to R Studio (rather than R) directly in an R Notebook file (for more information; see here). To check R Studio version, you can point/click on RStudio menu/About RStudio, or you can type directly into your R console “RStudio.Version()”

If it’s earlier than 2022.12.0.353, “Elspeth Geranium”, it would be good to upgrade R Studio as well.

What is the file path to the current working directory?

“get working directory” command tells you where R is right now

getwd()
## [1] "/Users/mvm16/Dropbox/Teaching/fishdataviz/Assignments/assignment2"

If you want to change it, you can use Rstudio menus (point/click) Session –> Set Working Directory, or use code below (you’ll have to customize for your own file pathway)

# setwd ("Users/megan/Dropbox/Teaching/fishdataviz/assignment2/")

Bring in your data

Import the data you collected for Assignment 1. Note that you will have to customize this code for your data file name.

mus <- read.csv ("mcphee_mussel_data.csv", header = TRUE)

If you’re getting error messages, there is probably something wrong with your data format in the .csv file. Refer to the bottom of the Assignment 1 description for help with data formatting. If you’re stuck ask for help in class or on the Slack #a_helpme channel.

Now, check out the structure of your dataframe. Go to the “Environment” window of Rstudio console and click on the blue tab next to [your_dataframe]; in my case, ‘mus’ This shows you the size of your dataframe (in my case, 40 observations and 7 variables = 40 rows and 7 columns). Take a screenshot of this window showing the identity of your variables (their names) and their attributes (i.e., are they integers, factors (and number of levels), characters, numbers?).

Copy the screenshot into your assignment doc. It should look something like this

Now, make some plots!

Here, we’ll use simple plotting commands in base R. We’ll move on to ggplot soon!

Let’s look at the distribution of your first quantitative (continuous variable); mine is ‘Length’. More precisely, my variable is mus$Length, specified by the dataframe (mus) and column (Length)

Boxplots (aka ‘box and whisker’ plots) are nice way to summarize the distribution of your data. Here, we tell R to make a boxplot of the Length variable by specifying dataframe_name$column_name

boxplot (mus$Length)

This plot shows the median value (dark horizontal line) - NOT mean - and the box contains the 25th to 75th percentiles of the data (that is, the middle 50% of your data). The whiskers mark 1.5 x interquartile range (~ 2 standard deviations). Any point outside that range shows up as a single point, an ‘outlier’.

What’s missing from this plot? Axis labels! We have no idea what your boxplot is showing until you tell us. Here, we use ‘ylab’ to assign a label to the Y axis (spaces are allowed, because the label is within quotation marks).

boxplot (mus$Length, ylab = "Mussel length (mm)")

Repeat for your second quantitative variable:

boxplot (mus$Width, ylab = "Mussel width (mm)")

Is there any relationship between your two quantitative variables? XY scatterplots are good for that:

plot (x = mus$Length, y = mus$Width)

While you’re getting started, writing out “x=” and “y=” is not a bad idea, to help you remember. But you can produce the same graph with less typing:

plot (mus$Length, mus$Width)

Let’s tidy up the axis labels:

plot (mus$Length, mus$Width, xlab = "Mussel length (mm)", ylab = "Mussel width (mm)")

If you’re getting error messages, pay close attention to the location of your quotation marks and parentheses. Computer programs are militantly logical - they can’t “see what you mean” unless the code is correct and unambiguous.

That’s all for now! Use the code shown here to guide you in producing your own figures for Assignment 2. Submit via Canvas (do not email to me). You can copy/paste your figures into a Word (or similar) doc, or if you want to use R Markdown/ R Notebook go for it.