Topics


Introductions

  • Nathan Byers and Kali Frost
  • Bios

What is R

R is many things…

  • R is a free, open-source language and environment for statistical analysis
  • It’s been very popular in academia for more than a decade

  • A statisitcal software that can do simple or complex analyses, similar to SAS or S plus, and which makes really great looking graphs

Really cool graph made in R

  • R is not just a stats software, it’s a programming language too
    • A language allows the user to do a limitless number of tasks
  • R is open source so users can make it available to others
    • This sharing of information is centered around the concept of open source development
    • Open source means that everything is freely available (kind of a big deal if you are working under state budget restrictions)
  • Becoming widely used in academia and government, and making headway into industry, especially the biotech and finance sectors. more popular ***

R vs. Excel

Advantages of Excel
  • Easy to use
  • Familiar
  • Easy to sort and scroll through data
Disadvantages of Excel
  • Easy to make a mistake
  • Limits on data size, not a huge deal anymore. But with large datasets, can really bog down your computer
  • Difficult to perform a series of steps
  • Without very very good documentation, difficult to describe steps of an analysis
Disadvantages of R
  • Difficult to use at first
  • Figuring out errors can be difficult
  • Relies on the user finding answers via google searches, no structured support
  • Often times your colleague may not be familar with it, so sharing data analyses with them can be difficult
Advantages of R
  • Because you can see all of your steps, much harder to make an error
  • If you know the right tricks, almost unlimited size of data
  • Easily reproducible and repeatable
  • Active user community means lots of internet sources and rapid releases of new technology
When to Use Excel:
  • Doing a one-time analysis, small dataset, basic graphics
When to Use R:
  • Doing repeated analyses, lots of variables, advanced graphics
  • Most likely will begin as a hybrid user and migrate over time

R and RStudio

  • This section covers the two pieces of software you need to download
  • R is the core piece
  • RStudio is a nice integrated development environment (IDE) that makes it much easier to use R

R

  • To download R for Windows, see this page
  • If you open R itself, it will look very plain plain R console

RStudio

  • RStudio makes R a little more user friendly
  • It’s free and can be downloaded at rstudio.com
  • It’s not necessary to open RStudio to use R, but in these slides we will assume that RStudio is your interface to R

When you first open RStudio, this is what you see first opening RStudio


  • The left panel is the console for R
  • Type 1 + 1 then hit “Enter” and R will return the answer RStudio 1 + 1

  • It’s a good idea to use a script so you can save your code
  • Open a new script by selecting “File” -> “New File” -> “R Script” and it will appear in the top left panel of RStudio RStudio open script

  • This is basically a text document that can be saved (go to “File” -> “Save As”)
  • You can type and run more than one line at a time by highlighting and clicking the “Run” button on the script tool bar RStudio many lines

  • The bottom right panel can be used to find and open files, view plots, load packages, and look at help pages
  • The top right panel gives you information about what variables you’re working with during your R session
  • We’ll explain more about what to look for in those panels later ***

R basics

Doing math
  • Open up a script if you haven’t already (“File” -> “New File” -> “R Script”)
  • Try some math by either typing the lines below or copying and pasting the lines into your script
10 + 5
10 - 5
10 * 5
10 / 5
10 ^ 5
  • Remember, to run the lines, highlight your code and click the “Run” button on the toolbar of the script panel
Creating objects
  • An object is used to store information in R
  • To create an object or variable in R we use an arrow symbol pointing left <-
  • On the right we’ve created the variables x and y by assigning some numbers to them
x <- 10
y <- 5
x + y
## [1] 15
(Above, the top panel is what you run in your script, the bottom panel is the output)

In RStudio, you will see the variables we created in the top right panel variables

  • If you’ve already created a variable, you can replace the value with another value
x
## [1] 10
x <- 20
x
## [1] 20

Creating a variable

In the top right panel you can see that the number stored in the variable x has changed variables2