my_function <- function(x) {
return(x * 2)
}
my_function(5) # Returns 10Week 1: Getting Started with R
This week, we’ll lay the foundation for your journey with R by setting up the software, exploring the interface, and running basic commands.
For a very detailed Introduction to R go here: https://cran.r-project.org/doc/manuals/r-release/R-intro.html
Objectives
Understand what R is and its significance in biology.
Install R and RStudio on your computer.
Navigate the RStudio interface.
Write and execute basic R commands and scripts.
Perform simple biological calculations using R.
1. Introduction to R
What is R?
R is a programming language and software environment for statistical computing and graphics.
It is widely used in biology for data analysis, statistical modelling, and data visualisation.
Benefits of R in biology
Free and open-source.
Extensive packages for various biological analyses (genomics, ecology, epidemiology).
Supports reproducible research through scripting and automation.
Why Use R?
Handle large datasets efficiently.
Perform complex statistical analyses.
Create high-quality graphs and figures.
Foster collaboration by sharing code and methodologies.
How R works
R works by applying functions to objects.
objects are pieces of information or data.
- There are many types of objects, e.g. vectors, matrices, arrays, lists and data.frames. We’ll learn more about them through the course.
variables are names we give to objects
functions do something to the data.
R is very good at producing plots of data. We’ll learn about this as we progress.
There are thousands of packages for R which contain functions for particular jobs.
- Install packages using the
install.packages()function.
- Install packages using the
The way to learn R is to think about what data (variables) you have, and what you want to do to it (functions).
Data Types in R
R has many fundamental data types:
Numeric: Numbers with decimals (e.g.,
3.14).Integer: Whole numbers (e.g.,
42L, whereLforces an integer type).Logical: Boolean values
TRUEandFALSE.Character: Text data (e.g.,
"Hello World").Factor: Categorical data with levels.
Functions in R
Functions are reusable blocks of code. You can define a function using:
Packages in R
Packages extend R’s functionality. To install and load a package:
install.packages("ggplot2") # Install a package
library(ggplot2) # Load the packageBase R vs Tidyverse
Base R uses built-in functions like
apply,lapply,tapplyfor data manipulation.Tidyverse provides a consistent, easier-to-read approach (e.g.,
dplyrfor data manipulation).
Example of filtering data in Base R vs Tidyverse:
# Base R
subset(mtcars, mpg > 20)
# Tidyverse
library(dplyr)
mtcars %>% filter(mpg > 20)Base R Graphics vs ggplot2
Base R graphics: Uses functions like
plot()andhist().ggplot2: More flexible and layered approach. Plots are built up by setting aesthetics (what variables to plot) and then how to plot them.
Example:
# Base R
plot(mtcars$mpg, mtcars$hp, main="MPG vs HP")
# ggplot2
ggplot(mtcars, aes(x=mpg, y=hp)) + geom_point()# BASIC R FUNCTIONALITY
# The set of numbers 11 to 20 is an object
11:20 # the : function generates a sequence
# Can also be generated from individual numbers
c(11,12,13,14,15,16,17,18,19,20)
# Ask what the function c does
help("c")
?c # You can also ask for help by typing ?
# Assign a variable name to the object using <-
x <- 11:20
# Ask what class of variable x is
class(x)
# Get basic information about x using the summary() function
summary(x)
# Ask how long x is
length(x)
# Extract the 5th value of x using square brackets
x[5]
# Extract the 5th and 9th values
x[c(5,9)]
# Manipulate x using a function
mean(x) # Calculate the mean
# Write your own function
mymean <- function(x){
S <- sum(x) # add all values in x together
L <- length(x) # calculate the length of x
M <- S/L # divide the sum by the length
return(M) # give the answer
}
# Apply the function to the variable
mymean(x)
# Generate a plot using the plot() function
plot(x) # plots x against the index
# Install a new package (you may get additional prompts about this)
install.packages("MASS", repos = "https://www.stats.bris.ac.uk/R/")
# Load the package so you can use its functions
library(MASS)2. Setting Up R and RStudio
Installation Guide
- Install R:
Download from the Comprehensive R Archive Network (CRAN) https://www.cran.r-project.org
Choose the version suitable for your operating system (Windows, macOS, Linux).
If you previously installed R, check that you have the newest version by typing:
# Check your R version
R.Version()- Install RStudio:
- Download the free RStudio Desktop from the RStudio website. https://posit.co/products/open-source/rstudio/
Launching RStudio
Open RStudio after installation.
RStudio provides a user-friendly interface for working with R.
Google Colab
Colab (“Colaboratory”) is a Google facility for online programming in Python and R.
https://colab.research.google.com/
You’ll need a Google account to use Colab.
Colab allows you to create documents with both text and code sections.
Click on File then select New notebook in Drive.
Colab uses Python by default, but you can switch to R by clicking on Connect (top right of screen) > Change runtime type > Runtime type and selecting R.
You can then start writing your code and run it by clicking the arrow next to each code section, or by clicking on Runtime and selecting the option you need.
If you’d prefer to use Colab rather than RStudio, please ask the lecturer and they will get you started.
4. Writing Your First R Commands
Basic Arithmetic Operations
Addition:
2 + 3➔5Subtraction:
7 - 2➔5Multiplication:
4 * 3➔12Division:
10 / 2➔5Exponentiation:
2^3➔8Modulus (Remainder):
7 %% 3➔1
Example: Simple Calculations
Here is some R code. Comments within the code can be written using the hash # . Any thing after a hash is ignored:
# Try these simple calculations
2+3
7-2
4*3
10/2
2^3
7%%35. Working with Variables
Variables are names we give (assign) to particular objects.
Assignment Operators:
<-(preferred in R)=(can be used, but not recommended)
Creating Variables
# Calculate the area of a circle with radius 5 units
radius <- 5
area <- pi * radius^2
area # Output: 78.53982, or
print(area)Variable Naming Rules
Names can include letters, numbers, periods, and underscores.
Must start with a letter or a period (not followed by a number).
Case-sensitive (
Heightandheightare different).Some names are protected, e.g.
TRUE,FALSE,T,F
6. Writing and Saving Scripts
Creating a New Script
Click on File > New File > R Script.
A new script editor window appears in the Source Editor panel.
Writing Code in Scripts
Type your R code in the script editor.
Save your script with a meaningful name, e.g.,
week1_intro.R.
Running Code from Scripts
Run Line: Place the cursor on a line and click Run or press
Ctrl + Enter(Cmd + Enteron macOS).Run Selection: Highlight code and click Run.
Run Entire Script: Click Source or use
Ctrl + Shift + S.
Saving and Loading
R automatically looks for data and scripts in your home directory.
You can set a different working directory using
setwd("/some/directory")(the form of the directory address will depend on whether you’re using Mac, Windows or Linux).You can check which directory R is looking at by running
getwd()
7. Simple Biological Calculations in R
Example 1: Exponential Growth
Calculate the number of bacteria after a certain time, given an initial count and a growth rate.
\[ x_t = x_0.e^{kt} \]
where \(x_t\) is the population size at time \(t\), \(x_0\) is the starting population size, \(e\) is Euler’s number (approximately 2.71828), and \(k\) is the growth constant.
# Parameters
initial_count <- 1000 # Initial number of bacteria
growth_rate <- 0.25 # Growth constant
time_hours <- 8 # Total time in hours
# Calculation
final_count <- initial_count * exp(growth_rate * time_hours)
final_count # View resultExample 2: Molar Concentrity Calculation
Calculate the molarity of a solution.
# Parameters
mass <- 5 # grams of solute
molecular_weight <- 58.44 # g/mol (e.g., NaCl)
volume <- 0.5 # litres
# Calculations
moles <- mass / molecular_weight
molarity <- moles / volume
molarity # Output: 0.1711 mol/L8. Commenting and Best Practices
Comments start with # and are not executed.
# This is a comment
Why Comment?
Explain the purpose of code blocks.
Make code easier to understand and maintain.
Best Practices:
Use clear and descriptive variable names.
Write code in small, testable chunks.
Regularly save your work and back it up online.
9. Errors and Debugging
All computing languages have strict rules. If your code can’t be interpreted by the computer, you’ll likely get an error or strange behaviour.
Try the following:
# Some incorrect code generates an error
print(V1)
# Some incorrect code causes unwanted behavior
print("V1)Be very careful and meticulous when writing code, checking lines as you go and evaluating outputs to ensure they make sense.
10. Getting Help
There are many ways to get help with R coding!
Every R function has a help file. Try the following:
Type
help(sum)into the console.Type
?suminto the console.Type
help.search("sum")into the console.
In each case, help files will appear in the bottom right-hand panel.
Read the online documentation at https://cran.r-project.org/
Ask an AI (e.g. ChatGPT).
Ask your lecturer!
Search online (e.g. Google) for specific queries (best if you have exhausted other methods of getting help).
Practical Activities
Activity 1: Install R and RStudio
Follow the installation guide provided.
Verify the installation by opening RStudio.
Activity 2: Explore the RStudio Interface
Identify each panel and its purpose.
Adjust the layout if desired via View > Panes > Pane Layout.
Activity 3: Run Basic Commands
Try some of the code in the examples above.
Experiment with code, changing some of the values for example.
Activity 4: Write and Save a Script
Create a new script file.
Use commenting (
#) to include a title.Include the calculations from Activity 3.
Save the script as
biological_calculations.R. Choose a convenient online location for saving your script, so that you can access it easily when needed.
Activity 5: Take the Introductory Quiz
- Try to answer the quiz questions below.
Quiz
Test your understanding of today’s material. Answer the following questions.
- Which of the following is the preferred way to assign the value
42to a variable namedxin R?
42 -> xx <- 42x = 42Any of these.
- What will be the output of the following code?
a <- 10
b <- 3
result <- a ^ b
result
10000.333330
- In RStudio, where do you type commands to be executed immediately without saving them?
Source Editor
Console
Environment Pane
Plot Pane
- Which symbol is used to comment out a line of code in R?
//#/* ... */--
- Write R code to calculate the kinetic energy (KE) of an object with mass
m = 12 kgand velocityv = 3 m/s. Use the formula \(E = \frac{1}{2}mv^2\).
- True or False: In R, variable names are case-sensitive, i.e.
v1 <- 3is not the same asV1 <- 3.
- What function would you use to display a summmary of an object, such as a variable or dataset, in R?
str()summary()print()structure()
- What does the function
rm()do in R?
Generates a random number.
Removes objects from the R environment.
Reads the memory.
Calculates the ratio of two numbers.
- Which of the following is NOT a valid variable name in R?
2ndVarfirst_var.tempdata3
- Write R code to generate a sequence (vector) of 100 random numbers drawn from a Normal distribution with mean 0 and standard deviation 1, then plot a histogram of the values in the vector.
Resources for Further Learning
R for Data Science by Hadley Wickham & Garrett Grolemund: https://r4ds.had.co.nz/
Swirl Package: Learn R programming interactively at your own pace.
- Install and run Swirl:
# Install and run interactive tutorials
install.packages("swirl", repos = "https://www.stats.bris.ac.uk/R/")
library(swirl)
swirl()