1 The workshop Obectives

Welcome to the Mastering R for Research Workshop. This comprehensive workshop is designed to help you unlock your research potential by mastering the R software. In this workshop, you will learn how to:

2 What is R?

R is a language and environment for statistical computing and data visualization http://www.r-project.org/about.html. R is a free, open-source programming language, meaning anyone can use, modify, and distribute it. There are multiple sub-packages that may help read input, implement functions, visualize output and transform results for further use.

3 Chapter 1: How to Install R

R can be installed on various operating systems. This chapter provides instructions for installing R on Windows, macOS, and Linux. It also includes brief guidance on installing RStudio.

3.1 For Windows, macOS

3.1.1 Visit CRAN:

Go to the Comprehensive R Archive Network (CRAN) website: https://cran.r-project.org/

3.1.2 Download R for Windows:

  • Click on “Download R for Windows” (make sure it’s “.exe” file).
  • Click on “Download R for macOS” (make sure it’s “.pkg” file)

3.1.3 Run the Installer:

Double-click the downloaded file and follow the installation instructions. You can usually accept the default settings.

3.1.4 Verify Installation:

Once installed, open the R GUI (or RStudio if you install that too) and type:

version

3.2 Installing R on Linux

For Ubuntu/Debian-based systems, follow these steps in your terminal:

# Update your package list and install prerequisites
sudo apt update
sudo apt install --no-install-recommends software-properties-common dirmngr

# Add the CRAN repository (replace 'focal' with your Ubuntu release if necessary)
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'

# Update package lists again and install R
sudo apt update
sudo apt install r-base

# Verify the installation by checking the version
R --version

4 Introduction to RStudio

RStudio is an integrated development environment (IDE) for R. It provides a user-friendly interface that makes writing and running R scripts easier.

4.1 Key Features of RStudio

RStudio has several key components that enhance the user experience:

  • Source Pane: This is where you write and edit R scripts.
  • Console: The console allows you to execute R commands interactively.
  • Environment/History Pane: Displays variables and functions created during the session.
  • Plots/Files/Packages/Help Pane: Allows access to plots, file management, installed packages, and help documentation.

4.2 Installing RStudio

To install RStudio: Download and install RStudio from RStudio’s website.

4.3 Basic Usage

4.3.1 Running Code

You can run R code in RStudio using the following methods:

  • Type code in the Console and press Enter.
  • Write code in the Source Pane and run it by pressing Ctrl + Enter (Windows) or Cmd + Enter (Mac).

4.3.2 Example: Running a Simple Calculation

# Adding two numbers
2 + 3

4.3.3 RStudeio workspace

  • Use ls() to list all objects in your environment.
  • Use rm(object_name) to remove an object.
  • Use rm(list = ls()) to clear the entire environment.
# Listing objects in the environment
ls()

4.4 Customizing RStudio

You can customize RStudio to suit your preferences:

  • Change the theme: Go to Tools > Global Options > Appearance.
  • Modify code editing settings: Tools > Global Options > Code.
  • Set a default working directory: Tools > Global Options > General.

4.5 Summary

RStudio is a powerful tool for writing and executing R code efficiently. Understanding its features and functionalities will help you work effectively with R for research and data analysis.

5 R Syntax

Understanding the basic syntax of R is essential for writing effective scripts. R is case-sensitive and follows a simple, readable syntax.

5.1 Assigning Values

In R, values can be assigned to variables using <- or =.

x <- 10  # Assigning 10 to x
y = 20    # Assigning 20 to y
x + y     # Summing x and y
## [1] 30

5.2 Data Types in R

R has several basic data types:

  • Numeric: Decimal values (e.g., 10.5, 2.3)
  • Integer: Whole numbers (e.g., 1L, 5L)
  • Character: Text strings (e.g., "Hello")
  • Logical: Boolean values (TRUE, FALSE)
a <- 5       # Numeric
b <- 2L      # Integer
c <- "R"     # Character
d <- TRUE    # Logical

5.3 Conditional Statements

R supports conditional statements like if, else, and ifelse.

x <- 15
if (x > 10) {
  print("x is greater than 10")
} else {
  print("x is 10 or less")
}
## [1] "x is greater than 10"

5.4 Loops in R

Loops help execute repetitive tasks efficiently.

5.4.1 For Loop

for (i in 1:5) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

5.4.2 While Loop

count <- 1
while (count <= 5) {
  print(count)
  count <- count + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

5.5 Functions in R

Functions allow modular programming and reuse of code.

add_numbers <- function(a, b) {
  return(a + b)
}

add_numbers(5, 3)
## [1] 8

5.6 Interactive Exercise

Try the following exercises to practice R syntax:

5.6.1 Exercise 1: Assign and Print Variables

5.6.1.1 Question

Modify the code below to multiply a and b instead of adding them.

# Assign values to variables

# Print the sum of a and b
Answer
# Assign values to variables
a <- 7
b <- 3

# Multiply a and b
print(a * b)
## [1] 21

5.6.2 Exercise 2: Conditional Statements

5.6.2.1 Question

Write an if-else statement to check if x is greater than 20.

# Define a variable x


# Write an if-else statement to check if x is greater than 20
Answer
# Define a variable x
x <- 25

# Write an if-else statement to check if x is greater than 20
if (x > 20) {
  print("x is greater than 20")
} else {
  print("x is 20 or less")
}
## [1] "x is greater than 20"

5.6.3 Exercise 3: Loops

5.6.3.1 Question

Write a for loop to print numbers from 1 to 10.

# Write a for loop to print numbers from 1 to 10

Modify the loop to print only even numbers between 1 and 10.

Answer
# Print even numbers from 1 to 10
for (i in seq(1, 10, by=1)) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

5.7 Summary

Mastering R syntax is the foundation for effective programming in R. Understanding variable assignment, data types, conditional statements, loops, and functions is key to becoming proficient in R programming.

6 Data Structures in R

R provides several fundamental data structures for handling data. Understanding these structures is essential for efficient data manipulation.

6.1 Vectors

Vectors are the simplest data structure in R and can contain elements of the same type.

numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)

6.2 Matrices

Matrices are two-dimensional arrays that contain elements of the same type.

matrix_example <- matrix(1:9, nrow=3, ncol=3)
matrix_example
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

6.3 Lists

Lists can hold elements of different types, including vectors, matrices, and even other lists.

list_example <- list(name = "John", age = 25, scores = c(90, 85, 88))
list_example
## $name
## [1] "John"
## 
## $age
## [1] 25
## 
## $scores
## [1] 90 85 88

6.4 Data Frames

Data frames are table-like structures where each column can contain different types of data.

df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30), Score = c(90, 85))
df
##    Name Age Score
## 1 Alice  25    90
## 2   Bob  30    85

6.5 Interactive Exercise

6.5.1 Exercise 1: Create a Vector

6.5.1.1 Question

Create a numeric vector containing the numbers 10, 20, 30, 40, and 50.

Answer
numeric_vector <- c(10, 20, 30, 40, 50)
numeric_vector

6.5.2 Exercise 2: Create a Data Frame

6.5.2.1 Question

Create a data frame with two columns: “Student” (names: “Alice”, “Bob”, “Charlie”) and “Score” (values: 85, 90, 78).

Answer
df <- data.frame(Student = c("Alice", "Bob", "Charlie"), Score = c(85, 90, 78))
df

6.6 Summary

Understanding R’s data structures is crucial for effective data analysis. Vectors, matrices, lists, and data frames each serve different purposes in handling and organizing data.

7 Data Structures in R - 2

R provides several fundamental data structures for handling data. Understanding these structures is essential for efficient data manipulation.

7.1 Data Types and File Formats in R

R supports multiple data file formats. Below are some common formats and how to handle them.

7.1.1 CSV Files

CSV (Comma-Separated Values) files are one of the most commonly used data formats.

Reading CSV Files: Built in read.csv

readr package

library(readr)
data <- read_csv("/data/bakeoff.csv")
head(data)

Writing CSV Files:

write_csv(data, "/results/output.csv")

7.1.2 Excel Files (.xlsx)

To handle Excel files, use the readxl package for reading and the writexl package for writing.

Reading Excel Files:

library(readxl)
data <- read_excel("/latitude.xlsx", sheet = 1)
head(data)

Writing Excel Files:

library(writexl)
write_xlsx(data, "/results/output.xlsx")

7.1.3 RDS Files

RDS files store R objects efficiently.

Loading an R Object:

data <- readRDS("/data/inventory_parts.rds")

Saving an R Object:

saveRDS(data, "/results/output.rds")

7.1.4 RData Files

To handle RData data, you don’t need any package. It is R’s internal data structure and preserves the type of data saved.

Reading RData Files:

data <- load("/data/wine.RData")

Writing RData Files:

save(data, "/results/output.RData")

7.2 Summary

R provides multiple ways to handle data formats, including CSV, Excel, JSON, RDS, and databases. Using the appropriate package ensures efficient data handling and manipulation.

8 Data Manipulation in R

Data manipulation is a crucial part of data analysis. R provides various packages to facilitate efficient data handling, including:

8.1 dplyr: Data Manipulation

The dplyr package provides functions for filtering, selecting, mutating, and summarizing data.

8.1.1 Filtering Data

library(dplyr)
data <- mtcars
filtered_data <- data %>% filter(mpg > 20)
head(filtered_data)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2

8.1.2 Selecting Columns

selected_data <- data %>% select(mpg, hp, wt)
head(selected_data)
##                    mpg  hp    wt
## Mazda RX4         21.0 110 2.620
## Mazda RX4 Wag     21.0 110 2.875
## Datsun 710        22.8  93 2.320
## Hornet 4 Drive    21.4 110 3.215
## Hornet Sportabout 18.7 175 3.440
## Valiant           18.1 105 3.460

8.1.3 Creating New Columns (Mutate)

data <- data %>% mutate(power_to_weight = hp / wt,
                        abc= mpg/wt)
head(data)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##                   power_to_weight      abc
## Mazda RX4                41.98473 8.015267
## Mazda RX4 Wag            38.26087 7.304348
## Datsun 710               40.08621 9.827586
## Hornet 4 Drive           34.21462 6.656299
## Hornet Sportabout        50.87209 5.436047
## Valiant                  30.34682 5.231214

8.1.4 Summarizing Data

data_summary <- data %>% summarize(avg_mpg = mean(mpg))
data_summary
##    avg_mpg
## 1 20.09062

8.1.5 Tidyr: Data Tidying

library(tidyr)
data_long <- data %>% gather(key = "attribute", value = "value", mpg:hp)
head(data_long)
##   drat    wt  qsec vs am gear carb power_to_weight      abc attribute value
## 1 3.90 2.620 16.46  0  1    4    4        41.98473 8.015267       mpg  21.0
## 2 3.90 2.875 17.02  0  1    4    4        38.26087 7.304348       mpg  21.0
## 3 3.85 2.320 18.61  1  1    4    1        40.08621 9.827586       mpg  22.8
## 4 3.08 3.215 19.44  1  0    3    1        34.21462 6.656299       mpg  21.4
## 5 3.15 3.440 17.02  0  0    3    2        50.87209 5.436047       mpg  18.7
## 6 2.76 3.460 20.22  1  0    3    1        30.34682 5.231214       mpg  18.1
data_separated <- data_long %>% separate(col = attribute, into = c("Type", "Detail"), sep = "_")
head(data_separated)
##   drat    wt  qsec vs am gear carb power_to_weight      abc Type Detail value
## 1 3.90 2.620 16.46  0  1    4    4        41.98473 8.015267  mpg   <NA>  21.0
## 2 3.90 2.875 17.02  0  1    4    4        38.26087 7.304348  mpg   <NA>  21.0
## 3 3.85 2.320 18.61  1  1    4    1        40.08621 9.827586  mpg   <NA>  22.8
## 4 3.08 3.215 19.44  1  0    3    1        34.21462 6.656299  mpg   <NA>  21.4
## 5 3.15 3.440 17.02  0  0    3    2        50.87209 5.436047  mpg   <NA>  18.7
## 6 2.76 3.460 20.22  1  0    3    1        30.34682 5.231214  mpg   <NA>  18.1

8.1.6 data.table: Fast Data Processing

library(data.table)
dt <- as.data.table(mtcars)
dt[, .(avg_mpg = mean(mpg)), by = cyl]
##      cyl  avg_mpg
##    <num>    <num>
## 1:     6 19.74286
## 2:     4 26.66364
## 3:     8 15.10000

8.2 Summary

Data manipulation is a key aspect of data analysis in R. The dplyr, tidyr, and data.table packages provide powerful tools for transforming and processing data efficiently.

9 Interactive Exercise: Data Manipulation

Question: Select only the mpg, cyl, and hp columns from the mtcars dataset.

# Write your code here

Answer:

selected_data <- mtcars %>% select(mpg, cyl, hp)
head(selected_data)
##                    mpg cyl  hp
## Mazda RX4         21.0   6 110
## Mazda RX4 Wag     21.0   6 110
## Datsun 710        22.8   4  93
## Hornet 4 Drive    21.4   6 110
## Hornet Sportabout 18.7   8 175
## Valiant           18.1   6 105

10 Data Visualization

10.1 Scatter plot

x=rnorm(100)
y=rnorm(100)
plot(x,y)

10.2 Scatter plot using ggplot2

For vectors

library(ggplot2)
qplot(x,y)

10.3 Basic Plot

For datasets

library(ggplot2)
ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point()

10.4 Adding Aesthetics

ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl))) + geom_point()

10.5 Bar Chart

ggplot(mtcars, aes(x = factor(cyl))) + geom_bar()

10.6 Facet Grid for Multiple Plots

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  facet_grid(. ~ cyl)

11 Interactive Exercise: Data Visualization

Question: Create a scatter plot of wt vs. mpg with points colored by gear.

# Write your code here

Answer:

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(gear))) + geom_point()