1 The workshop Obectives

Welcome to the Mastering R for Research Workshop. This comprehensive workshop is designed to help you unlock your research potential by mastering the R software. In this workshop, you will learn how to:

Navigate the RStudio interface.
Create and manipulate various data structures (vectors, matrices, lists, data frames).
Import and export datasets in formats like CSV and Excel.
Handle missing values and manipulate data using the dplyr package.
Create publication-ready visualizations with ggplot2.
Generate various plots, histograms, and charts for data exploration.
Perform descriptive and inferential statistical analyses (t-tests, chi-square tests, regression analysis).
Build dynamic, reproducible reports with RMarkdown.
Each chapter below covers one or more of these objectives, along with examples and interactive code chunks you can run and modify.

2 What is R?

R is a language and environment for statistical computing and data visualization http://www.r-project.org/about.html. R is a free, open-source programming language, meaning anyone can use, modify, and distribute it. There are multiple sub-packages that may help read input, implement functions, visualize output and transform results for further use.

3 Chapter 1: How to Install R

R can be installed on various operating systems. This chapter provides instructions for installing R on Windows, macOS, and Linux. It also includes brief guidance on installing RStudio.

3.1 For Windows, macOS

3.1.1 Visit CRAN:

Go to the Comprehensive R Archive Network (CRAN) website: https://cran.r-project.org/

3.1.2 Download R for Windows:

Click on “Download R for Windows” (make sure it’s “.exe” file).
Click on “Download R for macOS” (make sure it’s “.pkg” file)

3.1.3 Run the Installer:

Double-click the downloaded file and follow the installation instructions. You can usually accept the default settings.

3.1.4 Verify Installation:

Once installed, open the R GUI (or RStudio if you install that too) and type:

version

3.2 Installing R on Linux

For Ubuntu/Debian-based systems, follow these steps in your terminal:

# Update your package list and install prerequisites
sudo apt update
sudo apt install --no-install-recommends software-properties-common dirmngr

# Add the CRAN repository (replace 'focal' with your Ubuntu release if necessary)
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'

# Update package lists again and install R
sudo apt update
sudo apt install r-base

# Verify the installation by checking the version
R --version

4 Introduction to RStudio

RStudio is an integrated development environment (IDE) for R. It provides a user-friendly interface that makes writing and running R scripts easier.

4.1 Key Features of RStudio

RStudio has several key components that enhance the user experience:

Source Pane: This is where you write and edit R scripts.
Console: The console allows you to execute R commands interactively.
Environment/History Pane: Displays variables and functions created during the session.
Plots/Files/Packages/Help Pane: Allows access to plots, file management, installed packages, and help documentation.

4.2 Installing RStudio

To install RStudio: Download and install RStudio from RStudio’s website.

4.3 Basic Usage

4.3.1 Running Code

You can run R code in RStudio using the following methods:

Type code in the Console and press Enter.
Write code in the Source Pane and run it by pressing Ctrl + Enter (Windows) or Cmd + Enter (Mac).

4.3.2 Example: Running a Simple Calculation

# Adding two numbers
2 + 3

4.3.3 RStudeio workspace

Use ls() to list all objects in your environment.
Use rm(object_name) to remove an object.
Use rm(list = ls()) to clear the entire environment.

# Listing objects in the environment
ls()

4.4 Customizing RStudio

You can customize RStudio to suit your preferences:

Change the theme: Go to Tools > Global Options > Appearance.
Modify code editing settings: Tools > Global Options > Code.
Set a default working directory: Tools > Global Options > General.

4.5 Summary

RStudio is a powerful tool for writing and executing R code efficiently. Understanding its features and functionalities will help you work effectively with R for research and data analysis.

5 R Syntax

Understanding the basic syntax of R is essential for writing effective scripts. R is case-sensitive and follows a simple, readable syntax.

5.1 Assigning Values

In R, values can be assigned to variables using <- or =.

x <- 10  # Assigning 10 to x
y = 20    # Assigning 20 to y
x + y     # Summing x and y

## [1] 30

5.2 Data Types in R

R has several basic data types:

Numeric: Decimal values (e.g., 10.5, 2.3)
Integer: Whole numbers (e.g., 1L, 5L)
Character: Text strings (e.g., "Hello")
Logical: Boolean values (TRUE, FALSE)

a <- 5       # Numeric
b <- 2L      # Integer
c <- "R"     # Character
d <- TRUE    # Logical

5.3 Conditional Statements

R supports conditional statements like if, else, and ifelse.

x <- 15
if (x > 10) {
  print("x is greater than 10")
} else {
  print("x is 10 or less")
}

## [1] "x is greater than 10"

5.4 Loops in R

Loops help execute repetitive tasks efficiently.

5.4.1 For Loop

for (i in 1:5) {
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

5.4.2 While Loop

count <- 1
while (count <= 5) {
  print(count)
  count <- count + 1
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

5.5 Functions in R

Functions allow modular programming and reuse of code.

add_numbers <- function(a, b) {
  return(a + b)
}

add_numbers(5, 3)

## [1] 8

5.6 Interactive Exercise

Try the following exercises to practice R syntax:

5.6.1 Exercise 1: Assign and Print Variables

5.6.1.1 Question

Modify the code below to multiply a and b instead of adding them.

# Assign values to variables

# Print the sum of a and b

Answer

# Assign values to variables
a <- 7
b <- 3

# Multiply a and b
print(a * b)

## [1] 21

5.6.2 Exercise 2: Conditional Statements

5.6.2.1 Question

Write an if-else statement to check if x is greater than 20.

# Define a variable x


# Write an if-else statement to check if x is greater than 20

Answer

# Define a variable x
x <- 25

# Write an if-else statement to check if x is greater than 20
if (x > 20) {
  print("x is greater than 20")
} else {
  print("x is 20 or less")
}

## [1] "x is greater than 20"

5.6.3 Exercise 3: Loops

5.6.3.1 Question

Write a for loop to print numbers from 1 to 10.

# Write a for loop to print numbers from 1 to 10

Modify the loop to print only even numbers between 1 and 10.

Answer

# Print even numbers from 1 to 10
for (i in seq(1, 10, by=1)) {
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

5.7 Summary

Mastering R syntax is the foundation for effective programming in R. Understanding variable assignment, data types, conditional statements, loops, and functions is key to becoming proficient in R programming.

6 Data Structures in R

R provides several fundamental data structures for handling data. Understanding these structures is essential for efficient data manipulation.

6.1 Vectors

Vectors are the simplest data structure in R and can contain elements of the same type.

numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)

6.2 Matrices

Matrices are two-dimensional arrays that contain elements of the same type.

matrix_example <- matrix(1:9, nrow=3, ncol=3)
matrix_example

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

6.3 Lists

Lists can hold elements of different types, including vectors, matrices, and even other lists.

list_example <- list(name = "John", age = 25, scores = c(90, 85, 88))
list_example

## $name
## [1] "John"
## 
## $age
## [1] 25
## 
## $scores
## [1] 90 85 88

6.4 Data Frames

Data frames are table-like structures where each column can contain different types of data.

df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30), Score = c(90, 85))
df

##    Name Age Score
## 1 Alice  25    90
## 2   Bob  30    85

6.5 Interactive Exercise

6.5.1 Exercise 1: Create a Vector

6.5.1.1 Question

Create a numeric vector containing the numbers 10, 20, 30, 40, and 50.

Answer

numeric_vector <- c(10, 20, 30, 40, 50)
numeric_vector

6.5.2 Exercise 2: Create a Data Frame

6.5.2.1 Question

Create a data frame with two columns: “Student” (names: “Alice”, “Bob”, “Charlie”) and “Score” (values: 85, 90, 78).

Answer

df <- data.frame(Student = c("Alice", "Bob", "Charlie"), Score = c(85, 90, 78))
df

6.6 Summary

Understanding R’s data structures is crucial for effective data analysis. Vectors, matrices, lists, and data frames each serve different purposes in handling and organizing data.

7 Data Structures in R - 2

R provides several fundamental data structures for handling data. Understanding these structures is essential for efficient data manipulation.

7.1 Data Types and File Formats in R

R supports multiple data file formats. Below are some common formats and how to handle them.

7.1.1 CSV Files

CSV (Comma-Separated Values) files are one of the most commonly used data formats.

Reading CSV Files: Built in read.csv

readr package

library(readr)
data <- read_csv("/data/bakeoff.csv")
head(data)

Writing CSV Files:

write_csv(data, "/results/output.csv")

7.1.2 Excel Files (.xlsx)

To handle Excel files, use the readxl package for reading and the writexl package for writing.

Reading Excel Files:

library(readxl)
data <- read_excel("/latitude.xlsx", sheet = 1)
head(data)

Writing Excel Files:

library(writexl)
write_xlsx(data, "/results/output.xlsx")

7.1.3 RDS Files

RDS files store R objects efficiently.

Loading an R Object:

data <- readRDS("/data/inventory_parts.rds")

Saving an R Object:

saveRDS(data, "/results/output.rds")

7.1.4 RData Files

To handle RData data, you don’t need any package. It is R’s internal data structure and preserves the type of data saved.

Reading RData Files:

data <- load("/data/wine.RData")

Writing RData Files:

save(data, "/results/output.RData")

7.2 Summary

R provides multiple ways to handle data formats, including CSV, Excel, JSON, RDS, and databases. Using the appropriate package ensures efficient data handling and manipulation.

8 Data Manipulation in R

Data manipulation is a crucial part of data analysis. R provides various packages to facilitate efficient data handling, including:

dplyr: For data manipulation
tidyr: For data tidying
data.table: For high-performance data processing
readr: For reading/writing tabular data
stringr: For string operations

8.1 dplyr: Data Manipulation

The dplyr package provides functions for filtering, selecting, mutating, and summarizing data.

8.1.1 Filtering Data

library(dplyr)
data <- mtcars
filtered_data <- data %>% filter(mpg > 20)
head(filtered_data)

##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2

8.1.2 Selecting Columns

selected_data <- data %>% select(mpg, hp, wt)
head(selected_data)

##                    mpg  hp    wt
## Mazda RX4         21.0 110 2.620
## Mazda RX4 Wag     21.0 110 2.875
## Datsun 710        22.8  93 2.320
## Hornet 4 Drive    21.4 110 3.215
## Hornet Sportabout 18.7 175 3.440
## Valiant           18.1 105 3.460

8.1.3 Creating New Columns (Mutate)

data <- data %>% mutate(power_to_weight = hp / wt,
                        abc= mpg/wt)
head(data)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##                   power_to_weight      abc
## Mazda RX4                41.98473 8.015267
## Mazda RX4 Wag            38.26087 7.304348
## Datsun 710               40.08621 9.827586
## Hornet 4 Drive           34.21462 6.656299
## Hornet Sportabout        50.87209 5.436047
## Valiant                  30.34682 5.231214

8.1.4 Summarizing Data

data_summary <- data %>% summarize(avg_mpg = mean(mpg))
data_summary

##    avg_mpg
## 1 20.09062

8.1.5 Tidyr: Data Tidying

library(tidyr)
data_long <- data %>% gather(key = "attribute", value = "value", mpg:hp)
head(data_long)

##   drat    wt  qsec vs am gear carb power_to_weight      abc attribute value
## 1 3.90 2.620 16.46  0  1    4    4        41.98473 8.015267       mpg  21.0
## 2 3.90 2.875 17.02  0  1    4    4        38.26087 7.304348       mpg  21.0
## 3 3.85 2.320 18.61  1  1    4    1        40.08621 9.827586       mpg  22.8
## 4 3.08 3.215 19.44  1  0    3    1        34.21462 6.656299       mpg  21.4
## 5 3.15 3.440 17.02  0  0    3    2        50.87209 5.436047       mpg  18.7
## 6 2.76 3.460 20.22  1  0    3    1        30.34682 5.231214       mpg  18.1

data_separated <- data_long %>% separate(col = attribute, into = c("Type", "Detail"), sep = "_")
head(data_separated)

##   drat    wt  qsec vs am gear carb power_to_weight      abc Type Detail value
## 1 3.90 2.620 16.46  0  1    4    4        41.98473 8.015267  mpg   <NA>  21.0
## 2 3.90 2.875 17.02  0  1    4    4        38.26087 7.304348  mpg   <NA>  21.0
## 3 3.85 2.320 18.61  1  1    4    1        40.08621 9.827586  mpg   <NA>  22.8
## 4 3.08 3.215 19.44  1  0    3    1        34.21462 6.656299  mpg   <NA>  21.4
## 5 3.15 3.440 17.02  0  0    3    2        50.87209 5.436047  mpg   <NA>  18.7
## 6 2.76 3.460 20.22  1  0    3    1        30.34682 5.231214  mpg   <NA>  18.1

8.1.6 data.table: Fast Data Processing

library(data.table)
dt <- as.data.table(mtcars)
dt[, .(avg_mpg = mean(mpg)), by = cyl]

##      cyl  avg_mpg
##    <num>    <num>
## 1:     6 19.74286
## 2:     4 26.66364
## 3:     8 15.10000

8.2 Summary

Data manipulation is a key aspect of data analysis in R. The dplyr, tidyr, and data.table packages provide powerful tools for transforming and processing data efficiently.

9 Interactive Exercise: Data Manipulation

Question: Select only the mpg, cyl, and hp columns from the mtcars dataset.

# Write your code here

Answer:

selected_data <- mtcars %>% select(mpg, cyl, hp)
head(selected_data)

##                    mpg cyl  hp
## Mazda RX4         21.0   6 110
## Mazda RX4 Wag     21.0   6 110
## Datsun 710        22.8   4  93
## Hornet 4 Drive    21.4   6 110
## Hornet Sportabout 18.7   8 175
## Valiant           18.1   6 105

10 Data Visualization

10.1 Scatter plot

x=rnorm(100)
y=rnorm(100)
plot(x,y)

10.2 Scatter plot using ggplot2

For vectors

library(ggplot2)
qplot(x,y)

10.3 Basic Plot

For datasets

library(ggplot2)
ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point()

10.4 Adding Aesthetics

ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl))) + geom_point()

10.5 Bar Chart

ggplot(mtcars, aes(x = factor(cyl))) + geom_bar()

10.6 Facet Grid for Multiple Plots

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  facet_grid(. ~ cyl)

11 Interactive Exercise: Data Visualization

Question: Create a scatter plot of wt vs. mpg with points colored by gear.

# Write your code here

Answer:

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(gear))) + geom_point()

R Workshop: Mastering R for Research

Mulazim Ali KHOKHAR

2025-02-20

1 The workshop Obectives

2 What is R?

3 Chapter 1: How to Install R

3.1 For Windows, macOS

3.1.1 Visit CRAN:

3.1.2 Download R for Windows:

3.1.3 Run the Installer:

3.1.4 Verify Installation:

3.2 Installing R on Linux

4 Introduction to RStudio

4.1 Key Features of RStudio

4.2 Installing RStudio

4.3 Basic Usage

4.3.1 Running Code

4.3.2 Example: Running a Simple Calculation

4.3.3 RStudeio workspace

4.4 Customizing RStudio

4.5 Summary

5 R Syntax

5.1 Assigning Values

5.2 Data Types in R

5.3 Conditional Statements

5.4 Loops in R

5.4.1 For Loop

5.4.2 While Loop

5.5 Functions in R

5.6 Interactive Exercise

5.6.1 Exercise 1: Assign and Print Variables

5.6.1.1 Question

5.6.2 Exercise 2: Conditional Statements

5.6.2.1 Question

5.6.3 Exercise 3: Loops

5.6.3.1 Question

5.7 Summary

6 Data Structures in R

6.1 Vectors

6.2 Matrices

6.3 Lists

6.4 Data Frames

6.5 Interactive Exercise

6.5.1 Exercise 1: Create a Vector

6.5.1.1 Question

6.5.2 Exercise 2: Create a Data Frame

6.5.2.1 Question

6.6 Summary

7 Data Structures in R - 2

7.1 Data Types and File Formats in R

7.1.1 CSV Files

7.1.2 Excel Files (.xlsx)

7.1.3 RDS Files

7.1.4 RData Files

7.2 Summary

8 Data Manipulation in R

8.1 dplyr: Data Manipulation

8.1.1 Filtering Data

8.1.2 Selecting Columns

8.1.3 Creating New Columns (Mutate)

8.1.4 Summarizing Data

8.1.5 Tidyr: Data Tidying

8.1.6 data.table: Fast Data Processing

8.2 Summary

9 Interactive Exercise: Data Manipulation

10 Data Visualization

10.1 Scatter plot

10.2 Scatter plot using ggplot2

10.3 Basic Plot

10.4 Adding Aesthetics

10.5 Bar Chart

10.6 Facet Grid for Multiple Plots

11 Interactive Exercise: Data Visualization