First steps in R programming: Session 2

Authors Rasouli. Hassan, Moein, Fatemeh
Date December 20, 2025 — 20:00-21:45pm
Affiliation ABSA, Tarbiat Modares University, Tehran, Iran
Workshop R club for researchers
Contact

Changing Working Directory

getwd()
# Output: [1] "C:/Users/BiaDigi.Com/Documents"

# Change directory:
setwd("C:/Users/BiaDigi.Com/Downloads/codes/dataset")
setwd("F:/book/workfolder")

getwd()

Listing Files in Directory

# List files in path directory
list.files()

# When the path is empty
#output: character(0)

# When you have files in directory
# Output: [1] "code pca.R" "code tables and bar plots.R"

# Number of files per directory
length(list.files())
# Output: [1] 2

Data Import in RStudio

R supports over 50 different data formats through its core packages and additional packages from CRAN:

  1. Text Files: CSV, TSV, TXT, Fixed-width, etc.
  2. Spreadsheets: Excel (.xls, .xlsx), LibreOffice, Google Sheets, etc.
  3. Statistical Software: SPSS (.sav, .por), SAS (.sas7bdat, .xpt), Stata (.dta), Minitab, etc.
  4. Databases: SQLite, MySQL, PostgreSQL, Oracle, SQL Server, ODBC, etc.
  5. R Formats: RDS, RData, RDA, etc.
  6. Web Formats: JSON, XML, HTML tables, API data, etc.
  7. Modern Formats: Parquet, Feather, Arrow
  8. Other: Clipboard, Serialized objects, etc.

Key Packages for Reading Data

# Install packages
install.packages("readr")
install.packages("readxl")
install.packages("haven")
install.packages("arrow")
  • readr, data.table - Text files
  • readxl, openxlsx - Excel files
  • haven - SPSS, SAS, Stata
  • arrow - Parquet, Feather
  • DBI + drivers - Databases
  • jsonlite, xml2 - JSON/XML
  • foreign - Legacy statistical formats
  • ncdf4, hdf5r - Scientific formats (NGS GIS data, big data management)

Methods for Data Import

Method 1: Using RStudio File Menu

  • File > Import Dataset > Choose data type
  • Useful for basic formats and statistical software input/output

Method 2: Manual Assignment of Data

data <- read.table(text = "
Species L1 L2 L3 L4 L5
S1 242 254 253 240 228
S2 255 271 253 235 236
S3 245 270 240 241 233
S4 238 270 251 234 221
S5 245 251 256 240 228
S6 248 242 246 246 234
S7 251 270 252 244 232
S8 259 271 255 235 224
S9 246 244 249 242 239
S10 247 285 254 242 232
S11 233 267 255 232 225
S12 251 281 247 239 239
S13 232 273 251 250 236
S14 236 270 249 237 233
S15 231 260 246 246 224
S16 254 290 246 242 227
S17 230 288 247 231 226
S18 253 289 240 247 225
S19 236 285 242 234 236
S20 255 278 253 232 236
", header = TRUE)

rownames(data) <- data$Species
data$Species <- NULL
data <- as.matrix(data)
print(data)

Method 3: Using Vectors and Dataframes

Species <- paste0("S", 1:20)
L1 <- c(242,255,245,238,245,248,251,259,246,247,
        233,251,232,236,231,254,230,253,236,255)
L2 <- c(254,271,270,270,251,242,270,271,244,285,
        267,281,273,270,260,290,288,289,285,278)
L3 <- c(253,253,240,251,256,246,252,255,249,254,
        255,247,251,249,246,246,247,240,242,253)
L4 <- c(240,235,241,234,240,246,244,235,242,242,
        232,239,250,237,246,242,231,247,234,232)
L5 <- c(228,236,233,221,228,234,232,224,239,232,
        225,239,236,233,224,227,226,225,236,236)

data <- data.frame(Species, L1, L2, L3, L4, L5)
rownames(data) <- data$Species
data$Species <- NULL
print(data)

Method 4: Direct Copy from Excel Sheet

data <- read.table("clipboard", header = TRUE, sep="\t", row.names = 1)
head(data, n=5)
print(data)

# Visualize with heatmap
library(heatmaply)
heatmaply(data)

Method 5: Using Data Editor

df <- edit(data)
print(df)

Method 6: Using scan() for Input

# For numerical values
values <- scan()

# For text data
text_data <- scan(what = "character")

# Example:
cat("Enter Species names separated by space (S1 S2 ... S20):\n")
Species <- scan(what = "character")

cat("Enter values for L1 (20 numbers separated by space):\n")
L1 <- scan()

# Continue for L2, L3, L4, L5...

Method 7: Importing Text/CSV Files

# Reading text files
data <- read.table("set1txt.txt", header = TRUE, sep = "\t", row.names = 1)
head(data, n=4)

# Reading CSV files
data <- read.csv("set1csv.csv", header = TRUE, row.names = 1)
head(data, n=3)

# Using readr (tidyverse)
library(readr)
library(tibble)
df <- read_csv("set1csv.csv")
head(df, n=5)
data <- column_to_rownames(df, var = "Species")

Method 8: Reading Excel Files

library(readxl)
data <- read_excel("set1xlx.xlsx", sheet = 1)
head(data, n=3)

# Using openxlsx
library(openxlsx)
data <- read.xlsx("set1xlx.xlsx")
data <- column_to_rownames(data, var = "Species")
head(data)

Method 9: Using Built-in Datasets

data()  # List available datasets

h1 <- volcano
head(h1)

h2 <- mtcars
head(h2, n=5)

h3 <- CO2
head(h3, n=7)

Section 2: Basic Statistics

Descriptive Statistics

# Using summary() function
summary(data)
summary(data$L1)

# Calculating mean
mean(data$L1)
mean(data$L4)

# Using sapply for multiple columns
sapply(data, mean)

# Finding range
range(data)
min(data)
max(data)

# Variance
var(data)
var(data$L1)

# Standard deviation
sd(data$L1)
sd(data$L5)

Section 3: Package Installation in R

Before Installation, run this code in your R console

options(timeout = 3000)  # Increase waiting time for downloading packages

Method 1: Using RStudio GUI

  • Tools > Install Packages > Insert package name
  • Useful for installation of R core packages

Method 2: Installation from CRAN

# Install only one package
install.packages("dplyr")

# Install multiple packages
install.packages(c("ggplot2", "tidyr", "readr"))

# Direct installation from R-project repository (useful for Iranian users)
install.packages("ggplot2", repos = "https://cloud.r-project.org")

# Install with all dependencies
install.packages("caret", dependencies = TRUE)

# Uninstall packages
remove.packages("lava")

# Check and update old packages
old.packages()

# Update packages without asking
update.packages(oldPkgs = "ggplot2", ask = FALSE)

# Update R software
R.version.string
install.packages("installr")
library(installr)
updateR()

Method 3: Install from GitHub

GitHub is a platform for version control and collaboration, enabling developers to host, share, and manage code repositories. ➡️ GitHub

install.packages("devtools")
library(devtools)
install_github("username/packagename")

# Example
install_github("tidyverse/ggplot2")

# Using remotes
install.packages("remotes")
library(remotes)
install_github("username/packagename")

Method 4: Install from Bioconductor

In this bioinformatics repository, you can access all R packages used for bioinformatics analyses. ➡️ Bioconductor

# First install BiocManager
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(version = "3.22")

# Check available packages
BiocManager::available()

# Update Bioconductor packages
BiocManager::install()

# Install specific package
BiocManager::install("FastqCleaner")

# Browse documentation
browseVignettes("FastqCleaner")

Method 5: Install from Local Sources

# Install from tar.gz file (Ubuntu)
install.packages("~/Downloads/package.tar.gz", repos = NULL, type = "source")

# Install from zip file (Windows)
install.packages("C:/Downloads/package.zip", repos = NULL)

Method 6: Advanced Installation

# Install specific version
remotes::install_version("ggplot2", version = "3.3.0")

# Install development version
remotes::install_dev("dplyr")

# Install without compilation
install.packages("data.table", type = "binary")

Useful Tricks

✅ Using these tips, you can enjoy programming in the R environment.

# Find package installation path
.libPaths()

# Change CRAN mirror
chooseCRANmirror()

# Set download link directly
options(repos = c(CRAN = "https://mirror.sharif.edu/cran/"))
options(repos = c(CRAN = "https://cran.um.ac.ir/"))

Best CRAN Mirrors for Iranian R users:

✅ Using these links, you can have faster access and lower latency to R installation packages based on your geographical location in Iran. ➡️ CRAN mirrors

  1. https://cran.um.ac.ir/ - Ferdowsi University of Mashhad
  2. https://mirror.sharif.edu/cran/ - Sharif University of Technology
  3. https://cran.ir/ - Iranian R community
  4. https://cloud.r-project.org/ - Main CRAN server (sometimes accessible)

📚 Session 2 Materials

✅ In this section, you can download the high-quality recorded video files from the second session.

🎥 Recorded Session Videos

Part Description Download Link
Part 1 Session 2 - Part 1 📥 Download
Part 2 Session 2 - Part 2 📥 Download
Part 3 Session 2 - Part 3 📥 Download
Part 4 Session 2 - Part 4 📥 Download
Part 5 Session 2 - Part 5 📥 Download

📊 Sample Datasets

✅ In this section, you can download the sample data needed for working with R packages.

Dataset Format Description Download Link
Sample Data 1 📊 Excel (.xlsx) Dataset in Excel format 📥 Download
Sample Data 2 📄 CSV (.csv) Dataset in CSV format 📥 Download
Sample Data 3 📝 Text (.txt) Dataset in Text format 📥 Download

💡 Tip: Click on the 📥 icon next to each file to download the materials used in Session 2 of the R training course.

💡 Note: This guide covers essential first steps in R programming including working directory management, data import methods, basic statistics, and package installation techniques.