——————————————————————————————–

Facilitator: CDAM Experts

——————————————————————————————–

Course Outline: Beginner Level R for Data Science

Session 1: Introduction to Data science, R, RStudio, and Basic Data Types

Session 2: Data Import, Cleaning, and Exploratory Data Analysis (EDA)

Session 3: Data Manipulation with dplyr

Session 4: Data Visualization with ggplot2

Session 5: Probability Distributions and Random Variables

Session 6: Hypothesis Testing

Session 7: Regression Analysis, Correlation and Time Series

Session 8: Analysis of Variance (ANOVA) and Non-Parametric Tests

Session 9: Reporting with RMarkdown

Session 10: Capstone Project

Session 1: Introduction to Data Science, R, RStudio, and Basic Data Types

Learning Objectives:

By the end of this session, you will be able to:

Understand what is Data Science, its Aims and Application
Understand what R and RStudio are and how they differ.
Install and configure R and RStudio on your system.
Navigate the RStudio interface (Console, Script Editor, Environment, Plots).
Use basic R syntax and understand core data types: vectors, matrices, lists, and data frames.
Perform basic operations and write simple R scripts.
Prepare for working with real-world datasets in upcoming sessions

1. What is Data Science?

Definition

Data Science is an interdisciplinary field that uses scientific methods, algorithms, and systems to extract knowledge and insights from structured and unstructured data.

Aims of Data Science

Data Exploration & Analysis – Discover patterns and trends.
Predictive Modeling – Forecast future outcomes using machine learning.
Decision-Making – Support business and scientific decisions with data.
Automation – Build intelligent systems (e.g., recommendation engines).
Visualization – Communicate insights effectively.

Applications of Data Science

• Business: Customer segmentation, fraud detection, sales forecasting.

• Healthcare: Disease prediction, drug discovery, medical imaging.

• Finance: Risk assessment, algorithmic trading, credit scoring.

• Marketing: Sentiment analysis, personalized recommendations.

• Social Media: Trend analysis, user behavior modeling.

2. Overview of R and RStudio

What is R?

R is a programming language and software environment for statistical computing , data analysis , and graphics.
Originally developed by Ross Ihaka and Robert Gentleman at the University of Auckland.
Open-source, community-driven, and widely used in academia, research, and industry .
Key strengths:
- Rich ecosystem of packages (e.g., dplyr, ggplot2, caret)
- Built-in support for statistical models
- Strong community and active development

What is RStudio?

RStudio is an Integrated Development Environment (IDE) for working with R.
Provides a more user-friendly interface with tools for writing code, visualizing data, managing files, and debugging.
Available as Desktop (local) or Server (cloud-based) versions.

NB: Think of R as the engine and RStudio as the dashboard — both together make driving (coding) easier and more efficient

3. Installing R and RStudio

Step-by-step Installation:

Step 1. Install R :

Go to https://cran.r-project.org/
Download and install the version appropriate for your OS (Windows, macOS, Linux).

Step 2. Install RStudio :

Go to https://posit.co/download/rstudio-desktop
Download and install the free desktop version.

Step 3. Launch RStudio :

After installation, open RStudio.
You’ll see multiple panes: Console, Script Editor, Environment, etc.

Step 4. RStudio Interface Overview

Console: Where you type commands and see immediate output.
Script Editor: Write and save R scripts here (File > New File > R Script)
Environment/History: Lists objects (variables) created and command history
Files/Plots/Packages/Help: File browser, plot viewer, package manager, help documentation

Getting Started

Before you begin, you might want to create a new project in RStudio. A project is a self-contained working environment that helps you manage your work efficiently. It includes your R scripts, datasets, outputs, and settings in one place.You can name the project and choose a directory to save it in.

set a working directory Default location where R looks for files and saves outputs

setwd("~/2025_R_TRAINING") # It tells R where to look for files and where to save files

4. Basic Operations and Functions

Arithmetic Operators

x = 10
y = 3

print(x + y)   # Addition

## [1] 13

print(x - y)   # Subtraction

## [1] 7

print(x * y)   # Multiplication

## [1] 30

print(x / y)   # Division

## [1] 3.333333

Logical Operators

a <- 5
b <- 10

print(a > b)           # FALSE

## [1] FALSE

print(a == 5 & b > 5)  # AND

## [1] TRUE

print(a == 5 | b < 5)  # OR

## [1] TRUE

print(a!=5)            # a is not equal to 5

## [1] FALSE

print(a==5)            # a is equal to 5

## [1] TRUE

Built-in Functions

sum(c(1, 2, 3))         # Sum

## [1] 6

mean(c(2, 4, 6))        # Mean

## [1] 4

sd(c(2, 4, 6))          # Standard deviation

## [1] 2

min(c(10, 20, 5))       # Minimum

## [1] 5

max(c(10, 20, 5))       # Maximum

## [1] 20

length(c(1, 2, 3))      # Length

## [1] 3

seq(1, 10, by = 2)      # Sequence

## [1] 1 3 5 7 9

rep("hello", times = 3) # Repeat

## [1] "hello" "hello" "hello"

5. Basic R Syntax and Data Types:

R has several basic data structures

1. Vectors*

A vector is the simplest data structure in R. It contains elements of the same type.

# Create a numeric vector
numeric_vector <- c(1, 2, 3, 4, 5)

# Create a character vector
char_vector <- c("apple", "banana", "cherry")

# Create a logical vector
logical_vector <- c(TRUE, FALSE, TRUE)

# Print vector
print(numeric_vector)

## [1] 1 2 3 4 5

print(char_vector)

## [1] "apple"  "banana" "cherry"

print(logical_vector)

## [1]  TRUE FALSE  TRUE

Attention: All elements must be the same type; if mixed, coercion occurs

mixed_vector <- c(1, "two", TRUE)
print(mixed_vector)  # All converted to characters

## [1] "1"    "two"  "TRUE"

Useful Functions for Vectors

# Create a numeric vector
numeric_vector <- c(1, 2, 3, 4, 5)

# Create a character vector
char_vector <- c("apple", "banana", "cherry")

# Create a logical vector
logical_vector <- c(TRUE, FALSE, TRUE)

length(numeric_vector)       # Length of the vector

## [1] 5

typeof(char_vector)       # Type of elements

## [1] "character"

is.vector(logical_vector)    # Check if object is a vector

## [1] TRUE

vec_num <- c(1, 2, 3)
vec_char <- c("apple", "banana")
vec_logical <- c(TRUE, FALSE, TRUE)

length(vec_num)       # Length of the vector

## [1] 3

typeof(vec_num)       # Type of elements

## [1] "double"

is.vector(vec_num)    # Check if object is a vector

## [1] TRUE

2. Matrices

A matrix is a 2D vector with rows and columns. All elements must be of the same type.

# Create a matrix from a vector
mat <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
print(mat)

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

# Access elements
print(mat[1, 2])  # First row, second column

## [1] 3

Naming Rows and Columns

rownames(mat) <- c("Row1", "Row2")
colnames(mat) <- c("Col1", "Col2", "Col3")
print(mat)

##      Col1 Col2 Col3
## Row1    1    3    5
## Row2    2    4    6

Indexing in Matrices

mat[1, 2]     # First row, second column

## [1] 3

mat[2, ]      # Entire second row

## Col1 Col2 Col3 
##    2    4    6

mat[, 3]      # Entire third column

## Row1 Row2 
##    5    6

3. Lists

A list can contain elements of different types and even other lists.

my_list <- list(
  name = "John", 
  age = 30, 
  grades = c(85, 90, 78))


print(my_list)

## $name
## [1] "John"
## 
## $age
## [1] 30
## 
## $grades
## [1] 85 90 78

# Accessing List Elements
my_list$name           # By name

## [1] "John"

my_list[[3]]           # By index

## [1] 85 90 78

my_list$grades[2]      # Second score

## [1] 90

4. Data Frames – Tabular Data Structure

A data frame is like a spreadsheet or SQL table — rows represent observations, columns represent variables.

Creating a Data Frame

# Create a data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "Ann"),
  Age = c(25, 30, 35, 26),
  Salary = c(50000, 60000, 70000, 40000))

print(df)

##      Name Age Salary
## 1   Alice  25  50000
## 2     Bob  30  60000
## 3 Charlie  35  70000
## 4     Ann  26  40000

Inspecting a Data Frame

str(df)         # Structure of the data frame

## 'data.frame':    4 obs. of  3 variables:
##  $ Name  : chr  "Alice" "Bob" "Charlie" "Ann"
##  $ Age   : num  25 30 35 26
##  $ Salary: num  50000 60000 70000 40000

summary(df)     # Summary statistics

##      Name                Age            Salary     
##  Length:4           Min.   :25.00   Min.   :40000  
##  Class :character   1st Qu.:25.75   1st Qu.:47500  
##  Mode  :character   Median :28.00   Median :55000  
##                     Mean   :29.00   Mean   :55000  
##                     3rd Qu.:31.25   3rd Qu.:62500  
##                     Max.   :35.00   Max.   :70000

head(df)        # First few rows

##      Name Age Salary
## 1   Alice  25  50000
## 2     Bob  30  60000
## 3 Charlie  35  70000
## 4     Ann  26  40000

dim(df)         # Dimensions (rows x columns)

## [1] 4 3

Adding and Removing Columns

# Add a new column
df$Department <- c("HR", "Finance", "IT", "Audit")

# Remove a column
df$salary <- NULL
print(df)

##      Name Age Salary Department
## 1   Alice  25  50000         HR
## 2     Bob  30  60000    Finance
## 3 Charlie  35  70000         IT
## 4     Ann  26  40000      Audit

Filtering Rows

# Filter rows where Age > 30
filtered_df <- subset(df, Age > 30)
print(filtered_df)

##      Name Age Salary Department
## 3 Charlie  35  70000         IT

6. Hands-On Practice

Task 1: Create and Manipulate Vectors

# Create two vectors
vec1 <- c(10, 20, 30)
vec2 <- c("red", "green", "blue")

# Concatenate them
combined_vec <- c(vec1, vec2)
print(combined_vec)

## [1] "10"    "20"    "30"    "red"   "green" "blue"

# Find the length
print(length(combined_vec))

## [1] 6

# Coerce numeric to character
as.character(vec1)

## [1] "10" "20" "30"

Exercise 1: Working with Vectors

# Create two numeric vectors
vec1 <- c(10, 20, 30)
vec2 <- c(40, 50, 60)

# Concatenate them
combined_vec <- c(vec1, vec2)
print(combined_vec)

## [1] 10 20 30 40 50 60

# Convert to character and print
as.character(combined_vec)

## [1] "10" "20" "30" "40" "50" "60"

Task 2: Matrix Creation and Indexing

# Create a 3x3 matrix
mat <- matrix(seq(1, 9), nrow = 3, ncol = 3)
print(mat)

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

# Extract element at position (2,3)
print(mat[2, 3])

## [1] 8

Exercise 2: Matrix Practice

# Create a 3x3 matrix with values from 1 to 9
mat <- matrix(seq(1, 9), nrow = 3, ncol = 3)
print(mat)

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

# Extract diagonal elements
diag(mat)

## [1] 1 5 9

# Transpose the matrix
t(mat)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Task 3: Working with a Data Frame

# Create a small dataset
students <- data.frame(
  ID = c(101, 102, 103),
  Name = c("Emma", "Liam", "Olivia"),
  Score = c(88, 92, 85)
)

# Add a new column
students$Grade <- c("B", "A", "B")
print(students)

##    ID   Name Score Grade
## 1 101   Emma    88     B
## 2 102   Liam    92     A
## 3 103 Olivia    85     B

# Filter students with score > 90
high_scores <- subset(students, Score > 90)
print(high_scores)

##    ID Name Score Grade
## 2 102 Liam    92     A

Exercise 3: Exploring Data Frames

# Create a sample employee dataset
employees <- data.frame(
  ID = c(101, 102, 103),
  Name = c("Emma", "Liam", "Olivia"),
  Department = c("HR", "IT", "Marketing"),
  Salary = c(55000, 65000, 60000))

# Add a new column indicating whether salary is above $60,000
employees$HighEarner <- employees$Salary > 60000
print(employees)

##    ID   Name Department Salary HighEarner
## 1 101   Emma         HR  55000      FALSE
## 2 102   Liam         IT  65000       TRUE
## 3 103 Olivia  Marketing  60000      FALSE

# Filter employees who earn more than $60,000
high_earners <- subset(employees, Salary > 60000)
print(high_earners)

##    ID Name Department Salary HighEarner
## 2 102 Liam         IT  65000       TRUE

7. Homework Assignment One

Exercise 1: Vector Practice

Create a numeric vector with values from 1 to 20.
Calculate the sum and mean.
Convert it to a character vector and print the result.

Exercise 2: Matrix Challenge

Create a 4x4 matrix filled with numbers from 1 to 16.
Extract the diagonal elements.
Transpose the matrix.

Exercise 3: Data Frame Exploration

Create a data frame representing 5 employees with fields: Name, Department, Salary.
Compute the average salary.
Add a column indicating whether salary is above $60,000 (TRUE/FALSE).

8. Homework Assignment Two

Exercise A: Vector Mastery

Create a numeric vector containing numbers from 1 to 20.
Compute the sum and average.
Convert it to a character vector and print the result.

Exercise B: Matrix Challenge

Create a 4x4 matrix with numbers from 1 to 16.
Extract the diagonal, transpose, and last row.
Replace the last row with zeros.

Exercise C: Data Frame Analysis

Create a data frame representing 5 students with fields: Name, mathematics, Score.
Compute the average Score.
Add a column indicating whether Score is above 50 (use TRUE/FALSE).
Sort the students by Score in descending order.

Additional Resources

Official R Documentation: https://cran.r-project.org/manuals.html
RStudio Cheatsheets: https://posit.co/download/rstudio-cheatsheets
R for Data Science Book: https://r4ds.hadley.nz/
Try R Playground : https://try.rbind.io/

R Notebook

——————————————————————————————–

——————————————————————————————–

Course Outline: Beginner Level R for Data Science

Session 1: Introduction to Data Science, R, RStudio, and Basic Data Types

Learning Objectives:

1. What is Data Science?

Definition

Aims of Data Science

Applications of Data Science

2. Overview of R and RStudio

What is R?

What is RStudio?

3. Installing R and RStudio

Step-by-step Installation:

Step 1. Install R :

Step 2. Install RStudio :

Step 3. Launch RStudio :

Step 4. RStudio Interface Overview

Getting Started

set a working directory Default location where R looks for files and saves outputs

4. Basic Operations and Functions

Arithmetic Operators

Logical Operators

Built-in Functions

5. Basic R Syntax and Data Types:

1. Vectors*

2. Matrices

3. Lists

4. Data Frames – Tabular Data Structure

Creating a Data Frame

Inspecting a Data Frame

Adding and Removing Columns

Filtering Rows

6. Hands-On Practice

7. Homework Assignment One

Exercise 1: Vector Practice

Exercise 2: Matrix Challenge

Exercise 3: Data Frame Exploration

8. Homework Assignment Two

Exercise A: Vector Mastery

Exercise B: Matrix Challenge

Exercise C: Data Frame Analysis

Additional Resources