Objective

This document serves as an introduction to R programming, providing a foundation for data analysis, visualization, and machine learning.
The goal is show proficiency in R for portfolio development.

Why Use R?

R is widely used for:
- Data Analysis: Robust statistical computing capabilities.
- Visualization: Powerful graphing tools (ggplot2).
- Machine Learning: Libraries like caret and tidymodels.
- Business Intelligence: Ideal for decision-making and reporting.

Installing R & RStudio

Step 1: Install R

Download the latest version of R from CRAN.

Step 2: Install RStudio

Get RStudio IDE here.

Step 3: Verify Installation

Open RStudio and check using:

sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_Germany.utf8  LC_CTYPE=English_Germany.utf8   
## [3] LC_MONETARY=English_Germany.utf8 LC_NUMERIC=C                    
## [5] LC_TIME=English_Germany.utf8    
## 
## time zone: Europe/Berlin
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.47        
##  [5] cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1 rmarkdown_2.28   
##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.4.1    rstudioapi_0.16.0 tools_4.4.1       evaluate_1.0.0   
## [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9

If RStudio runs successfully, you’re set!. ## Basic Synthax in R #### Using R as calculator

# variable assignment in R
a <- 35 
b <- 5
c <- 35
d <- 8
# basic arithmetics in R
print(paste("a =", a))
## [1] "a = 35"
print(paste("b =", b))
## [1] "b = 5"
print(paste("c =", c))
## [1] "c = 35"
print(paste("d =", d))
## [1] "d = 8"
print(paste("a + b =", a + b))
## [1] "a + b = 40"
print(paste("a / b =", a / b))
## [1] "a / b = 7"
print(paste("a * b =", a * b))
## [1] "a * b = 175"
print(paste("a - b =", a - b))
## [1] "a - b = 30"
print(paste("a %% b =", a %% b))
## [1] "a %% b = 0"
print(paste("a %/% b =", a %/% b))
## [1] "a %/% b = 7"
print(paste("round(c/d) =", round(c/d, 3)))
## [1] "round(c/d) = 4.375"
print(paste("floor(c/d) =", floor(c/d)))
## [1] "floor(c/d) = 4"
print(paste("ceiling(c/d) =", ceiling(c/d)))
## [1] "ceiling(c/d) = 5"

Checking for equality in variables

print(paste("a > b =", a > b))
## [1] "a > b = TRUE"
print(paste("a < b =", a < b))
## [1] "a < b = FALSE"
print(paste("a > b or a = b =", a >= b))
## [1] "a > b or a = b = TRUE"
print(paste("a < b or a = b =", a <= b))
## [1] "a < b or a = b = FALSE"
print(paste("a is not equal to b =", a != b))
## [1] "a is not equal to b = TRUE"
print(paste("b is not equal to a =", b != a))
## [1] "b is not equal to a = TRUE"
print(paste("a is not equal to c =", a != c))
## [1] "a is not equal to c = FALSE"
print(paste("a > c and c > b =", a > c & c > b))
## [1] "a > c and c > b = FALSE"
print(paste("a = c and c > b =", a == c & c > b))
## [1] "a = c and c > b = TRUE"
print(paste("a != b or c > b =", a != b | c > b))
## [1] "a != b or c > b = TRUE"
print(paste("a != b or c > b =", a != c | c > c))
## [1] "a != b or c > b = FALSE"

Understanding Data Types in R

Data types define how values are stored and manipulated in R, ensuring the proper functioning of computations and preventing errors in data processing. Without a solid grasp of data types, operations may yield incorrect results or inefficiencies, leading to unreliable analysis.

Why Are Data Types Important?

Data types are crucial for: - Preventing unexpected errors in calculations. - Optimizing memory usage for efficient data storage. - Ensuring structured processing for different types of information.

For instance, numerical data allows for precise calculations, textual data is best used for classification or labeling, and logical values play a significant role in programmatic decision-making.


Types of Data in R

R provides several core data types, each serving distinct purposes in programming and analysis.

1. Numeric Data

Numeric values represent numbers, including whole numbers and decimals. These are essential for statistical calculations, financial modeling, and scientific computations.

2. Integer Data

Integers specifically store whole numbers without decimal points. They are useful in scenarios where decimal precision isn’t required, such as indexing, counts, and categorical grouping.

3. Character Data (Strings)

Character data consists of text values that serve as labels, identifiers, and descriptions. String manipulation plays a vital role in report generation, communication, and data categorization.

4. Logical Data (Boolean)

Logical values hold TRUE or FALSE states, allowing condition-based operations. They are often used in comparisons, decision structures, and filtering large datasets.

5. Factor Data

Factors represent categorical variables, such as product ratings or survey responses. Unlike raw character data, factors enable statistical modeling and improve storage efficiency when dealing with grouped classifications.

6. Complex Numbers

Complex numbers store both real and imaginary components, used primarily in advanced mathematical applications like signal processing and physics.


The Role of Data Types in Data Manipulation

Choosing the correct data type is critical for efficient and accurate computations. Numeric data enables precise calculations, logical values assist in filtering and decision-making, and character-based variables allow for structured text analysis.

A strong understanding of data types ensures smooth operations and prevents errors when working with datasets. The next step is exploring how these types function within vectors, matrices, lists, and data frames.
key structures that influence real world applications in data analytics, business intelligence, and machine learning.

#Numeric
num <- 10.5
print(paste("num =",class(num)))
## [1] "num = numeric"
#Integer
int_val <- 20L
print(paste("int_val =",class(int_val)))
## [1] "int_val = integer"
#Character (String)
text <- "Hello World"
print(paste("text =",class(text)))
## [1] "text = character"
#logical Data (Boolean)
is_valid <- TRUE
print(paste("is_valid =",class(is_valid)))
## [1] "is_valid = logical"
#Factor Data
categories <- factor(c("High", "Medium", "Low"))
print(paste("categories =",class(categories)))
## [1] "categories = factor"
#Complex Numbers
complex_numb <- 2 + 3i
print(paste("complex_numb =",class(complex_numb)))
## [1] "complex_numb = complex"

Type Casting in R

Type casting, also known as type conversion, refers to transforming a variable from one data type to another. This is especially useful when handling datasets where values may be improperly classified.

Why Is Type Casting Important?

  • Ensures Proper Operations: Some functions require specific data types to work correctly.
  • Prevents Errors: Incorrect data types can cause computation issues.
  • Optimizes Storage Efficiency: Converting to appropriate formats can improve memory handling.
  • Enhances Data Analysis: Factors enable categorical processing, and numeric conversions facilitate mathematical operations.

Common Type Casting Scenarios in R

R provides several functions for type conversion:
1. Character to Numeric: Text-based numbers ("45", "100.5") must be converted to numeric for mathematical operations.
2. Character to Factor: When categorical data is stored as text, converting it to a factor allows efficient grouping in models.
3. Logical to Numeric: TRUE and FALSE can be treated as 1 and 0, useful in binary classification.

# 1. Character to Numeric
char_num <- "45"
print(paste("char_num =", char_num)) 
## [1] "char_num = 45"
print(paste("char_num data type =", class(char_num)))
## [1] "char_num data type = character"
num_value <- as.numeric(char_num)
print(paste("num_value data type =", class(num_value)))
## [1] "num_value data type = numeric"
# 2. Character to Factor
char_category <- c("High", "Medium", "Low", "Medium")
factor_category <- as.factor(char_category)
print(factor_category)  # Output: Factor Levels: High Medium Low
## [1] High   Medium Low    Medium
## Levels: High Low Medium
print(paste("class of factor_category =", class(factor_category)))
## [1] "class of factor_category = factor"
# 3. Logical to Numeric
logical_value <- TRUE
numeric_logical <- as.numeric(logical_value)
print(numeric_logical)  # Output: 1
## [1] 1
print(paste("numeric_logic =", numeric_logical))
## [1] "numeric_logic = 1"
print(paste("class numeric_logic =", class(numeric_logical)))
## [1] "class numeric_logic = numeric"
# 4. Numeric to Character
num_to_char <- as.character(100)
print(paste("num_to_char class", class(num_to_char)))
## [1] "num_to_char class character"
# 5. Numeric to Integer
double_value <- 10.8
integer_value <- as.integer(double_value)
print(integer_value)  # Output: 10 (Rounded down)
## [1] 10

Understanding Data Structures in R

Data structures define how data is stored, organized, and accessed in R. They allow efficient data manipulation and analysis, ensuring seamless operations across different types of datasets.

Why Are Data Structures Important?

  • Facilitate structured data representation for analysis.
  • Optimize performance when handling large datasets.
  • Enable efficient data manipulation with built-in functions.

Types of Data Structures in R

R provides several fundamental data structures:

1. Vectors

Vectors are the simplest data structures, storing elements of the same type. They are useful for numerical calculations and categorical grouping.

2. Matrices

Matrices extend vectors into a two-dimensional format, often used for mathematical operations and simulations.

3. Lists

Lists allow storing elements of different types, making them ideal for handling diverse data points within a single structure.

4. Data Frames

Data frames resemble spreadsheets or SQL tables, storing multiple columns with different data types. They are widely used in data analysis.

5. Factors

Factors help categorize data efficiently, ensuring structured grouping in analysis and modeling.


# 1. Vectors
vec <- c(1, 2, 3, 4, 5) #creating a vector
print(vec)  # Output: Numeric vector
## [1] 1 2 3 4 5
length(vec)
## [1] 5
sum(vec)
## [1] 15
min(vec)
## [1] 1
max(vec)
## [1] 5
vec[1:4]  # print 1 to 4 value of vec
## [1] 1 2 3 4
vec[1] <- 10  #vector modification

#2.Matrices
mat <-  matrix(1:9,  ncol = 2,nrow = 3)
## Warning in matrix(1:9, ncol = 2, nrow = 3): data length [9] is not a
## sub-multiple or multiple of the number of columns [2]
mart <- matrix(c(3,5,1,6,9,10), nrow = 2, ncol = 3)
dim(mat)
## [1] 3 2
mat[2,2]  # Accessing elements
## [1] 5
mat[2,2] <- 20  # Modifying list elements
mat[2,2]
## [1] 20
length(mat)
## [1] 6
#using loop in matrix
for (i in 1:length(mart)){
  cat("index =",i, ",", "value =",mart[i],"\n")
 
}
## index = 1 , value = 3 
## index = 2 , value = 5 
## index = 3 , value = 1 
## index = 4 , value = 6 
## index = 5 , value = 9 
## index = 6 , value = 10

#3.List
lst <-  list(name = 'jamie', age = 30, score = c(50,70, 95))
lst$name # Accessing elements
## [1] "jamie"
lst["age"] <-  40 # Modifying list elements
lst$name <- "fox" # Modifying list elements
lst$city <- "Berlin"  # Adding new elements
print(lst)
## $name
## [1] "fox"
## 
## $age
## [1] 40
## 
## $score
## [1] 50 70 95
## 
## $city
## [1] "Berlin"

#4. Dataframes
df <-  data.frame(ID = 1:3, Name = c("Charlie", "Fox", "Greg"), Score = c(80, 90, 70))
str(df)  # Structure of data frame
## 'data.frame':    3 obs. of  3 variables:
##  $ ID   : int  1 2 3
##  $ Name : chr  "Charlie" "Fox" "Greg"
##  $ Score: num  80 90 70
nrow(df)  # Number of rows
## [1] 3
ncol(df)  # Number of columns
## [1] 3
summary(df) # Summary statistics
##        ID          Name               Score   
##  Min.   :1.0   Length:3           Min.   :70  
##  1st Qu.:1.5   Class :character   1st Qu.:75  
##  Median :2.0   Mode  :character   Median :80  
##  Mean   :2.0                      Mean   :80  
##  3rd Qu.:2.5                      3rd Qu.:85  
##  Max.   :3.0                      Max.   :90
df$Name   # Access 'Name' column
## [1] "Charlie" "Fox"     "Greg"
df[1, ]   # First row
##   ID    Name Score
## 1  1 Charlie    80
df[, 2]   # Second column
## [1] "Charlie" "Fox"     "Greg"
df$Score[2] <- 95  # Update Bob’s score
print(df)
##   ID    Name Score
## 1  1 Charlie    80
## 2  2     Fox    95
## 3  3    Greg    70

Note: Understanding Branching and Loops in R

Branching in R allows programs to execute different paths based on conditions. Loops enable repetitive execution, automating tasks efficiently. Together, they create structured and dynamic workflows.

Why Are Branching and Loops Important?

  • Enables Decision-Making: Ensures correct operations based on conditions.
  • Automates Repetitive Tasks: Helps process datasets efficiently.
  • Optimizes Program Flow: Reduces redundant operations, improving performance.

Types of Branching in R

1. if Statement

Executes a block of code only if a condition is met.

2. if...else Statement

Executes one block if a condition is TRUE, otherwise another block runs.

3. while Loop

Repeats execution as long as a condition remains TRUE.

4. for Loop

Iterates over elements in a sequence, useful for processing vectors and lists.


# 1. Basic if Statement
x <- 10
if (x > 5) {
  print("x is greater than 5")
}
## [1] "x is greater than 5"
# 2. if...else Statement
y <- 3
if (y > 5) {
  print("y is greater than 5")
} else {
  print("y is less than or equal to 5")
}
## [1] "y is less than or equal to 5"
# 3. while Loop (Repeating While Condition is True)
count <- 1
while (count <= 3) {
  print(paste("Iteration:", count))
  count <- count + 1
}
## [1] "Iteration: 1"
## [1] "Iteration: 2"
## [1] "Iteration: 3"
# 4. for Loop (Iterate Over a Sequence)
values <- c(2, 4, 6, 8)
for (num in values) {
  print(paste("Processing value:", num))
}
## [1] "Processing value: 2"
## [1] "Processing value: 4"
## [1] "Processing value: 6"
## [1] "Processing value: 8"

##Note: Understanding Functions in R Functions in R allow for code reusability, improving efficiency and readability. They help break down complex operations into manageable steps, making scripts more structured and logical.

Why Are Functions Important?

  • Reduces Code Redundancy: Avoid repetition by defining reusable logic.
  • Improves Debugging: Easier to identify errors in well-structured functions.
  • Enhances Readability: Well-organized functions clarify program execution.
  • Optimizes Performance: Efficiently process inputs and return meaningful results.

Types of User-Defined Functions in R

1. Normal Function

A basic function that performs an operation when called.

2. Function with Default Parameter

Allows predefined values for arguments, reducing the need to specify them explicitly.

3. Function with Input Parameters

Takes arguments to dynamically perform operations based on provided values.

4. Function that Returns a Value

Returns computed results, making them reusable for further calculations.

5. line Function

Functions that don’t require a name, mainly used for short, inline operations.


#1. Normal function
greet <- function(){
  #creating the function
  print("Hello and Welcome to my script!!!")
}
 greet() #calling the function
## [1] "Hello and Welcome to my script!!!"
 #2. calling function with default parameter
 give_info <- function(state = "Munich"){
   print(paste("the man lives in", state))
 }
 give_info() #using default value
## [1] "the man lives in Munich"
 give_info("London") #using new value
## [1] "the man lives in London"
 #3. Function with input Parameter
 multiply <- function(x, y){
   product <- x * y
   return(product)
 }
 
 multiply(4,6)
## [1] 24
 #4. Function that returns a Value
 add_values <- function(a,b){
   additions <-a + b
   return(additions)
 }
 add_values(20, 50)
## [1] 70
 #5. A line function
lambda_result <- (function(x) x * 2)(8)
print(lambda_result)
## [1] 16
str_val <- (function(x, y) paste(x,y))('Lagos', "Nigeria") 
print(str_val)
## [1] "Lagos Nigeria"
greet <- (function(x)print(paste("Hello and welcome to", x)))("Germany")
## [1] "Hello and welcome to Germany"
mult <- (function(x,y) x*y)(10,5)
print(mult)
## [1] 50