This document serves as an introduction to R programming, providing a
foundation for data analysis, visualization, and machine learning.
The goal is show proficiency in R for portfolio development.
R is widely used for:
- Data Analysis: Robust
statistical computing capabilities.
-
Visualization: Powerful graphing tools
(ggplot2).
- Machine Learning:
Libraries like caret and tidymodels.
-
Business Intelligence: Ideal for decision-making and
reporting.
Download the latest version of R from CRAN.
Get RStudio IDE here.
Open RStudio and check using:
sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_Germany.utf8 LC_CTYPE=English_Germany.utf8
## [3] LC_MONETARY=English_Germany.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Germany.utf8
##
## time zone: Europe/Berlin
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.47
## [5] cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.28
## [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4
## [13] compiler_4.4.1 rstudioapi_0.16.0 tools_4.4.1 evaluate_1.0.0
## [17] bslib_0.8.0 yaml_2.3.10 rlang_1.1.4 jsonlite_1.8.9
If RStudio runs successfully, you’re set!. ## Basic Synthax in R #### Using R as calculator
# variable assignment in R
a <- 35
b <- 5
c <- 35
d <- 8
# basic arithmetics in R
print(paste("a =", a))
## [1] "a = 35"
print(paste("b =", b))
## [1] "b = 5"
print(paste("c =", c))
## [1] "c = 35"
print(paste("d =", d))
## [1] "d = 8"
print(paste("a + b =", a + b))
## [1] "a + b = 40"
print(paste("a / b =", a / b))
## [1] "a / b = 7"
print(paste("a * b =", a * b))
## [1] "a * b = 175"
print(paste("a - b =", a - b))
## [1] "a - b = 30"
print(paste("a %% b =", a %% b))
## [1] "a %% b = 0"
print(paste("a %/% b =", a %/% b))
## [1] "a %/% b = 7"
print(paste("round(c/d) =", round(c/d, 3)))
## [1] "round(c/d) = 4.375"
print(paste("floor(c/d) =", floor(c/d)))
## [1] "floor(c/d) = 4"
print(paste("ceiling(c/d) =", ceiling(c/d)))
## [1] "ceiling(c/d) = 5"
print(paste("a > b =", a > b))
## [1] "a > b = TRUE"
print(paste("a < b =", a < b))
## [1] "a < b = FALSE"
print(paste("a > b or a = b =", a >= b))
## [1] "a > b or a = b = TRUE"
print(paste("a < b or a = b =", a <= b))
## [1] "a < b or a = b = FALSE"
print(paste("a is not equal to b =", a != b))
## [1] "a is not equal to b = TRUE"
print(paste("b is not equal to a =", b != a))
## [1] "b is not equal to a = TRUE"
print(paste("a is not equal to c =", a != c))
## [1] "a is not equal to c = FALSE"
print(paste("a > c and c > b =", a > c & c > b))
## [1] "a > c and c > b = FALSE"
print(paste("a = c and c > b =", a == c & c > b))
## [1] "a = c and c > b = TRUE"
print(paste("a != b or c > b =", a != b | c > b))
## [1] "a != b or c > b = TRUE"
print(paste("a != b or c > b =", a != c | c > c))
## [1] "a != b or c > b = FALSE"
Data types define how values are stored and manipulated in R, ensuring the proper functioning of computations and preventing errors in data processing. Without a solid grasp of data types, operations may yield incorrect results or inefficiencies, leading to unreliable analysis.
Data types are crucial for: - Preventing unexpected errors in calculations. - Optimizing memory usage for efficient data storage. - Ensuring structured processing for different types of information.
For instance, numerical data allows for precise calculations, textual data is best used for classification or labeling, and logical values play a significant role in programmatic decision-making.
R provides several core data types, each serving distinct purposes in programming and analysis.
Numeric values represent numbers, including whole numbers and decimals. These are essential for statistical calculations, financial modeling, and scientific computations.
Integers specifically store whole numbers without decimal points. They are useful in scenarios where decimal precision isn’t required, such as indexing, counts, and categorical grouping.
Character data consists of text values that serve as labels, identifiers, and descriptions. String manipulation plays a vital role in report generation, communication, and data categorization.
Logical values hold TRUE or FALSE states,
allowing condition-based operations. They are often used in comparisons,
decision structures, and filtering large datasets.
Factors represent categorical variables, such as product ratings or survey responses. Unlike raw character data, factors enable statistical modeling and improve storage efficiency when dealing with grouped classifications.
Complex numbers store both real and imaginary components, used primarily in advanced mathematical applications like signal processing and physics.
Choosing the correct data type is critical for efficient and accurate computations. Numeric data enables precise calculations, logical values assist in filtering and decision-making, and character-based variables allow for structured text analysis.
A strong understanding of data types ensures smooth operations and
prevents errors when working with datasets. The next step is exploring
how these types function within vectors, matrices, lists, and data
frames.
key structures that influence real world applications in
data analytics, business intelligence, and machine
learning.
#Numeric
num <- 10.5
print(paste("num =",class(num)))
## [1] "num = numeric"
#Integer
int_val <- 20L
print(paste("int_val =",class(int_val)))
## [1] "int_val = integer"
#Character (String)
text <- "Hello World"
print(paste("text =",class(text)))
## [1] "text = character"
#logical Data (Boolean)
is_valid <- TRUE
print(paste("is_valid =",class(is_valid)))
## [1] "is_valid = logical"
#Factor Data
categories <- factor(c("High", "Medium", "Low"))
print(paste("categories =",class(categories)))
## [1] "categories = factor"
#Complex Numbers
complex_numb <- 2 + 3i
print(paste("complex_numb =",class(complex_numb)))
## [1] "complex_numb = complex"
Type casting, also known as type conversion, refers to transforming a variable from one data type to another. This is especially useful when handling datasets where values may be improperly classified.
R provides several functions for type conversion:
1.
Character to Numeric: Text-based numbers
("45", "100.5") must be converted to numeric
for mathematical operations.
2. Character to
Factor: When categorical data is stored as text, converting it
to a factor allows efficient grouping in models.
3. Logical
to Numeric: TRUE and FALSE can be
treated as 1 and 0, useful in binary
classification.
# 1. Character to Numeric
char_num <- "45"
print(paste("char_num =", char_num))
## [1] "char_num = 45"
print(paste("char_num data type =", class(char_num)))
## [1] "char_num data type = character"
num_value <- as.numeric(char_num)
print(paste("num_value data type =", class(num_value)))
## [1] "num_value data type = numeric"
# 2. Character to Factor
char_category <- c("High", "Medium", "Low", "Medium")
factor_category <- as.factor(char_category)
print(factor_category) # Output: Factor Levels: High Medium Low
## [1] High Medium Low Medium
## Levels: High Low Medium
print(paste("class of factor_category =", class(factor_category)))
## [1] "class of factor_category = factor"
# 3. Logical to Numeric
logical_value <- TRUE
numeric_logical <- as.numeric(logical_value)
print(numeric_logical) # Output: 1
## [1] 1
print(paste("numeric_logic =", numeric_logical))
## [1] "numeric_logic = 1"
print(paste("class numeric_logic =", class(numeric_logical)))
## [1] "class numeric_logic = numeric"
# 4. Numeric to Character
num_to_char <- as.character(100)
print(paste("num_to_char class", class(num_to_char)))
## [1] "num_to_char class character"
# 5. Numeric to Integer
double_value <- 10.8
integer_value <- as.integer(double_value)
print(integer_value) # Output: 10 (Rounded down)
## [1] 10
Data structures define how data is stored, organized, and accessed in R. They allow efficient data manipulation and analysis, ensuring seamless operations across different types of datasets.
R provides several fundamental data structures:
Vectors are the simplest data structures, storing elements of the same type. They are useful for numerical calculations and categorical grouping.
Matrices extend vectors into a two-dimensional format, often used for mathematical operations and simulations.
Lists allow storing elements of different types, making them ideal for handling diverse data points within a single structure.
Data frames resemble spreadsheets or SQL tables, storing multiple columns with different data types. They are widely used in data analysis.
Factors help categorize data efficiently, ensuring structured grouping in analysis and modeling.
# 1. Vectors
vec <- c(1, 2, 3, 4, 5) #creating a vector
print(vec) # Output: Numeric vector
## [1] 1 2 3 4 5
length(vec)
## [1] 5
sum(vec)
## [1] 15
min(vec)
## [1] 1
max(vec)
## [1] 5
vec[1:4] # print 1 to 4 value of vec
## [1] 1 2 3 4
vec[1] <- 10 #vector modification
#2.Matrices
mat <- matrix(1:9, ncol = 2,nrow = 3)
## Warning in matrix(1:9, ncol = 2, nrow = 3): data length [9] is not a
## sub-multiple or multiple of the number of columns [2]
mart <- matrix(c(3,5,1,6,9,10), nrow = 2, ncol = 3)
dim(mat)
## [1] 3 2
mat[2,2] # Accessing elements
## [1] 5
mat[2,2] <- 20 # Modifying list elements
mat[2,2]
## [1] 20
length(mat)
## [1] 6
#using loop in matrix
for (i in 1:length(mart)){
cat("index =",i, ",", "value =",mart[i],"\n")
}
## index = 1 , value = 3
## index = 2 , value = 5
## index = 3 , value = 1
## index = 4 , value = 6
## index = 5 , value = 9
## index = 6 , value = 10
#3.List
lst <- list(name = 'jamie', age = 30, score = c(50,70, 95))
lst$name # Accessing elements
## [1] "jamie"
lst["age"] <- 40 # Modifying list elements
lst$name <- "fox" # Modifying list elements
lst$city <- "Berlin" # Adding new elements
print(lst)
## $name
## [1] "fox"
##
## $age
## [1] 40
##
## $score
## [1] 50 70 95
##
## $city
## [1] "Berlin"
#4. Dataframes
df <- data.frame(ID = 1:3, Name = c("Charlie", "Fox", "Greg"), Score = c(80, 90, 70))
str(df) # Structure of data frame
## 'data.frame': 3 obs. of 3 variables:
## $ ID : int 1 2 3
## $ Name : chr "Charlie" "Fox" "Greg"
## $ Score: num 80 90 70
nrow(df) # Number of rows
## [1] 3
ncol(df) # Number of columns
## [1] 3
summary(df) # Summary statistics
## ID Name Score
## Min. :1.0 Length:3 Min. :70
## 1st Qu.:1.5 Class :character 1st Qu.:75
## Median :2.0 Mode :character Median :80
## Mean :2.0 Mean :80
## 3rd Qu.:2.5 3rd Qu.:85
## Max. :3.0 Max. :90
df$Name # Access 'Name' column
## [1] "Charlie" "Fox" "Greg"
df[1, ] # First row
## ID Name Score
## 1 1 Charlie 80
df[, 2] # Second column
## [1] "Charlie" "Fox" "Greg"
df$Score[2] <- 95 # Update Bob’s score
print(df)
## ID Name Score
## 1 1 Charlie 80
## 2 2 Fox 95
## 3 3 Greg 70
Branching in R allows programs to execute different paths based on conditions. Loops enable repetitive execution, automating tasks efficiently. Together, they create structured and dynamic workflows.
if StatementExecutes a block of code only if a condition is met.
if...else StatementExecutes one block if a condition is TRUE, otherwise
another block runs.
while LoopRepeats execution as long as a condition remains
TRUE.
for LoopIterates over elements in a sequence, useful for processing vectors and lists.
# 1. Basic if Statement
x <- 10
if (x > 5) {
print("x is greater than 5")
}
## [1] "x is greater than 5"
# 2. if...else Statement
y <- 3
if (y > 5) {
print("y is greater than 5")
} else {
print("y is less than or equal to 5")
}
## [1] "y is less than or equal to 5"
# 3. while Loop (Repeating While Condition is True)
count <- 1
while (count <= 3) {
print(paste("Iteration:", count))
count <- count + 1
}
## [1] "Iteration: 1"
## [1] "Iteration: 2"
## [1] "Iteration: 3"
# 4. for Loop (Iterate Over a Sequence)
values <- c(2, 4, 6, 8)
for (num in values) {
print(paste("Processing value:", num))
}
## [1] "Processing value: 2"
## [1] "Processing value: 4"
## [1] "Processing value: 6"
## [1] "Processing value: 8"
##Note: Understanding Functions in R Functions in R allow for code reusability, improving efficiency and readability. They help break down complex operations into manageable steps, making scripts more structured and logical.
A basic function that performs an operation when called.
Allows predefined values for arguments, reducing the need to specify them explicitly.
Takes arguments to dynamically perform operations based on provided values.
Returns computed results, making them reusable for further calculations.
Functions that don’t require a name, mainly used for short, inline operations.
#1. Normal function
greet <- function(){
#creating the function
print("Hello and Welcome to my script!!!")
}
greet() #calling the function
## [1] "Hello and Welcome to my script!!!"
#2. calling function with default parameter
give_info <- function(state = "Munich"){
print(paste("the man lives in", state))
}
give_info() #using default value
## [1] "the man lives in Munich"
give_info("London") #using new value
## [1] "the man lives in London"
#3. Function with input Parameter
multiply <- function(x, y){
product <- x * y
return(product)
}
multiply(4,6)
## [1] 24
#4. Function that returns a Value
add_values <- function(a,b){
additions <-a + b
return(additions)
}
add_values(20, 50)
## [1] 70
#5. A line function
lambda_result <- (function(x) x * 2)(8)
print(lambda_result)
## [1] 16
str_val <- (function(x, y) paste(x,y))('Lagos', "Nigeria")
print(str_val)
## [1] "Lagos Nigeria"
greet <- (function(x)print(paste("Hello and welcome to", x)))("Germany")
## [1] "Hello and welcome to Germany"
mult <- (function(x,y) x*y)(10,5)
print(mult)
## [1] 50