The apply function in R is used to
apply a function to the rows or columns of a matrix or array. It’s a way
to perform repetitive operations on data structures (like matrices or
arrays) efficiently.
apply function?Imagine you have a large table of numbers (a matrix), and you want to
perform the same calculation on each row or each column of that table.
Instead of writing a loop to do this, the apply
function can do it in one line. It’s faster and easier to
write.
apply function:r: apply(X, margin, FUN , ...)X: This is the input data. It could be a matrix or an array.
MARGIN: This tells R whether to apply the function on the rows or the columns.
1 - means you are applying the function to the rows.
2 - means you are applying the function to the columns.
FUN: This is the function that you want to apply to the rows or columns. It could be any function, like sum, mean, or even a custom function that you create.
. . . : You can pass additional arguments to the function.
Let’s say you have a matrix that represents the test scores of students. Each row is a student, and each column is a subject.
scores <- matrix(c(80, 90, 85,
75, 60, 95,
88, 77, 92),
nrow = 3, byrow = TRUE)
colnames(scores) <- c("Math", "Science", "English")
rownames(scores) <- c("Student1", "Student2", "Student3")
print(scores)
## Math Science English
## Student1 80 90 85
## Student2 75 60 95
## Student3 88 77 92
If you want to calculate the average score for each
student (i.e., for each row), you can use
the apply function like this:
student_averages <- apply(scores, 1, mean)
print(student_averages)
## Student1 Student2 Student3
## 85.00000 76.66667 85.66667
Here’s what’s happening:
scores is the matrix of test scores.
1 tells R to apply the function on the rows.
mean is the function that’s calculating the average score for each row (i.e., for each student).
Now, if you want to calculate the average score for each
subject (i.e., for each column), you can use the
apply function like this:
subject_averages <- apply(scores, 2, mean)
print(subject_averages)
## Math Science English
## 81.00000 75.66667 90.66667
Here’s what’s happening:
scores is the matrix of test scores.
2 tells R to apply the function on the columns.
mean is the function that’s calculating the average score for each column (i.e., for each subject).
The apply function doesn’t just work
with basic functions like mean, sum, or
max. You can also create your own custom function and use
it.
Let’s say you want to subtract the minimum value from the
maximum value for each row. You can create a custom function
inside apply:
range_difference <- apply(scores, 1, function(x) max(x) - min(x))
print(range_difference)
## Student1 Student2 Student3
## 10 35 15
Here’s what’s happening:
The custom function function(x) max(x) - min(x)
calculates the difference between the maximum and minimum values for
each row.
1 tells R to apply this function to the rows.
So, for Student1, the difference between the highest score (90) and the lowest score (80) is 10.
apply works on
matrices and arrays.
You can apply a function to either the rows (MARGIN = 1) or columns (MARGIN = 2).
You can use built-in functions like sum,
mean, min, max, or create your
own custom functions.
apply makes your code shorter,
faster, and easier to read compared to using loops.
apply:Works only on matrices and arrays: If you’re
working with data frames or lists, there are other functions like
lapply, sapply, and tapply that
are better suited.
No names in the output: If you use
apply, the resulting vector or matrix doesn’t automatically
get row or column names. You may have to add them manually.
apply?When you have repetitive calculations on rows or columns of matrices or arrays.
When you want faster and cleaner code than using a
for loop.