APPLY FUNCTION IN R STUDIO

The apply function in R is used to apply a function to the rows or columns of a matrix or array. It’s a way to perform repetitive operations on data structures (like matrices or arrays) efficiently.

Why use the apply function?

Imagine you have a large table of numbers (a matrix), and you want to perform the same calculation on each row or each column of that table. Instead of writing a loop to do this, the apply function can do it in one line. It’s faster and easier to write.

Syntax of apply function:

r: apply(X, margin, FUN , ...)

  • X: This is the input data. It could be a matrix or an array.

  • MARGIN: This tells R whether to apply the function on the rows or the columns.

    • 1 - means you are applying the function to the rows.

    • 2 - means you are applying the function to the columns.

  • FUN: This is the function that you want to apply to the rows or columns. It could be any function, like sum, mean, or even a custom function that you create.

  • . . . : You can pass additional arguments to the function.

An example to understand better:

Let’s say you have a matrix that represents the test scores of students. Each row is a student, and each column is a subject.

scores <- matrix(c(80, 90, 85,
                   75, 60, 95,
                   88, 77, 92), 
                 nrow = 3, byrow = TRUE)

colnames(scores) <- c("Math", "Science", "English")
rownames(scores) <- c("Student1", "Student2", "Student3")
print(scores)
##          Math Science English
## Student1   80      90      85
## Student2   75      60      95
## Student3   88      77      92

Case 1: Apply function to rows (MARGIN = 1)

If you want to calculate the average score for each student (i.e., for each row), you can use the apply function like this:

student_averages <- apply(scores, 1, mean)
print(student_averages)
## Student1 Student2 Student3 
## 85.00000 76.66667 85.66667

Here’s what’s happening:

  • scores is the matrix of test scores.

  • 1 tells R to apply the function on the rows.

  • mean is the function that’s calculating the average score for each row (i.e., for each student).

Case 2: Apply function to columns (MARGIN = 2)

Now, if you want to calculate the average score for each subject (i.e., for each column), you can use the apply function like this:

subject_averages <- apply(scores, 2, mean)
print(subject_averages)
##     Math  Science  English 
## 81.00000 75.66667 90.66667

Here’s what’s happening:

  • scores is the matrix of test scores.

  • 2 tells R to apply the function on the columns.

  • mean is the function that’s calculating the average score for each column (i.e., for each subject).

What can the function (FUN) be?

The apply function doesn’t just work with basic functions like mean, sum, or max. You can also create your own custom function and use it.

Example of using a custom function:

Let’s say you want to subtract the minimum value from the maximum value for each row. You can create a custom function inside apply:

range_difference <- apply(scores, 1, function(x) max(x) - min(x))
print(range_difference)
## Student1 Student2 Student3 
##       10       35       15

Here’s what’s happening:

  • The custom function function(x) max(x) - min(x) calculates the difference between the maximum and minimum values for each row.

  • 1 tells R to apply this function to the rows.

  • So, for Student1, the difference between the highest score (90) and the lowest score (80) is 10.

Key Points to Remember:

  • apply works on matrices and arrays.

  • You can apply a function to either the rows (MARGIN = 1) or columns (MARGIN = 2).

  • You can use built-in functions like sum, mean, min, max, or create your own custom functions.

  • apply makes your code shorter, faster, and easier to read compared to using loops.

A Few Limitations of apply:

  1. Works only on matrices and arrays: If you’re working with data frames or lists, there are other functions like lapply, sapply, and tapply that are better suited.

  2. No names in the output: If you use apply, the resulting vector or matrix doesn’t automatically get row or column names. You may have to add them manually.

When to use apply?

  • When you have repetitive calculations on rows or columns of matrices or arrays.

  • When you want faster and cleaner code than using a for loop.