2024-11-19

Introduction

  • R offers powerful features that set it apart in the programming world.
  • Let’s explore some unique aspects that make R interesting for programmers.

Vectorized Operations

  • Perform operations on entire vectors without explicit loops.
  • Leads to concise and efficient code.
# Vectorized addition
a <- 1:5
b <- 6:10
result <- a + b
print(result)  # Outputs: 7 9 11 13 15
## [1]  7  9 11 13 15

Functional Programming Paradigms

  • Functions are first-class citizens.
  • Supports higher-order functions, closures, and function factories.
# Function returning another function
power_factory <- function(n) {
  function(x) x ^ n
}
square <- power_factory(2)
cube <- power_factory(3)
square(4)  # Outputs: 16
## [1] 16
cube(2)    # Outputs: 8
## [1] 8

Lazy Evaluation

  • Arguments are evaluated only when needed.
  • Enables flexible and efficient function definitions.
# Lazy evaluation example
lazy_function <- function(x, y) {
  if (x > 0) x else y
}
result <- lazy_function(5, {print("y is evaluated"); 0})
print(result)  # Outputs: 5
## [1] 5
# "y is evaluated" is not printed

Metaprogramming Capabilities

  • Modify and generate code at runtime.
  • Use expressions and environments to manipulate code objects.
# Creating a function from an expression
expr <- expression(function(x) x + 1)
increment <- eval(expr)
increment(5)  # Outputs: 6
## [1] 6

Non-Standard Evaluation (NSE)

  • Functions can capture expressions instead of values.
  • Simplifies syntax for data manipulation and modeling.

Let’s first import dplyer:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Then we can peform data manipulation simply using “verbs”:

# Using NSE in dplyr
result <- mtcars %>%
  select(mpg, cyl) %>%
  filter(cyl == 6)
print(result)
##                 mpg cyl
## Mazda RX4      21.0   6
## Mazda RX4 Wag  21.0   6
## Hornet 4 Drive 21.4   6
## Valiant        18.1   6
## Merc 280       19.2   6
## Merc 280C      17.8   6
## Ferrari Dino   19.7   6

Advanced Graphics with ggplot2

  • Create complex and customizable visualizations.
  • Based on the Grammar of Graphics.

library(ggplot2)

# Create the plot and store it in a variable
p <- ggplot(mtcars, aes(x = wt, y = mpg, color = factor(gear))) +
     geom_point(size = 3) +
     labs(
       title = "Fuel Efficiency by Car Weight",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon",
       color = "Gears"
     ) +
     theme_minimal()

# Display the plot
p

Interfacing with Other Languages

  • Integrate C/C++ code for performance-critical sections using Rcpp.
  • Interface with Python via reticulate.

library(Rcpp)
cppFunction('
int fibonacci(int n) {
  if (n <= 1) return n;
  else return(fibonacci(n - 1) + fibonacci(n - 2));
}')
fibonacci(10)  # Outputs: 55
## [1] 55

Statistical Modeling and Machine Learning

  • Extensive support for statistical analyses.
  • Machine learning libraries like caret, xgboost, and randomForest.

library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## The following object is masked from 'package:dplyr':
## 
##     combine

model <- randomForest(Species ~ ., data = iris)
print(model)
## 
## Call:
##  randomForest(formula = Species ~ ., data = iris) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 4.67%
## Confusion matrix:
##            setosa versicolor virginica class.error
## setosa         50          0         0        0.00
## versicolor      0         47         3        0.06
## virginica       0          4        46        0.08

Parallel Computing

  • Utilize multiple cores with packages like parallel and foreach.
  • Simplifies parallelization of code.
library(parallel)
results <- mclapply(1:5, function(x) x^2, mc.cores = 2)
print(results)  # Outputs: List of squared numbers
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 9
## 
## [[4]]
## [1] 16
## 
## [[5]]
## [1] 25

Rich Ecosystem and Community

  • Over 16,000 packages available on CRAN.
  • Active community contributing to diverse domains.
install.packages(c("dplyr", "ggplot2", "Rcpp", "randomForest", "parallel"))