Pipelining & Lambda Expressions

Spring 2026

Function Composition

Composition in Math

In math we sometimes have situations in which the co-domain of one functions because the domain of another: \(y = f(g(x))\)
When there are a bunch of these, it becomes tedious: \(y = a(b(c(d(x))))\)
Sometimes it’s more convenient to write: \(a \circ b \circ c \circ d(x)\)
Basically, we are saying apply \(d()\), that result goes into \(c()\), that result goes into \(b()\), that result goes into \(a()\)

Composition in Programming

The same tedium exists in programming
We can nest the functions:

y = third_function( second_function( first_function(x) ) )

Or use temporary variables:

y1 = first_function(x)
y2 = second_function(y1)
y  = third_function(y2)

But really all we want is to plug the result of one into another
This is basically a pipeline of operations

Data Processing

Stringing operations is pretty typical in data processing
Get this table, then filter some rows, to that add a column, now group that by such-and-such
Most data processing tasks involve such steps
Traditional imperative approaches are difficult to read and tedious to implement
Also, they read in the reverse order of the operations

suppressWarnings(library(dplyr))
df <- summarise(group_by(mutate(filter(mtcars, cyl==8), efficiency=mpg/wt), carb), AvgEff=mean(efficiency))
head(df)

Pipelining

Instead, we’d rather be able to say:
- Do this first operation
- Then this second operation
- Etc.
- Put the result here
Unsurprisingly, we call this pipelining
We need a new operator for this
- R doesn’t have one, but tidyverse extends R to give us this for data frames
- An alternative pipelining library for R is magrittr
- Python doesn’t have one, but some libraries give you something like it
- Julia has a pipe operator built-in for any type of data

Parameters And State Information

Pipelining requires some key assumptions: We know which parameter is the domain argument
With R/tidyverse, we assume it is the first parameter unless otherwise specified
With Julia, you can only have one parameter (sort of)
In general, when a function has multiple parameters, there’s got to be a way to either map functions to arguments, or to retain state inside the pipeline
That typically involves lambda expressions in Julia

Lambda Expressions

Functions as Parameters

Traditionally, we can pass a data argument (e.g., an integer) to use in a function as an integer parameter
Once it’s in the function, that code can use the integer whenever and however it likes
But what if I have some functionality that I’d like to pass in
That is: “Run this code when you get to that point”
We want the code we send to be used by the function that receives it whenever and however it likes
You could even store that function in a structure and use it later
That is, treat functions as data

Passing a Function

polynomial1(x) = x.^3 .+ 2*x .- 4
polynomial2(x) = 3*x.^2 .+ 6

function CoolFunction(f)
  x = rand(Float64, 10)
  y = f(x)
  return y
end

CoolFunction(polynomial1)
CoolFunction(polynomial2)

What is a Labmda Expression?

Sometimes these deferred “functions as data” are very concise one-offs
We know when we send the argument what it will look like, but defining a whole separate function for these every time gets tedious
We’d like to be able to just say, “Here’s some code to run when you get to that point”
When we do this, we need a way to map future arguments to what we will use in that code
That’s all a lambda expression is: Future code we will run with information for how to map future arguments
We also sometimes call these anonymous functions because often there is no function name at all

Example Lambda Expression in Julia

Julia uses the -> operator to perform this argument mapping
It means: “Hey, when you eventually run this the argument you send to me will be mapped to this parameter that my code uses

# Find first gives you the range of the first instance of a pattern in a longer list

findfirst("Turtle", "Teenage Mutant Ninja Turtles")

d = rand(10)  # How can I find the first range of values > 0.75?

# Each time a pattern is compared, the x>0.75 function is run,
# where "x" is whatever element from d it selected
findfirst(x -> x>0.75,  d)

That x>0.75 code doesn’t have a name, hence anonymous
Julia uses lambda expressions a lot, we’ve already seen some uses

List Comprehension in Julia

Common place we use lambda expressions is when populating a list
The “traditional” way to do something like this would be:

mylist = [];
for val in d
  push!(mylist, (val>0.5) + 0);
end

The map function in Julia says “apply this code to every element in list”

d = rand(10);
mylist = map(x ->  x>0.5, d)   # BTW, ths is equivalent to d.>0.5

Shorthand:

mylist = [x>0.5  for x in d]

The `do` Block in Julia

This all makes sense when the lambda expression is very succinct
What if you have several things to do, so need a whole code block ?

map([A, B, C]) do x
    if x < 0 && iseven(x)
        return 0
    elseif x == 0
        return 1
    else
        return x
    end
end

For each item in that first list, apply that block of code and treat the argument coming in as x in the block
Look back at the Flux neural network example from a few weeks ago

More Info: https://docs.julialang.org/en/v1/manual/functions/

Example Lambda Expression in Python

Python has lambda expressions, as well
We use the keyword lambda

import numpy as np
d = np.random.uniform(size=10)
list(filter(lambda x: x>0.75, d))

Tell filter to execute x>0.75 for item in d, treating that item as x

List Comprehension in Python

List comprehension in Python is a lambda expression:

fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = [x   for x in fruits if "a" in x]

This is the same as:

fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []
for x in fruits:
  newlist.append(x)

Tidyverse in R

Tidyverse Library

There is an external library in R that gives us this functionality
It must be installed, and the structures it works with are slightly different from data frames
Then you get access to a pipeline operator, %>%
But only for that data frame like structure
FYI: For magrittr, the pipeline operator is |>, like in Julia

More Info: https://malooflab.ucdavis.edu/apps/tidyverse-tutorial/

Tibbles

In R a tibble is like a simplified data frame
It has certain pretty-printing capabilities
But it also adds some syntactic features for subselection:

library(tidyverse)
mtcars %>% .$mpg

To do this, it restricts what you can name columns
Other than these ideas, it is more or less a data frame

Example:

suppressWarnings(library(dplyr))
suppressWarnings(library(tidyverse))
df <- mtcars %>% 
        filter(cyl==8) %>% 
        mutate(efficiency=mpg/wt) %>% 
        group_by(carb) %>% 
        summarize(AvgEff=mean(efficiency))
head(df)

Now the operations flow in order
No ugly nesting within nesting
Much more readable
Note: Can’t put %>% as first operator in a line for R syntax reasons

Another Example:

suppressWarnings(library(dplyr))
suppressWarnings(library(tidyverse))
df <- starwars %>% 
        filter(species == "Human") %>% 
        mutate(bmi=mass/height^2) %>% 
        group_by(homeworld) %>% 
        summarise(AvgBMI=mean(bmi)) %>% 
        arrange(AvgBMI)
head(df)

Handy Tidyverse Cheatsheet: https://github.com/rstudio/cheatsheets/blob/main/data-transformation.pdf

Pipeline Operations in Julia

Pro’s and Con’s Over Tidyverse

Julia’s pipeline operator is a native operator
It works on any kind of data
But dealing with functions that have multiple parameters not always straightforward
It requires the use of lambda expressions

More Info: https://mollahsabbir.medium.com/julia-pipeline-operator-write-methods-that-read-like-proses-c6800cafb73a

Simple Example

f(x) = x.^2
g(x) = 3*x .+ 2
h(x) = sum(x)

# This:
h(f(g([1,2])))

# Is the same as:
[1,2] |> g |> f |> h

Multiple Arguments

What if your function has multiple arguments?
How does the pipeline operator know when co-domain result to plug into the new domain?
Answer: lambda expressions

f(x,y) = x.^2 .+ y
g(x) = 3*x .+ 2
h(x) = sum(x)

h( f( g([1,2]), 5) )
[1,2] |>  g   |> a->f(a,5) |>  h

Example With DataFrames

using DataFrames, RDatasets, Statistics

df = dataset("datasets", "mtcars") |> 
    filter(row -> row.MPG>25) |> 
    x->transform(x, [:MPG, :WT] => (a,b) -> a./b)  |>
    x->groupby(x, :Carb) |> 
    x->combine(x,:MPG_WT_function=>mean)
head(df)

[:Col1, :Col2] => (a,b) -> a./b) means: Use Col1 of the data frame as argument a, use Col2 as arg b, then perform vectorized division between those two to make a new value
x->func(x, args) means map the co-domain as the x argument to func

Function Composition

Composition in Math

Composition in Programming

Data Processing

Pipelining

Parameters And State Information

Lambda Expressions

Functions as Parameters

Passing a Function

What is a Labmda Expression?

Example Lambda Expression in Julia

List Comprehension in Julia

The do Block in Julia

Example Lambda Expression in Python

List Comprehension in Python

Tidyverse in R

Tidyverse Library

Tibbles

Example:

Another Example:

Pipeline Operations in Julia

Pro’s and Con’s Over Tidyverse

Simple Example

Multiple Arguments

Example With DataFrames

The `do` Block in Julia