Spring 2026

Function Composition

Composition in Math

  • In math we sometimes have situations in which the co-domain of one functions because the domain of another: \(y = f(g(x))\)
  • When there are a bunch of these, it becomes tedious: \(y = a(b(c(d(x))))\)
  • Sometimes it’s more convenient to write: \(a \circ b \circ c \circ d(x)\)
  • Basically, we are saying apply \(d()\), that result goes into \(c()\), that result goes into \(b()\), that result goes into \(a()\)

Composition in Programming

  • The same tedium exists in programming
  • We can nest the functions:
y = third_function( second_function( first_function(x) ) )
  • Or use temporary variables:
y1 = first_function(x)
y2 = second_function(y1)
y  = third_function(y2)
  • But really all we want is to plug the result of one into another
  • This is basically a pipeline of operations

Data Processing

  • Stringing operations is pretty typical in data processing
  • Get this table, then filter some rows, to that add a column, now group that by such-and-such
  • Most data processing tasks involve such steps
  • Traditional imperative approaches are difficult to read and tedious to implement
  • Also, they read in the reverse order of the operations
suppressWarnings(library(dplyr))
df <- summarise(group_by(mutate(filter(mtcars, cyl==8), efficiency=mpg/wt), carb), AvgEff=mean(efficiency))
head(df)

Pipelining

  • Instead, we’d rather be able to say:
    • Do this first operation
    • Then this second operation
    • Etc.
    • Put the result here
  • Unsurprisingly, we call this pipelining
  • We need a new operator for this
    • R doesn’t have one, but tidyverse extends R to give us this for data frames
    • An alternative pipelining library for R is magrittr
    • Python doesn’t have one, but some libraries give you something like it
    • Julia has a pipe operator built-in for any type of data

Parameters And State Information

  • Pipelining requires some key assumptions: We know which parameter is the domain argument
  • With R/tidyverse, we assume it is the first parameter unless otherwise specified
  • With Julia, you can only have one parameter (sort of)
  • In general, when a function has multiple parameters, there’s got to be a way to either map functions to arguments, or to retain state inside the pipeline
  • That typically involves lambda expressions in Julia

Lambda Expressions

Functions as Parameters

  • Traditionally, we can pass a data argument (e.g., an integer) to use in a function as an integer parameter
  • Once it’s in the function, that code can use the integer whenever and however it likes
  • But what if I have some functionality that I’d like to pass in
  • That is: “Run this code when you get to that point”
  • We want the code we send to be used by the function that receives it whenever and however it likes
  • You could even store that function in a structure and use it later
  • That is, treat functions as data

Passing a Function

polynomial1(x) = x.^3 .+ 2*x .- 4
polynomial2(x) = 3*x.^2 .+ 6

function CoolFunction(f)
  x = rand(Float64, 10)
  y = f(x)
  return y
end

CoolFunction(polynomial1)
CoolFunction(polynomial2)

What is a Labmda Expression?

  • Sometimes these deferred “functions as data” are very concise one-offs
  • We know when we send the argument what it will look like, but defining a whole separate function for these every time gets tedious
  • We’d like to be able to just say, “Here’s some code to run when you get to that point”
  • When we do this, we need a way to map future arguments to what we will use in that code
  • That’s all a lambda expression is: Future code we will run with information for how to map future arguments
  • We also sometimes call these anonymous functions because often there is no function name at all

Example Lambda Expression in Julia

  • Julia uses the -> operator to perform this argument mapping
  • It means: “Hey, when you eventually run this the argument you send to me will be mapped to this parameter that my code uses
# Find first gives you the range of the first instance of a pattern in a longer list

findfirst("Turtle", "Teenage Mutant Ninja Turtles")

d = rand(10)  # How can I find the first range of values > 0.75?

# Each time a pattern is compared, the x>0.75 function is run,
# where "x" is whatever element from d it selected
findfirst(x -> x>0.75,  d)
  • That x>0.75 code doesn’t have a name, hence anonymous
  • Julia uses lambda expressions a lot, we’ve already seen some uses

List Comprehension in Julia

  • Common place we use lambda expressions is when populating a list
  • The “traditional” way to do something like this would be:
mylist = [];
for val in d
  push!(mylist, (val>0.5) + 0);
end
  • The map function in Julia says “apply this code to every element in list”
d = rand(10);
mylist = map(x ->  x>0.5, d)   # BTW, ths is equivalent to d.>0.5
  • Shorthand:
mylist = [x>0.5  for x in d]

The do Block in Julia

  • This all makes sense when the lambda expression is very succinct
  • What if you have several things to do, so need a whole code block ?
map([A, B, C]) do x
    if x < 0 && iseven(x)
        return 0
    elseif x == 0
        return 1
    else
        return x
    end
end
  • For each item in that first list, apply that block of code and treat the argument coming in as x in the block
  • Look back at the Flux neural network example from a few weeks ago

More Info: https://docs.julialang.org/en/v1/manual/functions/

Example Lambda Expression in Python

  • Python has lambda expressions, as well
  • We use the keyword lambda
import numpy as np
d = np.random.uniform(size=10)
list(filter(lambda x: x>0.75, d))
  • Tell filter to execute x>0.75 for item in d, treating that item as x

List Comprehension in Python

  • List comprehension in Python is a lambda expression:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = [x   for x in fruits if "a" in x]
  • This is the same as:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []
for x in fruits:
  newlist.append(x)

Tidyverse in R

Tidyverse Library

  • There is an external library in R that gives us this functionality
  • It must be installed, and the structures it works with are slightly different from data frames
  • Then you get access to a pipeline operator, %>%
  • But only for that data frame like structure
  • FYI: For magrittr, the pipeline operator is |>, like in Julia

More Info: https://malooflab.ucdavis.edu/apps/tidyverse-tutorial/

Tibbles

  • In R a tibble is like a simplified data frame
  • It has certain pretty-printing capabilities
  • But it also adds some syntactic features for subselection:
library(tidyverse)
mtcars %>% .$mpg
  • To do this, it restricts what you can name columns
  • Other than these ideas, it is more or less a data frame

Example:

suppressWarnings(library(dplyr))
suppressWarnings(library(tidyverse))
df <- mtcars %>% 
        filter(cyl==8) %>% 
        mutate(efficiency=mpg/wt) %>% 
        group_by(carb) %>% 
        summarize(AvgEff=mean(efficiency))
head(df)
  • Now the operations flow in order
  • No ugly nesting within nesting
  • Much more readable
  • Note: Can’t put %>% as first operator in a line for R syntax reasons

Another Example:

Pipeline Operations in Julia

Pro’s and Con’s Over Tidyverse

Simple Example

f(x) = x.^2
g(x) = 3*x .+ 2
h(x) = sum(x)

# This:
h(f(g([1,2])))

# Is the same as:
[1,2] |> g |> f |> h

Multiple Arguments

  • What if your function has multiple arguments?
  • How does the pipeline operator know when co-domain result to plug into the new domain?
  • Answer: lambda expressions
f(x,y) = x.^2 .+ y
g(x) = 3*x .+ 2
h(x) = sum(x)

h( f( g([1,2]), 5) )
[1,2] |>  g   |> a->f(a,5) |>  h

Example With DataFrames

using DataFrames, RDatasets, Statistics

df = dataset("datasets", "mtcars") |> 
    filter(row -> row.MPG>25) |> 
    x->transform(x, [:MPG, :WT] => (a,b) -> a./b)  |>
    x->groupby(x, :Carb) |> 
    x->combine(x,:MPG_WT_function=>mean)
head(df)
  • [:Col1, :Col2] => (a,b) -> a./b) means: Use Col1 of the data frame as argument a, use Col2 as arg b, then perform vectorized division between those two to make a new value
  • x->func(x, args) means map the co-domain as the x argument to func