Spring 2025

Julia Vectored “Dot” Operations

De-Vectored Operations

  • Consider multiplying two lists of numbers together, pairwise:
x = [1,2,3];
y = [4,5,6];
z = []
for idx in 1:length(x)
   push!(z, x[idx]*y[idx]);
end
println("z = $z");
  • This is cumbersome and sequential
  • But those multiplications don’t depend on one another … they could be concurrent

Vectored Operations

Timing Things

function unsquare(x)
  result = similar(x)
  for idx in 1:length(x)
    result[idx] = √( x[idx]^2 )
  end
  return result
end

x = 1:100000
@time unsquare(x);
@time .√( x.^2 );

Estimating Pi

estimate(n) = 4*sum( (rand(n).^2 .+ rand(n).^2) .< 1 ) / n
estimate(100)
est = estimate(10000000)
abs(est - π)

Vectorization With Strings

string([1,2,3])
string.([1,2,3])
string([1,2,3], ["a", "b", "c"])
string.([1,2,3], ["a", "b", "c"])

Cool One-Liners in Julia

fib(n) = n ≤ 2 ? one(n) : fib(n-1) + fib(n-2)     # Fibonacci number
dist(X,Y) = .√sum((X-Y).^2, dims=2)               # Distance between every row vector betwen X and Y
norm(X) = X./sum(X, dims=2)                       # Normalize each row vector in a matrix
avglag(X) = sum(X[2:end] - X[1:end-1], dims=1) / (size(X)[1]-1)  # Average lag between numbers in a list

# Here's a cooler estimator for pi (Bailey-Borwein-Plouffe)
bppEstimate(n) = sum([(1/16^k) * sum([4,-2,-1,-1] ./ (8*k .+ [1,4,5,6]))  for k in 0:n])

The DataFrames Package

Install DataFrames and CSV

  • There is an external package that lets you manipulate data frames like R and pandas
  • There is also a package for reading and writing CSV files, and a generic one for delimited files
  • Also, for remote calls we’ll need the HTTP package
  • Install in the usual way
using Pkg;
Pkg.add(["DataFrames", "CSV", "DelimitedFiles", "HTTP"]);
using DataFrames, CSV, DelimitedFiles, HTTP;

Reading a CSV File

  • Assume there’s a file called foo.csv:
foo = DataFrame(CSV.File("foo.csv"))

Reading a CSV from the web

# Make a function to simplify this:
read_remote_csv(url) = DataFrame(CSV.File(HTTP.get(url).body));
df = read_remote_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/refs/heads/master/iris.csv")

Accessing Parts of DataFrames

show(first(df, 5))
names(df)
first(df.sepal_length, 7)

Calling R and Python from Julia

The PyCall Package

  • Julia has an external package called PyCall that let’s you call Python directly
using Pkg
Pkg.add("PyCall")    # Install, just once
  • Here’s some documentation for it:

https://github.com/JuliaPy/PyCall.jl

Using the Built-In Python

  • By default Julia mantains it’s own version of Python, not your system Python
  • So you won’t necessarily have access to Python packages you’ve already installed
  • You can install Python packages with Conda
Pkg.build("PyCall") 
using Conda
Conda.add(["numpy", "pandas", "scipy", "matplotlib"])  

Using Your System Python

  • But you can use your system python:
ENV["PYTHON"] = "/opt/local/bin/python"
Pkg.build("PyCall")

Using PyCall

  • Once you have PyCall loaded, you can import any installed Python package you like with pyimport
  • Afterward, a lot of Python code can be directly called
np = pyimport("numpy");
x = np.random.normal(size=10, scale=2.5);
println("x = $x");

Calling Pandas from Inside Julia

pd = pyimport("pandas");
foo = pd.read_csv("foo.csv")
foo["Name"]

The RCall Package

The R Environment

  • You can drop into an R console from inside Julia by typing the $
  • You get back to the Julia prompt by pressing backspace (Windows) or delete (MacOS)
  • You can copy variables into the R environment from Julia using @rput
x = 3.1415;
@rput x;
  • You can copy variables out of the R environment from Julia using @rget
@rget y;
println("y = $y");

Using R Evaluation Strings

  • You can make a direct call using R strings:
x = R"rnorm(10)"
  • But results from R calls are special Robjects
  • If you want to convert R structures into Julia structures, use Rcopy
z = rcopy(R"rnorm(10)")
  • Or the other direction with robject

Multi-Line R Calls

rdf = robject(df);
@rput rdf;
R"""
  library(ggplot2)
  ggplot(rdf, aes(x=sepal_length, y=sepal_width, fill=species)) +
    geom_point(size=4, shape=21, color="black") +
    xlab("Sepal Length of Iris") + ylab("Sepal Width of Iris")
  """

Importing and Calling Directly

@rimport base as rbase
rbase.sum(df.sepal_length)
rcall(:summary, rdf)