Why Julia?

A gentle pitch

Jose Storopoli, PhD

Agenda

1. speed
2. ease-of-use
3. composability

What I Assume?

• Python background
• scientific computing background

Speed

Julia is fast!

Two examples:

• Data Wrangling: pandas versus DataFrames.jl
• ODE solving: scipy versus DifferentialEquations.jl

Benchmarking — Data Wrangling

Common data wrangling scenario doing “split-apply-combine” operations.

• 10,000 observations
• 1 categorical variable x $\in \{\mathrm{A}, \mathrm{B}, \mathrm{C}, \mathrm{D}\}$
• 2 continuous variables:
• y $\in [0, 1]$
• z $\text{Normal}(0, 1)$

Benchmarking — Data Wrangling (Python)

using BenchmarkTools
py"""
import pandas as pd
import numpy as np

n = 10000

df = pd.DataFrame({'x': np.random.choice(['A', 'B', 'C', 'D'], n, replace=True),
'y': np.random.randn(n),
'z': np.random.rand(n)})
"""
@btime py"df.groupby('x').agg({'y': 'median', 'z': 'mean'})";
  755.016 μs (3 allocations: 144 bytes)

Benchmarking — Data Wrangling (Julia)

using Random
using DataFrames
using BenchmarkTools
using Chain
Random.seed!(123)

n = 10_000

df = DataFrame(
x=rand('A':'D', n),
y=rand(n),
z=randn(n),
)

@btime @chain \$df begin
groupby(:x)
combine(:y => median, :z => mean)
end;
  170.326 μs (500 allocations: 466.34 KiB)

Benchmarking — ODE Solver

Second order non-linear ODE example with a simple pendulum

\begin{align*} &\dot{\theta} = d{\theta} \\ &\dot{d\theta} = - \frac{g}{L}{\sin(\theta)} \end{align*}

Benchmarking — ODE Solver (Julia)

using DifferentialEquations

# Constants
const g = 9.81
L = 1.0

# Initial Conditions
u₀ = [0, π/2]
tspan = (0.0, 6.3)

# Define the problem
function simplependulum(du, u, p, t)
θ, dθ = u
du[1] = dθ
du[2] = -(g/L)*sin(θ)
end

# Pass to solvers
prob = ODEProblem(simplependulum, u₀, tspan)
# RK 4/5th order solver (Tsitouras)
@btime solve(prob, Tsit5(); saveat=range(tspan...; length=1_000));
  86.747 μs (1838 allocations: 139.62 KiB)

Benchmarking — ODE Solver (Python)

py"""
import numpy as np
from scipy.integrate import odeint

# Constants
g = 9.81
L = 1.0

# Initial Conditions
u0 = [0, np.pi/2]
tspan = np.linspace(0.0, 6.3, 1000)

def simplependulum(u, t, g, L):
theta, dtheta = u
dydt = [dtheta, -(g/L)*np.sin(theta)]
return dydt
"""

# RK 4/5th order solver (Dormand-Prince)
@btime py"odeint(simplependulum, u0, tspan, args=(g, L))";
  513.592 μs (29 allocations: 17.09 KiB)

Why Julia is so Fast?

LLVM

• just-in-time compilation for the LLVM compiler
• exposes everything in intermediate representation code
• then LLVM does what does best: OPTIMIZE
• including for-loops

Why Julia is so Fast? — LLVM code

using Statistics: mean
@code_llvm mean(1:10)

Why Julia is so Fast? — LLVM code

;  @ /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/Statistics/src/Statistics.jl:185 within mean
define double @julia_mean_5524([2 x i64]* nocapture nonnull readonly align 8 dereferenceable(16) %0) #0 {
top:
;  @ /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/Statistics/src/Statistics.jl:186 within mean
; ┌ @ range.jl:654 within isempty
; │┌ @ range.jl:816 within first
; ││┌ @ Base.jl:38 within getproperty
%1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
; │└└
; │┌ @ range.jl:821 within last
; ││┌ @ Base.jl:38 within getproperty
%2 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 1
; │└└
; │┌ @ operators.jl:382 within >
; ││┌ @ int.jl:83 within <
%3 = load i64, i64* %2, 
align 8
%4 = load i64, i64* %1, align 8
%.not = icmp slt i64 %3, %4
; └└└
%5 = add i64 %3, %4
%6 = sitofp i64 %5 to double
%7 = fmul double %6, 5.000000e-01
%common.ret.op = select i1 %.not, double 0x7FF8000000000000, double %7
;  @ /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/Statistics/src/Statistics.jl within mean
ret double %common.ret.op
}

Ease of Use

The syntax is quite similar to Python.

But, no identation and every keyword needs an end.

Julia:

for i in 1:10
println(i)
end

Python:

for i in range(10):
print(i)

It’s Julia all the way down

If you need to find something just use the @which macro on a type or a function signature.

@which DataFrame # type
DataFrames

@which mean(1:10) # function signature.

Composability

• It is very easy to create new packages that have types and functions.

• You can extend other package’s functions, including Julia’s Base standard library to you new types.

• And you can also create new functions for other Package’s types.

Composability — Example with Point

struct Point
x::Float64
y::Float64
end

function Base.:+(x::Point, y::Point)
return Point(x.x + y.x, x.y + y.y)
end

function distance(x::Point, y::Point)
return sqrt( (x.x - y.x)^2 + (x.y - y.y)^2 )
end

p1 = Point(1, 1); p2 = Point(2, 2)
p1 + p2
Point(3.0, 3.0)
distance(p1, p2)
1.4142135623730951

Composability — Example with Autodiff

Suppose you are creating a new sort of graph structure that allows for differentiation and integration, i.e you can take gradients, Jacobians, Hessians and so on.

Imagine having to code the whole API in libtorch (PyTorch C++ backend). Including:

• types
• constructors
• linear algebra functions
• autodiff rules

And in the end you can only use PyTorch. You would have to do the whole thing again for JAX or any other autodiff backend.

Composability — Example with Autodiff (Julia)

Now let’s see how we do this in Julia?

• We can create a package DifferentialGraph.jl.
• Add ChainRulesCore.jl as a dependency.
• Create forward- and reverse-mode derivative rules: rrules or frules

Now we can use you differential graphs with all of these backends:

My Pitch

• It is fast.
• It is easy to use.
• Learning the basics of Julia will make your life so much easier in all other packages. You don’t need to learn specific package syntax to be effective in using a certain package.
• A bliss to install in Windows, Mac, and Linux (even in clusters).
• Very good community, check the discourse.
• Very “nerdy”, “mathy”, and “geeky” userbase.
• If you are creating new stuff, like research or algorithms, you don’t want to have to stumble upon FORTRAN or C code (scipy, numpy, pytorch etc.). In Julia everything is in Julia.
• You can easily mix-and-match types and functions from different packages, as you saw in the previous slide.
• Good language interop:

It’s not all rainbows

• Hard to onboard people. Sometimes they don’t want to learn new stuff (I mean we still have FORTRAN around …).
• Not widely used in the marketplace (but tons of academic usage).
• Some package ecosystems are not mature enough, e.g. survival analysis. But, differential equations is way more mature than other scientific computing languages.
• In my point of view, Julia’s strength is in scientific computing. For all other things, you might not have additional benefits.

Conclusions

• Julia is pretty darn awesome.
• Easy to get going, and you can always make it faster by just optimizing your Julia code.
• No need to drop down to C++.
• Buuuut it can’t beat Python at deep learning.
• Otherwise, it’s worth a try.
• Godspeed to you.

Packages Used

Julia: 1.8.5
BenchmarkTools 1.3.2
Chain 0.5.0
DataFrames 1.5.0
DifferentialEquations 7.6.0
Pkg 1.8.0
PyCall 1.95.1
Python: 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]
numpy: 1.24.2
scipy: 1.10.0
pandas: 1.5.3

System Information

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × 12th Gen Intel(R) Core(TM) i5-1240P
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
Threads: 16 on 16 virtual cores
Environment:
JULIA_NUM_THREADS = auto