Why Julia?

A gentle pitch

Jose Storopoli, PhD

Agenda



  1. speed
  2. ease-of-use
  3. composability

What I Assume?



  • Python background
  • scientific computing background

So let’s dive in?

Julia is past beyond “experimental”

Speed


Julia is fast!


Two examples:

  • Data Wrangling: pandas versus DataFrames.jl
  • ODE solving: scipy versus DifferentialEquations.jl

Benchmarking — Data Wrangling

Common data wrangling scenario doing “split-apply-combine” operations.

  • 10,000 observations
  • 1 categorical variable x \(\in \{\mathrm{A}, \mathrm{B}, \mathrm{C}, \mathrm{D}\}\)
  • 2 continuous variables:
    • y \(\in [0, 1]\)
    • z \(\text{Normal}(0, 1)\)

Benchmarking — Data Wrangling (Python)

using BenchmarkTools
py"""
import pandas as pd
import numpy as np

n = 10000

df = pd.DataFrame({'x': np.random.choice(['A', 'B', 'C', 'D'], n, replace=True),
                   'y': np.random.randn(n),
                   'z': np.random.rand(n)})
"""
@btime py"df.groupby('x').agg({'y': 'median', 'z': 'mean'})";
  1.435 ms (3 allocations: 144 bytes)

Benchmarking — Data Wrangling (Julia)

using Random
using DataFrames
using BenchmarkTools
using Chain
Random.seed!(123)

n = 10_000

df = DataFrame(
    x=rand('A':'D', n),
    y=rand(n),
    z=randn(n),
)

@btime @chain $df begin
    groupby(:x)
    combine(:y => median, :z => mean)
end;
  277.088 μs (433 allocations: 540.63 KiB)

Benchmarking — ODE Solver

Second order non-linear ODE example with a simple pendulum

\[ \begin{align*} &\dot{\theta} = d{\theta} \\ &\dot{d\theta} = - \frac{g}{L}{\sin(\theta)} \end{align*} \]

Benchmarking — ODE Solver (Julia)

using DifferentialEquations

# Constants
const g = 9.81
L = 1.0

# Initial Conditions
u₀ = [0, π/2]
tspan = (0.0, 6.3)

# Define the problem
function simplependulum(du, u, p, t)
    θ, dθ = u
    du[1] =
    du[2] = -(g/L)*sin(θ)
end

# Pass to solvers
prob = ODEProblem(simplependulum, u₀, tspan)
# RK 4/5th order solver (Tsitouras)
@btime solve(prob, Tsit5(); saveat=range(tspan...; length=1_000));
  165.100 μs (1840 allocations: 139.64 KiB)

Benchmarking — ODE Solver (Python)

py"""
import numpy as np
from scipy.integrate import odeint

# Constants
g = 9.81
L = 1.0

# Initial Conditions
u0 = [0, np.pi/2]
tspan = np.linspace(0.0, 6.3, 1000)

def simplependulum(u, t, g, L):
    theta, dtheta = u
    dydt = [dtheta, -(g/L)*np.sin(theta)]
    return dydt
"""

# RK 4/5th order solver (Dormand-Prince)
@btime py"odeint(simplependulum, u0, tspan, args=(g, L))";
  1.506 ms (29 allocations: 17.05 KiB)

Why Julia is so Fast?

LLVM

  • just-in-time compilation for the LLVM compiler
  • exposes everything in intermediate representation code
  • then LLVM does what does best: OPTIMIZE
  • including for-loops

Why Julia is so Fast? — LLVM code

using Statistics: mean
@code_llvm mean(1:10)

Why Julia is so Fast? — LLVM code

;  @ /opt/hostedtoolcache/julia/1.10.3/x64/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:195 within `mean`
define double @julia_mean_2080([2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %0) #0 {
top:
;  @ /opt/hostedtoolcache/julia/1.10.3/x64/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:196 within `mean`
; ┌ @ range.jl:672 within `isempty`
; │┌ @ range.jl:834 within `first`
; ││┌ @ Base.jl:37 within `getproperty`
     %1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
; │└└
; │┌ @ range.jl:839 within `last`
; ││┌ @ Base.jl:37 within `getproperty`
     %2 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 1
; │└└
; │┌ @ operators.jl:378 within `>`
; ││┌ @ int.jl:83 within `<`
     %unbox = load i64, i64* %2, align 8
     %unbox1 = load i64, i64* %1, align 8
     %.not = icmp slt i64 %unbox, %unbox1
; └└└
  %3 = add i64 %unbox, %unbox1
  %4 = sitofp i64 %3 to double
  %5 = fmul double %4, 5.000000e-01
  %common.ret.op = select i1 %.not, double 0x7FF8000000000000, double %5
;  @ /opt/hostedtoolcache/julia/1.10.3/x64/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl within `mean`
  ret double %common.ret.op
}

Ease of Use

The syntax is quite similar to Python.

But, no indentation and every keyword needs an end.

Julia:

for i in 1:10
    println(i)
end

Python:

for i in range(10):
    print(i)

It’s Julia all the way down

If you need to find something just use the @which macro on a type or a function signature.


@which DataFrame # type
DataFrames


@which mean(1:10) # function signature.

Composability

  • It is very easy to create new packages that have types and functions.

  • You can extend other package’s functions, including Julia’s Base standard library to you new types.

  • And you can also create new functions for other Package’s types.

Composability — Example with Point

struct Point
  x::Float64
  y::Float64
end

function Base.:+(x::Point, y::Point)
  return Point(x.x + y.x, x.y + y.y)
end

function distance(x::Point, y::Point)
  return sqrt( (x.x - y.x)^2 + (x.y - y.y)^2 )
end

p1 = Point(1, 1); p2 = Point(2, 2)
p1 + p2
Point(3.0, 3.0)
distance(p1, p2)
1.4142135623730951

Composability — Example with Autodiff

Suppose you are creating a new sort of graph structure that allows for differentiation and integration, i.e you can take gradients, Jacobians, Hessians and so on.


Imagine having to code the whole API in libtorch (PyTorch C++ backend). Including:

  • types
  • constructors
  • linear algebra functions
  • autodiff rules

And in the end you can only use PyTorch. You would have to do the whole thing again for JAX or any other autodiff backend.

Composability — Example with Autodiff (Julia)

Now let’s see how we do this in Julia?

  • We can create a package DifferentialGraph.jl.
  • Add ChainRulesCore.jl as a dependency.
  • Create forward- and reverse-mode derivative rules: rrules or frules

Now we can use you differential graphs with all of these backends:

Composability — Examples from the Julia Ecosystem

My Pitch

  • It is fast.
  • It is easy to use.
  • Learning the basics of Julia will make your life so much easier in all other packages. You don’t need to learn specific package syntax to be effective in using a certain package.
  • A bliss to install in Windows, Mac, and Linux (even in clusters).
  • Very good community, check the discourse.
  • Very “nerdy”, “mathy”, and “geeky” userbase.
  • If you are creating new stuff, like research or algorithms, you don’t want to have to stumble upon FORTRAN or C code (scipy, numpy, pytorch etc.). In Julia everything is in Julia.
  • You can easily mix-and-match types and functions from different packages, as you saw in the previous slide.
  • Good language interop:

It’s not all rainbows

  • Hard to onboard people. Sometimes they don’t want to learn new stuff (I mean we still have FORTRAN around …).
  • Not widely used in the marketplace (but tons of academic usage).
  • Some package ecosystems are not mature enough, e.g. survival analysis. But, differential equations is way more mature than other scientific computing languages.
  • In my point of view, Julia’s strength is in scientific computing. For all other things, you might not have additional benefits.

Some nice packages

Conclusions

  • Julia is pretty darn awesome.
  • Easy to get going, and you can always make it faster by just optimizing your Julia code.
  • No need to drop down to C++.
  • Buuuut it can’t beat Python at deep learning.
  • Otherwise, it’s worth a try.
  • Godspeed to you.

Packages Used

Julia: 1.10.3
BenchmarkTools 1.5.0
Chain 0.6.0
DataFrames 1.6.1
DifferentialEquations 7.13.0
IJulia 1.24.2
Pkg 1.10.0
PyCall 1.96.4
Statistics 1.10.0
Python: 3.12.3 (main, Apr 10 2024, 03:36:41) [GCC 11.4.0]
numpy: 1.26.4
scipy: 1.13.0
pandas: 2.2.2

System Information

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 4 default, 0 interactive, 2 GC (on 4 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto
  LD_LIBRARY_PATH = /opt/hostedtoolcache/Python/3.12.3/x64/lib