Why Julia?

A gentle pitch

Jose Storopoli, PhD
#| echo: false
#| output: false
ENV["PYCALL_JL_RUNTIME_PYTHON"] = Sys.which("python")
using PyCall

Agenda



  1. speed
  2. ease-of-use
  3. composability

What I Assume?



  • Python background
  • scientific computing background

So let’s dive in?

Julia is past beyond “experimental”

Speed


Julia is fast!


Two examples:

  • Data Wrangling: pandas versus DataFrames.jl
  • ODE solving: scipy versus DifferentialEquations.jl

Benchmarking — Data Wrangling

Common data wrangling scenario doing “split-apply-combine” operations.

  • 10,000 observations
  • 1 categorical variable x \(\in \{\mathrm{A}, \mathrm{B}, \mathrm{C}, \mathrm{D}\}\)
  • 2 continuous variables:
    • y \(\in [0, 1]\)
    • z \(\text{Normal}(0, 1)\)

Benchmarking — Data Wrangling (Python)

using BenchmarkTools
py"""
import pandas as pd
import numpy as np

n = 10000

df = pd.DataFrame({'x': np.random.choice(['A', 'B', 'C', 'D'], n, replace=True),
                   'y': np.random.randn(n),
                   'z': np.random.rand(n)})
"""
@btime py"df.groupby('x').agg({'y': 'median', 'z': 'mean'})";

Benchmarking — Data Wrangling (Julia)

using Random
using DataFrames
using BenchmarkTools
using Chain
Random.seed!(123)

n = 10_000

df = DataFrame(
    x=rand('A':'D', n),
    y=rand(n),
    z=randn(n),
)

@btime @chain $df begin
    groupby(:x)
    combine(:y => median, :z => mean)
end;

Benchmarking — ODE Solver

Second order non-linear ODE example with a simple pendulum

\[ \begin{align*} &\dot{\theta} = d{\theta} \\ &\dot{d\theta} = - \frac{g}{L}{\sin(\theta)} \end{align*} \]

Benchmarking — ODE Solver (Julia)

using DifferentialEquations

# Constants
const g = 9.81
L = 1.0

# Initial Conditions
u₀ = [0, π/2]
tspan = (0.0, 6.3)

# Define the problem
function simplependulum(du, u, p, t)
    θ, dθ = u
    du[1] = dθ
    du[2] = -(g/L)*sin(θ)
end

# Pass to solvers
prob = ODEProblem(simplependulum, u₀, tspan)
# RK 4/5th order solver (Tsitouras)
@btime solve(prob, Tsit5(); saveat=range(tspan...; length=1_000));

Benchmarking — ODE Solver (Python)

py"""
import numpy as np
from scipy.integrate import odeint

# Constants
g = 9.81
L = 1.0

# Initial Conditions
u0 = [0, np.pi/2]
tspan = np.linspace(0.0, 6.3, 1000)

def simplependulum(u, t, g, L):
    theta, dtheta = u
    dydt = [dtheta, -(g/L)*np.sin(theta)]
    return dydt
"""

# RK 4/5th order solver (Dormand-Prince)
@btime py"odeint(simplependulum, u0, tspan, args=(g, L))";

Why Julia is so Fast?

LLVM

  • just-in-time compilation for the LLVM compiler
  • exposes everything in intermediate representation code
  • then LLVM does what does best: OPTIMIZE
  • including for-loops

Why Julia is so Fast? — LLVM code

#| output-location: slide
using Statistics: mean
@code_llvm mean(1:10)

Ease of Use

The syntax is quite similar to Python.

But, no indentation and every keyword needs an end.

Julia:

for i in 1:10
    println(i)
end

Python:

for i in range(10):
    print(i)

It’s Julia all the way down

If you need to find something just use the @which macro on a type or a function signature.


@which DataFrame # type


@which mean(1:10) # function signature.

Composability

  • It is very easy to create new packages that have types and functions.

  • You can extend other package’s functions, including Julia’s Base standard library to you new types.

  • And you can also create new functions for other Package’s types.

Composability — Example with Point

#| output: false
struct Point
  x::Float64
  y::Float64
end

function Base.:+(x::Point, y::Point)
  return Point(x.x + y.x, x.y + y.y)
end

function distance(x::Point, y::Point)
  return sqrt( (x.x - y.x)^2 + (x.y - y.y)^2 )
end

p1 = Point(1, 1); p2 = Point(2, 2)
p1 + p2
distance(p1, p2)

Composability — Example with Autodiff

Suppose you are creating a new sort of graph structure that allows for differentiation and integration, i.e you can take gradients, Jacobians, Hessians and so on.


Imagine having to code the whole API in libtorch (PyTorch C++ backend). Including:

  • types
  • constructors
  • linear algebra functions
  • autodiff rules

And in the end you can only use PyTorch. You would have to do the whole thing again for JAX or any other autodiff backend.

Composability — Example with Autodiff (Julia)

Now let’s see how we do this in Julia?

  • We can create a package DifferentialGraph.jl.
  • Add ChainRulesCore.jl as a dependency.
  • Create forward- and reverse-mode derivative rules: rrules or frules

Now we can use you differential graphs with all of these backends:

Composability — Examples from the Julia Ecosystem

My Pitch

  • It is fast.
  • It is easy to use.
  • Learning the basics of Julia will make your life so much easier in all other packages. You don’t need to learn specific package syntax to be effective in using a certain package.
  • A bliss to install in Windows, Mac, and Linux (even in clusters).
  • Very good community, check the discourse.
  • Very “nerdy”, “mathy”, and “geeky” userbase.
  • If you are creating new stuff, like research or algorithms, you don’t want to have to stumble upon FORTRAN or C code (scipy, numpy, pytorch etc.). In Julia everything is in Julia.
  • You can easily mix-and-match types and functions from different packages, as you saw in the previous slide.
  • Good language interop:

It’s not all rainbows

  • Hard to onboard people. Sometimes they don’t want to learn new stuff (I mean we still have FORTRAN around …).
  • Not widely used in the marketplace (but tons of academic usage).
  • Some package ecosystems are not mature enough, e.g. survival analysis. But, differential equations is way more mature than other scientific computing languages.
  • In my point of view, Julia’s strength is in scientific computing. For all other things, you might not have additional benefits.

Some nice packages

Conclusions

  • Julia is pretty darn awesome.
  • Easy to get going, and you can always make it faster by just optimizing your Julia code.
  • No need to drop down to C++.
  • Buuuut it can’t beat Python at deep learning.
  • Otherwise, it’s worth a try.
  • Godspeed to you.

Packages Used

#| echo: false
using Pkg
deps = [pair.second for pair in Pkg.dependencies()]
deps = filter(p -> p.is_direct_dep, deps)
deps = filter(p -> !isnothing(p.version), deps)
list = ["$(p.name) $(p.version)" for p in deps]
sort!(list)
println("Julia: $(VERSION)")
println(join(list, '\n'))
#| echo: false
py"""
import sys
import numpy as np
import scipy
import pandas as pd

print(f"Python: {sys.version}")
print(f"numpy: {np.__version__}")
print(f"scipy: {scipy.__version__}")
print(f"pandas: {pd.__version__}")
"""

System Information

#| echo: false
versioninfo()