# Why Julia?

A gentle pitch

Jose Storopoli, PhD
#| echo: false
#| output: false
ENV["PYCALL_JL_RUNTIME_PYTHON"] = Sys.which("python")
using PyCall

## Agenda

1. speed
2. ease-of-use
3. composability

## What I Assume?

• Python background
• scientific computing background

## Speed

Julia is fast!

Two examples:

• Data Wrangling: pandas versus DataFrames.jl
• ODE solving: scipy versus DifferentialEquations.jl

## Benchmarking — Data Wrangling

Common data wrangling scenario doing “split-apply-combine” operations.

• 10,000 observations
• 1 categorical variable x $\in \{\mathrm{A}, \mathrm{B}, \mathrm{C}, \mathrm{D}\}$
• 2 continuous variables:
• y $\in [0, 1]$
• z $\text{Normal}(0, 1)$

## Benchmarking — Data Wrangling (Python)

using BenchmarkTools
py"""
import pandas as pd
import numpy as np

n = 10000

df = pd.DataFrame({'x': np.random.choice(['A', 'B', 'C', 'D'], n, replace=True),
'y': np.random.randn(n),
'z': np.random.rand(n)})
"""
@btime py"df.groupby('x').agg({'y': 'median', 'z': 'mean'})";

## Benchmarking — Data Wrangling (Julia)

using Random
using DataFrames
using BenchmarkTools
using Chain
Random.seed!(123)

n = 10_000

df = DataFrame(
x=rand('A':'D', n),
y=rand(n),
z=randn(n),
)

@btime @chain df begin groupby(:x) combine(:y => median, :z => mean) end; ## Benchmarking — ODE Solver Second order non-linear ODE example with a simple pendulum \begin{align*} &\dot{\theta} = d{\theta} \\ &\dot{d\theta} = - \frac{g}{L}{\sin(\theta)} \end{align*} ## Benchmarking — ODE Solver (Julia) using DifferentialEquations # Constants const g = 9.81 L = 1.0 # Initial Conditions u₀ = [0, π/2] tspan = (0.0, 6.3) # Define the problem function simplependulum(du, u, p, t) θ, dθ = u du[1] = dθ du[2] = -(g/L)*sin(θ) end # Pass to solvers prob = ODEProblem(simplependulum, u₀, tspan) # RK 4/5th order solver (Tsitouras) @btime solve(prob, Tsit5(); saveat=range(tspan...; length=1_000)); ## Benchmarking — ODE Solver (Python) py""" import numpy as np from scipy.integrate import odeint # Constants g = 9.81 L = 1.0 # Initial Conditions u0 = [0, np.pi/2] tspan = np.linspace(0.0, 6.3, 1000) def simplependulum(u, t, g, L): theta, dtheta = u dydt = [dtheta, -(g/L)*np.sin(theta)] return dydt """ # RK 4/5th order solver (Dormand-Prince) @btime py"odeint(simplependulum, u0, tspan, args=(g, L))"; ## Why Julia is so Fast? LLVM • just-in-time compilation for the LLVM compiler • exposes everything in intermediate representation code • then LLVM does what does best: OPTIMIZE • including for-loops ## Why Julia is so Fast? — LLVM code #| output-location: slide using Statistics: mean @code_llvm mean(1:10) ## Ease of Use The syntax is quite similar to Python. But, no indentation and every keyword needs an end. Julia: for i in 1:10 println(i) end Python: for i in range(10): print(i) ## It’s Julia all the way down If you need to find something just use the @which macro on a type or a function signature. @which DataFrame # type @which mean(1:10) # function signature. ## Composability • It is very easy to create new packages that have types and functions. • You can extend other package’s functions, including Julia’s Base standard library to you new types. • And you can also create new functions for other Package’s types. ## Composability — Example with Point #| output: false struct Point x::Float64 y::Float64 end function Base.:+(x::Point, y::Point) return Point(x.x + y.x, x.y + y.y) end function distance(x::Point, y::Point) return sqrt( (x.x - y.x)^2 + (x.y - y.y)^2 ) end p1 = Point(1, 1); p2 = Point(2, 2) p1 + p2 distance(p1, p2) ## Composability — Example with Autodiff Suppose you are creating a new sort of graph structure that allows for differentiation and integration, i.e you can take gradients, Jacobians, Hessians and so on. Imagine having to code the whole API in libtorch (PyTorch C++ backend). Including: • types • constructors • linear algebra functions • autodiff rules And in the end you can only use PyTorch. You would have to do the whole thing again for JAX or any other autodiff backend. ## Composability — Example with Autodiff (Julia) Now let’s see how we do this in Julia? • We can create a package DifferentialGraph.jl. • Add ChainRulesCore.jl as a dependency. • Create forward- and reverse-mode derivative rules: rrules or frules Now we can use you differential graphs with all of these backends: ## Composability — Examples from the Julia Ecosystem ## My Pitch • It is fast. • It is easy to use. • Learning the basics of Julia will make your life so much easier in all other packages. You don’t need to learn specific package syntax to be effective in using a certain package. • A bliss to install in Windows, Mac, and Linux (even in clusters). • Very good community, check the discourse. • Very “nerdy”, “mathy”, and “geeky” userbase. • If you are creating new stuff, like research or algorithms, you don’t want to have to stumble upon FORTRAN or C code (scipy, numpy, pytorch etc.). In Julia everything is in Julia. • You can easily mix-and-match types and functions from different packages, as you saw in the previous slide. • Good language interop: ## It’s not all rainbows • Hard to onboard people. Sometimes they don’t want to learn new stuff (I mean we still have FORTRAN around …). • Not widely used in the marketplace (but tons of academic usage). • Some package ecosystems are not mature enough, e.g. survival analysis. But, differential equations is way more mature than other scientific computing languages. • In my point of view, Julia’s strength is in scientific computing. For all other things, you might not have additional benefits. ## Some nice packages ## Conclusions • Julia is pretty darn awesome. • Easy to get going, and you can always make it faster by just optimizing your Julia code. • No need to drop down to C++. • Buuuut it can’t beat Python at deep learning. • Otherwise, it’s worth a try. • Godspeed to you. ## Packages Used #| echo: false using Pkg deps = [pair.second for pair in Pkg.dependencies()] deps = filter(p -> p.is_direct_dep, deps) deps = filter(p -> !isnothing(p.version), deps) list = ["(p.name) $(p.version)" for p in deps] sort!(list) println("Julia:$(VERSION)")
println(join(list, '\n'))
#| echo: false
py"""
import sys
import numpy as np
import scipy
import pandas as pd

print(f"Python: {sys.version}")
print(f"numpy: {np.__version__}")
print(f"scipy: {scipy.__version__}")
print(f"pandas: {pd.__version__}")
"""

## System Information

#| echo: false
versioninfo()