ccall
function (to call C and Fortran libraries) to JuliaInterop[5] packages that connect Julia to Python, R, Matlab, C++, and more.Let me demonstrate how fast Julia is. Here is a simple "groupby" operation using random stuff to emulate common data analysis "split-apply-combine" operations in three languages[1] :
Julia: using DataFrames.jl
- 0.4ms
Python: using Pandas
and NumPy
- 1.76ms
R: using {dplyr}
- 3.22ms
Here is Julia:
using Random
using StatsBase
using DataFrames
using BenchmarkTools
using Chain
Random.seed!(123)
n = 10_000
df = DataFrame(
x=sample(["A", "B", "C", "D"], n, replace=true),
y=rand(n),
z=randn(n),
)
@btime @chain $df begin # passing `df` as reference so the compiler cannot optimize
groupby(:x)
combine(:y => median, :z => mean)
end
Here is Python:
import pandas as pd
import numpy as np
n = 10000
df = pd.DataFrame({'x': np.random.choice(['A', 'B', 'C', 'D'], n, replace=True),
'y': np.random.randn(n),
'z': np.random.rand(n)})
%timeit df.groupby('x').agg({'y': 'median', 'z': 'mean'})
Here is R:
library(dplyr)
n <- 10e3
df <- tibble(
x = sample(c("A", "B", "C", "D"), n, replace = TRUE),
y = runif(n),
z = rnorm(n)
)
bench::mark(
df %>%
group_by(x) %>%
summarize(
median(y),
mean(z)
)
)
So clearly Julia is the winner here, being 4x faster than Python and almost 10x faster than R. Also note that Pandas
(along with NumPy
) and {dplyr}
are all written in C or C++. Additionally, I didn't let Julia cheat by allowing the compiler optimize for df
by passing a reference $df
. So, I guess this is a fair comparison.
What is most striking is that Julia can be as fast as C (and faster than Java in some applications) while having a very simple and intelligible syntax. This feature along with its speed is what Julia creators denote as "the two language problem" that Julia addresses. The "two language problem" is a very typical situation in scientific computing where a researcher or computer scientist devises an algorithm or a solution that he or she prototypes in an easy to code language (like Python) and, if it works, he or she would code in a fast language that is not easy to code (C or FORTRAN). Thus, we have two languages involved in the process of developing a new solution. One which is easy to prototype but is not suited for implementation (mostly due to being slow). And another one which is not so easy to code (and, consequently, not easy to prototype) but suited for implementation (mostly because it is fast). Julia comes to eliminate such situations by being the same language that you prototype (ease of use) and implement the solution (speed).
Also, Julia lets you use unicode characters as variables or parameters. This means no more using sigma
or sigma_i
, and instead just use σ
or σᵢ
as you would in mathematical notation. When you see code for an algorithm or for a mathematical equation you see a one-to-one relation to code and math. This is a powerful feature.
I think that the "two language problem" and the one-to-one code and math relation are best described by one of the creators of Julia, Alan Edelman, in a TED Talk (see the video below):
I will try to exemplify what would be the "two language problem" by showing you how I would code a simple Metropolis algorithm for a bivariate normal distribution. I would mostly prototype it in a dynamically-typed language such as R or Python. Then, deploy the algorithm using a fast but hard to code language such as C++. This is exactly what I'll do now. The algorithm will be coded in Julia, R, C++ and Stan
. There are two caveats. First, I am coding the original 1950s Metropolis version, not the 1970s Metropolis-Hastings, which implies symmetrical proposal distributions just for the sake of the example. Second, the proposals are based on a uniform distribution on the current proposal values of the proposal values ± a certain width
.
Let's start with Julia which uses the Distributions.jl
package for its probabilistic distributions along with logpdf()
defined methods for all of the distributions.
using Distributions
function metropolis(S::Int64, width::Float64, ρ::Float64;
μ_x::Float64=0.0, μ_y::Float64=0.0,
σ_x::Float64=1.0, σ_y::Float64=1.0)
binormal = MvNormal([μ_x; μ_y], [σ_x ρ; ρ σ_y]);
draws = Matrix{Float64}(undef, S, 2);
x = randn(); y = randn();
accepted = 0::Int64;
for s in 1:S
x_ = rand(Uniform(x - width, x + width));
y_ = rand(Uniform(y - width, y + width));
r = exp(logpdf(binormal, [x_, y_]) - logpdf(binormal, [x, y]));
if r > rand(Uniform())
x = x_;
y = y_;
accepted += 1;
end
@inbounds draws[s, :] = [x y];
end
println("Acceptance rate is $(accepted / S)")
return draws
end
Now let's go to the R version (from now on no more fancy names like μ
or σ
😭). Since this is a bivariate normal I am using the package {mnormt}
which allows for very fast (FORTRAN code) computation of multivariate normal distributions' pdf and logpdf.
metropolis <- function(S, width,
mu_X = 0, mu_Y = 0,
sigma_X = 1, sigma_Y = 1,
rho) {
Sigma <- diag(2)
Sigma[1, 2] <- rho
Sigma[2, 1] <- rho
draws <- matrix(nrow = S, ncol = 2)
x <- rnorm(1)
y <- rnorm(1)
accepted <- 0
for (s in 1:S) {
x_ <- runif(1, x - width, x + width)
y_ <- runif(1, y - width, y + width)
r <- exp(mnormt::dmnorm(c(x_, y_), mean = c(mu_X, mu_Y), varcov = Sigma, log = TRUE) -
mnormt::dmnorm(c(x, y), mean = c(mu_X, mu_Y), varcov = Sigma, log = TRUE))
if (r > runif(1, 0, 1)) {
x <- x_
y <- y_
accepted <- accepted + 1
}
draws[s, 1] <- x
draws[s, 2] <- y
}
print(paste0("Acceptance rate is ", accepted / S))
return(draws)
}
Now C++. Here I am using the Eigen
library. Note that, since C++ is a very powerful language to be used as "close to the metal" as possible, I don't have any convenient predefined multivariate normal to use. So I will have to create this from zero[2].
Ok, be ready! This is a mouthful:
#include <Eigen/Eigen>
#include <cmath>
#include <iostream>
#include <random>
using std::cout;
std::random_device rd{};
std::mt19937 gen{rd()};
// Random Number Generator Stuff
double random_normal(double mean = 0, double std = 1) {
std::normal_distribution<double> d{mean, std};
return d(gen);
};
double random_unif(double min = 0, double max = 1) {
std::uniform_real_distribution<double> d{min, max};
return d(gen);
};
// Multivariate Normal
struct Mvn {
Mvn(const Eigen::VectorXd &mu, const Eigen::MatrixXd &s)
: mean(mu), sigma(s) {}
double pdf(const Eigen::VectorXd &x) const;
double lpdf(const Eigen::VectorXd &x) const;
Eigen::VectorXd mean;
Eigen::MatrixXd sigma;
};
double Mvn::pdf(const Eigen::VectorXd &x) const {
double n = x.rows();
double sqrt2pi = std::sqrt(2 * M_PI);
double quadform = (x - mean).transpose() * sigma.inverse() * (x - mean);
double norm = std::pow(sqrt2pi, -n) * std::pow(sigma.determinant(), -0.5);
return norm * exp(-0.5 * quadform);
}
double Mvn::lpdf(const Eigen::VectorXd &x) const {
double n = x.rows();
double sqrt2pi = std::sqrt(2 * M_PI);
double quadform = (x - mean).transpose() * sigma.inverse() * (x - mean);
double norm = std::pow(sqrt2pi, -n) * std::pow(sigma.determinant(), -0.5);
return log(norm) + (-0.5 * quadform);
}
Eigen::MatrixXd metropolis(int S, double width, double mu_X = 0,
double mu_Y = 0, double sigma_X = 1,
double sigma_Y = 1, double rho = 0.8) {
Eigen::MatrixXd sigma(2, 2);
sigma << sigma_X, rho, rho, sigma_Y;
Eigen::VectorXd mean(2);
mean << mu_X, mu_Y;
Mvn binormal(mean, sigma);
Eigen::MatrixXd out(S, 2);
double x = random_normal();
double y = random_normal();
double accepted = 0;
for (size_t i = 0; i < S - 1; i++) {
double xmw = x - width;
double xpw = x + width;
double ymw = y - width;
double ypw = y + width;
double x_ = random_unif(xmw, xpw);
double y_ = random_unif(ymw, ypw);
double r = std::exp(binormal.lpdf(Eigen::Vector2d(x_, y_)) -
binormal.lpdf(Eigen::Vector2d(x, y)));
if (r > random_unif()) {
x = x_;
y = y_;
accepted++;
}
out(i, 0) = x;
out(i, 1) = y;
}
cout << "Acceptance rate is " << accepted / S << '\n';
return out;
}
note that the PDF for a multivariate normal is:
where is a vector of means, is the number of dimensions, is a covariance matrix, is the determinant and is a vector of values that the PDF is evaluated for.
SPOILER ALERT: Julia will beat this C++ Eigen implementation by being almost 100x faster. So I will try to help C++ beat Julia (😂) by making a bivariate normal class BiNormal
in order to avoid the expensive operation of inverting a covariance matrix and computing determinants in every logpdf proposal evaluation. Also since we are not doing linear algebra computations I've removed Eigen and used C++ STL's <vector>
:
#define M_PI 3.14159265358979323846 /* pi */
// Bivariate Normal
struct BiNormal {
BiNormal(const std::vector<double> &mu, const double &rho)
: mean(mu), rho(rho) {}
double pdf(const std::vector<double> &x) const;
double lpdf(const std::vector<double> &x) const;
std::vector<double> mean;
double rho;
};
double BiNormal::pdf(const std::vector<double> &x) const {
double x_ = x[0];
double y_ = x[1];
return std::exp(-((std::pow(x_, 2) - (2 * rho * x_ * y_) + std::pow(y_, 2)) /
(2 * (1 - std::pow(rho, 2))))) /
(2 * M_PI * std::sqrt(1 - std::pow(rho, 2)));
}
double BiNormal::lpdf(const std::vector<double> &x) const {
double x_ = x[0];
double y_ = x[1];
return (-((std::pow(x_, 2) - (2 * rho * x_ * y_) + std::pow(y_, 2))) /
(2 * (1 - std::pow(rho, 2)))) -
std::log(2) - std::log(M_PI) - log(std::sqrt(1 - std::pow(rho, 2)));
}
This means that I've simplified the PDF [3] from equation (1) into:
Since , equation (2) boils down to:
no more determinants or matrix inversions. Easy-peasy for C++.
Now let's go to the last, but not least: Stan
is a probabilistic language for specifying probabilistic models (does the same as Turing.jl
does) and comes also with a very fast C++-based MCMC sampler. Stan
is a personal favorite of mine and I have a whole graduate course of Bayesian statistics using Stan
. Here's the Stan
implementation:
functions {
real binormal_lpdf(real [] xy, real mu_X, real mu_Y, real sigma_X, real sigma_Y, real rho) {
real beta = rho * sigma_Y / sigma_X; real sigma = sigma_Y * sqrt(1 - square(rho));
return normal_lpdf(xy[1] | mu_X, sigma_X) +
normal_lpdf(xy[2] | mu_Y + beta * (xy[1] - mu_X), sigma);
}
matrix metropolis_rng(int S, real width,
real mu_X, real mu_Y,
real sigma_X, real sigma_Y,
real rho) {
matrix[S, 2] out; real x = normal_rng(0, 1); real y = normal_rng(0, 1); real accepted = 0;
for (s in 1:S) {
real xmw = x - width; real xpw = x + width; real ymw = y - width; real ypw = y + width;
real x_ = uniform_rng(xmw, xpw); real y_ = uniform_rng(ymw, ypw);
real r = exp(binormal_lpdf({x_, y_} | mu_X, mu_Y, sigma_X, sigma_Y, rho) -
binormal_lpdf({x , y } | mu_X, mu_Y, sigma_X, sigma_Y, rho));
if (r > uniform_rng(0, 1)) {
x = x_; y = y_; accepted += 1;
}
out[s, 1] = x; out[s, 2] = y;
}
print("Acceptance rate is ", accepted / S);
return out;
}
}
Wow, that was lot... Not let's go to the results. I've benchmarked R and Stan
code using {bench}
and {rstan}
packages, C++ using catch2
, Julia using BenchmarkTools.jl
. For all benchmarks the parameters were: S = 10_000
simulations, width = 2.75
and ρ = 0.8
. From fastest to slowest:
Stan
- 3.6ms
Julia - 6.3ms
C++ BiNormal
- 17ms
C++ MvNormal
- 592ms
R - 1.35s which means 1350ms
Conclusion: a naïve Julia implementation beats C++ (while also beating a C++ math-helped faster implementation using bivariate normal PDFs) and gets very close to Stan
, a highly specialized probabilistic language that compiles and runs on C++ with lots of contributors, funding and development time invested.
Despite being blazing fast, Julia also codes very easily.
You can write and read code without much effort.
I think that this is the real game changer of Julia language: The ability to define function behavior across many combinations of argument types via multiple dispatch. Multiple dispatch is a feature that allows a function or method to be dynamically dispatched based on the run-time (dynamic) type or, in the more general case, some other attribute of more than one of its arguments. This is a generalization of single-dispatch polymorphism where a function or method call is dynamically dispatched based on the derived type of the object on which the method has been called. Multiple dispatch routes the dynamic dispatch to the implementing function or method using the combined characteristics of one or more arguments.
Most languages have single-dispatch polymorphism that rely on the first parameter of a method in order to determine which method should be called. But what Julia differs is that multiple parameters are taken into account. This enables multiple definitions of similar functions that have the same initial parameter. I think that this is best explained by one of the creators of Julia, Stefan Karpinski, at JuliaCon 2019 (see the video below):
I will reproduce Karpinski's example. In the talk, Karpinski designs a structure of classes which are very common in object-oriented programming (OOP). In Julia, we don't have classes but we have structures (struct
) that are meant to be "structured data": they define the kind of information that is embedded in the structure, that is a set of fields (aka "properties" or "attributes" in other languages), and then individual instances (or "objects") can be produced each with its own specific values for the fields defined by the structure.
We create an abstract type
called Pet
. Then, we proceed by creating two derived struct
from Pet
that has one field name
(a String
). These derived struct
are Dog
and Cat
. We also define some methods for what happens in an "encounter" by defining a generic function meets()
and several specific methods of meets()
that will be multiple dispatched by Julia in runtime to define the action that one type Pet
takes when it meets another Pet
:
abstract type Pet end
struct Dog <: Pet
name::String
end
struct Cat <: Pet
name::String
end
function encounter(a::Pet, b::Pet)
verb = meets(a, b)
return println("$(a.name) meets $(b.name) and $verb")
end
meets(a::Dog, b::Dog) = "sniffs";
meets(a::Dog, b::Cat) = "chases";
meets(a::Cat, b::Dog) = "hisses";
meets(a::Cat, b::Cat) = "slinks";
Let's see what happens when we instantiate objects from Dog
and Cat
and call encounter
on them in Julia:
fido = Dog("Fido");
rex = Dog("Rex");
whiskers = Cat("Whiskers");
spots = Cat("Spots");
encounter(fido, rex)
encounter(rex, whiskers)
encounter(spots, fido)
encounter(whiskers, spots)
Fido meets Rex and sniffs
Rex meets Whiskers and chases
Spots meets Fido and hisses
Whiskers meets Spots and slinks
It works as expected. Now let's translate this to modern C++ as literally as possible. Let's define a class Pet
with a member variable name
– in C ++ we can do this. Then we define a base function meets()
, a function encounter()
for two objects of the type Pet
, and finally, define derived classes Dog
and Cat
overload meets()
for them:
#include <iostream>
#include <string>
using std::string;
using std::cout;
class Pet {
public:
string name;
};
string meets(Pet a, Pet b) { return "FALLBACK"; } // If we use `return meets(a, b)` doesn't work
void encounter(Pet a, Pet b) {
string verb = meets(a, b);
cout << a.name << " meets "
<< b. name << " and " << verb << '\n';
}
class Cat : public Pet {};
class Dog : public Pet {};
string meets(Dog a, Dog b) { return "sniffs"; }
string meets(Dog a, Cat b) { return "chases"; }
string meets(Cat a, Dog b) { return "hisses"; }
string meets(Cat a, Cat b) { return "slinks"; }
Now we add a main()
function to the C++ script:
int main() {
Dog fido; fido.name = "Fido";
Dog rex; rex.name = "Rex";
Cat whiskers; whiskers.name = "Whiskers";
Cat spots; spots.name = "Spots";
encounter(fido, rex);
encounter(rex, whiskers);
encounter(spots, fido);
encounter(whiskers, spots);
return 0;
}
And this is what we get:
g++ main.cpp && ./a.out
Fido meets Rex and FALLBACK
Rex meets Whiskers and FALLBACK
Spots meets Fido and FALLBACK
Whiskers meets Spots and FALLBACK
Doesn't work... 🤷🏼
Now let's change to another nice example of creating a one-hot vector. One-hot vector is a vector of integers in which all indices are zero (0) except for one single index that is one (1). In machine learning, one-hot encoding is a frequently used method to deal with categorical data. Because many machine learning models need their input variables to be numeric, categorical variables need to be transformed in the pre-processing part. The example below is heavily inspired by a post from Vasily Pisarev[4].
How we would represent one-hot vectors in Julia? Simple: we create a new type OneHotVector
in Julia using the struct
keyword and define two fields len
and ind
, which represents the OneHotVector
length and which index is the entry 1 (i.e. which index is "hot"). Then, we define new methods for the Base
functions size()
and getindex()
for our newly defined OneHotVector
.
import Base: size, getindex
struct OneHotVector <: AbstractVector{Int}
len::Int
ind::Int
end
size(v::OneHotVector) = (v.len,)
getindex(v::OneHotVector, i::Integer) = Int(i == v.ind)
getindex (generic function with 928 methods)
Since OneHotVector
is a struct
derived from AbstractVector
we can use all of the methods previously defined for AbstractVector
and it simply works right off the bat. Here we are constructing an Array
with a list comprehension:
onehot = [OneHotVector(3, rand(1:3)) for _ in 1:4]
4-element Vector{OneHotVector}:
[1, 0, 0]
[1, 0, 0]
[1, 0, 0]
[0, 0, 1]
Now I define a new function inner_sum()
that is basically a recursive dot product with a summation. Here A – this is something matrix-like (although I did not indicate the types, and you can guess something only by the name), and vs
is a vector of some vector-like elements. The function proceeds by taking the dot product of the "matrix" with all vector-like elements of vs
and returning the accumulated values. This is all given generic definition without specifying any types. Generic programming here consists in this very function call inner()
in a loop.
using LinearAlgebra
function inner_sum(A, vs)
t = zero(eltype(A))
for v in vs
t += inner(v, A, v) # multiple dispatch!
end
return t
end
inner(v, A, w) = dot(v, A * w) # very general definition
inner (generic function with 1 method)
So, "look mom, it works":
A = rand(3, 3)
vs = [rand(3) for _ in 1:4]
inner_sum(A, vs)
2.4426500326493357
Since OneHotVector
is a subtype of AbstractVector
:
supertype(OneHotVector)
AbstractVector{Int64} (alias for AbstractArray{Int64, 1})
We can use inner_sum
and it will do what it is supposed to do:
inner_sum(A, onehot)
1.604358084381523
But this default implementation is slow:
using BenchmarkTools
@btime inner_sum($A, $onehot);
189.279 ns (4 allocations: 320 bytes)
We can greatly optimize this procedure. Now let's redefine matrix multiplication by OneHotVector
with a simple column selection. We do this by defining a new method of the *
function (multiplier function) of Base
Julia. Additionally we also create a new optimized method of inner()
for dealing with OneHotVector
:
import Base: *
*(A::AbstractMatrix, v::OneHotVector) = A[:, v.ind]
inner(v::OneHotVector, A, w::OneHotVector) = A[v.ind, w.ind]
inner (generic function with 2 methods)
That's it! Simple, huh? Now let's benchmark:
@btime inner_sum($A, $onehot);
4.648 ns (0 allocations: 0 bytes)
Huge gains of speed! 🚀
Here are some more thoughts on why I believe Julia is the right approach to scientific computation.
Below is a very opinionated image that divides the scientific computing languages that we've spoken so far in a 2x2 diagram with two axes: Slow-Fast and Easy-Hard. I've put C++ and FORTRAN in the hard and fast quadrant. R and Python goes into the easy and slow quadrant. Julia is the only language in the easy and fast quadrant. I don't know any language that would want to be hard and slow, so this quadrant is empty.
What I want to say with this image is that if you want to code fast and easy use Julia.
One other thing to note that I find quite astonishing is that Julia packages are all written in Julia. This does not happen in other scientific computing languages. For example, the whole {tidyverse}
ecosystem of R packages are based on C++. NumPy
and SciPy
are a mix of FORTRAN and C. Scikit-Learn
is also coded in C.
See the figure below where I compare the GitHub's "Languages" stack bar of PyTorch
, TensorFlow
and Flux.jl
(Julia's Deep Learning package). This figure I would call "Python my a**!" 😂:
ccall
function (to call C and Fortran libraries) to JuliaInterop[5] packages that connect Julia to Python, R, Matlab, C++, and more.Another example comes from a Julia podcast that unfortunately I cannot recollect either what podcast was nor who was being interviewed. While being asked about how he joined the Julia bandwagon, he replied something in the likes:
"Well, I was doing some crazy calculation using a library that was a wrapper to an algorithm coded in FORTRAN and I was trying to get help with a bug. I opened an issue and after 2 weeks of no reply I've dived into the FORTRAN code (despite having no knowledge of FORTRAN). There I saw a comment from the original author describing exactly the same bug that I was experiencing and saying that he would fix this in the future. The comment was dated from 1992. At the same time a colleague of mine suggested that I could try to code the algorithm in some new language called Julia. I thought 'me?! code an algorithm?!'. So, I coded the algorithm in Julia and it was faster than the FORTRAN implementation and also without the evil bug. One thing to note that it was really easy to code the algorithm in Julia."
Having stuff in different language and wrappers can hinder further research as you can see from this example.
As you saw from the Karpinski's talk above, multiple dispatch empower users to define their own types (if necessary) and also allows them to extend functions and types from other users to their own special use. This results in an ecosystem that stimulates code sharing and code reuse in scientific computing that is unmatched. For instance, if I plug a differential equation from DifferentialEquations.jl
into a Turing.jl
model I get a Bayesian stochastic differential equation model, e.g. Bayesian SIR model for infectious disease. If I plug a Flux.jl
neural network into a Turing.jl
model I get a Bayesian neural network! When I saw this type of code sharing I was blown away (and I still am).
[1] | please note that I've used updated versions for all languages and packages as of April, 2021. DataFrames.jl version 1.0.1, Pandas version 1.2.4, NumPy version 1.20.2, {dplyr} version 1.0.5. We did not cover R's {data.table} here. Further benchmarking information is available for example here: Tabular data benchmarking
|
[2] | which of course I did not. The Mvn class is inspired by Iason Sarantopoulos' implementation.
|
[3] | you can find all the math here. |
[4] | the post in Russian, I've "Google Translated" it to English. |
[5] | Julia has a lot of interoperability between languages. Check out: PyCall.jl and JuliaPy for Python; RCall.jl for Java; Cxx.jl and CxxWrap.jl for C++; Clang.jl for libclang and C; ObjectiveC.jl for Objective-C; JavaCall.jl for Java; MATLAB.jl for MATLAB; MathLink.jl for Mathematica/Wolfram Engine; OctCall.jl for GNU Octave; and ZMQ.jl for ZeroMQ.
|
Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98.