I have an open access and open source^{1} graduate-level course on Bayesian statistics.
It is available in GitHub through the repo storopoli/Bayesian-Statistics
.
I’ve taught it many times and every time was such a joy.
It is composed of:
Now and then I receive emails from someone saying that the materials helped them to understand Bayesian statistics. These kind messages really make my day, and that’s why I strive to keep the content up-to-date and relevant.
I decided to make the repository fully reproducible and testable in CI^{5} using Nix and GitHub actions.
Here’s what I am testing on every new change to the main repository and every new pull request (PR):
All of these tests demand a highly reproducible and intricate development environment. That’s where Nix comes in. Nix can be viewed as a package manager, operating system, build tool, immutable system, and many things.
Nix is purely functional. Everything is described as an expression/function, taking some inputs and producing deterministic outputs. This guarantees reproducible results and makes caching everything easy. Nix expressions are lazy. Anything described in Nix code will only be executed if some other expression needs its results. This is very powerful but somewhat unnatural for developers not familiar with functional programming.
I enjoy Nix so much that I use it as the operating system and package manager in
all of my computers.
Feel free to check my setup at
storopoli/flakes
.
The main essence of the repository setup is the
flake.nix
file.
A Flake is a collection of recipes (Nix derivations) that the repository
provides.
From the NixOS Wiki article on Flakes:
Flakes is a feature of managing Nix packages to simplify usability and improve reproducibility of Nix installations. Flakes manages dependencies between Nix expressions, which are the primary protocols for specifying packages. Flakes implements these protocols in a consistent schema with a common set of policies for managing packages.
I use the Nix’s Flakes to not only setup the main repository package,
defined in the Flake as just package.default
which is the PDF build of the LaTeX slides,
but also to setup the development environment,
defined in the Flake as the devShell.default
,
to run the latest versions of
Stan and Julia/Turing.jl.
We’ll go over the Flake file in detail. However, let me show the full Flake file:
{
description = "A basic flake with a shell";
inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
inputs.flake-utils.url = "github:numtide/flake-utils";
inputs.pre-commit-hooks.url = "github:cachix/pre-commit-hooks.nix";
outputs = { self, nixpkgs, flake-utils, pre-commit-hooks }:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages.${system};
tex = pkgs.texlive.combine {
inherit (pkgs.texlive) scheme-small;
inherit (pkgs.texlive) latexmk pgf pgfplots tikzsymbols biblatex beamer;
inherit (pkgs.texlive) silence appendixnumberbeamer fira fontaxes mwe;
inherit (pkgs.texlive) noto csquotes babel helvetic transparent;
inherit (pkgs.texlive) xpatch hyphenat wasysym algorithm2e listings;
inherit (pkgs.texlive) lstbayes ulem subfigure ifoddpage relsize;
inherit (pkgs.texlive) adjustbox media9 ocgx2 biblatex-apa wasy;
};
julia = pkgs.julia-bin.overrideDerivation (oldAttrs: { doInstallCheck = false; });
in
{
checks = {
pre-commit-check = pre-commit-hooks.lib.${system}.run {
src = ./.;
hooks = {
typos.enable = true;
};
};
};
devShells.default = pkgs.mkShell {
packages = with pkgs;[
bashInteractive
# pdfpc # FIXME: broken on darwin
typos
cmdstan
julia
];
shellHook = ''
export JULIA_NUM_THREADS="auto"
export JULIA_PROJECT="turing"
export CMDSTAN_HOME="${pkgs.cmdstan}/opt/cmdstan"
${self.checks.${system}.pre-commit-check.shellHook}
'';
};
packages.default = pkgs.stdenvNoCC.mkDerivation rec {
name = "slides";
src = self;
buildInputs = with pkgs; [
coreutils
tex
gnuplot
biber
];
phases = [ "unpackPhase" "buildPhase" "installPhase" ];
buildPhase = ''
export PATH="${pkgs.lib.makeBinPath buildInputs}";
cd slides
export HOME=$(pwd)
latexmk -pdflatex -shell-escape slides.tex
'';
installPhase = ''
mkdir -p $out
cp slides.pdf $out/
'';
};
});
}
A flake is composed primarily of inputs
and outputs
.
As inputs
I have:
inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
inputs.flake-utils.url = "github:numtide/flake-utils";
inputs.pre-commit-hooks.url = "github:cachix/pre-commit-hooks.nix";
nixpkgs
is responsible for providing all of the packages necessary for both
package.default
and devShell.default
: cmdstan
, julia-bin
, typos
,
and a bunch of texlive
LaTeX small packages.flake-utils
are a bunch of Nix utility functions that creates tons of
syntactic sugar to make the Flake easily accessible in all platforms,
such as macOS and Linux.pre-commit-hooks
is a nice Nix utility to create easy
git hooks
that do some checking at several steps of the git workflow.
The only hook that I am using is the typos
pre-commit hook that checks the whole commit changes for common typos and won’t
let you commit successfully if you have typos:
either correct or whitelist them in the _typos.toml
file.The outputs
are the bulk of the Flake file and it is a Nix function that
takes all the above as inputs and outputs a couple of things:
outputs = { self, nixpkgs, flake-utils, pre-commit-hooks }:
flake-utils.lib.eachDefaultSystem (system: {
checks = ...
devShells = ...
packages = ...
});
checks
things that are executed/built when you run nix flake check
devShells
things that are executed/built when you run nix develop
packages
things that are executed/built when you run nix build
Let’s go over each one of the outputs that the repository Flake has.
packages
– LaTeX slidesWe all know that LaTeX is a pain to make it work.
If it builds in my machine definitely won’t build in yours.
This is solved effortlessly in Nix.
Take a look at the tex
variable definition in the let ... in
block:
let
# ...
tex = pkgs.texlive.combine {
inherit (pkgs.texlive) scheme-small;
inherit (pkgs.texlive) latexmk pgf pgfplots tikzsymbols biblatex beamer;
inherit (pkgs.texlive) silence appendixnumberbeamer fira fontaxes mwe;
inherit (pkgs.texlive) noto csquotes babel helvetic transparent;
inherit (pkgs.texlive) xpatch hyphenat wasysym algorithm2e listings;
inherit (pkgs.texlive) lstbayes ulem subfigure ifoddpage relsize;
inherit (pkgs.texlive) adjustbox media9 ocgx2 biblatex-apa wasy;
};
# ...
in
tex
is a custom instantiation of the texlive.combine
derivation with some
overrides to specify which CTAN packages you need to build the slides.
We use tex
in the packages.default
Flake output
:
packages.default = pkgs.stdenvNoCC.mkDerivation rec {
name = "slides";
src = self;
buildInputs = with pkgs; [
coreutils
tex
gnuplot
biber
];
phases = [ "unpackPhase" "buildPhase" "installPhase" ];
buildPhase = ''
export PATH="${pkgs.lib.makeBinPath buildInputs}";
cd slides
export HOME=$(pwd)
latexmk -pdflatex -shell-escape slides.tex
'';
installPhase = ''
mkdir -p $out
cp slides.pdf $out/
'';
};
Here we are declaring a Nix derivation with the stdenvNoCC.mkDerivation
,
the NoCC
part means that we don’t need C/C++ build tools.
The src
is the Flake repository itself and I also specify the dependencies
in buildInputs
: I still need some fancy stuff to build my slides.
Finally, I specify the several phases
of the derivation.
The most important part is that I cd
into the slides/
directory
and run latexmk
in it, and copy the resulting PDF to the $out
Nix
special directory which serves as the output directory for the derivation.
This is really nice because anyone with Nix installed can run:
$ nix build github:storopoli/Bayesian-Statistics
and bingo! You have my slides as PDF built from LaTeX files without having to clone or download the repository. Fully reproducible in any machine or architecture.
The next step is to configure GitHub actions to run Nix and build the slides'
PDF file in CI.
I have two workflows for that and they are almost identical except for the
last step.
The first one is the
build-slides.yml
,
which, of course, builds the slides.
These are the relevant parts:
name: Build Slides
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install Nix
uses: DeterminateSystems/nix-installer-action@v8
- name: Build Slides
run: nix build -L
- name: Copy result out of nix store
run: cp -v result/slides.pdf slides.pdf
- name: Upload Artifacts
uses: actions/upload-artifact@v3
with:
name: output
path: ./slides.pdf
if-no-files-found: error
Here we use a set of actions to:
nix build
(the -L
flag is to have more verbose logs)The last one is the
release-slides.yml
,
which releases the slides when I publish a new tag.
It is almost the same as build-slides.yml
, thus I will only highlight the
relevant bits:
on:
push:
tags:
- "*"
# ...
- name: Release
uses: ncipollo/release-action@v1
id: release
with:
artifacts: ./slides.pdf
The only change is the final step that we now use a release-action
that automatically publishes a release with the slides’ PDF file as one of the
release artifacts.
This is good since, once I achieve a milestone in the slides,
I can easily tag a new version and have GitHub automatically publish a new
release with the resulting PDF file attached in the release.
This is a very good workflow, both in GitHub but also locally.
I don’t need to install tons of gigabytes of texlive stuff to build my slides
locally.
I just run nix build
.
Also, if someones contributes to the slides I don’t need to check the correctness
of the LaTeX code, only the content and the output PDF artifact in the
resulting CI from the PR.
If it’s all good, just thank the blessed soul and merge it!
The repository has a directory called turing/
which is a Julia project with
.jl
files and a Project.toml
that lists the Julia dependencies and
appropriate compat
bounds.
In order to test the Turing.jl models in the Julia files,
I have the following things in the Nix Flake devShell
:
let
# ...
julia = pkgs.julia-bin.overrideDerivation (oldAttrs: { doInstallCheck = false; });
# ...
in
# ...
devShells.default = pkgs.mkShell {
packages = with pkgs;[
# ...
julia
# ...
];
shellHook = ''
# ...
export JULIA_NUM_THREADS="auto"
export JULIA_PROJECT="turing"
# ...
'';
};
Nix devShell
lets you create a development environment by adding a
transparent layer on top of your standard shell environment with additional
packages, hooks, and environment variables.
First, in the let ... in
block, I am defining a variable called julia
that is the julia-bin
package with an attribute doInstallCheck
being overridden to false
.
I don’t want the Nix derivation of the mkShell
to run all Julia standard tests.
Next, I define some environment variables in the shellHook
,
which, as the name implies, runs every time that I instantiate the default
devShell
with nix develop
.
With the Nix Flake part covered, let’s check how we wrap everything in a
GitHub action workflow file named
models.yml
.
Again, I will only highlight the relevant parts for the Turing.jl model testing
CI job:
jobs:
test-turing:
name: Test Turing Models
runs-on: ubuntu-latest
strategy:
matrix:
jl-file: [
"01-predictive_checks.jl",
# ...
"13-model_comparison-roaches.jl",
]
steps:
# ...
- name: Test ${{ matrix.jl-file }}
run: |
nix develop -L . --command bash -c "julia -e 'using Pkg; Pkg.instantiate()'"
nix develop -L . --command bash -c "julia turing/${{ matrix.jl-file }}"
I list all the Turing.jl model Julia files in a matrix.jl-file
list
to
define variations for each job.
Next, we install the latest Julia version.
Finally, we run everything in parallel using the YAML string interpolation
${{ matrix.jl-file }}
.
This expands the expression into N
parallel jobs,
where N
is the jl-file
list length.
If any of these parallel jobs error out, then the whole workflow will error.
Hence, we are always certain that the models are up-to-date with the latest Julia
version in nixpkgs
, and the latest Turing.jl dependencies.
The repository has a directory called stan/
that holds a bunch of Stan models
in .stan
files.
These models can be used with any Stan interface,
such as
RStan
/CmdStanR
,
PyStan
/CmdStanPy
,
or Stan.jl
.
However I am using CmdStan
which only needs a shell environment and Stan, no additional dependencies
like Python, R, or Julia.
Additionally, nixpkgs
has a
cmdstan
package that is well-maintained and up-to-date with the latest Stan release.
In order to test the Stan models,
I have the following setup in the Nix Flake devShell
:
devShells.default = pkgs.mkShell {
packages = with pkgs;[
# ...
cmdstan
# ...
];
shellHook = ''
# ...
export CMDSTAN_HOME="${pkgs.cmdstan}/opt/cmdstan"
# ...
'';
};
Here I am also defining an environment variable in the shellHook
,
CMDSTAN_HOME
because that is useful for local development.
In the same GitHub action workflow
models.yml
file is defined the Stan model testing CI job:
jobs:
test-stan:
name: Test Stan Models
runs-on: ubuntu-latest
strategy:
matrix:
stan:
[
{ model: "01-predictive_checks-posterior", data: "coin_flip.data.json" },
# ...
{ model: "13-model_comparison-zero_inflated-poisson", data: "roaches.data.json" },
]
steps:
# ...
- name: Test ${{ matrix.stan.model }}
run: |
echo "Compiling: ${{ matrix.stan.model }}"
nix develop -L . --command bash -c "stan stan/${{ matrix.stan.model }}"
nix develop -L . --command bash -c "stan/${{ matrix.stan.model }} sample data file=stan/${{ matrix.stan.data }}"
Now I am using a YAML dictionary as the entry for every element in the stan
YAML list with two keys: model
and data
.
model
lists the Stan model file without the .stan
extension,
and data
lists the JSON data file that the model needs to run.
We’ll use both to run parallel jobs to test all the Stan models listed in the
stan
list.
For that we use the following commands:
nix develop -L . --command bash -c "stan stan/${{ matrix.stan.model }}"
nix develop -L . --command bash -c "stan/${{ matrix.stan.model }} sample data file=stan/${{ matrix.stan.data }}"
This instantiates the devShell.default
shell environment,
and uses the stan
binary provided by the cmdstan
Nix package to compile the
model into an executable binary.
Next, we run this model executable binary in sample
mode while also providing
the corresponding data file with data file=
.
As before, if any of these parallel jobs error out, then the whole workflow will
error.
Hence, we are always certain that the models are up-to-date with the latest
Stan/CmdStan version in nixpkgs
.
I am quite happy with this setup. It makes easy to run test in CI with GitHub Actions, while also being effortless to instantiate a development environment with Nix. If I want to get a new computer up and running, I don’t need to install a bunch of packages and go over “getting started” instructions to have all the necessary dependencies.
This setup also helps onboard new contributors since it is:
Speaking of “contributors”, if you are interested in Bayesian modeling,
feel free to go over the contents of the repository
storopoli/Bayesian-Statistics
.
Contributions are most welcomed.
Don’t hesitate on opening an issue or pull request.
This post is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
the code is MIT-licensed and the content is CreativeCommons Non-Commercial 4.0 ↩︎
I am also planning to go over the slides for every lecture in a YouTube playlist in the near future. This would make it the experience complete: slides, lectures, and code. ↩︎
a probabilistic programming language and suite of MCMC samplers written in C++. It is today’s gold standard in Bayesian stats. ↩︎
is an ecosystem of Julia packages for Bayesian inference using probabilistic programming. ↩︎
CI stands for continuous integration, sometimes also known as CI/CD, continuous integration and continuous delivery. CI/CD is a wide “umbrella” term for “everything that is tested in all parts of the development cicle”, and these tests commonly take place in a cloud machine. ↩︎
Zero-cost abstractions allows you to write performant code without having to give up a single drop of convenience and expressiveness:
You want for-loops? You can have it. Generics? Yeah, why not? Data structures? Sure, keep’em coming. Async operations? You bet ya! Multi-threading? Hell yes!
To put more formally, I like this definition from StackOverflow:
Zero Cost Abstractions means adding higher-level programming concepts, like generics, collections and so on do not come with a run-time cost, only compile time cost (the code will be slower to compile). Any operation on zero-cost abstractions is as fast as you would write out matching functionality by hand using lower-level programming concepts like for loops, counters, ifs and using raw pointers.
Here’s an analogy:
Imagine that you are going to buy a car. The sales person offers you a fancy car praising how easy it is to drive it, that you don’t need to think about RPM, clutch and stick shift, parking maneuver, fuel type, and other shenanigans. You just turn it on and drive. However, once you take a look at the car’s data sheet, you are horrified. The car is bad in every aspect except easy of use. It has dreadful fuel consumption, atrocious safety ratings, disastrous handling, and so on…
Believe me, you wouldn’t want to own that car.
Metaphors aside, that’s exactly what professional developers^{1} and whole teams choose to use every day: unacceptable inferior tools. Tools that, not only don’t have zero-cost abstractions, rather don’t allow you to even have non-zero-cost anything!
Let’s do some Python bashing in the meantime. I know that’s easy to bash Python, but that’s not the point. If Python wasn’t used so widely in production, I would definitely leave it alone. Don’t get me wrong, Python is the second-best language for everything^{2}.
I wish this meme was a joke, but it isn’t. A boolean is one of the simplest data type taking only two possible values: true or false. Just grab your nearest Python REPL:
>>> from sys import getsizeof
>>> getsizeof(True)
28
The function sys.getsizeof
returns the size of an object in bytes.
How the hell Python needs 28 bytes to represent something that needs at most 1 byte^{3}?
Imagine incurring a 28x penalty in memory size requirements for every boolean
that you use.
Now multiply this by every operation that your code is going to run in production
over time.
Again: unacceptable.
That’s because all objects in Python,
in the sense that everything that you can instantiate,
i.e. everything that you can put on the left hand-side of the =
assignment,
is a PyObject
:
All Python objects ultimately share a small number of fields at the beginning of the object’s representation in memory. These are represented by the
PyObject
andPyVarObject
types.
Python is dynamically-typed, which means that you don’t have primitives like 8-, 16-, 32-bit (un)signed integers and so on. Everything is a huge mess allocated in the heap that must carry not only its value, but also information about its type.
Most important, everything that is fast in Python is not Python-based. Take a look at the image below, I grabbed some popular Python libraries from GitHub, namely NumPy (linear algebra package) and PyToch (deep learning package), and checked the language codebase percentage.
Surprise, they are not Python libraries. They are C/C++ codebases. Even if Python is the main language used in these codebases^{4}, I still think that this is not the case due to the nature of the Python code: all docstrings are written in Python. If you have a very fast C function in your codebase that takes 50 lines of code, followed by a Python wrapper function that calls it using 10 lines of code, but with a docstring that is 50 lines of code; you have a “Python”-majority codebase.
In a sense the most efficient Python programmer is a C/C++ programmer…
Here’s Julia, which is also dynamically-typed:
julia> Base.summarysize(true)
1
And to your surprise,
Julia is coded in …. Julia!
Check the image below for the language codebase percentage of
Julia
and Lux.jl
^{5} (deep learning package).
Finally, here’s Rust, which is not dynamically-, but static-typed:
// main.rs
use std::mem;
fn main() {
println!("Size of bool: {} byte", mem::size_of::<bool>());
}
$ cargo run --release
Compiling size_of_bool v0.1.0
Finished release [optimized] target(s) in 0.00s
Running `target/release/size_of_bool`
Size of bool: 1 byte
Let’s cover two more zero-costs abstractions, both in Julia and in Rust: for-loops and enums.
A friend and a Julia-advocate once told me that Julia’s master plan is to secretly “make everyone aware about compilers”. The compiler is a program that translate source code from a high-level programming language to a low-level programming language (e.g. assembly language, object code, or machine code) to create an executable program.
Python uses CPython as the compiler. If you search around on why CPython/Python is so slow and inefficient, you’ll find that the culprits are:
I completely disagree with almost all the above reasons, except the GIL. Python is slow because of its design decisions, more specifically the way CPython works under the hood. It is not built for performance in mind. Actually, the main objective of Python was to be a “language that would be easy to read, write, and maintain”. I salute that: Python has remained true to its main objective.
Now let’s switch to Julia:
I’ve copy-pasted all Python’s arguments for inefficiency, except the GIL. And, contrary to Python, Julia is fast! Sometimes even faster than C^{6}. Actually, that was the goal all along since Julia’s inception. If you check the notorious Julia announcement blog post from 2012:
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.
(Did we mention it should be as fast as C?)
It mentions “speed” twice. Not only that, but also specifically says that it should match C’s speed.
Julia is fast because of its design decisions. One of the major reasons why Julia is fast is because of the choice of compiler that it uses: LLVM.
LLVM originally stood for low level virtual machine. Despite its name, LLVM has little to do with traditional virtual machines. LLVM can take intermediate representation (IR) code and compile it into machine-dependent instructions. It has support and sponsorship from a lot of big-tech corporations, such as Apple, Google, IBM, Meta, Arm, Intel, AMD, Nvidia, and so on. It is a pretty fast compiler that can do wonders in optimizing IR code to a plethora of computer architectures.
In a sense, Julia is a front-end for LLVM. It turns your easy-to-read and easy-to-write Julia code into LLVM IR code. Take this for-loop example inside a function:
function sum_10()
acc = 0
for i in 1:10
acc += i
end
return acc
end
Let’s check what Julia generates as LLVM IR code for this function.
We can do that with the @code_llvm
macro.
julia> @code_llvm debuginfo=:none sum_10()
define i64 @julia_sum_10_172() #0 {
top:
ret i64 55
}
You can’t easily fool the compiler. Julia understands that the answer is 55, and the LLVM IR generated code is pretty much just “return 55 as a 64-bit integer”.
Let’s also check the machine-dependent instructions with the @code_native
macro.
I am using an Apple Silicon machine, so these instructions might differ from yours:
julia> @code_native debuginfo=:none sum_10()
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 14, 0
.globl _julia_sum_10_214 ; -- Begin function julia_sum_10_214
.p2align 2
_julia_sum_10_214: ; @julia_sum_10_214
.cfi_startproc
; %bb.0: ; %top
mov w0, #55
ret
.cfi_endproc
; -- End function
.subsections_via_symbols
The only important instruction for our argument here is the mov w0, #55
.
This means “move the value 55 into the w0
register”,
where w0
is one of registers available in ARM-based architectures
(which Apple Silicon chips are).
This is a zero-cost abstraction! I don’t need to give up for-loops, because they might be slow and inefficient; like some Python users suggest newcomers. I can have the full convenience and expressiveness of for-loops without paying performance costs. Pretty much the definition of a zero-cost abstraction from above.
Using LLVM as a compiler backend is not something unique to Julia. Rust also uses LLVM under the hood. Take for example this simple Rust code:
// main.rs
pub fn sum_10() -> i32 {
let mut acc = 0;
for i in 1..=10 {
acc += i
}
acc
}
fn main() {
println!("sum_10: {}", sum_10());
}
We can inspect both LLVM IR code and machine instructions with the
cargo-show-asm
crate:
$ cargo asm --llvm "sum_10::main" | grep 55
Finished release [optimized] target(s) in 0.00s
store i32 55, ptr %_9, align 4
$ cargo asm "sum_10::main" | grep 55
Finished release [optimized] target(s) in 0.00s
mov w8, #55
No coincidence that the LLVM IR code is very similar,
with the difference that integers, by default,
in Julia are 64 bits and in Rust 32 bits.
However, the machine code is identical:
“move the value 55 into a w
something register”.
Another zero-cost abstraction, in Julia and Rust, is enums.
In Julia all enums, by default have a BaseType
of Int32
:
a signed 32-bit integer.
However, we can override this with type annotations:
julia> @enum Thing::Bool One Two
julia> Base.summarysize(Thing(false))
1
Here we have an enum Thing
with two variants: One
and Two
.
Since we can safely represent all the possible variant space of Thing
with a boolean type, we override the BaseType
of Thing
to be the Bool
type.
Unsurprised, any object of Thing
occupies 1 byte in memory.
We can achieve the same with Rust:
// main.rs
use std::mem;
#[allow(dead_code)]
enum Thing {
One,
Two,
}
fn main() {
println!("Size of Thing: {} byte", mem::size_of::<Thing>());
}
$ cargo run --release
Compiling enum_size v0.1.0
Finished release [optimized] target(s) in 0.09s
Running `target/release/enum_size`
Size of Thing: 1 byte
However, contrary to Julia, Rust compiler automatically detects the enum’s variant space size and adjust accordingly. So, no need of overrides.
Zero-cost abstractions are a joy to have in a programming language. It enables you, as a programmer, to just focus on what’s important: write expressive code that is easy to read, maintain, debug, and build upon.
It is no wonder that zero-cost abstractions is a pervasive feature of two of my top-favorite languages: Julia and Rust.
This post is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
this post is somehow connected to my soydev rant. ↩︎
and that’s not a compliment. ↩︎
technically, we can represent a boolean with just one bit. However, the short answer is still one byte, because that’s smallest addressable unit of memory. ↩︎
and modifying .gitattributes
is cheating.
Yes, I am talking to you NumPy! ↩︎
if you compare runtime execution. ↩︎
Warning: This post has KaTeX enabled, so if you want to view the rendered math formulas, you’ll have to unfortunately enable JavaScript.
Dennis Lindley, one of my many heroes, was an English statistician, decision theorist and leading advocate of Bayesian statistics. He published a pivotal book, Understanding Uncertainty, that changed my view on what is and how to handle uncertainty in a coherent^{1} way. He is responsible for one of my favorites quotes: “Inside every non-Bayesian there is a Bayesian struggling to get out”; and one of my favorite heuristics around prior probabilities: Cromwell’s Rule^{2}. Lindley predicted in 1975 that “Bayesian methods will indeed become pervasive, enabled by the development of powerful computing facilities” (Lindley, 1975). You can find more about all of Lindley’s achievements in his obituary.
Lindley’s paradox^{3} is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution.
More formally, the paradox is as follows. We have some parameter $\theta$ that we are interested in. Then, we proceed with an experiment to test two competing hypotheses:
The paradox occurs when two conditions are met:
These results can occur at the same time when $H_0$ is very specific, $H_a$ more diffuse, and the prior distribution does not strongly favor one or the other. These conditions are pervasive across science and common in traditional null-hypothesis significance testing approaches.
This is a duel of frequentist versus Bayesian approaches, and one of the many in which Bayesian emerges as the most coherent. Let’s give a example and go over the analytical result with a ton of math, but also a computational result with Julia.
Here’s the setup for the example. In a certain city 49,581 boys and 48,870 girls have been born over a certain time period. The observed proportion of male births is thus $\frac{49,581}{98,451} \approx 0.5036$.
We assume that the birth of a child is independent with a certain probability $\theta$. Since our data is a sequence of $n$ independent Bernoulli trials, i.e., $n$ independent random experiments with exactly two possible outcomes: “success” and “failure”, in which the probability of success is the same every time the experiment is conducted. We can safely assume that it follows a binomial distribution with parameters:
We then set up our two competing hypotheses:
This is a toy-problem and, like most toy problems, we can solve it analytically^{5} for both the frequentist and the Bayesian approaches.
The frequentist approach to testing $H_0$ is to compute a $p$-value^{4}, the probability of observing births of boys at least as large as 49,581 assuming $H_0$ is true. Because the number of births is very large, we can use a normal approximation^{6} for the binomial-distributed number of male births. Let’s define $X$ as the total number of male births, then $X$ follows a normal distribution:
$$X \sim \text{Normal}(\mu, \sigma)$$
where $\mu$ is the mean parameter, $n \theta$ in our case, and $\sigma$ is the standard deviation parameter, $\sqrt{n \theta (1 - \theta)}$. We need to calculate the conditional probability of $X \geq \frac{49,581}{98,451} \approx 0.5036$ given $\mu = n \theta = 98,451 \cdot \frac{1}{2} = 49,225.5$ and $\sigma = \sqrt{n \theta (1 - \theta)} = \sqrt{98,451 \cdot \frac{1}{2} \cdot (1 - \frac{1}{2})}$:
$$P(X \ge 0.5036 \mid \mu = 49,225.5, \sigma = \sqrt{24.612.75})$$
This is basically a cumulative distribution function (CDF) of $X$ on the interval $[49,225.5, 98,451]$:
$$\int_{49,225.5}^{98,451} \frac{1}{\sqrt{2 \pi \sigma^2}} e^{- \frac{\left( \frac{x - \mu}{\sigma} \right)^2}{2}} dx$$
After inserting the values and doing some arithmetic, our answer is approximately $0.0117$. Note that this is a one-sided test, since it is symmetrical, the two-sided test would be $0.0117 \cdot 2 = 0.0235$. Since we don’t deviate from the Fisher’s canon, this is well below the 5% threshold. Hooray! We rejected the null hypothesis! Quick! Grab a frequentist celebratory cigar! But, wait. Let’s check the Bayesian approach.
For the Bayesian approach, we need to set prior probabilities on both hypotheses. Since we do not favor one from another, let’s set equal prior probabilities:
$$P(H_0) = P(H_a) = \frac{1}{2}$$
Additionally, all parameters of interest need a prior distribution. So, let’s put a prior distribution on $\theta$. We could be fancy here, but let’s not. We’ll use a uniform distribution on $[0, 1]$.
We have everything we need to compute the posterior probability of $H_0$ given $\theta$. For this, we’ll use Bayes theorem^{7}:
$$P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}$$
Now again let’s plug in all the values:
$$P(H_0 \mid \theta) = \frac{P(\theta \mid H_0) P(H_0)}{P(\theta)}$$
Note that by the axioms of probability and by the product rule of probability we can decompose $P(\theta)$ into:
$$P(\theta) = P(\theta \mid H_0) P(H_0) + P(\theta \mid H_a) P(H_a)$$
Again, we’ll use the normal approximation:
$$\begin{aligned} &P \left( \theta = 0.5 \mid \mu = 49,225.5, \sigma = \sqrt{24.612.75} \right) \\ &= \frac{ \frac{1}{\sqrt{2 \pi \sigma^2}} e^{- \left( \frac{(\mu - \mu \cdot 0.5)}{2 \sigma} \right)^2} \cdot 0.5 } { \frac{1}{\sqrt{2 \pi \sigma^2}} e^{ \left( -\frac{(\mu - \mu \cdot 0.5)}{2 \sigma} \right)^2} \cdot 0.5 + \int_0^1 \frac {1}{\sqrt{2 \pi \sigma^2} } e^{- \left( \frac{\mu - \mu \cdot \theta)}{2 \sigma} \right)^2}d \theta \cdot 0.5 } \\ &= 0.9505 \end{aligned}$$
The likelihood of the alternative hypothesis, $P(\theta \mid H_a)$, is just the CDF of all possible values of $\theta \ne 0.5$.
$$P(H_0 \mid \text{data}) = P \left( \theta = 0.5 \mid \mu = 49,225.5, \sigma = \sqrt{24.612.75} \right) > 0.95$$
And we fail to reject the null hypothesis, in frequentist terms. However, we can also say in Bayesian terms, that we strongly favor $H_0$ over $H_a$.
Quick! Grab the Bayesian celebratory cigar! The null is back on the game!
For the computational solution, we’ll use Julia and the following packages:
We can perform a BinomialTest
with HypothesisTest.jl
:
julia> using HypothesisTests
julia> BinomialTest(49_225, 98_451, 0.5036)
Binomial test
-------------
Population details:
parameter of interest: Probability of success
value under h_0: 0.5036
point estimate: 0.499995
95% confidence interval: (0.4969, 0.5031)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: 0.0239
Details:
number of observations: 98451
number of successes: 49225
This is the two-sided test,
and I had to round $49,225.5$ to $49,225$
since BinomialTest
do not support real numbers.
But the results match with the analytical solution,
we still reject the null.
Now, for the Bayesian computational approach,
I’m going to use a generative modeling approach,
and one of my favorites probabilistic programming languages,
Turing.jl
:
julia> using Turing
julia> @model function birth_rate()
θ ~ Uniform(0, 1)
total_births = 98_451
male_births ~ Binomial(total_births, θ)
end;
julia> model = birth_rate() | (; male_births = 49_225);
julia> chain = sample(model, NUTS(1_000, 0.8), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×13×4 Array{Float64, 3}):
Iterations = 1001:1:2000
Number of chains = 4
Samples per chain = 1000
Wall duration = 0.2 seconds
Compute duration = 0.19 seconds
parameters = θ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
θ 0.4999 0.0016 0.0000 1422.2028 2198.1987 1.0057 7368.9267
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
θ 0.4969 0.4988 0.4999 0.5011 0.5031
We can see from the output of the quantiles that the 95% quantile for $\theta$ is the interval $(0.4969, 0.5031)$. Although it overlaps zero, that is not the equivalent of a hypothesis test. For that, we’ll use the highest posterior density interval (HPDI), which is defined as “choosing the narrowest interval” that captures a certain posterior density threshold value. In this case, we’ll use a threshold interval of 95%, i.e. an $\alpha = 0.05$:
julia> hpd(chain; alpha=0.05)
HPD
parameters lower upper
Symbol Float64 Float64
θ 0.4970 0.5031
We see that we fail to reject the null, $\theta = 0.5$ at $\alpha = 0.05$ which is in accordance with the analytical solution.
Why do the approaches disagree? What is going on under the hood?
The answer is disappointing^{8}. The main problem is that the frequentist approach only allows fixed significance levels with respect to sample size. Whereas the Bayesian approach is consistent and robust to sample size variations.
Taken to extreme, in some cases, due to huge sample sizes, the $p$-value is pretty much a proxy for sample size and have little to no utility on hypothesis testing. This is known as $p$-hacking^{9}.
This post is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Lindley, Dennis V. “The future of statistics: A Bayesian 21st century”. Advances in Applied Probability 7 (1975): 106-115.
as far as I know there’s only one coherent approach to uncertainty, and it is the Bayesian approach. Otherwise, as de Finetti and Ramsey proposed, you are susceptible to a Dutch book. This is a topic for another blog post… ↩︎
Cromwell’s rule states that the use of prior probabilities of 1 (“the event will definitely occur”) or 0 (“the event will definitely not occur”) should be avoided, except when applied to statements that are logically true or false. Hence, anything that is not a math theorem should have priors in $(0,1)$. The reference comes from Oliver Cromwell, asking, very politely, for the Church of Scotland to consider that their prior probability might be wrong. This footnote also deserves a whole blog post… ↩︎
Stigler’s law of eponymy states that no scientific discovery is named after its original discoverer. The paradox was already was discussed in Harold Jeffreys' 1939 textbook. Also, fun fact, Stigler’s is not the original creator of such law… Now that’s a self-referential paradox, and a broad version of the Halting problem, which should earn its own footnote. Nevertheless, we are getting into self-referential danger zone here with footnotes’ of footnotes’ of footnotes’… ↩︎
this is called $p$-value and can be easily defined as “the probability of sampling data from a target population given that $H_0$ is true as the number of sampling procedures $\to \infty$”. Yes, it is not that intuitive, and it deserves not a blog post, but a full curriculum to hammer it home. ↩︎ ↩︎
that is not true for most of the real-world problems. For Bayesian approaches, we need to run computational asymptotic exact approximations using a class of methods called Markov chain Monte Carlo (MCMC). Furthermore, for some nasty problems we need to use different set of methods called variational inference (VI) or approximate Bayesian computation (ABC). ↩︎
if you are curious about how this approximation works, check the backup slides of my open access and open source graduate course on Bayesian statistics. ↩︎
Bayes’ theorem is officially called Bayes-Price-Laplace theorem. Bayes was trying to disprove David Hume’s argument that miracles did not exist (How dare he?). He used the probabilistic approach of trying to quantify the probability of a parameter (god exists) given data (miracles happened). He died without publishing any of his ideas. His wife probably freaked out when she saw the huge pile of notes that he had and called his buddy Richard Price to figure out what to do with it. Price struck gold and immediately noticed the relevance of Bayes’ findings. He read it aloud at the Royal Society. Later, Pierre-Simon Laplace, unbeknownst to the work of Bayes, used the same probabilistic approach to perform statistical inference using France’s first census data in the early-Napoleonic era. Somehow we had the answer to statistical inference back then, and we had to rediscover everything again in the late-20th century… ↩︎
disappointing because most of published scientific studies suffer from this flaw. ↩︎
and, like all footnotes here, it deserves its own blog post… ↩︎
Warning: This post has KaTeX enabled, so if you want to view the rendered math formulas, you’ll have to unfortunately enable JavaScript.
I wish I could go back in time and tell my younger self that you can make a machine understand human language with trigonometry. That would definitely have made me more aware and interested in the subject during my school years. I would have looked at triangles, circles, sines, cosines, and tangents in a whole different way. Alas, better late than never.
In this post, we’ll learn how to represent words using word embeddings, and how to use basic trigonometry to play around with them. Of course, we’ll use Julia.
Word embeddings is a way to represent words as a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning.
Ok, let’s unwrap the above definition. First, a real-valued vector is any vector which its elements belong to the real numbers. Generally we denote vectors with a bold lower-case letter, and we denote its elements (also called components) using square brackets. Hence, a vector $\bold{v}$ that has 3 elements, $1$, $2$, and $3$, can be written as
$$\bold{v} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$$
Next, what “close” means for vectors? We can use distance functions to get a measurable value. The most famous and commonly used distance function is the Euclidean distance, in honor of Euclid, the “father of geometry”, and the guy pictured in the image at the top of this post. The Euclidean distance is defined in trigonometry for 2-D and 3-D spaces. However, it can be generalized to any dimension $n > 1$ by using vectors.
Since every word is represented by an $n$-dimensional vector, we can use distances to compute a metric that represent similarity between vectors. And, more interesting, we can add and subtract words (or any other linear combination of one or more words) to generate new words.
Before we jump to code and examples, a quick note about how word embeddings are constructed. They are trained like a regular machine learning algorithm, where the cost function measures the difference between some vector distance between the vectors and a “semantic distance”. The goal is to iteratively find good vector values that minimize the cost. So, if a vector is close to another vector measured by a distance function, but far apart measured by some semantic distance on the words that these vectors represent, then the cost function will be higher. The algorithm cannot change the semantic distance, it is treated as a fixed value. However, it can change the vector elements’ values so that the vector distance function closely resembles the semantic distance function. Lastly, generally the dimensionality of the vectors used in word embeddings are high, $n > 50$, since it needs a proper amount of dimensions in order to represent all the semantic information of words with vectors.
Generally we don’t train our own word embeddings from scratch, we use pre-trained ones. Here is a list of some of the most popular ones:
We will use the Embeddings.jl
package to easily load word embeddings as vectors,
and the Distances.jl
package for the convenience of several distance functions.
This is a nice example of the Julia package ecosystem composability,
where one package can define types, another can define functions,
and another can define custom behavior of these functions on types that
are defined in other packages.
julia> using Embeddings
julia> using Distances
Let’s load the GloVe word embeddings. First, let’s check what we have in store to choose from GloVe’s English language embeddings:
julia> language_files(GloVe{:en})
20-element Vector{String}:
"glove.6B/glove.6B.50d.txt"
"glove.6B/glove.6B.100d.txt"
"glove.6B/glove.6B.200d.txt"
"glove.6B/glove.6B.300d.txt"
"glove.42B.300d/glove.42B.300d.txt"
"glove.840B.300d/glove.840B.300d.txt"
"glove.twitter.27B/glove.twitter.27B.25d.txt"
"glove.twitter.27B/glove.twitter.27B.50d.txt"
"glove.twitter.27B/glove.twitter.27B.100d.txt"
"glove.twitter.27B/glove.twitter.27B.200d.txt"
"glove.6B/glove.6B.50d.txt"
"glove.6B/glove.6B.100d.txt"
"glove.6B/glove.6B.200d.txt"
"glove.6B/glove.6B.300d.txt"
"glove.42B.300d/glove.42B.300d.txt"
"glove.840B.300d/glove.840B.300d.txt"
"glove.twitter.27B/glove.twitter.27B.25d.txt"
"glove.twitter.27B/glove.twitter.27B.50d.txt"
"glove.twitter.27B/glove.twitter.27B.100d.txt"
"glove.twitter.27B/glove.twitter.27B.200d.txt"
I’ll use the "glove.6B/glove.6B.50d.txt"
.
This means that it was trained with 6 billion tokens,
and it provides embeddings with 50-dimensional vectors.
The load_embeddings
function takes an optional second positional
argument as an Int
to choose from which index of the language_files
to use.
Finally, I just want the words “king”, “queen”, “man”, “woman”;
so I am passing these words as a Set
to the keep_words
keyword argument:
julia> const glove = load_embeddings(GloVe{:en}, 1; keep_words=Set(["king", "queen", "man", "woman"]));
Embeddings.EmbeddingTable{Matrix{Float32}, Vector{String}}(Float32[-0.094386 0.50451 -0.18153 0.37854; 0.43007 0.68607 0.64827 1.8233; … ; 0.53135 -0.64426 0.48764 0.0092753; -0.11725 -0.51042 -0.10467 -0.60284], ["man", "king", "woman", "queen"])
Watch out with the order that we get back.
If you see the output of load_embeddings
,
the order is "man", "king", "woman", "queen"]
Let’s see how a word is represented:
julia> queen = glove.embeddings[:, 4]
50-element Vector{Float32}:
0.37854
1.8233
-1.2648
⋮
-2.2839
0.0092753
-0.60284
They are 50-dimensional vectors of Float32
.
Now, here’s the fun part: let’s add words and check the similarity between the result and some other word. A classical example is to start with the word “king”, subtract the word “men”, add the word “woman”, and check the distance of the result to the word “queen”:
julia> man = glove.embeddings[:, 1];
julia> king = glove.embeddings[:, 2];
julia> woman = glove.embeddings[:, 3];
julia> cosine_dist(king - man + woman, queen)
0.13904202f0
This is less than 1/4 of the distance of “woman” to “king”:
julia> cosine_dist(woman, king)
0.58866215f0
Feel free to play around with others words. If you want suggestions, another classical example is:
cosine_dist(Madrid - Spain + France, Paris)
I think that by allying interesting applications to abstract math topics like trigonometry is the vital missing piece in STEM education. I wish every new kid that is learning math could have the opportunity to contemplate how new and exciting technologies have some amazing simple math under the hood. If you liked this post, you would probably like linear algebra. I would highly recommend Gilbert Strang’s books and 3blue1brown series on linear algebra.
This post is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
]]>Let’s dive into the concept of “soydev”, a term often used pejoratively to describe developers with a superficial understanding of technology. I provide my definition of what soydev is, why is bad, and how it came to be. To counteract soydev inclinations, I propose an abstract approach centered on timeless concepts, protocols, and first principles, fostering a mindset of exploration, resilience in the face of failure, and an insatiable hunger for knowledge.
While we’ll start with a look at the soydev stereotype, our journey will lead us to a wider reflection on the importance of depth in technological understanding.
First, let’s tackle the definition of soydev. Urban Dictionary provides two interesting definitions:
Urban Dictionary definition 1:
Soydev is a “programmer” that works at a bigh tech company and only knows JavaScript and HTML. They love IDEs like Visual Studio Code and inefficient frameworks that slow their code down. They represent the majority of “programmers” today and if their numbers continue growing, not one person on earth will know how a computer works by the year 2050 when all the gigachad 1980s C and Unix programmers are gone.
Urban Dictionary definition 2:
Soydev is a type of most abundant Software Developer. The Software he/she makes is always inefficient and uses more CPU and RAM than it should. This person always prefers hard work to smart work, Has little or no knowledge of existing solutions of a problem, Comes up with very complex solution for a simple problem and has fear of native and fast programming languages like C, C++ and Rust
These definitions give a glimpse of what a soydev is. However, they are loaded with pejorative language, and also are based on non-timeless technologies and tools. I, much prefer to rely on concepts and principles that are timeless. Hence, I will provide my own definition of soydev:
Soydev is someone who only has a superficial conception of technology and computers that is restricted to repeating patterns learned from popular workflows on the internet; but who doesn’t dedicate time or effort to learning concepts in a deeper way.
Although soydev is a term with specific connotations, it opens the door to a larger conversation about the depth of our engagement with technology. This superficiality is not unique to soydevs but is a symptom of a broader trend in our relationship with technology.
Most of us start our journey in a skill by having the superficial conception of it. However, some are not satisfied with this superficial conception, and strive to understand what lies beyond the surface.
Understanding concepts from first principles allows us to achieve a deep graceful kind of mastery that when seems almost effortless to others. Deep down lies a lot of effort and time spent in learning and practicing. Innumerable hours of deep thinking and reflecting on why things are the way they are, and how they could be different if you tried to implement them from scratch yourself.
There is also an inherently rare mixture of curiosity and creativity in the process of profoundly learning and understanding concepts in this way. You start not only to ask the “Why?” questions but also the “What if?” questions. I feel that this posture on understanding concepts paves the way for joyful mastery.
Richard Feynman once said “What I cannot create, I do not understand”. You cannot create anything that you don’t know the underlying concepts. Therefore, by allying creativity and discovery with deep knowledge, Feynman’s argument was that in order for you truly master something, you’ll need to be able to recreate it from scratch.
If you are struggling with my abstractions, I can provide some concrete examples. A soydev might be someone who:
First, let’s understand that being a soydev is not necessarily bad, but is highly limited on his ability and curiosity. A soydev will never be able to achieve the same level of mastery as someone who is willing to go deep and learn concepts from first principles.
Now, on the other hand, soydev is bad because it perpetuates a mindset of superficiality. The path of technology innovation is guided by curiosity and creativity. And paved with hard work and deep understanding. Imagine if all the great minds in technology took the easy path of mindless tooling and problem-solving? We would be in a stagnant and infertile scenario, where everyone would use the same technology and tools without questioning or thinking about the problems that they are trying to solve.
Hence, the culture of soydev is bad for the future of technology, where most new developers will be highly limited in their ability to innovate.
I think that soydev culture is highly correlated with the increase of technology and decrease of barriers to access such technology. We live in an age that not only technology is everywhere, but also to interact with it is quite effortless.
My computational statistician mind is always aware of cognitive and statistical bias. Whenever I see a correlation across time, I always take a step back and try to think about the assumptions and conceptual models behind it.
Does the increase in technology usage and importance in daily life results in more people using technology from a professional point-of-view? Yes. Does the increase in people professionally using technology results in an increase of tooling and conceptual abstractions that allows superficial interactions without need to deeply understand the concepts behind such technology? I do think that this is true as well.
These assumptions cover the constituents of the rise of soydev from a “demand” viewpoint. Nevertheless, there is also the analogous “supply” viewpoint. If these trends in demand are not met by trends in supply, we would not see the establishment of the soydev phenomenon. There is an emerging trend to standardize all the available tech into commodities.
While commoditization of technological solutions has inherent advantages, such as scalability and lower opportunity costs, it has some disadvantages. The main disadvantage is the abrupt decrease of technological innovations. If we have strong standardization that are enforced by market and social forces, then why care to innovate? Why bring new solutions or new ways to solve problems if it will not be adopted and are doomed to oblivion? Why decide to try to do things different if there is such a high maintenance cost, especially when training and expanding human resources capable of dealing with such non-standard solutions?
In this context, technological innovation can only be undertaken by big corporations that, not only have big budgets, but also big influence to push its innovations as industry standards.
Don’t get me wrong: I do think that industry standards are important. However, I much prefer a protocol standard than product standards. First, protocol standards are generally not tied to a single company or brand. Second, protocol standards have a higher propensity to expose its underlying concepts to developers. Think about TCP/IP versus your favorite front-end framework: Which one would result in deeper understanding of the underlying concepts?
The rise of soydevs mirrors a societal shift towards immediate gratification and away from the pursuit of deep knowledge.
Despite these unstoppable trends I do think that it is possible to use tools and shallow abstractions without being a soydev. Or, to stop being a soydev and advance towards deep understanding of what constitutes your craft. Moving beyond the ‘soydev’ mindset is about embracing the richness that comes from a deep understanding of technology. Here is a short, not by any means exhaustive list of things that you can start doing:
This post is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
]]>Here’s a list of some resources that I’ve made or contributed:
Stan
and
Turing.jl
code examples.Turing.jl
TuringGLM.jl
R
Rcpp
tutorials (Portuguese)I don’t have social media, since I think they are overrated and “they sell your data”. If you want to contact me, please send an email.
]]>