Computational Tricks with Turing
(Non-Centered Parametrization
and QR Decomposition)

There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:

  1. QR Decomposition

  2. Non-Centered Parametrization

QR Decomposition

Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:

  • \(\mathbf{Q}\): an orthogonal matrix (its columns are orthogonal unit vectors meaning \(\mathbf{Q}^T = \mathbf{Q}^{-1})\).

  • \(\mathbf{R}\): an upper triangular matrix.

This is commonly known as the QR Decomposition:

\[ \mathbf{A} = \mathbf{Q} \cdot \mathbf{R} \]

Let me show you an example with a random matrix \(\mathbf{A} \in \mathbb{R}^{3 \times 2}\):

A = rand(3, 2)
3×2 Matrix{Float64}:
 0.615669  0.182872
 0.602221  0.805418
 0.936869  0.645826

Now let's factor A using LinearAlgebra's qr() function:

using LinearAlgebra:qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}}
Q factor:
3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}:
 -0.483798   0.604397  -0.632964
 -0.473231  -0.78905   -0.39173
 -0.736201   0.11002    0.667761
R factor:
2×2 Matrix{Float64}:
 -1.27257  -0.94508
  0.0      -0.453934

Notice that qr() produced a tuple containing two matrices Q and R. Q is a 3x3 orthogonal matrix. And R is a 2x2 upper triangular matrix. So that \(\mathbf{Q}^T = \mathbf{Q}^{-1}\) (the transpose is equal the inverse):

Matrix(Q') ≈ Matrix(Q^-1)
true

Also note that \(\mathbf{Q}^T \cdot \mathbf{Q}^{-1} = \mathbf{I}\) (identity matrix):

Q' * Q ≈ I(3)
true

This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of \(\mathbf{X}\), i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq example in 6. Bayesian Linear Regression:

using Turing
using Statistics: mean, std
using Random:seed!
seed!(123)

@model linreg(X, y; predictors=size(X, 2)) = begin
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))
    β ~ filldist(TDist(3), predictors)
    σ ~ Exponential(1)

    #likelihood
    y ~ MvNormal(α .+ X * β, σ)
end;

using DataFrames, CSV, HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
kidiq = CSV.read(HTTP.get(url).body, DataFrame)
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 26.33 seconds
Compute duration  = 49.41 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α   21.7294    8.5853     0.0960    0.1390   3485.3416    1.0004       70.5449
        β[1]    2.0249    1.7880     0.0200    0.0291   3122.9675    1.0010       63.2103
        β[2]    0.5799    0.0575     0.0006    0.0009   4521.7685    1.0004       91.5227
        β[3]    0.2411    0.3036     0.0034    0.0043   4183.4396    1.0000       84.6747
           σ   17.8769    0.5978     0.0067    0.0075   6158.8479    1.0008      124.6579

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           α    4.9794   15.9707   21.6723   27.4182   38.6449
        β[1]   -0.6166    0.7170    1.7000    3.0248    6.3278
        β[2]    0.4667    0.5418    0.5809    0.6177    0.6928
        β[3]   -0.3610    0.0384    0.2422    0.4429    0.8434
           σ   16.7538   17.4746   17.8553   18.2619   19.1156

See the wall duration in Turing's chain: for me it took around 24 seconds.

Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the \(\mathbf{Q}\) and \(\mathbf{R}\) matrices by a factor of \(\sqrt{n-1}\) where \(n\) is the number of rows of \(\mathbf{X}\). In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:

\[ \begin{aligned} x &= \mathbf{Q}^* \mathbf{R}^* \\ \mathbf{Q}^* &= \mathbf{Q} \cdot \sqrt{n - 1} \\ \mathbf{R}^* &= \frac{1}{\sqrt{n - 1}} \cdot \mathbf{R}\\ \boldsymbol{\mu} &= \alpha + \mathbf{X} \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \mathbf{R}^* \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot (\mathbf{R}^* \cdot \boldsymbol{\beta}) + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \widetilde{\boldsymbol{\beta}} + \sigma \\ \end{aligned} \]

Then we can recover the original \(\boldsymbol{\beta}\) with:

\[ \boldsymbol{\beta} = \mathbf{R}^{*-1} \cdot \widetilde{\boldsymbol{\beta}} \]

In Turing, a model with QR decomposition would be the same linreg but with a different X matrix supplied, since it is a data transformation. First, we decompose your model data X into Q and R:

Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);

Then, we instantiate a model with Q instead of X and sample as you would:

model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 10.66 seconds
Compute duration  = 21.2 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    33.2749    7.9295     0.0887    0.1678   2113.0783    1.0003       99.6970
        β[1]   -49.6645    7.0809     0.0792    0.1498   2117.7218    1.0003       99.9161
        β[2]    21.8955    3.6100     0.0404    0.0757   2147.1293    1.0002      101.3036
        β[3]     0.2928    0.9071     0.0101    0.0167   2942.9498    1.0001      138.8511
           σ    17.8676    0.5988     0.0067    0.0090   4805.0471    0.9999      226.7066

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

           α    18.4666    27.8142    32.9991    38.4474    49.3556
        β[1]   -63.0827   -54.5222   -49.8503   -45.0637   -35.2238
        β[2]    14.4705    19.5789    21.9879    24.3914    28.6957
        β[3]    -1.3998    -0.3152     0.2446     0.8594     2.2182
           σ    16.7584    17.4485    17.8652    18.2675    19.0575

See the wall duration in Turing's chain_qr: for me it took around 5 seconds. Much faster than the regular linreg. Now we have to reconstruct our \(\boldsymbol{\beta}\)s:

betas = mapslices(x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β),:].value.data, dims=[2])
chain_beta = setrange(Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:3_000)
chain_qr_reconstructed = hcat(chain_beta, chain_qr)
Chains MCMC chain (2000×20×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
parameters        = real_β[1], real_β[2], real_β[3], α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64

   real_β[1]     6.2755    2.2081     0.0247    0.0265   7885.8353    0.9996
   real_β[2]     0.5023    0.0633     0.0007    0.0011   3402.7897    1.0001
   real_β[3]    -0.0715    0.2215     0.0025    0.0041   2942.9498    1.0001
           α    33.2749    7.9295     0.0887    0.1678   2113.0783    1.0003
        β[1]   -49.6645    7.0809     0.0792    0.1498   2117.7218    1.0003
        β[2]    21.8955    3.6100     0.0404    0.0757   2147.1293    1.0002
        β[3]     0.2928    0.9071     0.0101    0.0167   2942.9498    1.0001
           σ    17.8676    0.5988     0.0067    0.0090   4805.0471    0.9999

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

   real_β[1]     1.9200     4.7657     6.3056     7.7812    10.5096
   real_β[2]     0.3778     0.4602     0.5030     0.5456     0.6249
   real_β[3]    -0.5417    -0.2099    -0.0597     0.0770     0.3418
           α    18.4666    27.8142    32.9991    38.4474    49.3556
        β[1]   -63.0827   -54.5222   -49.8503   -45.0637   -35.2238
        β[2]    14.4705    19.5789    21.9879    24.3914    28.6957
        β[3]    -1.3998    -0.3152     0.2446     0.8594     2.2182
           σ    16.7584    17.4485    17.8652    18.2675    19.0575

Non-Centered Parametrization

Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size \(L\) and the \(\epsilon\) factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003):

using StatsPlots, Distributions, LaTeXStrings
funnel_y = rand(Normal(0, 3), 10_000)
funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2)

scatter((funnel_x, funnel_y),
        label=false, mc=:steelblue, ma=0.3,
        xlabel=L"X", ylabel=L"Y",
        xlims=(-100, 100))

Neal's Funnel

Here we see that in upper part of the funnel HMC has to take few steps \(L\) and be more liberal with the \(\epsilon\) factor. But, the opposite is in the lower part of the funnel: way more steps \(L\) and be more conservative with the \(\epsilon\) factor.

To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample:

@model funnel() = begin
    y ~ Normal(0, 3)
    x ~ Normal(0, exp(y / 2))
end

    chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×14×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 7.55 seconds
Compute duration  = 13.65 seconds
parameters        = y, x
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64       Float64

           y    1.3884    2.3810     0.0266    0.1494   160.3485    1.0422       11.7480
           x    0.1935    8.7429     0.0977    0.2901   607.2345    1.0093       44.4893

Quantiles
  parameters       2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol    Float64   Float64   Float64   Float64   Float64

           y    -2.3844   -0.4325    1.2142    2.9793    6.5042
           x   -13.3863   -1.0163   -0.0697    1.1303   16.0996

Wow, take a look at those rhat values... That sucks: all are above 1.01 even with 4 parallel chains with 2,000 iterations!

How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner:

\[ \text{Normal}(\mu, \sigma) = \text{Standard Normal} \cdot \sigma + \mu \]

where the standard normal is the normal with mean \(\mu = 0\) and standard deviation \(\sigma = 1\). This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before.

@model ncp_funnel() = begin
    x̃ ~ Normal()
    ỹ ~ Normal()
    y = 3.0 * ỹ         # implies y ~ Normal(0, 3)
    x = exp(y / 2) * x̃  # implies x ~ Normal(0, exp(y / 2))
end

chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×14×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 7.38 seconds
Compute duration  = 13.68 seconds
parameters        = x̃, ỹ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

           x̃   -0.0069    0.9970     0.0111    0.0109   8178.0937    0.9999      597.8139
           ỹ   -0.0047    1.0121     0.0113    0.0100   8018.0031    0.9996      586.1113

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           x̃   -1.9425   -0.6828    0.0018    0.6641    1.9180
           ỹ   -1.9909   -0.6893    0.0031    0.6811    1.9658

Much better now: all rhat are well below 1.01 (or below 0.99).

How we would implement this a real-world model in Turing? Let's go back to the cheese random-intercept model in 10. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP):

@model varying_intercept(X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)) = begin
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(1 / std(y))             # residual SD
    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2), 0, Inf)     # group-level SDs intercepts
    αⱼ ~ filldist(Normal(0, τ), n_gr)       # CP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ αⱼ[idx]
    y ~ MvNormal(ŷ, σ)
end;

To perform a Non-Centered Parametrization (NCP) in this model we do as following:

@model varying_intercept_ncp(X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)) = begin
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(1 / std(y))             # residual SD

    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2), 0, Inf)    # group-level SDs intercepts
    zⱼ ~ filldist(Normal(0, 1), n_gr)      # NCP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ zⱼ[idx] .* τ
    y ~ MvNormal(ŷ, σ)
end;

Here we are using a NCP with the zⱼs following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼs by τ. Since the original αⱼs had a prior centered on 0 with standard deviation τ, we only have to use the multiplication by τ to get back the αⱼs.

Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model:

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)

for c in unique(cheese[:, :cheese])
    cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
    b == "rural" ? 1 :
    b == "urban" ? 2 : missing
end

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];

model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×21×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 30.24 seconds
Compute duration  = 58.6 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    70.7190    5.2616     0.0588    0.1773    906.7976    1.0058       15.4749
        β[1]     3.2351    1.2543     0.0140    0.0226   3335.1223    0.9998       56.9153
        β[2]   -11.6323    1.2699     0.0142    0.0218   3482.0497    0.9999       59.4227
        β[3]     7.1701    1.2307     0.0138    0.0207   3306.7974    0.9997       56.4319
        β[4]     1.1979    1.2502     0.0140    0.0206   3513.0851    0.9998       59.9523
           σ     5.9930    0.2745     0.0031    0.0037   5527.6283    1.0004       94.3313
           τ     6.4146    6.6373     0.0742    0.1804   1478.5048    1.0042       25.2313
       αⱼ[1]    -3.4511    5.1655     0.0578    0.1722    952.7034    1.0062       16.2583
       αⱼ[2]     3.7110    5.1772     0.0579    0.1717    960.0203    1.0062       16.3832

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%     97.5%
      Symbol    Float64    Float64    Float64    Float64   Float64

           α    60.2781    68.3590    70.7779    73.2704   80.9692
        β[1]     0.7285     2.3946     3.2539     4.0834    5.6589
        β[2]   -14.1365   -12.4906   -11.6232   -10.7748   -9.1705
        β[3]     4.7085     6.3426     7.1733     7.9974    9.6151
        β[4]    -1.3057     0.3672     1.2095     2.0532    3.6275
           σ     5.4874     5.8044     5.9858     6.1718    6.5540
           τ     1.9390     3.3318     4.7163     7.2482   20.9105
       αⱼ[1]   -13.4753    -5.8040    -3.4689    -1.2218    6.9789
       αⱼ[2]    -6.1605     1.3716     3.5932     5.8833   14.4133

Now let's do the NCP hierarchical model:

model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×21×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 32.55 seconds
Compute duration  = 62.67 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    70.8868    3.6980     0.0413    0.1201    944.4853    1.0052       15.0713
        β[1]     3.1683    1.2649     0.0141    0.0277   1599.3589    1.0034       25.5211
        β[2]   -11.6644    1.2595     0.0141    0.0234   2311.3091    1.0028       36.8818
        β[3]     7.0896    1.2638     0.0141    0.0289   1824.2196    1.0026       29.1093
        β[4]     1.1516    1.2785     0.0143    0.0262   1210.2260    1.0041       19.3117
           σ     6.0035    0.2758     0.0031    0.0049   2900.7301    1.0017       46.2873
           τ     5.3971    3.0013     0.0336    0.1287    339.9252    1.0099        5.4242
       zⱼ[1]    -0.8373    0.7752     0.0087    0.0165   1754.7860    1.0000       28.0013
       zⱼ[2]     0.8637    0.7866     0.0088    0.0226   1167.2946    1.0059       18.6266

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%     97.5%
      Symbol    Float64    Float64    Float64    Float64   Float64

           α    63.1206    68.8006    70.8514    73.0677   78.9521
        β[1]     0.7887     2.2889     3.1632     4.0150    5.6652
        β[2]   -14.0890   -12.5424   -11.6937   -10.7988   -9.1833
        β[3]     4.6131     6.2141     7.1102     7.9595    9.5283
        β[4]    -1.3032     0.2904     1.1481     2.0148    3.6623
           σ     5.4889     5.8138     5.9893     6.1812    6.5681
           τ     1.9159     3.2325     4.5453     6.6889   13.4013
       zⱼ[1]    -2.4647    -1.3441    -0.7969    -0.3060    0.6232
       zⱼ[2]    -0.5411     0.2982     0.8235     1.3713    2.5196

Notice that some models are better off with a standard Centered Parametrization (as is our cheese case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼs. We can do this by multiplying zⱼ[idx] .* τ:

τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data, dims=[2])
chain_ncp_reconstructed = hcat(MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[$i]" for i in 1:length(unique(idx))]))
Chains MCMC chain (2000×23×4 Array{Float64, 3}):

Iterations        = 1:2000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 32.55 seconds
Compute duration  = 62.67 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2], αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    70.8868    3.6980     0.0413    0.1201    944.4853    1.0052       15.0713
        β[1]     3.1683    1.2649     0.0141    0.0277   1599.3589    1.0034       25.5211
        β[2]   -11.6644    1.2595     0.0141    0.0234   2311.3091    1.0028       36.8818
        β[3]     7.0896    1.2638     0.0141    0.0289   1824.2196    1.0026       29.1093
        β[4]     1.1516    1.2785     0.0143    0.0262   1210.2260    1.0041       19.3117
           σ     6.0035    0.2758     0.0031    0.0049   2900.7301    1.0017       46.2873
           τ     5.3971    3.0013     0.0336    0.1287    339.9252    1.0099        5.4242
       zⱼ[1]    -0.8373    0.7752     0.0087    0.0165   1754.7860    1.0000       28.0013
       zⱼ[2]     0.8637    0.7866     0.0088    0.0226   1167.2946    1.0059       18.6266
       αⱼ[1]    -4.5192    4.1838     0.0468    0.0889   1754.7860    1.0000       28.0013
       αⱼ[2]     4.6613    4.2455     0.0475    0.1220   1167.2946    1.0059       18.6266

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%     97.5%
      Symbol    Float64    Float64    Float64    Float64   Float64

           α    63.1206    68.8006    70.8514    73.0677   78.9521
        β[1]     0.7887     2.2889     3.1632     4.0150    5.6652
        β[2]   -14.0890   -12.5424   -11.6937   -10.7988   -9.1833
        β[3]     4.6131     6.2141     7.1102     7.9595    9.5283
        β[4]    -1.3032     0.2904     1.1481     2.0148    3.6623
           σ     5.4889     5.8138     5.9893     6.1812    6.5681
           τ     1.9159     3.2325     4.5453     6.6889   13.4013
       zⱼ[1]    -2.4647    -1.3441    -0.7969    -0.3060    0.6232
       zⱼ[2]    -0.5411     0.2982     0.8235     1.3713    2.5196
       αⱼ[1]   -13.3021    -7.2541    -4.3009    -1.6514    3.3637
       αⱼ[2]    -2.9201     1.6096     4.4446     7.4010   13.5986

References

Neal, Radford M. (2003). Slice Sampling. The Annals of Statistics, 31(3), 705–741. Retrieved from https://www.jstor.org/stable/3448413