# Computational Tricks with Turing (Non-Centered Parametrization and QR Decomposition)

There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:

1. QR Decomposition

2. Non-Centered Parametrization

## QR Decomposition

Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:

• $\mathbf{Q}$: an orthogonal matrix (its columns are orthogonal unit vectors meaning $\mathbf{Q}^T = \mathbf{Q}^{-1})$.

• $\mathbf{R}$: an upper triangular matrix.

This is commonly known as the QR Decomposition:

$\mathbf{A} = \mathbf{Q} \cdot \mathbf{R}$

Let me show you an example with a random matrix $\mathbf{A} \in \mathbb{R}^{3 \times 2}$:

A = rand(3, 2)
3×2 Matrix{Float64}:
0.720103  0.295367
0.573619  0.276597
0.664468  0.983436

Now let's factor A using LinearAlgebra's qr() function:

using LinearAlgebra: qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor: 3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}
R factor:
2×2 Matrix{Float64}:
-1.13539  -0.902615
0.0       0.562299

Notice that qr() produced a tuple containing two matrices Q and R. Q is a 3x3 orthogonal matrix. And R is a 2x2 upper triangular matrix. So that $\mathbf{Q}^T = \mathbf{Q}^{-1}$ (the transpose is equal the inverse):

Matrix(Q') ≈ Matrix(Q^-1)
MethodError: no method matching ^(::LinearAlgebra.AdjointQ{Float64, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}}, ::Int64)

Closest candidates are:
^(!Matched::Float16, ::Integer)
@ Base math.jl:1283
^(!Matched::Regex, ::Integer)
@ Base regex.jl:863
^(!Matched::Float32, ::Integer)
@ Base math.jl:1277
...



Also note that $\mathbf{Q}^T \cdot \mathbf{Q}^{-1} = \mathbf{I}$ (identity matrix):

Q' * Q ≈ I(3)
true

This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of $\mathbf{X}$, i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq example in 6. Bayesian Linear Regression:

using Turing
using LinearAlgebra: I
using Statistics: mean, std
using Random: seed!
seed!(123)

@model function linreg(X, y; predictors=size(X, 2))
#priors
α ~ Normal(mean(y), 2.5 * std(y))
β ~ filldist(TDist(3), predictors)
σ ~ Exponential(1)

#likelihood
return y ~ MvNormal(α .+ X * β, σ^2 * I)
end;

using DataFrames
using CSV
using HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
kidiq = CSV.read(HTTP.get(url).body, DataFrame)
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 6.24 seconds
Compute duration  = 23.56 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64

α   21.5126    8.6720    0.2217   1528.8886   1868.2612    1.0009       64.8962
β[1]    2.0014    1.7943    0.0419   2281.1940   1734.6512    1.0016       96.8290
β[2]    0.5788    0.0584    0.0013   2163.9754   2292.8814    1.0006       91.8534
β[3]    0.2566    0.3092    0.0074   1762.0214   2135.6795    1.0010       74.7919
σ   17.8859    0.6033    0.0106   3271.1669   2347.2435    1.0008      138.8500

Quantiles
parameters      2.5%     25.0%     50.0%     75.0%     97.5%
Symbol   Float64   Float64   Float64   Float64   Float64

α    4.7278   15.7633   21.2942   27.4322   38.4426
β[1]   -0.5876    0.7324    1.6761    2.9919    6.3388
β[2]    0.4662    0.5392    0.5793    0.6184    0.6924
β[3]   -0.3477    0.0440    0.2588    0.4733    0.8490
σ   16.7525   17.4685   17.8796   18.2703   19.1238


See the wall duration in Turing's chain: for me it took around 24 seconds.

Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the $\mathbf{Q}$ and $\mathbf{R}$ matrices by a factor of $\sqrt{n-1}$ where $n$ is the number of rows of $\mathbf{X}$. In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:

\begin{aligned} x &= \mathbf{Q}^* \mathbf{R}^* \\ \mathbf{Q}^* &= \mathbf{Q} \cdot \sqrt{n - 1} \\ \mathbf{R}^* &= \frac{1}{\sqrt{n - 1}} \cdot \mathbf{R}\\ \boldsymbol{\mu} &= \alpha + \mathbf{X} \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \mathbf{R}^* \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot (\mathbf{R}^* \cdot \boldsymbol{\beta}) + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \widetilde{\boldsymbol{\beta}} + \sigma \\ \end{aligned}

Then we can recover the original $\boldsymbol{\beta}$ with:

$\boldsymbol{\beta} = \mathbf{R}^{*-1} \cdot \widetilde{\boldsymbol{\beta}}$

In Turing, a model with QR decomposition would be the same linreg but with a different X matrix supplied, since it is a data transformation. First, we decompose your model data X into Q and R:

Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);

Then, we instantiate a model with Q instead of X and sample as you would:

model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:2000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 2.65 seconds
Compute duration  = 7.82 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

α    32.9057    7.8274    0.2470   1006.3265   1242.3293    1.0026      128.7521
β[1]   -49.9922    7.0245    0.2210   1009.4898   1401.5358    1.0022      129.1568
β[2]    22.0858    3.5735    0.1101   1054.2580   1418.8568    1.0028      134.8846
β[3]     0.2869    0.8775    0.0238   1370.8734   1800.6496    1.0010      175.3932
σ    17.8703    0.5859    0.0113   2699.9100   2464.9195    1.0019      345.4337

Quantiles
parameters       2.5%      25.0%      50.0%      75.0%      97.5%
Symbol    Float64    Float64    Float64    Float64    Float64

α    17.6798    27.6922    32.7076    38.1740    47.9793
β[1]   -63.6746   -54.6970   -50.1683   -45.2700   -36.2948
β[2]    15.1066    19.7019    22.1804    24.4485    29.1569
β[3]    -1.3554    -0.2981     0.2697     0.8374     2.1505
σ    16.7293    17.4690    17.8683    18.2608    19.0084


See the wall duration in Turing's chain_qr: for me it took around 5 seconds. Much faster than the regular linreg. Now we have to reconstruct our $\boldsymbol{\beta}$s:

betas = mapslices(
x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β), :].value.data; dims=[2]
)
chain_beta = setrange(
Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:2_000 ) chain_qr_reconstructed = hcat(chain_beta, chain_qr) Chains MCMC chain (1000×20×4 Array{Float64, 3}): Iterations = 1001:1:2000 Number of chains = 4 Samples per chain = 1000 parameters = real_β[1], real_β[2], real_β[3], α, β[1], β[2], β[3], σ internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec Symbol Float64 Float64 Float64 Float64 Float64 Float64 Missing real_β[1] 6.2103 2.1904 0.0339 4190.5377 2075.4888 1.0015 missing real_β[2] 0.5062 0.0616 0.0015 1657.4420 2135.3528 1.0021 missing real_β[3] -0.0701 0.2143 0.0058 1370.8734 1800.6496 1.0010 missing α 32.9057 7.8274 0.2470 1006.3265 1242.3293 1.0026 missing β[1] -49.9922 7.0245 0.2210 1009.4898 1401.5358 1.0022 missing β[2] 22.0858 3.5735 0.1101 1054.2580 1418.8568 1.0028 missing β[3] 0.2869 0.8775 0.0238 1370.8734 1800.6496 1.0010 missing σ 17.8703 0.5859 0.0113 2699.9100 2464.9195 1.0019 missing Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 real_β[1] 1.8485 4.7441 6.2339 7.7050 10.3766 real_β[2] 0.3815 0.4656 0.5071 0.5480 0.6252 real_β[3] -0.5252 -0.2045 -0.0659 0.0728 0.3310 α 17.6798 27.6922 32.7076 38.1740 47.9793 β[1] -63.6746 -54.6970 -50.1683 -45.2700 -36.2948 β[2] 15.1066 19.7019 22.1804 24.4485 29.1569 β[3] -1.3554 -0.2981 0.2697 0.8374 2.1505 σ 16.7293 17.4690 17.8683 18.2608 19.0084  ## Non-Centered Parametrization Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size $L$ and the $\epsilon$ factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003): using CairoMakie using Distributions funnel_y = rand(Normal(0, 3), 10_000) funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2) f, ax, s = scatter( funnel_x, funnel_y; color=(:steelblue, 0.3), axis=(; xlabel=L"X", ylabel=L"Y", limits=(-100, 100, nothing, nothing)), ) Neal's Funnel Here we see that in upper part of the funnel HMC has to take few steps $L$ and be more liberal with the $\epsilon$ factor. But, the opposite is in the lower part of the funnel: way more steps $L$ and be more conservative with the $\epsilon$ factor. To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample: @model function funnel() y ~ Normal(0, 3) return x ~ Normal(0, exp(y / 2)) end chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 1_000, 4) Chains MCMC chain (1000×14×4 Array{Float64, 3}): Iterations = 501:1:1500 Number of chains = 4 Samples per chain = 1000 Wall duration = 5.91 seconds Compute duration = 23.42 seconds parameters = y, x internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64 y 0.3304 2.5005 0.4266 34.1757 64.0933 1.0981 1.4589 x -0.7113 8.8403 0.6150 467.3926 188.6939 1.0944 19.9527 Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 y -3.2368 -2.1271 0.0819 1.9243 5.8793 x -13.3185 -0.6603 -0.2764 0.4686 6.4055  Wow, take a look at those rhat values... That sucks: all are above 1.01 even with 4 parallel chains with 1,000 iterations! How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner: $\text{Normal}(\mu, \sigma) = \text{Standard Normal} \cdot \sigma + \mu$ where the standard normal is the normal with mean $\mu = 0$ and standard deviation $\sigma = 1$. This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before. @model function ncp_funnel() x̃ ~ Normal() ỹ ~ Normal() y = 3.0 * ỹ # implies y ~ Normal(0, 3) return x = exp(y / 2) * x̃ # implies x ~ Normal(0, exp(y / 2)) end chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 1_000, 4) Chains MCMC chain (1000×14×4 Array{Float64, 3}): Iterations = 501:1:1500 Number of chains = 4 Samples per chain = 1000 Wall duration = 5.38 seconds Compute duration = 21.32 seconds parameters = x̃, ỹ internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64 x̃ -0.0006 1.0002 0.0161 3829.5014 2972.2855 0.9999 179.6454 ỹ -0.0444 1.0054 0.0160 3917.0636 2987.0567 0.9999 183.7530 Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 x̃ -1.8916 -0.6865 -0.0231 0.6704 2.0065 ỹ -2.0393 -0.7107 -0.0524 0.6362 1.9541  Much better now: all rhat are well below 1.01 (or below 0.99). How we would implement this a real-world model in Turing? Let's go back to the cheese random-intercept model in 11. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP): @model function varying_intercept( X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2) ) #priors α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept β ~ filldist(Normal(0, 2), predictors) # population-level coefficients σ ~ Exponential(std(y)) # residual SD #prior for variance of random intercepts #usually requires thoughtful specification τ ~ truncated(Cauchy(0, 2); lower=0) # group-level SDs intercepts αⱼ ~ filldist(Normal(0, τ), n_gr) # CP group-level intercepts #likelihood ŷ = α .+ X * β .+ αⱼ[idx] return y ~ MvNormal(ŷ, σ^2 * I) end; To perform a Non-Centered Parametrization (NCP) in this model we do as following: @model function varying_intercept_ncp( X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2) ) #priors α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept β ~ filldist(Normal(0, 2), predictors) # population-level coefficients σ ~ Exponential(std(y)) # residual SD #prior for variance of random intercepts #usually requires thoughtful specification τ ~ truncated(Cauchy(0, 2); lower=0) # group-level SDs intercepts zⱼ ~ filldist(Normal(0, 1), n_gr) # NCP group-level intercepts #likelihood ŷ = α .+ X * β .+ zⱼ[idx] .* τ return y ~ MvNormal(ŷ, σ^2 * I) end; Here we are using a NCP with the zⱼs following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼs by τ. Since the original αⱼs had a prior centered on 0 with standard deviation τ, we only have to use the multiplication by τ to get back the αⱼs. Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model: url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv" cheese = CSV.read(HTTP.get(url).body, DataFrame) for c in unique(cheese[:, :cheese]) cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
if b == "rural"
1
elseif b == "urban"
2
else
missing
end
end

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];

model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 7.72 seconds
Compute duration  = 26.82 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

α    70.6388    5.4554    0.2162    915.6164    759.2940    1.0028       34.1457
β[1]     2.9639    1.3435    0.0265   2565.3867   2773.4856    1.0012       95.6698
β[2]   -10.5915    1.3904    0.0277   2522.8976   2504.4545    1.0016       94.0853
β[3]     6.5485    1.3619    0.0264   2668.6532   2413.9865    0.9997       99.5209
β[4]     1.0905    1.3750    0.0285   2338.0486   2471.5094    1.0010       87.1918
σ     7.4087    0.4507    0.0088   2643.0429   2581.2624    1.0001       98.5658
τ     6.2641    5.7017    0.1952   1222.7006   1203.5556    1.0038       45.5976
αⱼ[1]    -3.3228    5.3516    0.2167    865.3478    780.3551    1.0027       32.2710
αⱼ[2]     3.7386    5.3657    0.2173    894.2647    746.3891    1.0034       33.3494

Quantiles
parameters       2.5%      25.0%      50.0%     75.0%     97.5%
Symbol    Float64    Float64    Float64   Float64   Float64

α    59.8857    68.3215    70.7533   73.2141   81.3633
β[1]     0.3781     2.0754     2.9757    3.8536    5.5841
β[2]   -13.2884   -11.5392   -10.6060   -9.6587   -7.8122
β[3]     3.9443     5.6304     6.5351    7.4435    9.2425
β[4]    -1.6999     0.1515     1.1072    2.0425    3.7349
σ     6.5891     7.0826     7.3872    7.7026    8.3437
τ     1.8181     3.2859     4.6436    7.1806   20.4141
αⱼ[1]   -14.0825    -5.6883    -3.3988   -1.1424    7.5331
αⱼ[2]    -6.5136     1.3252     3.5179    5.9215   14.7231


Now let's do the NCP hierarchical model:

model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 11.84 seconds
Compute duration  = 45.18 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

α    71.1250    3.5397    0.1218    878.2014    784.3499    1.0054       19.4365
β[1]     2.9093    1.2858    0.0297   1876.0233   2481.1601    1.0026       41.5206
β[2]   -10.6841    1.3789    0.0608    571.3700    192.4652    1.0074       12.6457
β[3]     6.5318    1.3379    0.0326   1713.4983   1455.9135    1.0030       37.9235
β[4]     1.0746    1.3279    0.0316   1767.8948   2009.5112    1.0071       39.1274
σ     7.3765    0.4561    0.0158    777.6440    188.6146    1.0139       17.2110
τ     5.0157    2.6654    0.1354    604.8463    193.9385    1.0075       13.3866
zⱼ[1]    -0.9156    0.7846    0.0225   1184.6917   1334.6682    1.0041       26.2199
zⱼ[2]     0.8326    0.8046    0.0223   1253.2053    947.4690    1.0023       27.7362

Quantiles
parameters       2.5%      25.0%      50.0%     75.0%     97.5%
Symbol    Float64    Float64    Float64   Float64   Float64

α    63.6608    68.9974    71.0956   73.1433   78.5094
β[1]     0.3584     2.0549     2.9066    3.7865    5.4293
β[2]   -13.5656   -11.5643   -10.6594   -9.7752   -7.9711
β[3]     3.9994     5.5582     6.4857    7.4393    9.0957
β[4]    -1.4433     0.2116     1.0321    1.9658    3.6415
σ     6.5332     7.0626     7.3592    7.6827    8.3100
τ     1.8047     3.1023     4.2758    6.3272   11.7515
zⱼ[1]    -2.5756    -1.4160    -0.8862   -0.3486    0.5239
zⱼ[2]    -0.6131     0.2661     0.7795    1.3822    2.4805


Notice that some models are better off with a standard Centered Parametrization (as is our cheese case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼs. We can do this by multiplying zⱼ[idx] .* τ:

τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(
x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data; dims=[2]
)
chain_ncp_reconstructed = hcat(
MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[\$i]" for i in 1:length(unique(idx))])
)
Chains MCMC chain (1000×23×4 Array{Float64, 3}):

Iterations        = 1:1000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 11.84 seconds
Compute duration  = 45.18 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2], αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

α    71.1250    3.5397    0.1218    878.2014    784.3499    1.0054       19.4365
β[1]     2.9093    1.2858    0.0297   1876.0233   2481.1601    1.0026       41.5206
β[2]   -10.6841    1.3789    0.0608    571.3700    192.4652    1.0074       12.6457
β[3]     6.5318    1.3379    0.0326   1713.4983   1455.9135    1.0030       37.9235
β[4]     1.0746    1.3279    0.0316   1767.8948   2009.5112    1.0071       39.1274
σ     7.3765    0.4561    0.0158    777.6440    188.6146    1.0139       17.2110
τ     5.0157    2.6654    0.1354    604.8463    193.9385    1.0075       13.3866
zⱼ[1]    -0.9156    0.7846    0.0225   1184.6917   1334.6682    1.0041       26.2199
zⱼ[2]     0.8326    0.8046    0.0223   1253.2053    947.4690    1.0023       27.7362
αⱼ[1]    -4.5922    3.9351    0.1128   1184.6917   1334.6682    1.0041       26.2199
αⱼ[2]     4.1762    4.0358    0.1118   1253.2053    947.4690    1.0023       27.7362

Quantiles
parameters       2.5%      25.0%      50.0%     75.0%     97.5%
Symbol    Float64    Float64    Float64   Float64   Float64

α    63.6608    68.9974    71.0956   73.1433   78.5094
β[1]     0.3584     2.0549     2.9066    3.7865    5.4293
β[2]   -13.5656   -11.5643   -10.6594   -9.7752   -7.9711
β[3]     3.9994     5.5582     6.4857    7.4393    9.0957
β[4]    -1.4433     0.2116     1.0321    1.9658    3.6415
σ     6.5332     7.0626     7.3592    7.6827    8.3100
τ     1.8047     3.1023     4.2758    6.3272   11.7515
zⱼ[1]    -2.5756    -1.4160    -0.8862   -0.3486    0.5239
zⱼ[2]    -0.6131     0.2661     0.7795    1.3822    2.4805
αⱼ[1]   -12.9183    -7.1023    -4.4447   -1.7483    2.6277
αⱼ[2]    -3.0749     1.3348     3.9098    6.9328   12.4414


## References

Neal, Radford M. (2003). Slice Sampling. The Annals of Statistics, 31(3), 705–741. Retrieved from https://www.jstor.org/stable/3448413