Computational Tricks with Turing (Non-Centered Parametrization and QR Decomposition)

There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:

1. QR Decomposition

2. Non-Centered Parametrization

QR Decomposition

Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:

• $$\mathbf{Q}$$: an orthogonal matrix (its columns are orthogonal unit vectors meaning $$\mathbf{Q}^T = \mathbf{Q}^{-1})$$.

• $$\mathbf{R}$$: an upper triangular matrix.

This is commonly known as the QR Decomposition:

$\mathbf{A} = \mathbf{Q} \cdot \mathbf{R}$

Let me show you an example with a random matrix $$\mathbf{A} \in \mathbb{R}^{3 \times 2}$$:

A = rand(3, 2)
3×2 Matrix{Float64}:
0.615669  0.182872
0.602221  0.805418
0.936869  0.645826

Now let's factor A using LinearAlgebra's qr() function:

using LinearAlgebra:qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}}
Q factor:
3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}:
-0.483798   0.604397  -0.632964
-0.473231  -0.78905   -0.39173
-0.736201   0.11002    0.667761
R factor:
2×2 Matrix{Float64}:
-1.27257  -0.94508
0.0      -0.453934

Notice that qr() produced a tuple containing two matrices Q and R. Q is a 3x3 orthogonal matrix. And R is a 2x2 upper triangular matrix. So that $$\mathbf{Q}^T = \mathbf{Q}^{-1}$$ (the transpose is equal the inverse):

Matrix(Q') ≈ Matrix(Q^-1)
true

Also note that $$\mathbf{Q}^T \cdot \mathbf{Q}^{-1} = \mathbf{I}$$ (identity matrix):

Q' * Q ≈ I(3)
true

This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of $$\mathbf{X}$$, i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq example in 6. Bayesian Linear Regression:

using Turing
using Statistics: mean, std
using Random:seed!
seed!(123)

@model linreg(X, y; predictors=size(X, 2)) = begin
#priors
α ~ Normal(mean(y), 2.5 * std(y))
β ~ filldist(TDist(3), predictors)
σ ~ Exponential(1)

#likelihood
y ~ MvNormal(α .+ X * β, σ)
end;

using DataFrames, CSV, HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 26.33 seconds
Compute duration  = 49.41 seconds
parameters        = α, β, β, β, σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

α   21.7294    8.5853     0.0960    0.1390   3485.3416    1.0004       70.5449
β    2.0249    1.7880     0.0200    0.0291   3122.9675    1.0010       63.2103
β    0.5799    0.0575     0.0006    0.0009   4521.7685    1.0004       91.5227
β    0.2411    0.3036     0.0034    0.0043   4183.4396    1.0000       84.6747
σ   17.8769    0.5978     0.0067    0.0075   6158.8479    1.0008      124.6579

Quantiles
parameters      2.5%     25.0%     50.0%     75.0%     97.5%
Symbol   Float64   Float64   Float64   Float64   Float64

α    4.9794   15.9707   21.6723   27.4182   38.6449
β   -0.6166    0.7170    1.7000    3.0248    6.3278
β    0.4667    0.5418    0.5809    0.6177    0.6928
β   -0.3610    0.0384    0.2422    0.4429    0.8434
σ   16.7538   17.4746   17.8553   18.2619   19.1156

See the wall duration in Turing's chain: for me it took around 24 seconds.

Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the $$\mathbf{Q}$$ and $$\mathbf{R}$$ matrices by a factor of $$\sqrt{n-1}$$ where $$n$$ is the number of rows of $$\mathbf{X}$$. In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:

\begin{aligned} x &= \mathbf{Q}^* \mathbf{R}^* \\ \mathbf{Q}^* &= \mathbf{Q} \cdot \sqrt{n - 1} \\ \mathbf{R}^* &= \frac{1}{\sqrt{n - 1}} \cdot \mathbf{R}\\ \boldsymbol{\mu} &= \alpha + \mathbf{X} \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \mathbf{R}^* \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot (\mathbf{R}^* \cdot \boldsymbol{\beta}) + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \widetilde{\boldsymbol{\beta}} + \sigma \\ \end{aligned}

Then we can recover the original $$\boldsymbol{\beta}$$ with:

$\boldsymbol{\beta} = \mathbf{R}^{*-1} \cdot \widetilde{\boldsymbol{\beta}}$

In Turing, a model with QR decomposition would be the same linreg but with a different X matrix supplied, since it is a data transformation. First, we decompose your model data X into Q and R:

Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);

Then, we instantiate a model with Q instead of X and sample as you would:

model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 10.66 seconds
Compute duration  = 21.2 seconds
parameters        = α, β, β, β, σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

α    33.2749    7.9295     0.0887    0.1678   2113.0783    1.0003       99.6970
β   -49.6645    7.0809     0.0792    0.1498   2117.7218    1.0003       99.9161
β    21.8955    3.6100     0.0404    0.0757   2147.1293    1.0002      101.3036
β     0.2928    0.9071     0.0101    0.0167   2942.9498    1.0001      138.8511
σ    17.8676    0.5988     0.0067    0.0090   4805.0471    0.9999      226.7066

Quantiles
parameters       2.5%      25.0%      50.0%      75.0%      97.5%
Symbol    Float64    Float64    Float64    Float64    Float64

α    18.4666    27.8142    32.9991    38.4474    49.3556
β   -63.0827   -54.5222   -49.8503   -45.0637   -35.2238
β    14.4705    19.5789    21.9879    24.3914    28.6957
β    -1.3998    -0.3152     0.2446     0.8594     2.2182
σ    16.7584    17.4485    17.8652    18.2675    19.0575

See the wall duration in Turing's chain_qr: for me it took around 5 seconds. Much faster than the regular linreg. Now we have to reconstruct our $$\boldsymbol{\beta}$$s:

betas = mapslices(x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β),:].value.data, dims=)
chain_beta = setrange(Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:3_000) chain_qr_reconstructed = hcat(chain_beta, chain_qr) Chains MCMC chain (2000×20×4 Array{Float64, 3}): Iterations = 1001:1:3000 Number of chains = 4 Samples per chain = 2000 parameters = real_β, real_β, real_β, α, β, β, β, σ internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std naive_se mcse ess rhat Symbol Float64 Float64 Float64 Float64 Float64 Float64 real_β 6.2755 2.2081 0.0247 0.0265 7885.8353 0.9996 real_β 0.5023 0.0633 0.0007 0.0011 3402.7897 1.0001 real_β -0.0715 0.2215 0.0025 0.0041 2942.9498 1.0001 α 33.2749 7.9295 0.0887 0.1678 2113.0783 1.0003 β -49.6645 7.0809 0.0792 0.1498 2117.7218 1.0003 β 21.8955 3.6100 0.0404 0.0757 2147.1293 1.0002 β 0.2928 0.9071 0.0101 0.0167 2942.9498 1.0001 σ 17.8676 0.5988 0.0067 0.0090 4805.0471 0.9999 Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 real_β 1.9200 4.7657 6.3056 7.7812 10.5096 real_β 0.3778 0.4602 0.5030 0.5456 0.6249 real_β -0.5417 -0.2099 -0.0597 0.0770 0.3418 α 18.4666 27.8142 32.9991 38.4474 49.3556 β -63.0827 -54.5222 -49.8503 -45.0637 -35.2238 β 14.4705 19.5789 21.9879 24.3914 28.6957 β -1.3998 -0.3152 0.2446 0.8594 2.2182 σ 16.7584 17.4485 17.8652 18.2675 19.0575 Non-Centered Parametrization Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size $$L$$ and the $$\epsilon$$ factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003): using StatsPlots, Distributions, LaTeXStrings funnel_y = rand(Normal(0, 3), 10_000) funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2) scatter((funnel_x, funnel_y), label=false, mc=:steelblue, ma=0.3, xlabel=L"X", ylabel=L"Y", xlims=(-100, 100)) Neal's Funnel Here we see that in upper part of the funnel HMC has to take few steps $$L$$ and be more liberal with the $$\epsilon$$ factor. But, the opposite is in the lower part of the funnel: way more steps $$L$$ and be more conservative with the $$\epsilon$$ factor. To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample: @model funnel() = begin y ~ Normal(0, 3) x ~ Normal(0, exp(y / 2)) end chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 2_000, 4) Chains MCMC chain (2000×14×4 Array{Float64, 3}): Iterations = 1001:1:3000 Number of chains = 4 Samples per chain = 2000 Wall duration = 7.55 seconds Compute duration = 13.65 seconds parameters = y, x internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std naive_se mcse ess rhat ess_per_sec Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64 y 1.3884 2.3810 0.0266 0.1494 160.3485 1.0422 11.7480 x 0.1935 8.7429 0.0977 0.2901 607.2345 1.0093 44.4893 Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 y -2.3844 -0.4325 1.2142 2.9793 6.5042 x -13.3863 -1.0163 -0.0697 1.1303 16.0996 Wow, take a look at those rhat values... That sucks: all are above 1.01 even with 4 parallel chains with 2,000 iterations! How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner: $\text{Normal}(\mu, \sigma) = \text{Standard Normal} \cdot \sigma + \mu$ where the standard normal is the normal with mean $$\mu = 0$$ and standard deviation $$\sigma = 1$$. This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before. @model ncp_funnel() = begin x̃ ~ Normal() ỹ ~ Normal() y = 3.0 * ỹ # implies y ~ Normal(0, 3) x = exp(y / 2) * x̃ # implies x ~ Normal(0, exp(y / 2)) end chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 2_000, 4) Chains MCMC chain (2000×14×4 Array{Float64, 3}): Iterations = 1001:1:3000 Number of chains = 4 Samples per chain = 2000 Wall duration = 7.38 seconds Compute duration = 13.68 seconds parameters = x̃, ỹ internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std naive_se mcse ess rhat ess_per_sec Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64 x̃ -0.0069 0.9970 0.0111 0.0109 8178.0937 0.9999 597.8139 ỹ -0.0047 1.0121 0.0113 0.0100 8018.0031 0.9996 586.1113 Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 x̃ -1.9425 -0.6828 0.0018 0.6641 1.9180 ỹ -1.9909 -0.6893 0.0031 0.6811 1.9658 Much better now: all rhat are well below 1.01 (or below 0.99). How we would implement this a real-world model in Turing? Let's go back to the cheese random-intercept model in 10. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP): @model varying_intercept(X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)) = begin #priors α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept β ~ filldist(Normal(0, 2), predictors) # population-level coefficients σ ~ Exponential(1 / std(y)) # residual SD #prior for variance of random intercepts #usually requires thoughtful specification τ ~ truncated(Cauchy(0, 2), 0, Inf) # group-level SDs intercepts αⱼ ~ filldist(Normal(0, τ), n_gr) # CP group-level intercepts #likelihood ŷ = α .+ X * β .+ αⱼ[idx] y ~ MvNormal(ŷ, σ) end; To perform a Non-Centered Parametrization (NCP) in this model we do as following: @model varying_intercept_ncp(X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)) = begin #priors α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept β ~ filldist(Normal(0, 2), predictors) # population-level coefficients σ ~ Exponential(1 / std(y)) # residual SD #prior for variance of random intercepts #usually requires thoughtful specification τ ~ truncated(Cauchy(0, 2), 0, Inf) # group-level SDs intercepts zⱼ ~ filldist(Normal(0, 1), n_gr) # NCP group-level intercepts #likelihood ŷ = α .+ X * β .+ zⱼ[idx] .* τ y ~ MvNormal(ŷ, σ) end; Here we are using a NCP with the zⱼs following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼs by τ. Since the original αⱼs had a prior centered on 0 with standard deviation τ, we only have to use the multiplication by τ to get back the αⱼs. Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model: url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv" cheese = CSV.read(HTTP.get(url).body, DataFrame) for c in unique(cheese[:, :cheese]) cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
b == "rural" ? 1 :
b == "urban" ? 2 : missing
end

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];

model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×21×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 30.24 seconds
Compute duration  = 58.6 seconds
parameters        = α, β, β, β, β, σ, τ, αⱼ, αⱼ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

α    70.7190    5.2616     0.0588    0.1773    906.7976    1.0058       15.4749
β     3.2351    1.2543     0.0140    0.0226   3335.1223    0.9998       56.9153
β   -11.6323    1.2699     0.0142    0.0218   3482.0497    0.9999       59.4227
β     7.1701    1.2307     0.0138    0.0207   3306.7974    0.9997       56.4319
β     1.1979    1.2502     0.0140    0.0206   3513.0851    0.9998       59.9523
σ     5.9930    0.2745     0.0031    0.0037   5527.6283    1.0004       94.3313
τ     6.4146    6.6373     0.0742    0.1804   1478.5048    1.0042       25.2313
αⱼ    -3.4511    5.1655     0.0578    0.1722    952.7034    1.0062       16.2583
αⱼ     3.7110    5.1772     0.0579    0.1717    960.0203    1.0062       16.3832

Quantiles
parameters       2.5%      25.0%      50.0%      75.0%     97.5%
Symbol    Float64    Float64    Float64    Float64   Float64

α    60.2781    68.3590    70.7779    73.2704   80.9692
β     0.7285     2.3946     3.2539     4.0834    5.6589
β   -14.1365   -12.4906   -11.6232   -10.7748   -9.1705
β     4.7085     6.3426     7.1733     7.9974    9.6151
β    -1.3057     0.3672     1.2095     2.0532    3.6275
σ     5.4874     5.8044     5.9858     6.1718    6.5540
τ     1.9390     3.3318     4.7163     7.2482   20.9105
αⱼ   -13.4753    -5.8040    -3.4689    -1.2218    6.9789
αⱼ    -6.1605     1.3716     3.5932     5.8833   14.4133

Now let's do the NCP hierarchical model:

model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 2_000, 4)
Chains MCMC chain (2000×21×4 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 32.55 seconds
Compute duration  = 62.67 seconds
parameters        = α, β, β, β, β, σ, τ, zⱼ, zⱼ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

α    70.8868    3.6980     0.0413    0.1201    944.4853    1.0052       15.0713
β     3.1683    1.2649     0.0141    0.0277   1599.3589    1.0034       25.5211
β   -11.6644    1.2595     0.0141    0.0234   2311.3091    1.0028       36.8818
β     7.0896    1.2638     0.0141    0.0289   1824.2196    1.0026       29.1093
β     1.1516    1.2785     0.0143    0.0262   1210.2260    1.0041       19.3117
σ     6.0035    0.2758     0.0031    0.0049   2900.7301    1.0017       46.2873
τ     5.3971    3.0013     0.0336    0.1287    339.9252    1.0099        5.4242
zⱼ    -0.8373    0.7752     0.0087    0.0165   1754.7860    1.0000       28.0013
zⱼ     0.8637    0.7866     0.0088    0.0226   1167.2946    1.0059       18.6266

Quantiles
parameters       2.5%      25.0%      50.0%      75.0%     97.5%
Symbol    Float64    Float64    Float64    Float64   Float64

α    63.1206    68.8006    70.8514    73.0677   78.9521
β     0.7887     2.2889     3.1632     4.0150    5.6652
β   -14.0890   -12.5424   -11.6937   -10.7988   -9.1833
β     4.6131     6.2141     7.1102     7.9595    9.5283
β    -1.3032     0.2904     1.1481     2.0148    3.6623
σ     5.4889     5.8138     5.9893     6.1812    6.5681
τ     1.9159     3.2325     4.5453     6.6889   13.4013
zⱼ    -2.4647    -1.3441    -0.7969    -0.3060    0.6232
zⱼ    -0.5411     0.2982     0.8235     1.3713    2.5196

Notice that some models are better off with a standard Centered Parametrization (as is our cheese case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼs. We can do this by multiplying zⱼ[idx] .* τ:

τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data, dims=)
chain_ncp_reconstructed = hcat(MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[\$i]" for i in 1:length(unique(idx))]))
Chains MCMC chain (2000×23×4 Array{Float64, 3}):

Iterations        = 1:2000
Number of chains  = 4
Samples per chain = 2000
Wall duration     = 32.55 seconds
Compute duration  = 62.67 seconds
parameters        = α, β, β, β, β, σ, τ, zⱼ, zⱼ, αⱼ, αⱼ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

α    70.8868    3.6980     0.0413    0.1201    944.4853    1.0052       15.0713
β     3.1683    1.2649     0.0141    0.0277   1599.3589    1.0034       25.5211
β   -11.6644    1.2595     0.0141    0.0234   2311.3091    1.0028       36.8818
β     7.0896    1.2638     0.0141    0.0289   1824.2196    1.0026       29.1093
β     1.1516    1.2785     0.0143    0.0262   1210.2260    1.0041       19.3117
σ     6.0035    0.2758     0.0031    0.0049   2900.7301    1.0017       46.2873
τ     5.3971    3.0013     0.0336    0.1287    339.9252    1.0099        5.4242
zⱼ    -0.8373    0.7752     0.0087    0.0165   1754.7860    1.0000       28.0013
zⱼ     0.8637    0.7866     0.0088    0.0226   1167.2946    1.0059       18.6266
αⱼ    -4.5192    4.1838     0.0468    0.0889   1754.7860    1.0000       28.0013
αⱼ     4.6613    4.2455     0.0475    0.1220   1167.2946    1.0059       18.6266

Quantiles
parameters       2.5%      25.0%      50.0%      75.0%     97.5%
Symbol    Float64    Float64    Float64    Float64   Float64

α    63.1206    68.8006    70.8514    73.0677   78.9521
β     0.7887     2.2889     3.1632     4.0150    5.6652
β   -14.0890   -12.5424   -11.6937   -10.7988   -9.1833
β     4.6131     6.2141     7.1102     7.9595    9.5283
β    -1.3032     0.2904     1.1481     2.0148    3.6623
σ     5.4889     5.8138     5.9893     6.1812    6.5681
τ     1.9159     3.2325     4.5453     6.6889   13.4013
zⱼ    -2.4647    -1.3441    -0.7969    -0.3060    0.6232
zⱼ    -0.5411     0.2982     0.8235     1.3713    2.5196
αⱼ   -13.3021    -7.2541    -4.3009    -1.6514    3.3637
αⱼ    -2.9201     1.6096     4.4446     7.4010   13.5986