Computational Tricks with Turing
(Non-Centered Parametrization
and QR Decomposition)
There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:
QR Decomposition
Non-Centered Parametrization
QR Decomposition
Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:
: an orthogonal matrix (its columns are orthogonal unit vectors meaning .
: an upper triangular matrix.
This is commonly known as the QR Decomposition:
Let me show you an example with a random matrix :
A = rand(3, 2)
3×2 Matrix{Float64}:
0.720103 0.295367
0.573619 0.276597
0.664468 0.983436
Now let's factor A
using LinearAlgebra
's qr()
function:
using LinearAlgebra: qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor: 3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}
R factor:
2×2 Matrix{Float64}:
-1.13539 -0.902615
0.0 0.562299
Notice that qr()
produced a tuple containing two matrices Q
and R
. Q
is a 3x3 orthogonal matrix. And R
is a 2x2 upper triangular matrix. So that (the transpose is equal the inverse):
Matrix(Q') ≈ Matrix(Q^-1)
MethodError: no method matching ^(::LinearAlgebra.AdjointQ{Float64, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}}, ::Int64)
Closest candidates are:
^(!Matched::Float16, ::Integer)
@ Base math.jl:1283
^(!Matched::Regex, ::Integer)
@ Base regex.jl:863
^(!Matched::Float32, ::Integer)
@ Base math.jl:1277
...
Also note that (identity matrix):
Q' * Q ≈ I(3)
true
This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of , i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq
example in 6. Bayesian Linear Regression:
using Turing
using LinearAlgebra: I
using Statistics: mean, std
using Random: seed!
seed!(123)
@model function linreg(X, y; predictors=size(X, 2))
#priors
α ~ Normal(mean(y), 2.5 * std(y))
β ~ filldist(TDist(3), predictors)
σ ~ Exponential(1)
#likelihood
return y ~ MvNormal(α .+ X * β, σ^2 * I)
end;
using DataFrames
using CSV
using HTTP
url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
kidiq = CSV.read(HTTP.get(url).body, DataFrame)
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 6.24 seconds
Compute duration = 23.56 seconds
parameters = α, β[1], β[2], β[3], σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 21.5126 8.6720 0.2217 1528.8886 1868.2612 1.0009 64.8962
β[1] 2.0014 1.7943 0.0419 2281.1940 1734.6512 1.0016 96.8290
β[2] 0.5788 0.0584 0.0013 2163.9754 2292.8814 1.0006 91.8534
β[3] 0.2566 0.3092 0.0074 1762.0214 2135.6795 1.0010 74.7919
σ 17.8859 0.6033 0.0106 3271.1669 2347.2435 1.0008 138.8500
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 4.7278 15.7633 21.2942 27.4322 38.4426
β[1] -0.5876 0.7324 1.6761 2.9919 6.3388
β[2] 0.4662 0.5392 0.5793 0.6184 0.6924
β[3] -0.3477 0.0440 0.2588 0.4733 0.8490
σ 16.7525 17.4685 17.8796 18.2703 19.1238
See the wall duration in Turing's chain
: for me it took around 24 seconds.
Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the and matrices by a factor of where is the number of rows of . In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:
Then we can recover the original with:
In Turing, a model with QR decomposition would be the same linreg
but with a different X
matrix supplied, since it is a data transformation. First, we decompose your model data X
into Q
and R
:
Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);
Then, we instantiate a model with Q
instead of X
and sample as you would:
model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):
Iterations = 1001:1:2000
Number of chains = 4
Samples per chain = 1000
Wall duration = 2.65 seconds
Compute duration = 7.82 seconds
parameters = α, β[1], β[2], β[3], σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 32.9057 7.8274 0.2470 1006.3265 1242.3293 1.0026 128.7521
β[1] -49.9922 7.0245 0.2210 1009.4898 1401.5358 1.0022 129.1568
β[2] 22.0858 3.5735 0.1101 1054.2580 1418.8568 1.0028 134.8846
β[3] 0.2869 0.8775 0.0238 1370.8734 1800.6496 1.0010 175.3932
σ 17.8703 0.5859 0.0113 2699.9100 2464.9195 1.0019 345.4337
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 17.6798 27.6922 32.7076 38.1740 47.9793
β[1] -63.6746 -54.6970 -50.1683 -45.2700 -36.2948
β[2] 15.1066 19.7019 22.1804 24.4485 29.1569
β[3] -1.3554 -0.2981 0.2697 0.8374 2.1505
σ 16.7293 17.4690 17.8683 18.2608 19.0084
See the wall duration in Turing's chain_qr
: for me it took around 5 seconds. Much faster than the regular linreg
. Now we have to reconstruct our s:
betas = mapslices(
x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β), :].value.data; dims=[2]
)
chain_beta = setrange(
Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:2_000
)
chain_qr_reconstructed = hcat(chain_beta, chain_qr)
Chains MCMC chain (1000×20×4 Array{Float64, 3}):
Iterations = 1001:1:2000
Number of chains = 4
Samples per chain = 1000
parameters = real_β[1], real_β[2], real_β[3], α, β[1], β[2], β[3], σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Missing
real_β[1] 6.2103 2.1904 0.0339 4190.5377 2075.4888 1.0015 missing
real_β[2] 0.5062 0.0616 0.0015 1657.4420 2135.3528 1.0021 missing
real_β[3] -0.0701 0.2143 0.0058 1370.8734 1800.6496 1.0010 missing
α 32.9057 7.8274 0.2470 1006.3265 1242.3293 1.0026 missing
β[1] -49.9922 7.0245 0.2210 1009.4898 1401.5358 1.0022 missing
β[2] 22.0858 3.5735 0.1101 1054.2580 1418.8568 1.0028 missing
β[3] 0.2869 0.8775 0.0238 1370.8734 1800.6496 1.0010 missing
σ 17.8703 0.5859 0.0113 2699.9100 2464.9195 1.0019 missing
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
real_β[1] 1.8485 4.7441 6.2339 7.7050 10.3766
real_β[2] 0.3815 0.4656 0.5071 0.5480 0.6252
real_β[3] -0.5252 -0.2045 -0.0659 0.0728 0.3310
α 17.6798 27.6922 32.7076 38.1740 47.9793
β[1] -63.6746 -54.6970 -50.1683 -45.2700 -36.2948
β[2] 15.1066 19.7019 22.1804 24.4485 29.1569
β[3] -1.3554 -0.2981 0.2697 0.8374 2.1505
σ 16.7293 17.4690 17.8683 18.2608 19.0084
Non-Centered Parametrization
Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size and the factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003):
using CairoMakie
using Distributions
funnel_y = rand(Normal(0, 3), 10_000)
funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2)
f, ax, s = scatter(
funnel_x,
funnel_y;
color=(:steelblue, 0.3),
axis=(; xlabel=L"X", ylabel=L"Y", limits=(-100, 100, nothing, nothing)),
)
Here we see that in upper part of the funnel HMC has to take few steps and be more liberal with the factor. But, the opposite is in the lower part of the funnel: way more steps and be more conservative with the factor.
To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample:
@model function funnel()
y ~ Normal(0, 3)
return x ~ Normal(0, exp(y / 2))
end
chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 5.91 seconds
Compute duration = 23.42 seconds
parameters = y, x
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
y 0.3304 2.5005 0.4266 34.1757 64.0933 1.0981 1.4589
x -0.7113 8.8403 0.6150 467.3926 188.6939 1.0944 19.9527
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
y -3.2368 -2.1271 0.0819 1.9243 5.8793
x -13.3185 -0.6603 -0.2764 0.4686 6.4055
Wow, take a look at those rhat
values... That sucks: all are above 1.01
even with 4 parallel chains with 1,000 iterations!
How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner:
where the standard normal is the normal with mean and standard deviation . This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before.
@model function ncp_funnel()
x̃ ~ Normal()
ỹ ~ Normal()
y = 3.0 * ỹ # implies y ~ Normal(0, 3)
return x = exp(y / 2) * x̃ # implies x ~ Normal(0, exp(y / 2))
end
chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 5.38 seconds
Compute duration = 21.32 seconds
parameters = x̃, ỹ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
x̃ -0.0006 1.0002 0.0161 3829.5014 2972.2855 0.9999 179.6454
ỹ -0.0444 1.0054 0.0160 3917.0636 2987.0567 0.9999 183.7530
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
x̃ -1.8916 -0.6865 -0.0231 0.6704 2.0065
ỹ -2.0393 -0.7107 -0.0524 0.6362 1.9541
Much better now: all rhat
are well below 1.01
(or below 0.99
).
How we would implement this a real-world model in Turing? Let's go back to the cheese
random-intercept model in 11. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP):
@model function varying_intercept(
X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
#priors
α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
β ~ filldist(Normal(0, 2), predictors) # population-level coefficients
σ ~ Exponential(std(y)) # residual SD
#prior for variance of random intercepts
#usually requires thoughtful specification
τ ~ truncated(Cauchy(0, 2); lower=0) # group-level SDs intercepts
αⱼ ~ filldist(Normal(0, τ), n_gr) # CP group-level intercepts
#likelihood
ŷ = α .+ X * β .+ αⱼ[idx]
return y ~ MvNormal(ŷ, σ^2 * I)
end;
To perform a Non-Centered Parametrization (NCP) in this model we do as following:
@model function varying_intercept_ncp(
X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
#priors
α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
β ~ filldist(Normal(0, 2), predictors) # population-level coefficients
σ ~ Exponential(std(y)) # residual SD
#prior for variance of random intercepts
#usually requires thoughtful specification
τ ~ truncated(Cauchy(0, 2); lower=0) # group-level SDs intercepts
zⱼ ~ filldist(Normal(0, 1), n_gr) # NCP group-level intercepts
#likelihood
ŷ = α .+ X * β .+ zⱼ[idx] .* τ
return y ~ MvNormal(ŷ, σ^2 * I)
end;
Here we are using a NCP with the zⱼ
s following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼ
s by τ
. Since the original αⱼ
s had a prior centered on 0 with standard deviation τ
, we only have to use the multiplication by τ
to get back the αⱼ
s.
Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model:
url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)
for c in unique(cheese[:, :cheese])
cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end
cheese[:, :background_int] = map(cheese[:, :background]) do b
if b == "rural"
1
elseif b == "urban"
2
else
missing
end
end
X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];
model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 7.72 seconds
Compute duration = 26.82 seconds
parameters = α, β[1], β[2], β[3], β[4], σ, τ, αⱼ[1], αⱼ[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 70.6388 5.4554 0.2162 915.6164 759.2940 1.0028 34.1457
β[1] 2.9639 1.3435 0.0265 2565.3867 2773.4856 1.0012 95.6698
β[2] -10.5915 1.3904 0.0277 2522.8976 2504.4545 1.0016 94.0853
β[3] 6.5485 1.3619 0.0264 2668.6532 2413.9865 0.9997 99.5209
β[4] 1.0905 1.3750 0.0285 2338.0486 2471.5094 1.0010 87.1918
σ 7.4087 0.4507 0.0088 2643.0429 2581.2624 1.0001 98.5658
τ 6.2641 5.7017 0.1952 1222.7006 1203.5556 1.0038 45.5976
αⱼ[1] -3.3228 5.3516 0.2167 865.3478 780.3551 1.0027 32.2710
αⱼ[2] 3.7386 5.3657 0.2173 894.2647 746.3891 1.0034 33.3494
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 59.8857 68.3215 70.7533 73.2141 81.3633
β[1] 0.3781 2.0754 2.9757 3.8536 5.5841
β[2] -13.2884 -11.5392 -10.6060 -9.6587 -7.8122
β[3] 3.9443 5.6304 6.5351 7.4435 9.2425
β[4] -1.6999 0.1515 1.1072 2.0425 3.7349
σ 6.5891 7.0826 7.3872 7.7026 8.3437
τ 1.8181 3.2859 4.6436 7.1806 20.4141
αⱼ[1] -14.0825 -5.6883 -3.3988 -1.1424 7.5331
αⱼ[2] -6.5136 1.3252 3.5179 5.9215 14.7231
Now let's do the NCP hierarchical model:
model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 11.84 seconds
Compute duration = 45.18 seconds
parameters = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 71.1250 3.5397 0.1218 878.2014 784.3499 1.0054 19.4365
β[1] 2.9093 1.2858 0.0297 1876.0233 2481.1601 1.0026 41.5206
β[2] -10.6841 1.3789 0.0608 571.3700 192.4652 1.0074 12.6457
β[3] 6.5318 1.3379 0.0326 1713.4983 1455.9135 1.0030 37.9235
β[4] 1.0746 1.3279 0.0316 1767.8948 2009.5112 1.0071 39.1274
σ 7.3765 0.4561 0.0158 777.6440 188.6146 1.0139 17.2110
τ 5.0157 2.6654 0.1354 604.8463 193.9385 1.0075 13.3866
zⱼ[1] -0.9156 0.7846 0.0225 1184.6917 1334.6682 1.0041 26.2199
zⱼ[2] 0.8326 0.8046 0.0223 1253.2053 947.4690 1.0023 27.7362
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 63.6608 68.9974 71.0956 73.1433 78.5094
β[1] 0.3584 2.0549 2.9066 3.7865 5.4293
β[2] -13.5656 -11.5643 -10.6594 -9.7752 -7.9711
β[3] 3.9994 5.5582 6.4857 7.4393 9.0957
β[4] -1.4433 0.2116 1.0321 1.9658 3.6415
σ 6.5332 7.0626 7.3592 7.6827 8.3100
τ 1.8047 3.1023 4.2758 6.3272 11.7515
zⱼ[1] -2.5756 -1.4160 -0.8862 -0.3486 0.5239
zⱼ[2] -0.6131 0.2661 0.7795 1.3822 2.4805
Notice that some models are better off with a standard Centered Parametrization (as is our cheese
case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼ
s. We can do this by multiplying zⱼ[idx] .* τ
:
τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(
x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data; dims=[2]
)
chain_ncp_reconstructed = hcat(
MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[$i]" for i in 1:length(unique(idx))])
)
Chains MCMC chain (1000×23×4 Array{Float64, 3}):
Iterations = 1:1000
Number of chains = 4
Samples per chain = 1000
Wall duration = 11.84 seconds
Compute duration = 45.18 seconds
parameters = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2], αⱼ[1], αⱼ[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 71.1250 3.5397 0.1218 878.2014 784.3499 1.0054 19.4365
β[1] 2.9093 1.2858 0.0297 1876.0233 2481.1601 1.0026 41.5206
β[2] -10.6841 1.3789 0.0608 571.3700 192.4652 1.0074 12.6457
β[3] 6.5318 1.3379 0.0326 1713.4983 1455.9135 1.0030 37.9235
β[4] 1.0746 1.3279 0.0316 1767.8948 2009.5112 1.0071 39.1274
σ 7.3765 0.4561 0.0158 777.6440 188.6146 1.0139 17.2110
τ 5.0157 2.6654 0.1354 604.8463 193.9385 1.0075 13.3866
zⱼ[1] -0.9156 0.7846 0.0225 1184.6917 1334.6682 1.0041 26.2199
zⱼ[2] 0.8326 0.8046 0.0223 1253.2053 947.4690 1.0023 27.7362
αⱼ[1] -4.5922 3.9351 0.1128 1184.6917 1334.6682 1.0041 26.2199
αⱼ[2] 4.1762 4.0358 0.1118 1253.2053 947.4690 1.0023 27.7362
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 63.6608 68.9974 71.0956 73.1433 78.5094
β[1] 0.3584 2.0549 2.9066 3.7865 5.4293
β[2] -13.5656 -11.5643 -10.6594 -9.7752 -7.9711
β[3] 3.9994 5.5582 6.4857 7.4393 9.0957
β[4] -1.4433 0.2116 1.0321 1.9658 3.6415
σ 6.5332 7.0626 7.3592 7.6827 8.3100
τ 1.8047 3.1023 4.2758 6.3272 11.7515
zⱼ[1] -2.5756 -1.4160 -0.8862 -0.3486 0.5239
zⱼ[2] -0.6131 0.2661 0.7795 1.3822 2.4805
αⱼ[1] -12.9183 -7.1023 -4.4447 -1.7483 2.6277
αⱼ[2] -3.0749 1.3348 3.9098 6.9328 12.4414
References
Neal, Radford M. (2003). Slice Sampling. The Annals of Statistics, 31(3), 705–741. Retrieved from https://www.jstor.org/stable/3448413