-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BoundsError in Turing test suite on Julia v1.11.1 #315
Comments
There is something going on with the underlying
Figuring out how this happened should be possible using debug mode, but I might need to update it with some checks for |
Now managing to reproduce locally by checking out the latest commit (6f5b273dfa1d21548758d986e5710c3e72524195) from TuringLang/Turing.jl#2328 and running module MWE
using Distributions: Normal, sample
import Random
using StableRNGs: StableRNG
import Mooncake
using Turing
@model function gdemo_d()
s ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s))
1.5 ~ Normal(m, sqrt(s))
2.0 ~ Normal(m, sqrt(s))
return s, m
end
gdemo_default = gdemo_d()
adbackend = Turing.AutoMooncake(; config=nothing)
rng = StableRNG(123)
Random.seed!(12345) # particle samplers do not support user-provided `rng` yet
alg3 = Gibbs(; s=PG(20), m=HMCDA(500, 0.8, 0.25; init_ϵ=0.05, adtype=adbackend))
res3 = sample(rng, gdemo_default, alg3, 3000, discard_initial=1000)
end with Julia v1.11.1 |
This will make debugging vastly easier. Thank you! |
I don't seem to be able to replicate. My Project status is: (jl_7S59v2) pkg> st
Status `/private/var/folders/8p/znj24dc50hq6dbkbskbfg8680000gs/T/jl_7S59v2/Project.toml`
[31c24e10] Distributions v0.25.112
[da2b9cff] Mooncake v0.4.24
[860ef19b] StableRNGs v1.0.2
[fce5fe82] Turing v0.36.0 `../../../../../../../Users/wtebbutt/.julia/dev/Turing`
[9a3f8284] Random v1.11.0 Could you share yours so that we can find the discrepancy? |
I suspect this is related to Libtask’s mechanism for deep copying arrays. |
Could you elaborate a bit on this @yebai / could you point me towards the bit of Libtask that you think is a likely candidate for causing these problems? |
Mine is a bit messier, it's the one I run Turing tests in: (jl_qMot4L) pkg> st
Status `/private/var/folders/wk/zmsrlr9s2cgdpdnqj5d522sw0000gr/T/jl_qMot4L/Project.toml`
[80f14c24] AbstractMCMC v5.5.0
[5b7e9947] AdvancedMH v0.8.3
[576499cb] AdvancedPS v0.6.0
⌃ [b5ca4192] AdvancedVI v0.2.8
[4c88cf16] Aqua v0.8.9
[aaaa29a8] Clustering v0.15.7
[31c24e10] Distributions v0.25.112
[ced4e74d] DistributionsAD v0.6.57
[bbc10e6e] DynamicHMC v3.4.7
[366bfd00] DynamicPPL v0.30.1
[26cc04aa] FiniteDifferences v0.12.32
[f6369f11] ForwardDiff v0.10.36
[09f84164] HypothesisTests v0.11.3
[6fdf6af0] LogDensityProblems v2.1.2
[996a588d] LogDensityProblemsAD v1.11.0
[c7f686f2] MCMCChains v6.0.6
[da2b9cff] Mooncake v0.4.24
[86f7a689] NamedArrays v0.10.3
[429524aa] Optim v1.9.4
[7f7a1694] Optimization v4.0.3
[3e6eede4] OptimizationBBO v0.4.0
[4e6fcdb7] OptimizationNLopt v0.3.1
[36348300] OptimizationOptimJL v0.4.1
[90014a1f] PDMats v0.11.31
[37e2e3b7] ReverseDiff v1.15.3
[276daf66] SpecialFunctions v2.4.0
[860ef19b] StableRNGs v1.0.2
[2913bbd2] StatsBase v0.34.3
[4c63d2b9] StatsFuns v1.3.2
[a759f4b9] TimerOutputs v0.5.25
[fce5fe82] Turing v0.36.0 `~/projects/Turing.jl`
[e88e6eb3] Zygote v0.6.72
[37e2e46d] LinearAlgebra v1.11.0
[44cfe95a] Pkg v1.11.0
[9a3f8284] Random v1.11.0
[8dfed614] Test v1.11.0 That all seems in line with yours though. Any help from this? julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M1 Pro
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores) I'm running
Some of the flags seem to be signifcant for this, not sure which.
|
I'm referring to the following code. I don't have a lot of evidence here, so the bug could be from other places (instead of Libtask). |
Here's a partial minimisation, need to attend to other things now, can continue tomorrow: module MWE
using Distributions: Normal
using StableRNGs: StableRNG
import Mooncake
import LogDensityProblemsAD
import LogDensityProblems
import Turing
import DynamicPPL
function main()
function gdemo_d(__model__, __varinfo__, __context__)
_, __varinfo__ = DynamicPPL.tilde_assume!!(
__context__, Normal(0, 1), DynamicPPL.@varname(m), __varinfo__
)
return nothing, __varinfo__
end
model = DynamicPPL.Model(gdemo_d, (;))
adbackend = Turing.AutoMooncake(; config=nothing)
rng = StableRNG(123)
alg_local = Turing.HMCDA(500, 0.8, 0.25; init_ϵ=0.05, adtype=adbackend)
sampler_local = DynamicPPL.Sampler(alg_local)
vi_local = DynamicPPL.default_varinfo(rng, model, sampler_local)
vi_local = DynamicPPL.link(vi_local, sampler_local, model)
theta = vi_local[sampler_local]
ℓ = LogDensityProblemsAD.ADgradient(
Turing.LogDensityFunction(
vi_local,
model,
DynamicPPL.SamplingContext(rng, sampler_local, DynamicPPL.DefaultContext()),
),
)
f(x) = LogDensityProblems.logdensity_and_gradient(ℓ, x)
DynamicPPL.evaluate!!(
model,
vi_local,
DynamicPPL.SamplingContext(rng, sampler_local),
)
f(theta)
return nothing
end
main()
end |
Thanks @mhauru -- this is helpful! |
@mhauru could you confirm whether you're generating this error in Mooncake 0.4.32? Sadly, I'm struggling to reliably reproduce your example. It appears to be true that if I fix some specific bits of Mooncake, the problem goes away, but I'm also finding that if I switch between version 0.4.24 and 0.4.32 (neither of which contain said fix), the problem also disappears. I basically just want to confirm that you're also seeing this somewhat mysterious behaviour. |
Still seeing this on Mooncake v0.4.32. Or more precisely on (mooncake_mwes) pkg> st
Status `~/projects/mooncake_mwes/Project.toml`
[80f14c24] AbstractMCMC v5.6.0
[7a57a42e] AbstractPPL v0.9.0
[7d9f7c33] Accessors v0.1.38
[0bf59076] AdvancedHMC v0.6.3
[31c24e10] Distributions v0.25.112
[366bfd00] DynamicPPL v0.30.2
[6fdf6af0] LogDensityProblems v2.1.2
[996a588d] LogDensityProblemsAD v1.12.0
[da2b9cff] Mooncake v0.4.32
[efcf1570] Setfield v1.1.1
[ce78b400] SimpleUnPack v1.1.0
[860ef19b] StableRNGs v1.0.2
[fce5fe82] Turing v0.36.0 `https://github.com/TuringLang/Turing.jl.git#6f5b273` Let me also get to the bottom of which Julia cmd line arguments are necessary. |
It's the |
Thanks for this -- I hadn't considered that the check-bounds flag might be necessary. |
Okay, so I think I now understand what's going on here -- the order in which you execute stuff above really matters quite a bit. If you instead run: module MWE
using Distributions: Normal
using StableRNGs: StableRNG
import Mooncake
import LogDensityProblemsAD
import LogDensityProblems
import Turing
import DynamicPPL
function main()
function gdemo_d(__model__, __varinfo__, __context__)
_, __varinfo__ = DynamicPPL.tilde_assume!!(
__context__, Normal(0, 1), DynamicPPL.@varname(m), __varinfo__
)
return nothing, __varinfo__
end
model = DynamicPPL.Model(gdemo_d, (;))
adbackend = Turing.AutoMooncake(; config=nothing)
rng = StableRNG(123)
alg_local = Turing.HMCDA(500, 0.8, 0.25; init_ϵ=0.05, adtype=adbackend)
sampler_local = DynamicPPL.Sampler(alg_local)
vi_local = DynamicPPL.default_varinfo(rng, model, sampler_local)
vi_local = DynamicPPL.link(vi_local, sampler_local, model)
theta = vi_local[sampler_local]
# Happens before ADgradient is constructed.
DynamicPPL.evaluate!!(
model,
vi_local,
DynamicPPL.SamplingContext(rng, sampler_local),
)
# Happens after evaluate!! is called.
ℓ = LogDensityProblemsAD.ADgradient(
Turing.LogDensityFunction(
vi_local,
model,
DynamicPPL.SamplingContext(rng, sampler_local, DynamicPPL.DefaultContext()),
),
)
f(x) = LogDensityProblems.logdensity_and_gradient(ℓ, x)
f(theta)
return nothing
end
main()
end you'll find that it is error free. ProblemWhat's going on in the original example is:
So the problem is that:
When we change the ordering, the pre-allocation operations happen after the call to SolutionThere are a couple of possible solutions:
Any thoughts on this @mhauru ? |
Just this morning I was hunting down a case in the same PR where this originated from, a case in which ForwardDiff was giving strange zero-division errors. The problem was that I was inadvertently first creating a So the original PR were this came up is now error free as of a few hours ago. What I still don't fully understand is whether the whole thing was just a very confusing outcome of a mistake I made in the Gibbs sampler, and there's nothing to fix in Mooncake (in which case I'm very sorry to have wasted huge amounts of your time), or if there's some deeper lesson here about caches, or a more helpful error message that could be raised, or some other useful outcome. |
Aha! This is excellent news. No problem re my time -- none of it has been wasted, it's very helpful to know about this problem. The conclusion as I understand it is that there doesn't appear to be anything wrong in the internals of Mooncake.jl, but the way that we're currently interfacing with ADgradient.jl is potentially somewhat unsafe. Moving forwards, now that the ADTypes.jl stuff has been merged in to LogDensityProblemsAD.jl I'm keen to remove the LogDensityProblemsAD.jl extension from Mooncake.jl entirely. I propose the following sequence of actions:
Sounds reasonable? |
Sure, happy to follow your lead on that. I was thinking for the ForwardDiff case as well whether something better could be done in |
Good question. I'd be happy to help with this -- could you point me to some code? |
With ForwardDiff, what happened was that this line set the |
A Turing CI run fails with the following:
The test passes for me locally, also on v1.11.1. To save time, I skipped running the preceding tests though, so if this is random seed dependent then that would explain why it worked for me locally.
The text was updated successfully, but these errors were encountered: