Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors and wrong results when calculating hessian #2072

Open
ernestds opened this issue Nov 7, 2024 · 0 comments
Open

Errors and wrong results when calculating hessian #2072

ernestds opened this issue Nov 7, 2024 · 0 comments

Comments

@ernestds
Copy link

ernestds commented Nov 7, 2024

EDIT:
Julia 1.11.1
Enzyme v0.13.13
StaticArrays v1.9.8
6.11.6-2-cachyos
AMD Ryzen 7 8845hs

Newbie in Julia and Enzyme here so forgive me if it's a skill issue.
First here's the gist with the problem documented: https://gist.github.com/ernestds/8dd881253ad7b49617da57aa22e61548

All I am trying to do is to calculate the hessian of some function.

I've identified three problems:
1)

Current scope: 
; Function Attrs: alwaysinline mustprogress willreturn

error when trying to run F1, which goes away by replacing the loop with a nested one (F2)
F1:

for i = 1:400
:(
...

F2 (nested loop):

    for i = 1:200
        for j=1:2
:)
...

The problem also doesn't happen when doing fewer iterations (~ < 40) (F1_less).
I've also tested the outer loop with more than 10000 iterations and it also worked; however, if I increase the inner loop above ~20 I get the same error.

  1. The F3 definition gives the wrong result. This one uses the function cos.(x) in a for loop. Note that this doesn't happen if I manually nest the function call (F4) or manually (F5)*.
    Additionally, I've found out that if I run F2 (correct result, not the same as the previous F2 of problem (1) because I messed up naming, I think it is clear in the code tho) after F3 (wrong) WITHOUT zeroing dx, dy and hess, F2 also gets the wrong result (numerically the same as F3). This doesn't happen if I zero the derivative arrays after F3 or if I run F2 two consecutive times (without zeroing the arrays).
    *F5 has its own problem.

  2. F5 (loop unrolled) doesn't compile if I include z[1] and z[2] in the output. (note that for F52, doesn't compile, y[1] = z[1]*z[1] + z[2]*z[2] and for F5, which does compile, y[1] = z[1]*z[1].
    Also F5 takes much longer to compile than anything else.

I'll also just throw the question here: Note the amount of allocations when using the cos.(x) function (F3 and F5) - and to a lesser extent F4. Is this expected, a bug or user error? Would the correct way to do this be like F1 SVector{N,T}(cos(zz) for zz in z) or something?

To verify the values I've used Casadi in Python:


x = ca.SX.sym("x",2,1)

def f(x):
    y = ca.cos(x)
    for i in range(20):
        y = ca.cos(y)
    return y[0]**2 + y[1]**2
   

f = ca.Function("f",[x],ca.cse(ca.hessian(f(x),x)),["x"],["y","y2"],{"jit":True,"jit_options":{"flags":["-O3", "-march=native"]}})
#time_function(f,s=3)
f(1)
#hess = [[0.000270065, 00], 
#  [00, 0.000270065]]

#grad = [-0.000366693, -0.000366693]

Just for fun, the above Casadi function (but with 400 iterations loop) benchmarked:
Mean: 21.562 us; standard deviation: 1.006 us; min: 21.219 us; max: 54.193 us
vs ~9.3us with Enzyme. Pretty cool.

For completeness here's the code again:

using Enzyme
using StaticArrays
using BenchmarkTools
function cos_vec(z::SVector{N,T}) where {N,T}
    return SVector{N,T}(cos(zz) for zz in z)
end
function cos_vec(z::NTuple{N,T}) where {N,T}
    return NTuple{N,T}(cos(zz) for zz in z)
end
function cos_vec(z::AbstractArray{T}) where {T}
    return T[cos(zz) for zz in z]
end

function f1!(x, y) 
    z1,z2 = cos_vec((x[1],x[2]))
    for i = 1:400
        z1,z2 = cos_vec((z1,z2))    
    end
    y[1] = z1*z1 + z2*z2
    return nothing
end
function f1_less!(x, y) 
    z1,z2 = cos_vec((x[1],x[2]))
    for i = 1:20
        z1,z2 = cos_vec((z1,z2))    
    end
    y[1] = z1*z1 + z2*z2
    return nothing
end
function f2!(x, y) 
    z1,z2 = cos_vec((x[1],x[2]))
    for i = 1:200
        for j=1:2
            z1,z2 = cos_vec((z1,z2))    
        end
    end
    y[1] = z1*z1 + z2*z2
    return nothing
end
function f2_less!(x, y) 
    z1,z2 = cos_vec((x[1],x[2]))
    for i = 1:10
        for j=1:2
            z1,z2 = cos_vec((z1,z2))    
        end
    end
    y[1] = z1*z1 + z2*z2
    return nothing
end

function grad!(x, dx, y, dy,f)
    Enzyme.autodiff(Reverse, f, Duplicated(x, dx), DuplicatedNoNeed(y, dy),)
    nothing
  end
x =  [1.0, 1.0]
y = Vector{Float64}(undef, 1)
dx = [0.0, 0.0]
dy = [1.0]
hess = ([0.0, 0.0], [0.0, 0.0])

F1:

Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f1!))
Current scope: 
; Function Attrs: alwaysinline mustprogress willreturn
define void @preprocess_diffejulia_f1__99290wrap({} addrspace(10)* %0, {} addrspace(10)* nocapture readonly %1, {} addrspace(10)* %2, {} addrspace(10)* nocapture readonly %3) local_unnamed_addr #16 !dbg !910 {
entry:
  %4 = call {}*** @julia.get_pgcstack() #18

F1 with less iterations: works with correct result

x =  [1.0, 1.0]
y = Vector{Float64}(undef, 1)
dx = [0.0, 0.0]
dy = [1.0]
hess = ([0.0, 0.0], [0.0, 0.0])

Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f1_less!))
@show hess;
@show dx;
@benchmark Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f1_less!))
hess = ([0.000270064862399231, 0.0], [0.0, 0.000270064862399231])
dx = [-0.00036669251092078, -0.00036669251092078]
BenchmarkTools.Trial: 10000 samples with 104 evaluations.
 Range (min … max):  786.481 ns …  1.147 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     802.385 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   810.563 ns ± 24.679 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂███▇▅▃▁   ▁ ▁▂▂▂▃▃▃▃▂▂▁                                 ▂
  ▃▆█▇████████▇▇██████████████▇▇▇▇▇▇▆▆▇▅▆▇▅▇▆▆▅▂▅▅▅▆▃▄▅▄▅▆▆▆▅▅ █
  786 ns        Histogram: log(frequency) by time       918 ns <

 Memory estimate: 96 bytes, allocs estimate: 4.
x =  [1.0, 1.0]
y = Vector{Float64}(undef, 1)
dx = [0.0, 0.0]
dy = [1.0]
hess = ([0.0, 0.0], [0.0, 0.0])

### F2
# works (result is also correct)
Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2!))
@show hess;
@show dx;
@benchmark Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2!))
@benchmark Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2_less!))
hess = ([1.687031797366739e-69, 0.0], [0.0, 1.687031797366739e-69])
dx = [-2.2911801638083815e-69, -2.2911801638083815e-69]

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  9.307 μs …  22.642 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.618 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.809 μs ± 980.584 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▃▆█▆▃▂▁                                                    ▁
  ████████▇▆▆▆▅▅▄▅▅▃▄▄▅▃▃▃▄▅▄▃▅▄▅▅▄▅▄▅▅▅▄▅▅▆▄▄▃▃▅▄▄▃▄▅▅▆▅▆▆▆▄ █
  9.31 μs      Histogram: log(frequency) by time      14.7 μs <

 Memory estimate: 96 bytes, allocs estimate: 4.

BenchmarkTools.Trial: 10000 samples with 101 evaluations.
 Range (min … max):  795.634 ns …  2.424 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     809.812 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   837.734 ns ± 86.157 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▇▃▂▃▄▃▂▁▁▂▁▁▁▁▂▁▁▁▁▁▁                                      ▁
  ████████████████████████▇▆▆▅▅▅▅▅▅▅▅▄▅▅▃▄▄▃▄▂▂▅▅▃▃▃▄▄▃▃▄▃▄▃▂▅ █
  796 ns        Histogram: log(frequency) by time      1.21 μs <

 Memory estimate: 96 bytes, allocs estimate: 4.

ground truth from Casadi:


x = ca.SX.sym("x",2,1)

def f(x):
    y = ca.cos(x)
    for i in range(20):
        y = ca.cos(y)
    return y[0]**2 + y[1]**2
   

f = ca.Function("f",[x],ca.cse(ca.hessian(f(x),x)),["x"],["y","y2"],{"jit":True,"jit_options":{"flags":["-O3", "-march=native"]}})
f(1)
#hess = [[0.000270065, 00], 
#  [00, 0.000270065]]

#grad = [-0.000366693, -0.000366693]

Problem 2:
F2a (different than the last F2 because I messed up the naming along the way, result is correct)

function f2!(x, y) 
    z1,z2 = cos_vec((x[1],x[2]))
    # z1,z2 = cos_vec(SVector{2,Float64}(x))
    for i = 1:20
        z1,z2 = cos_vec((z1,z2))    
    end
    # y[1] = z1*z1 + z2*z2
    y[1] = z1*z1 + z2*z2
    return nothing
end
dx = [0.0, 0.0]
dy = [1.0]
vx = ([1.0, 0.0], [0.0, 1.0])
hess = ([0.0, 0.0], [0.0, 0.0])


Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2!));
@show hess;
@show dx;
Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2!));
@show hess;
@show dx;

@benchmark Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2!))
hess = ([0.000270064862399231, 0.0], [0.0, 0.000270064862399231])
dx = [-0.00036669251092078, -0.00036669251092078]
hess = ([0.000270064862399231, 0.0], [0.0, 0.000270064862399231])
dx = [-0.00036669251092078, -0.00036669251092078]
BenchmarkTools.Trial: 10000 samples with 103 evaluations.
 Range (min … max):  781.369 ns …  2.937 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     802.864 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   831.833 ns ± 97.917 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▄█▆▂▃▄▅▄▄▃▂▂▂▁                                              ▁
  ▇█████████████████▇▇▇█▇▆▆▆▆▅▆▆▅▅▅▅▅▆▆▆▅▆▆▆▅▅▅▅▄▄▄▂▄▄▆▅▅▅▃▅▅▅ █
  781 ns        Histogram: log(frequency) by time      1.19 μs <

 Memory estimate: 96 bytes, allocs estimate: 4.

F3 (result is wrong)


function f3!(x, y) 
    z = cos.(x)
    for i = 1:20
        z = cos.(z)
    end
    y[1] = z[1]*z[1] + z[2]*z[2]
    return nothing
end

dx = [0.0, 0.0]
dy = [1.0]
vx = ([1.0, 0.0], [0.0, 1.0])
hess = ([0.0, 0.0], [0.0, 0.0])

Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f3!));
sleep(1)
@show hess;
@show dx;
@benchmark Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f3!))
hess = ([-0.0002354505535806518, 0.0], [0.0, -0.0002354505535806518])
dx = [-0.00036669251092078, -0.00036669251092078]

BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.249 μs …  2.134 ms  ┊ GC (min … max):  0.00% … 99.25%
 Time  (median):     5.194 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   5.926 μs ± 34.013 μs  ┊ GC (mean ± σ):  11.10% ±  1.98%

    ██                ▁                                       
  ▂▇██▆▄▃▃▂▂▂▂▃▄▄▄▃▄▅▇█▇▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂ ▃
  3.25 μs        Histogram: frequency by time        12.1 μs <

 Memory estimate: 9.94 KiB, allocs estimate: 256.

F2 after F3
interestingly the result of F2 also goes wrong (coinciding with F3) if I run it directly after F3 without zeroing dx, dy and hess:

dx = [0.0, 0.0]
dy = [1.0]
vx = ([1.0, 0.0], [0.0, 1.0])
hess = ([0.0, 0.0], [0.0, 0.0])

Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f3!));
println("f2 after f3")
Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2!));
@show hess;
@show dx;


dx = [0.0, 0.0]
dy = [1.0]
hess = ([0.0, 0.0], [0.0, 0.0])

println("reseting dx/hess/dy")
Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f2!));
@show hess;
@show dx;
f2 after f3
hess = ([-0.0002354505535806518, 0.0], [0.0, -0.0002354505535806518])
dx = [-0.00036669251092078, -0.00036669251092078]
reseting dx/hess/dy
hess = ([0.000270064862399231, 0.0], [0.0, 0.000270064862399231])
dx = [-0.00036669251092078, -0.00036669251092078]

Just some other tries:
F4 (works fine):

function f4!(x, y) 
    z = cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(cos.(x)))))))))))))))))))))
    y[1] = z[1]*z[1] + z[2]*z[2]
    return nothing
end
dx = [0.0, 0.0]
dy = [1.0]
vx = ([1.0, 0.0], [0.0, 1.0])
hess = ([0.0, 0.0], [0.0, 0.0])


Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f4!));
@show hess;
@show dx;
hess = ([0.00027006486239923106, 0.0], [0.0, 0.00027006486239923106])
dx = [-0.00036669251092077997, -0.00036669251092077997]
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.184 μs …   6.682 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.214 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.308 μs ± 238.067 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▆▅▄▂                 ▁▁▁▁▁                                ▁
  ███████▇▇▆▆▇▇▇▇▇▇▇▇███████████████▇█▇▆▆▆▆▇▇▇█▇▇▇▇█▇▇▇▆▇▆▅▅▃ █
  1.18 μs      Histogram: log(frequency) by time      2.07 μs <

 Memory estimate: 416 bytes, allocs estimate: 11.

F5 (note I changed the output to be only z[1]^2, works):

function f5!(x, y) 
    z = cos.(x)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    y[1] = z[1]*z[1]# + z[2]*z[2]
    return nothing
end
dx = [0.0, 0.0]
dy = [1.0]
vx = ([1.0, 0.0], [0.0, 1.0])
hess = ([0.0, 0.0], [0.0, 0.0])
Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f5!));
@show hess;
@show dx;
hess = ([0.000270064862399231, 0.0], [0.0, 0.0])
dx = [-0.00036669251092078, 0.0]
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.996 μs …  2.585 ms  ┊ GC (min … max):  0.00% … 99.46%
 Time  (median):     4.877 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   6.813 μs ± 38.121 μs  ┊ GC (mean ± σ):  10.71% ±  1.98%

   ▆    █▃                                                    
  ▄█▄▃▄███▅▂▂▂▂▁▁▁▁▁▁▂▂▁▂▂▂▂▃▃▃▃▃▃▃▂▂▃▄▅▅▅▅▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  4 μs           Histogram: frequency by time        10.3 μs <

 Memory estimate: 9.75 KiB, allocs estimate: 250.

F5_2 (I changed back the output to z[1]^2 + z[2]^2, compilation fails):

function f52!(x, y) 
    z = cos.(x)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    z = cos.(z)
    y[1] = z[1]*z[1] + z[2]*z[2]
    return nothing
end
dx = [0.0, 0.0]
dy = [1.0]
vx = ([1.0, 0.0], [0.0, 1.0])
hess = ([0.0, 0.0], [0.0, 0.0])
Enzyme.autodiff(Enzyme.Forward, grad!, Enzyme.BatchDuplicated(x, vx), Enzyme.BatchDuplicated(dx, hess), Const(y), Const(dy),Const(f52!));
@show hess;
@show dx;
Enzyme compilation failed.
Current scope: 
define void @julia_f52__102846({} addrspace(10)* nocapture noundef nonnull readonly align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@double, [-1,8]:Pointer, [-1,8,0]:Integer, [-1,8,1]:Integer, [-1,8,2]:Integer, [-1,8,3]:Integer, [-1,8,4]:Integer, [-1,8,5]:Integer, [-1,8,6]:Integer, [-1,8,7]:Integer, [-1,8,8]:Pointer, [-1,8,8,-1]:Float@double, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}" "enzymejl_parmtype"="127076369513104" "enzymejl_parmtype_ref"="2" %0, {} addrspace(10)* noundef nonnull align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@double, [-1,8]:Pointer, [-1,8,0]:Integer, [-1,8,1]:Integer, [-1,8,2]:Integer, [-1,8,3]:Integer, [-1,8,4]:Integer, [-1,8,5]:Integer, [-1,8,6]:Integer, [-1,8,7]:Integer, [-1,8,8]:Pointer, [-1,8,8,-1]:Float@double, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}" "enzymejl_parmtype"="127076369513104" "enzymejl_parmtype_ref"="2" %1) local_unnamed_addr #0 !dbg !10 {
top:
@ernestds ernestds changed the title Erros and wrong results when calculating hessian Errors and wrong results when calculating hessian Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant