Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DimensionMismatch error - Symbolic Regression failing for sparse hyperparameters #514

Open
brian-hartley opened this issue Jul 9, 2024 · 0 comments · May be fixed by #521
Open

DimensionMismatch error - Symbolic Regression failing for sparse hyperparameters #514

brian-hartley opened this issue Jul 9, 2024 · 0 comments · May be fixed by #521
Labels
bug Something isn't working

Comments

@brian-hartley
Copy link

brian-hartley commented Jul 9, 2024

Hello,

Posted this on the forums and was told to replicate here (CR told me it could be due to convex optimization needing to be implemented, not sure):

Question about DataDrivenDiffEq / UDE / Symbolic Regression failing for very sparse settings]

Hello,

I’m playing around with symbolic regression and UDEs for a larger problem and have run into some difficulties. It seems that for various methods, the symbolic regression simply fails with zeros method errors and different array dimension errors (which are rather inscrutable to me). This seems to happen when the sparsity-related parameters are set too “high” – but this seems to defeat the purpose of the exercise, as the only solutions I can successfully obtain are really big expressions.

I post a simple example here where it fails with random matrices but could post my actual MWE / application which is training a UDE on an ODE that is scalable by dimension to check the recovery performance of interaction terms for various system sizes. I guess I naively thought that you can simply sparsify the regression however much you wanted, but unless I am making a silly mistake, this doesn’t seem to be the case.

Any guidance or resources to check on hyperparameter tuning would be helpful here.

Thanks!

using DataDrivenDiffEq, DataDrivenSparse, ModelingToolkit

n = 10 
T = 15
@variables u[1:n]
b = polynomial_basis(u, 2)
basis = Basis(b, u);

# random X = rand(n,T)

# simple relation to Y =.+ 0.15 .* rand(n,T)


nn_problem = DirectDataDrivenProblem(X̂, Ŷ)

#λ_sparse = exp10.(-1:1:3)
#opt = STLSQ(λ_sparse)
opt = SR3(1e-1, 100.0) # <- doesn't work 
#opt = SR3(1e-3,100.0)  # <- works 

options = DataDrivenCommonOptions(maxiters = 10_000,
                                  normalize = DataNormalization(ZScoreTransform),
                                  selector = bic, digits = 2,
                                  data_processing = DataProcessing(split = 0.9,
                                                                   batchsize = 10,
                                                                   shuffle = true,
                                                                   rng = StableRNG(1111)))

nn_res = solve(nn_problem, basis, opt, options = options)
nn_eqs = get_basis(nn_res)
equations(nn_eqs)

Error & Stacktrace ⚠️

ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 10 and 0
Stacktrace:
  [1] _bcs1
    @ .\broadcast.jl:555 [inlined]
  [2] _bcs
    @ .\broadcast.jl:549 [inlined]
  [3] broadcast_shape
    @ .\broadcast.jl:543 [inlined]
  [4] combine_axes
    @ .\broadcast.jl:524 [inlined]
  [5] instantiate
    @ .\broadcast.jl:306 [inlined]
  [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{…}, Nothing, typeof(-), Tuple{…}})
    @ Base.Broadcast .\broadcast.jl:903
  [7] DataDrivenSolution(b::Basis{…}, p::DataDrivenProblem{…}, alg::SR3{…}, result::Vector{…}, internal_problem::DataDrivenDiffEq.InternalDataDrivenProblem{…}, retcode::DDReturnCode)
    @ DataDrivenDiffEq C:\Users\brian\.julia\packages\DataDrivenDiffEq\ZMUkZ\src\solution.jl:37
  [8] solve!(ps::DataDrivenDiffEq.InternalDataDrivenProblem{…})
    @ DataDrivenSparse C:\Users\brian\.julia\packages\DataDrivenSparse\0c4Fb\src\commonsolve.jl:21
  [9] solve(::DataDrivenProblem{…}, ::Vararg{…}; kwargs::@Kwargs{})
    @ CommonSolve C:\Users\brian\.julia\packages\CommonSolve\JfpfI\src\CommonSolve.jl:23
 [10] top-level scope
    @ c:\Users\brian\OneDrive\Documents\Julia\UDE\test_folder\DataDrivenDiffEq_issue.jl:33
Some type information was truncated. Use `show(err)` to see complete types.

Environment:

Status `C:\Users\brian\OneDrive\Documents\Julia\UDE\Project.toml`
  [2445eb08] DataDrivenDiffEq v1.4.1
  [5b588203] DataDrivenSparse v0.1.2
⌃ [961ee093] ModelingToolkit v9.15.0
Info Packages marked with ⌃ have new versions available and may be upgradable.
  • Output of versioninfo()
Julia Version 1.10.4
Commit 48d4fd4843 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 4 default, 0 interactive, 2 GC (on 16 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4
@brian-hartley brian-hartley added the bug Something isn't working label Jul 9, 2024
girochat added a commit to girochat/DataDrivenDiffEq.jl that referenced this issue Sep 12, 2024
Proposed solution for bug in solution.jl (Fixes SciML#514). The solution handles the case where the basis of the problem is empty which would throw the DimensionMismatch error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant