-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I1083 l bfgs optimiser #1149
base: main
Are you sure you want to change the base?
I1083 l bfgs optimiser #1149
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1149 +/- ##
===========================================
- Coverage 100.00% 96.86% -3.14%
===========================================
Files 71 84 +13
Lines 7543 9050 +1507
===========================================
+ Hits 7543 8766 +1223
- Misses 0 284 +284
Continue to review full report at Codecov.
|
I've made a rookie mistake and used pythons 3 feature of specifying what type a variable should be in a function, Python 2 doesn't support it I'll fix before I next commit. And Docs are failing as I haven't included the added files and functions in the rst files yet (not sure which implementation we will eventually keep). |
Thanks @alisterde ! Looks like a huge amount of work has already gone into this. One thing that struck me when you described the hard to find bug: Could this be an issue of scale or tuning? Many methods don't seem to work at all unless you tune them right, e.g. by picking the correct step sizes. If you were testing on the logistic problem you may also have noticed that it's got massive differences in the parameter magnitudes, which can throw some methods; so best to start on e.g. an almost symmetrical parabola in 2d param space. |
Wondering if we should split the line search parts off into a separate PR? |
@@ -279,6 +279,29 @@ def set_hyper_parameters(self, x): | |||
self.set_population_size(x[0]) | |||
|
|||
|
|||
class LineSearchBasedOptimiser(Optimiser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure this is the best way to do it yet, but let's leave all that till after we have a working prototype!
This looks like it could be useful http://www.caam.rice.edu/~yzhang/caam554/pdf/cgsurvey.pdf |
Thanks @MichaelClerx and @ben18785, |
Hi @alisterde ! I'd like to pick this up again at some point, as I think it's a really good basis for further optimisation work :-) |
Hi Michael,
I think there might be some uncommitted changes, I'll check and get back to you.
I am/was intending to pick this up again as well!
Best,
Alister
…________________________________
From: Michael Clerx <[email protected]>
Sent: 03 November 2020 09:35
To: pints-team/pints <[email protected]>
Cc: Alister Dale-Evans <[email protected]>; Mention <[email protected]>
Subject: Re: [pints-team/pints] I1083 l bfgs optimiser (#1149)
Hi @alisterde<https://github.com/alisterde> ! I'd like to pick this up again at some point, as I think it's a really good basis for further optimisation work :-)
Is this PR up to date with the latest stuff you did, or are there still some uncommitted things you've worked on?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1149 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANR4JGTWBXHC5RJQZYBSFGTSN7FEPANCNFSM4NI7HSOA>.
|
…m/pints into i1083_l-bfgs_optimiser
Hi @MichaelClerx, I've taken another look at this and done the following:
The line search algorithm needs improvement to better fit into the ask/tell framework. I have not added tests yet. Tests are all passing in my per-commit checks but appear to be failing on committing due to an import problem for the 'curve_fit' functions in Python 2.7, I haven't been able to work put how to fix this. Let me know how you want to proceed, |
Hi @ben18785 and @MichaelClerx ,
I’ve spent the last 2 and a bit weeks looking at the BFGS and L-BFGS/ LM-BFGS algorithms for issue #1083 . The L-BFGS part is implement - if you you use a very large value of 'm' (the number of correction vectors for the inverse hessian stored) it's the same as the BFGS. I would appreciate it if you could check if I've got the Hessian update correct. There are two implementations one using scipy line search algorithms (BFGS_scipy) and another using the Hager-Zhang algorithm.
The optimiser isn't working yet, and I'm fairly sure the problem is with the line search, I’m having a lot of trouble with the line search component, as the scipy line search fails to find an acceptable step size form the very first step for the toy logistic model and the Hager-Zhang doesn't find an acceptable size either. Though as both of them have this issue it maybe the initialisation of the first inverse hessian and newton direction.
I’ve tried writing up the Hanger -Zang algorithm slightly adjusted for ask tell, I can’t quite see where I’m going wrong. I’ve checked it against an implementation in tensor flow (https://github.com/tensorflow/probability/blob/v0.10.0/tensorflow_probability/python/optimizer/linesearch/hager_zhang.py) and they seem to be preforming the same function, at the same time, in the same order as me. Stan and scipy don’t use this algorithm, they use an older method called the More-Thuente line search. The Hanger-Zhang algorithm has been shown to preform better.
I’ve also tried using the line search algorithms in scipy, but I'm having issues. Scipy doesn’t use them for their lm-bfgs - they wrap FORTRAN code. Also very few people actually seem to have implemented the BFGS alogrithms most wrap FORTRAN libraries.
Anyway, I’d appreciate it if you could check the BFGS part of what I’ve written (it’s in the tell() function of either implementation, though BFGS_scipy is clearer, and updates the hessian). Also please let me know if you have any ideas about line search.
When this is done I was thinking the majority of code would go in the abstract class
LineSearchBasedOptimiser
, as the only difference between BFGS, DFP, and conjugate gradient optimisation is the approximation of the inverse hessian.I’m not going to be working on this for the next few weeks as I need to write up my project report (I have been working too much on pints have neglected it).
Best,
Alister