Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I1083 l bfgs optimiser #1149

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

I1083 l bfgs optimiser #1149

wants to merge 17 commits into from

Conversation

alisterde
Copy link
Member

@alisterde alisterde commented May 24, 2020

Hi @ben18785 and @MichaelClerx ,

I’ve spent the last 2 and a bit weeks looking at the BFGS and L-BFGS/ LM-BFGS algorithms for issue #1083 . The L-BFGS part is implement - if you you use a very large value of 'm' (the number of correction vectors for the inverse hessian stored) it's the same as the BFGS. I would appreciate it if you could check if I've got the Hessian update correct. There are two implementations one using scipy line search algorithms (BFGS_scipy) and another using the Hager-Zhang algorithm.

The optimiser isn't working yet, and I'm fairly sure the problem is with the line search, I’m having a lot of trouble with the line search component, as the scipy line search fails to find an acceptable step size form the very first step for the toy logistic model and the Hager-Zhang doesn't find an acceptable size either. Though as both of them have this issue it maybe the initialisation of the first inverse hessian and newton direction.

I’ve tried writing up the Hanger -Zang algorithm slightly adjusted for ask tell, I can’t quite see where I’m going wrong. I’ve checked it against an implementation in tensor flow (https://github.com/tensorflow/probability/blob/v0.10.0/tensorflow_probability/python/optimizer/linesearch/hager_zhang.py) and they seem to be preforming the same function, at the same time, in the same order as me. Stan and scipy don’t use this algorithm, they use an older method called the More-Thuente line search. The Hanger-Zhang algorithm has been shown to preform better.

I’ve also tried using the line search algorithms in scipy, but I'm having issues. Scipy doesn’t use them for their lm-bfgs - they wrap FORTRAN code. Also very few people actually seem to have implemented the BFGS alogrithms most wrap FORTRAN libraries.

Anyway, I’d appreciate it if you could check the BFGS part of what I’ve written (it’s in the tell() function of either implementation, though BFGS_scipy is clearer, and updates the hessian). Also please let me know if you have any ideas about line search.

When this is done I was thinking the majority of code would go in the abstract class LineSearchBasedOptimiser, as the only difference between BFGS, DFP, and conjugate gradient optimisation is the approximation of the inverse hessian.

I’m not going to be working on this for the next few weeks as I need to write up my project report (I have been working too much on pints have neglected it).

Best,
Alister

@codecov
Copy link

codecov bot commented May 24, 2020

Codecov Report

Merging #1149 (4369b39) into master (a06ab88) will decrease coverage by 3.13%.
The diff coverage is 11.28%.

Impacted file tree graph

@@             Coverage Diff             @@
##            master    #1149      +/-   ##
===========================================
- Coverage   100.00%   96.86%   -3.14%     
===========================================
  Files           71       84      +13     
  Lines         7543     9050    +1507     
===========================================
+ Hits          7543     8766    +1223     
- Misses           0      284     +284     
Impacted Files Coverage Δ
pints/_optimisers/__init__.py 60.12% <8.79%> (-39.88%) ⬇️
pints/_optimisers/_lbfgs.py 24.44% <24.44%> (ø)
pints/__init__.py 100.00% <100.00%> (ø)
pints/_util.py 100.00% <0.00%> (ø)
pints/noise.py 100.00% <0.00%> (ø)
pints/_logger.py 100.00% <0.00%> (ø)
pints/_log_pdfs.py 100.00% <0.00%> (ø)
pints/_log_priors.py 100.00% <0.00%> (ø)
pints/_diagnostics.py 100.00% <0.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a06ab88...4369b39. Read the comment docs.

@alisterde
Copy link
Member Author

I've made a rookie mistake and used pythons 3 feature of specifying what type a variable should be in a function, Python 2 doesn't support it I'll fix before I next commit. And Docs are failing as I haven't included the added files and functions in the rst files yet (not sure which implementation we will eventually keep).

@MichaelClerx
Copy link
Member

Thanks @alisterde ! Looks like a huge amount of work has already gone into this.
I'll try to carve out some time to look through this in detail, within the next few weeks. So it's quite ideal that you'll be busy with something else, from my point of view 😅

One thing that struck me when you described the hard to find bug: Could this be an issue of scale or tuning? Many methods don't seem to work at all unless you tune them right, e.g. by picking the correct step sizes. If you were testing on the logistic problem you may also have noticed that it's got massive differences in the parameter magnitudes, which can throw some methods; so best to start on e.g. an almost symmetrical parabola in 2d param space.

@MichaelClerx
Copy link
Member

Wondering if we should split the line search parts off into a separate PR?

@@ -279,6 +279,29 @@ def set_hyper_parameters(self, x):
self.set_population_size(x[0])


class LineSearchBasedOptimiser(Optimiser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this is the best way to do it yet, but let's leave all that till after we have a working prototype!

@MichaelClerx
Copy link
Member

This looks like it could be useful http://www.caam.rice.edu/~yzhang/caam554/pdf/cgsurvey.pdf

@alisterde
Copy link
Member Author

Thanks @MichaelClerx and @ben18785,
It could definitely be an issue of scale or tuning I hadn't thought of that I will start by trying this!

@MichaelClerx
Copy link
Member

Hi @alisterde ! I'd like to pick this up again at some point, as I think it's a really good basis for further optimisation work :-)
Is this PR up to date with the latest stuff you did, or are there still some uncommitted things you've worked on?

@alisterde
Copy link
Member Author

alisterde commented Nov 3, 2020 via email

@alisterde
Copy link
Member Author

Hi @MichaelClerx,

I've taken another look at this and done the following:

  • Split the line search and LBFGS optimiser so that the line search can be used with other gradient based optimisers.
  • Addressed the testing areas cased by using some Python 3 only features
  • Added a basic example notebook
  • Added a stopping criteria so that any line search based algorithm will terminate when the gradients at the best fit point are equal to zero (<1e-6).

The line search algorithm needs improvement to better fit into the ask/tell framework.
I've left a print statement in that tells the user when the Hessian has been updated otherwise it looks very like the algorithm is stuck when it isn't, I would like to know you're thoughts on this.
There's a large commented out function at the bottom of the line search class called __secant2 this function was moved to be within ask and pass to tell at key points however I've left it for you to see as I think it's easier to understand written in this function, but I don't mind getting rid of it.

I have not added tests yet.
The implementation seems quite slow.
It will probable benefit from the transformation features in the most resent pints.

Tests are all passing in my per-commit checks but appear to be failing on committing due to an import problem for the 'curve_fit' functions in Python 2.7, I haven't been able to work put how to fix this.

Let me know how you want to proceed,
Alister

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants