Process for inclusion of newer algorithms #156

ghost · 2021-08-29T12:51:44Z

ghost
Aug 29, 2021

Where I work, our data scientist is using Python for data exploration and in particular, she is using the scikit-learn library. For one specific task we started with DBSCAN but founded that it was using too much memory, so we made the switch to their OPTICS-derived algorithm.

Currently, we are also using Python for in production. Everything's working fine so far but we are expecting our dataset to increase substantially and, having heard many times that Python wasn't performant enough for production, I wanted to compare our current solution with another one I would have implemented with FSharp.Stats, however I discovered that OPTICS wasn't offered by the library.

I would like to suggest that OPTICS be implemented in the library, and more generally that more algorithms be offered for clustering however, when I look at Wikipedia's page on OPTICS, there are at least half a dozen of variations on the OPTICS algorithm (and the same can be said for DBSCAN), none of which appears to be the same as the one used in scikit-learn... I'm just a simple user of ML, and I don't really have the qualification to decide which variation(s), if any, should be implemented in FSharp.Stats.

So I was wondering if there should not be some kind of process for evaluating the inclusion of newer algorithms in general FSharp.Stats, we could then record in a document their status (e.g. rejected, PR accepted...), if rejected we could provide an explanation. In some way, the document could serve as a part of the project roadmap.

Another method could also be, since Python is the de-facto tool for data science and ML, to at least offer the same algorithms as what is offered by their most popular libraries either to make F#/FSharp.Stats an easy pick for turning Python exploration code into production code.

What are your thoughts on the matter?

muehlhaus · 2021-08-30T10:05:38Z

muehlhaus
Aug 30, 2021
Maintainer

This is a great idea. We would love to have a version of OPTICS available in FSharp.Stats. We are happy for any contribution, and you can feel free to contribute according to your needs. I personally like the idea to take scikit-learn library as a guide.

0 replies

bvenn · 2021-09-22T12:39:24Z

bvenn
Sep 22, 2021
Maintainer

OPTICS would be a great extension of our clustering collection and serves as alternative when DBSCAN efficiency is a limiting factor. I've opened an issue to list it as feature to be implemented in the future and monitor its implementation status.

#158

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process for inclusion of newer algorithms #156

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Process for inclusion of newer algorithms #156

ghost Aug 29, 2021

Replies: 2 comments

muehlhaus Aug 30, 2021 Maintainer

bvenn Sep 22, 2021 Maintainer

ghost
Aug 29, 2021

muehlhaus
Aug 30, 2021
Maintainer

bvenn
Sep 22, 2021
Maintainer