Replies: 2 comments
-
This is a great idea. We would love to have a version of OPTICS available in FSharp.Stats. We are happy for any contribution, and you can feel free to contribute according to your needs. I personally like the idea to take scikit-learn library as a guide. |
Beta Was this translation helpful? Give feedback.
0 replies
-
OPTICS would be a great extension of our clustering collection and serves as alternative when DBSCAN efficiency is a limiting factor. I've opened an issue to list it as feature to be implemented in the future and monitor its implementation status. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Where I work, our data scientist is using Python for data exploration and in particular, she is using the scikit-learn library. For one specific task we started with DBSCAN but founded that it was using too much memory, so we made the switch to their OPTICS-derived algorithm.
Currently, we are also using Python for in production. Everything's working fine so far but we are expecting our dataset to increase substantially and, having heard many times that Python wasn't performant enough for production, I wanted to compare our current solution with another one I would have implemented with
FSharp.Stats
, however I discovered that OPTICS wasn't offered by the library.I would like to suggest that OPTICS be implemented in the library, and more generally that more algorithms be offered for clustering however, when I look at Wikipedia's page on OPTICS, there are at least half a dozen of variations on the OPTICS algorithm (and the same can be said for DBSCAN), none of which appears to be the same as the one used in scikit-learn... I'm just a simple user of ML, and I don't really have the qualification to decide which variation(s), if any, should be implemented in
FSharp.Stats
.So I was wondering if there should not be some kind of process for evaluating the inclusion of newer algorithms in general
FSharp.Stats
, we could then record in a document their status (e.g. rejected, PR accepted...), if rejected we could provide an explanation. In some way, the document could serve as a part of the project roadmap.Another method could also be, since Python is the de-facto tool for data science and ML, to at least offer the same algorithms as what is offered by their most popular libraries either to make F#/
FSharp.Stats
an easy pick for turning Python exploration code into production code.What are your thoughts on the matter?
Beta Was this translation helpful? Give feedback.
All reactions