-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Batch and Cached Predictions #732
Comments
Could you clarify what you meant by "batch models" and "cached predictions"? |
@terrytangyuan added some definitions, let me know if you'd like additional context or info! Happy to try to make this clearer. 👍 |
One idea from @thesuperzapper is that we could make a blog post instead about this, which I like as an option before doing an implementation. @thesuperzapper if you're aligned with this, I can take this on as a starting point. I'd limit the scope to just serving batch predictions as a feature. What do you think? |
@franciscojavierarceo I'm still not entirely sure what you would be needing from the Kubeflow community here. What would your blog post be explaining or demonstrating specifically? In general, I am very on board with having a good case study for batch feature serving with Kubeflow. Although I am not quite sure I understand how KServe comes into it, as KServe is about serving REST endpoints, which are not typically associated with batch inference. |
I'm asking if folks would be supportive of me creating a demo or documentation outlining how users can serve batch computed predictions* using Feast, if not then I won't spend the time building the demo as it will require meaningful effort. I would want to add it to the website to highlight the behavior.
*Concretely, if a MLE wanted to serve predictions online created by an ML Model that was the output of some scheduled job (e.g., a KFP pipeline), you could do so using Feast (in this case the prediction is a feature).
Awesome, glad we're aligned. Yeah, KServe does not come into the batch-only use case. KServe does come into it when you want to calculate a score in real-time, cache it, update it only when the data changes, and initialize the cache from a batch set of data. This use case is a little complicated but will offer much lower latency for serving ML Models, which is why I think it's useful. Also, thanks for reviewing! Let me know if you have additional feedback. 👍 |
@andreyvelich let me know if you have any thoughts here. |
Thank you for this @franciscojavierarceo! If we feel that we should just explain the case-study on how to achieve it, we should create a Kubeflow blog post about it as @thesuperzapper suggested: https://blog.kubeflow.org/. |
@andreyvelich @yuzisun @sivanantha321 @johnugeorge @thesuperzapper and @terrytangyuan I updated the Feast documentation to outline how this can work in Feast here: https://docs.feast.dev/v/master/getting-started/architecture/model-inference. This doesn't mention KServe explicitly as it's meant to be from the Feast perspective (i.e., inference approach agnostic). It would ideal to incorporate similar documentation to Kubeflow to outline the tradeoffs from different structures (e.g., KServe centric-client or a completely separate client orchestrating both KServe and Feast). |
Issue
Kubeflow should provide some guidance on serving the following two types of model predictions online:
(1) requires retrieving a precomputed score from an online database.
(2) requires recomputing the score dynamically (e.g., calling a Kserve endpoint), retrieving the precomputed score from an online database, and updating the score in some way (e.g,. when data changes from the upstream data producer).
Definitions
Precomputed Predictions: We define precomputed predictions as a sample of n observations (e.g., n users) computed as a batch process. Example: a risk-score prediction computed for all n users at a point in time and stored in some file (e.g., parquet/csv).
Cached Predictions: We define cached predictions as predictions computed online (i.e., via a request), cached in a database, and updated and persisted on some asynchronous frequency independent of the usage of the prediction. Example: a risk-score prediction computed for a single user and stored in an online database and refreshed as features in the model change independent of the usage of the risk-score (i.e., a client's API call).
Options Available
I believe there are at least 3 ways this can be done:
There are pros and cons to each and it'd be good to discuss them with the Kubeflow community to work through them to come to a consensus.
Feedback from the Community
I think the solution recommended may end up depending upon the needs of the users. Having KServe call Feast requires an additional network call but it is a more intuitive architecture. Getting feedback from the community would be great here to discuss the tradeoffs and make an informed, collaborative choice. An ideal outcome would be to incorporate this feedback in the Kubeflow documentation.
Additional Context
There are several discussion that have lead me to make this issue see this issue in Feast here, this issue in Kubeflow/kubeflow and this blog post by Databricks.
The text was updated successfully, but these errors were encountered: