Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async, non-blocking jobs execution #114

Open
coutoPL opened this issue Jun 4, 2020 · 13 comments
Open

Async, non-blocking jobs execution #114

coutoPL opened this issue Jun 4, 2020 · 13 comments

Comments

@coutoPL
Copy link

coutoPL commented Jun 4, 2020

Hi,

Your lib almost perfectly fits my use case. But, after code inspecting, I suppose there is no support for asynchronous, non blocking jobs.

What do I exactly mean? I have plenty of jobs to start (let's say 10k/min). Each of it contains a HTTP request(s), which I call using non-blocking http client. So, I have eg. CompletableFuture[Response] and based on the response I have to decide if reschedule the task instance or not (custom task and ExecutionHandler seem to be great to do it). Currently, I have to block the thread, but this is not the way to go, because there is a lot of tasks to start (requirement: immediately or as soon as possible) and mean time of waiting for response can be ~30s. The thread is blocked and just waiting for IO.

It seems that it can be solved if db-scheduler could be able to define and use sth like async execution handler:

public interface ExecutionHandler<T> {
   CompletableFuture<CompletionHandler<T>> execute(TaskInstance<T> taskInstance, ExecutionContext executionContext);
}

Or maybe I missed sth and there is a way to achieve sth like described above?
If not, WDYT about the idea to introduce async interface and adapt db-scheduler to be able to work with it? Do you see any obstacles?

Thanks for your effort. This project and your care for it looks very impressive.

@kagkarlsson
Copy link
Owner

Hi! What level of parallelism per scheduler-instance do you need?

I haven't really thought about it before, but I agree it makes sense if you have many but slow executions. Trying to think about what the obstacles are...

@kagkarlsson
Copy link
Owner

I don't see any obvious obstables, probably need to get a feel for how it would affect the code-base to be able to evaluate it properly

@coutoPL
Copy link
Author

coutoPL commented Jun 4, 2020

I'd like to control parallelism by thread pool set for a scheduler instance.

Let's say I give the scheduler pool of 5 threads (fixed) and the scheduler should run as many executions as it can. If job/execution release thread (because eg. it waits for IO), the next job/execution can be run. The IO result of previous execution will be (can be) handled (after async response) by other thread than the initiation one.

Ok, if we decide to use db-scheduler, I'll try to add this functionality to the lib, if you don't mind :)

@kagkarlsson
Copy link
Owner

kagkarlsson commented Jun 4, 2020

Ok 👍

I am a bit uncertain about how the algorithm controlling how many executions the scheduler would be allowed to pick would look... 🤔

@coutoPL
Copy link
Author

coutoPL commented Jun 4, 2020

Me neither. I've not been there yet. But I'm going to consult with you all uncertain things here. Stay tuned :)

@muradm
Copy link

muradm commented Nov 28, 2020

@coutoPL, just currious, i'm doing similar thing here with Axon, why not just re-schedule in CompletableFuture's handle or thenApply?

@dmoidl
Copy link
Contributor

dmoidl commented Feb 5, 2021

Hi guys, I'd like to revive this issue 🙃

We have a nearly identical use-case as the OP, just not with that many tasks in need for scheduling (yet?) and with lower expected task duration so even blocking the thread works for us now. But that doesn't mean I wouldn't like to see support for this in the lib directly 🙂

I'd be willing to participate in building this feature, but I think it would be best to first properly think about how exactly would such a feature get incorporated into the current code base, no? Maybe, @kagkarlsson if you could find some time to just think about this some day and write down your thoughts on the topic, that would be something to start from.

@muradm I though about doing just that, simply registering a callback and doing your stuff in there, but there are number of reasons why that doesn't work. The main one is that when you return from a task's execution handler, you need to somehow specify what should happen with that task instance:

  1. If you return CompletionHandler.OnCompleteRemove(), you've just marked the task as finished even though it has not finished yet. That's a weird state on its own, but it is a problem. Imagine you application suddenly crashes - then the task instance has not finished, but it will not get ever picked up, because it appears completed to the scheduler.
  2. You can't really return OnCompletionReschedule since you don't know the outcome of your task at the moment.
  3. You could implement a no-op CompletionHandler, but then again you don't use the Scheduler properly. This time you'd make the task look "picked" even after it has finished.

Long story short: you may be able to make it (almost) work, but you'd need to duplicate parts of logic currently being handled by the Scheduler for you, such as tracking of failures etc. And you can't really work around the issue that once your node goes down unexpectedly, you simply loose the state of task instances currently being executed by that node since that state is now only kept in your app's memory 🤷‍♂️

@coutoPL
Copy link
Author

coutoPL commented Feb 5, 2021

I would just like to say that we have postponed this problem for later, so I'm not working on it atm. I didn't even start.

@kagkarlsson
Copy link
Owner

kagkarlsson commented Feb 5, 2021

I will think some on this use-case when I get some time :)

In the mean time, if you are considering ways to implement this, it might be good to know that I am working on a refactor of db-scheduler to support select-for-update polling. In doing that, I have extracted some of the execution logic into a separate class, possibly making this feature here easier to implement (haven't checked). You can find the new code here: https://github.com/kagkarlsson/db-scheduler/pull/175/files#diff-d015624ee0d7dbc8378459e38bfffc0581736254a71f6d14b953343eff50089dR1

@coutoPL
Copy link
Author

coutoPL commented Feb 5, 2021

I identified the interface which should be changed to introduce the fact that job can be done asynchronously.

public interface ExecutionHandler<T> {
   CompletableFuture<CompletionHandler<T>> execute(TaskInstance<T> taskInstance, ExecutionContext executionContext);
}

I was going to start there.

@kagkarlsson
Copy link
Owner

@amit-handda
Copy link

amit-handda commented Mar 15, 2022

thanks folks, this is nice. I am looking to adopt db-scheduler and pair it up with kotlin coroutines to execute short tasks which are going to mainly interact with 2-3 other http services (async http client). hence, looking to execute tasks async'ly.

IIUC, if I implement ExecutorService using coroutines, and initiatize Scheduler with it, then I am done ?
Appreciate the implementation and work so far. love it.

UPDATE: never mind, I understand it needs more changes.

@amit-handda
Copy link

@kagkarlsson we created a PR to address this issue, would be awesome if you could review it sometime. TYSM
#304

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants