Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/interruptible execution clean2 #796

Merged
merged 12 commits into from
May 28, 2024

Conversation

detule
Copy link
Collaborator

@detule detule commented Apr 27, 2024

Hi All:

Please consider adding this feature - the ability to "Ctrl-C" past a long running query. For example in SQL Server

> dbGetQuery(conn, "WAITFOR DELAY '00:10'", immediate = TRUE);
^C
Caught user interrupt, attempting a clean exit...
Error: nanodbc/nanodbc.cpp:1711: 00000
<SQL> 'WAITFOR DELAY '00:10''
>

Some notes - i feel like I am being wordy below but I am sure I still left quite a bit out:

  • Design: Created a run_interruptible wrapper that takes two arguments: a function to be executed away from the main thread, and a cleanup function to be invoked once the signal is caught on the main thread. Perhaps this is an overkill but I did this with an eye that maybe some day we may need to be able to interrupt other calls from the odbc API, not just SQLExecute. We can then just re-use the wrapper.

  • Design: Used std::async rather than std::thread: I personally am a little biased towards std::async but mostly based on semantics. I know there's quite a bit left up to the implementation but at this point it's been around for long enough that I think there should be little in the way of surprises. I've witnessed some passionate thread-vs-async discussions before but I think it's mostly a matter of style. If folks feel strongly one way or another, happy to change it.

  • Design: Signal handling ( away from Windows ) - used the pthread API to make sure the SIGINT signal is delivered to the main thread. This is pretty idiosyncratic, but i've seen issues where depending on the implementation, a signal can end up with the wrong thread.

  • Opt-out: Struggled with this for a bit. I am not a fan of dbConnect accruing yet more arguments. At the same time also not in love with introducing some sort of a secret option --- getOption(.odbc.interruptible.exec) --- or somesuch. There is an attributes arguments to dbConnect but when we introduced it, I think we envisioned ODBC/API connection arguments. So - hated myself a little bit for it - but ended up writing in a new argument to dbConnect.

  • Testing: Pipelines are (hopefully) passing on both windows and linux and that's a good sign. Beyond that, on my linux box, I tested DB2, Oracle, Terradata, Databricks and did not see any issues. Note, the WAITFOR style queries are not always interruptible. For example on snowflake, interrupting the equivalent SELECT system$wait(10) won't save you any time since the SQLCancel call blocks on the main thread after catching the interrupt. However, testing an actual long running execution I saw better behavior.

    • MySQL: Saw some issues with the MariaDB odbc driver 3.1.15 on Linux. I went back to installing the driver from their github tree as the behavior is much better with that version ( 3.1.17). In particular, when using the canned v3.1.15 driver, executing the following will hang execution:
      • dbSend(Get)Query(conn, NA_character_) ( part of DBItest::send(get)_query_non_string )
      • dbSendStatement(conn, NA_character_) ( part of DBItest::send_statement_non_string )
      • dbExecute(conn, NA_character_) ( part of DBItest::execute_non_string )

    Note, all remaining (~3.5K) DBItest tests pass with both drivers just fine. Spent quite a bit of time comparing odbc traces to no avail, and thought it's something so idiosyncratic ( and fixed in their newer drivers to boot ) that it should probably not hold this feature back. Options to ignore were to use newer driver, or to add the tests to the "skip" list for driver-mysql.

Appreciate any feedback - and thanks for taking a look!

@detule
Copy link
Collaborator Author

detule commented Apr 30, 2024

  • The business of the PR is in feature: interruptible execution; remaining commits are auxiliary / supporting cast.
  • I think snowflake tests are failing perhaps because I don't have access to the github credentials / secret/

Copy link
Collaborator

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so great. Stoked about this one.

Leaving a "Comment" rather than "Approve" review as there are quite a few Cpp changes here that I don't feel quite qualified enough to approve. Mostly feedback on the interface to cli conditions from me.

tests/testthat/_snaps/utils.md Outdated Show resolved Hide resolved
R/dbi-driver.R Outdated Show resolved Hide resolved
R/dbi-driver.R Show resolved Hide resolved
R/dbi-driver.R Outdated Show resolved Hide resolved
R/dbi-driver.R Outdated Show resolved Hide resolved
src/odbc_result.cpp Outdated Show resolved Hide resolved
src/utils.h Outdated Show resolved Hide resolved
src/utils.cpp Outdated Show resolved Hide resolved
src/odbc_result.h Outdated Show resolved Hide resolved
Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a news bullet too

Looks good from my end and I know users will really be excited about this issue. But I have no ability to review the threaded C++ code, so please wait on @gaborcsardi's review before merging.

.github/odbc/install-mariadb-driver.sh Show resolved Hide resolved
R/dbi-driver.R Outdated
#' prior to the connection being established. See \link{ConnectionAttributes}.
#' @param interruptibleExecution Logical. If `TRUE` calls to SQLExecute and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mark this as experimental, and to file an issue on GitHub if it's used? I assume we'll eventually take this away once we're confident that it's ok.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the note ( exprimental, please bring issues to github ) in the NEWS bullet. Let me know if I should mention it elsewhere as well.

R/dbi-driver.R Outdated Show resolved Hide resolved
src/odbc_result.h Outdated Show resolved Hide resolved
@simonpcouch simonpcouch mentioned this pull request May 14, 2024
@gaborcsardi
Copy link
Collaborator

gaborcsardi commented May 14, 2024

@detule @simonpcouch Some quick comments wrt the new argument and having a (not so) secret option instead. I think an option or an env var is much friendlier for the users than the new argument.

  • If interruptible execution causes you issues (which it might, especially on older systems that barely support C++11), it is much easier to opt out with a global option than updating your code at multiple places, in multiple projects. You might not even own the code for which you want to opt out.
  • An option makes the transition much smoother, e.g. you default first to FALSE (if you are conservative), and then in the next release to TRUE, and ideally at some point drop the option altogether. You can do all this without changing APIs or code that uses odbc.
  • You can document the option, e.g. you should mention it in the NEWS, and mention it in the manual page(s), and/or in the README. I agree that options tend to be hidden, because of our lack of conventions for documenting them, but we can try to do better in this regard. In any case I think the benefits of a global option outweigh its clumsiness.

@gaborcsardi
Copy link
Collaborator

gaborcsardi commented May 14, 2024

@detule A quick question about concurrency and resources. If there is an interrupt, then cleanup_fn() runs concurrently with the async thread, right?

And after an interrupt, the main thread invalidates the DB handle that the async thread is using? If that's the case, are the ODBC drivers prepared for this?

@simonpcouch
Copy link
Collaborator

simonpcouch commented May 14, 2024

Not doing this myself in case @detule has local changes, but this PR should pass CI with changes merged from main!

@detule
Copy link
Collaborator Author

detule commented May 17, 2024

HI @gaborcsardi

Appreciate you taking a look.

  • Will make the interruptible argument default to a global option. Good call.
  • Right - if interrupted cleanup_fn would run concurrently with the async/execution thread. Per the ODBC spec this should be fine, since before invalidating the statement as part of the clean up, we first call SQLCancel. In the async thread that should cause the statement to terminate processing.
    • This mode of operation is covered in the ODBC documentation here.
    • In testing locally (albeit Linux only) almost everything I tested was well behaved modulo an older vintage of the mysql driver.
    • Note there is a notion of "async" execution in the ODBC spec. I would expect that feature to be poorly supported across various back-ends and drivers, but this is not that.

@detule detule force-pushed the feature/interruptible_execution_clean2 branch from 5f8bac1 to c340595 Compare May 18, 2024 22:41
@detule
Copy link
Collaborator Author

detule commented May 18, 2024

Not doing this myself in case @detule has local changes, but this PR should pass CI with changes merged from main!

Thanks @simonpcouch - rebased on top of main. Hopefully it's all green.

@gaborcsardi
Copy link
Collaborator

gaborcsardi commented May 22, 2024

Per the ODBC spec this should be fine, since before invalidating the statement as part of the clean up, we first call SQLCancel

I don't see where that happens, probably in close()?

In any case, this looks pretty good then. The only concern I have is that you need to install the dev MariaDB drivers to make ODBC run reliably. I would think that many ODBC users are stuck on older Linux systems, so we could potentially break their projects.

So maybe you could make this feature opt-in for now? And make it the default in the next release?

@detule
Copy link
Collaborator Author

detule commented May 23, 2024

Per the ODBC spec this should be fine, since before invalidating the statement as part of the clean up, we first call SQLCancel

I don't see where that happens, probably in close()?

Yes, both set_current_result, and close in the cleanup function will cancel the current result, as needed.

In any case, this looks pretty good then. The only concern I have is that you need to install the dev MariaDB drivers to make ODBC run reliably. I would think that many ODBC users are stuck on older Linux systems, so we could potentially break their projects.

So maybe you could make this feature opt-in for now? And make it the default in the next release?

Thanks @gaborcsardi . Appreciate you taking a look. On mariadb - do you still feel the same way knowing that the only test that the driver seems to be misfiring on is the relatively contrived dbGetQuery(conn, NA_character_) ? All other tests, more than three thousand of them, seem to be passing on both Linux and Windows ( admittedly MySQL, not MariaDB ) without issues. At any rate, I don't have a problem making it opt-in and happy to do it, just a bit concerned that it won't get enough mileage.

@gaborcsardi
Copy link
Collaborator

At any rate, I don't have a problem making it opt-in and happy to do it, just a bit concerned that it won't get enough mileage.

You can make a much more informed decision than me. In general I tend to be conservative in these kind of issues, especially with packages that may be used on older systems, like odbc.

You can also make the default depend on whether the session is interactive or not. I would think that interruption is much less important in non-interactive sessions, where an interruption just kills the process, because there are already ways to kill the process. So you could turn it on in interactive sessions only.

@detule
Copy link
Collaborator Author

detule commented May 24, 2024

You can also make the default depend on whether the session is interactive or not. I would think that interruption is much less important in non-interactive sessions, where an interruption just kills the process, because there are already ways to kill the process. So you could turn it on in interactive sessions only.

Done - thanks, that's a great suggestion.

detule and others added 2 commits May 24, 2024 18:10
Merge branch 'main' into detule-feature/interruptible_execution_clean2

# Conflicts:
#	tests/testthat/_snaps/utils.md
@simonpcouch simonpcouch merged commit e45cc17 into r-dbi:main May 28, 2024
16 checks passed
@r2evans
Copy link

r2evans commented Aug 9, 2024

(I cannot express sufficiently how much I've wanted this and never knew to ask for it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants