Skip to content

Commit

Permalink
channel: Increase grace period for subprocesses to stop
Browse files Browse the repository at this point in the history
There were complaints that tbot would raise the "some subprocess(es) did
not stop" error too quickly.  The previous 1.27s timeout was too short
for some programs to exit on their own.

To make tbot more graceful, increase the maximum termination time to 10.2
seconds.  However, a warning will be emitted at 1.27s to let the user
know why tbot is waiting.

Signed-off-by: Rahix <[email protected]>
  • Loading branch information
Rahix committed Jul 14, 2024
1 parent d2c043f commit 1e14da5
Showing 1 changed file with 36 additions and 7 deletions.
43 changes: 36 additions & 7 deletions tbot/machine/channel/subprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,12 +110,14 @@ def close(self) -> None:
self.p.kill()
self.p.communicate()

# Wait for all processes in the session to end. Most of the time
# this will return immediately, but in some cases (eg. a serial session
# with picocom) we have to wait a bit until we can continue. To be as
# quick as possible this is implemented as exponential backoff and will
# give up after 1 second (1.27 to be exact) and emit a warning.
for t in range(7):
# Wait for all processes in the session to end. Most of the time this
# will return immediately, but in some cases (eg. a serial session with
# picocom) we have to wait a bit until we can continue. To be as quick
# as possible this is implemented as exponential backoff. When a
# subprocess takes longer than 1.27s to terminate, a warning is
# emitted. After 10.2s, we give up and error out.
wait_total = 0.0
for t in range(10):
if (
subprocess.call(
["ps", "-s", str(sid)],
Expand All @@ -125,7 +127,34 @@ def close(self) -> None:
!= 0
):
break
time.sleep(2**t / 100)

wait_time = 2**t / 100

Check warning on line 131 in tbot/machine/channel/subprocess.py

View check run for this annotation

Codecov / codecov/patch

tbot/machine/channel/subprocess.py#L131

Added line #L131 was not covered by tests

if t >= 7:
offending = (

Check warning on line 134 in tbot/machine/channel/subprocess.py

View check run for this annotation

Codecov / codecov/patch

tbot/machine/channel/subprocess.py#L133-L134

Added lines #L133 - L134 were not covered by tests
subprocess.run(
["ps", "-s", str(sid), "ho", "args"],
stdout=subprocess.PIPE,
encoding="utf-8",
)
.stdout.strip()
.split("\n")
)
offending_str = "\n".join(f" - {args!r}" for args in offending)
tbot.log.warning(

Check warning on line 144 in tbot/machine/channel/subprocess.py

View check run for this annotation

Codecov / codecov/patch

tbot/machine/channel/subprocess.py#L143-L144

Added lines #L143 - L144 were not covered by tests
f"""\
Some subprocesses have not stopped after {wait_total:.1f} s:
{offending_str}
You should probably consider to explicitly terminate the offending processes in
your connector configuration.
Waiting for {wait_time:.1f} more seconds..."""
)

time.sleep(wait_time)
wait_total += wait_time

Check warning on line 157 in tbot/machine/channel/subprocess.py

View check run for this annotation

Codecov / codecov/patch

tbot/machine/channel/subprocess.py#L156-L157

Added lines #L156 - L157 were not covered by tests
else:
raise tbot.error.TbotException("some subprocess(es) did not stop")

Expand Down

0 comments on commit 1e14da5

Please sign in to comment.