Bug fix for switch_controller when using controller chaining #1591
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is separated to address only the changes related to bug fix #1568 , fixing a bug that occurs under specific conditions when handling chainable controllers in the controller_manager's switch_controller.
Note: To focus on meaningful changes, it is recommended to compare the code using "hide whitespace" mode as follows.
https://github.com/ros-controls/ros2_control/compare/master...TakashiSato:ros2_control:feature/fix_switch_controller_bug?diff=split&w=1
Explanation of the Bugs
Using the typical model for controller chaining, which is also used in this test, the following explanations introduce three situations where bugs occur.
Note that these bugs occur only when the strictness value is set to BEST_EFFORT. When the strictness value is set to STRICT, switch_controller fails, and these bugs do not arise.
Additionally, the test cases to detect these bugs are implemented in this commit.
BUG1
pid_left
(not_chained),pid_right
(not_chained)position_tracking
,diff_drive
diff_drive
pid_right
pid_left
(in_chained)position_tracking
,diff_drive
,pid_right
In this situation, the start request for
diff_drive
should be rejected, and only the stop command forpid_right
should be accepted. While this result is achieved, there is an issue wherepid_left
incorrectly enters in_chained mode.The cause of this bug lies in this section.
When checking the activate request for
diff_drive
, the first following controller,pid_left
, is checked for any inconsistencies along with the request. Since no issues are detected,pid_left
is added to theto_chained_mode_request
.However, when the check is performed with the next controller,
pid_right
, inconsistencies are detected becausepid_right
is targeted for deactivation, leading to the rejection of thediff_drive
activate request.Since the
to_chained_mode_request
is not re-evaluated in subsequent processes, this request remains and results in the actual transition to chained mode during manage_switch.BUG2
position_tracking
,diff_drive
,pid_left
,pid_right
diff_drive
,pid_left
pid_left
(in_chained)position_tracking
,diff_drive
,pid_right
This situation is similar to BUG1, where the start request for diff_drive is rejected, and only
pid_left
is activated. However,pid_left
still incorrectly enters in_chained mode.The cause of this bug is almost the same as BUG1. During the activate check for
diff_drive
, since the following controllerpid_left
is also a target for activation,pid_left
is added to theto_chained_mode_request
.Then, during the subsequent check with
pid_right
, sincepid_right
is not a target for activation, the activate request fordiff_drive
is rejected, leaving theto_chained_mode_request
forpid_left
.Incidentally, in this situation, the checks are performed in the order of
pid_left
followed bypid_right
. Therefore, ifpid_right
andpid_left
in the switch requests for BUG1 and BUG2 are swapped, the checks will function correctly, and these issues will not occur.BUG3
position_tracking
(not_chained),diff_drive
(in_chained),pid_left
(in_chained),pid_right
(in_chained)diff_drive
In this situation, since it is impossible to deactivate only
diff_drive
, the request should be rejected and no changes should occur.However, the following switch requests are internally generated, and since these switches cannot be handled properly, a deadlock occurs.
pid_left
,pid_right
pid_left
,pid_right
The sequence in which such switch requests are generated is as follows:
diff_drive
is a deactivate target, inpropagate_deactivation_of_chained_mode
, its following controllers,pid_left
andpid_right
, are added to thefrom_chained_mode_request
.check_following_controllers_for_activate
does nothing. (Note: The process to erase from from_chained_mode_request exists only within this function.)check_preceding_controllers_for_deactivate
, the preceding controller ofdiff_drive
,position_tracking
, is active and not present in the deactivate_request, resulting in the rejection of thediff_drive
deactivate request.pid_left
andpid_right
are included in thefrom_chained_mode_request
and meet the conditions, they are added to the(de)activate_request
as restart targets due to this process.Bug Fix Proposal
In this part of the check process,
propagate_deactivation_of_chained_mode
,check_following_controllers_for_activate
, andcheck_preceding_controllers_for_deactivate
perform judgments based on the content of the(de)activate_request
. Therefore, if the content of the (de)activate_request changes, the process should be retried from the beginning. However, during this process, it is also possible that the content offrom/to_chained_mode_request
has changed, so it is necessary to properly clear these as well.This commit demonstrates the simplest change to fix the bug based on this idea.
However, since the above commit uses
goto
, which is generally not recommended, the final implementation in this PR has been rewritten using lambda functions and a while loop for the retry process (The corresponding commit).