Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume job_explanation we sometimes set in ansible-runner #12089

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions awx/main/tasks/callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,12 +203,12 @@ def status_handler(self, status_data, runner_config):
if os.path.exists(key_data_file) and stat.S_ISFIFO(os.stat(key_data_file).st_mode):
os.remove(key_data_file)
elif status_data['status'] == 'error':
result_traceback = status_data.get('result_traceback', None)
if result_traceback:
from awx.main.signals import disable_activity_stream # Circular import

with disable_activity_stream():
self.instance = self.update_model(self.instance.pk, result_traceback=result_traceback)
updates = {}
for potential_field in ('result_traceback', 'job_explanation'):
if status_data.get(potential_field, None):
updates[potential_field] = status_data[potential_field]
if updates:
self.instance = self.update_model(self.instance.pk, **updates)


class RunnerCallbackForProjectUpdate(RunnerCallback):
Expand Down
6 changes: 5 additions & 1 deletion awx/main/tasks/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,11 @@ def run(self, pk, **kwargs):
status = res.status
rc = res.rc

# We call this to get the current values from the database, in case update_model was called
# within the threadpools inside of AWXReceptorJob. We use update_model instead of
# refresh_from_db here because it contains retry logic that is resilient to database failures.
self.instance = self.update_model(self.instance.pk)
shanemcd marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @sarabrajsingh, as he and I discussed error logs that came from post_run_hook due to not doing this.

Summarizing here...

Each thread has its own connection object. When connections are dropped, that object is stale. If it's not hitting one particular corner case, it will correct itself after 1 failed query. So post_run_hook errors, but then code after it starts to work again, so it's not easy to observe unless you check the logs.


if status in ('timeout', 'error'):
job_explanation = f"Job terminated due to {status}"
self.instance.job_explanation = self.instance.job_explanation or job_explanation
Expand Down Expand Up @@ -580,7 +585,6 @@ def run(self, pk, **kwargs):
if 'got an unexpected keyword argument' in extra_update_fields.get('result_traceback', ''):
extra_update_fields['result_traceback'] = "{}\n\n{}".format(extra_update_fields['result_traceback'], ANSIBLE_RUNNER_NEEDS_UPDATE_MESSAGE)

self.instance = self.update_model(pk)
self.instance = self.update_model(pk, status=status, emitted_events=self.runner_callback.event_ct, **extra_update_fields)

try:
Expand Down