Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow claim actions update query causing lock wait timeouts #1104

Open
prettyboymp opened this issue Sep 23, 2024 · 0 comments
Open

Slow claim actions update query causing lock wait timeouts #1104

prettyboymp opened this issue Sep 23, 2024 · 0 comments
Labels
priority: high The issue/PR is high priority—it affects lots of customers substantially, but not critically.

Comments

@prettyboymp
Copy link
Contributor

prettyboymp commented Sep 23, 2024

Overview

When there are a large number of pending actions, the query built by ActionScheduler_DBStore::claim_actions() to claim actions can become slow and lock the entire table. While this lock is active, other requests to update actions from Queue_Runner are stalled or may fail due to transaction lock timeouts.

This causes actions to fail due to errors in transitioning their status. In high-usage environments, the queue of pending actions grows as successful queries are delayed while waiting for locks to clear.

Environment

  • WooCommerce and WooCommerce Subscriptions active
  • Webhooks are set up for order and subscription creation/update events, running as scheduled actions
  • Approximately 26,000 active subscriptions, each renewing weekly, leading to a regular accumulation of 26,000+ pending actions

A script was run to update a decent sized portion of the subscriptions. This triggered the queueing of a few thousand more pending actions for the webhooks to be processed, resulting in a total of ~31,000 pending actions.

Issue Detail

Upon reaching ~31000 pending actions and ~2,500,000+ actions total, we started encountering the following fatal error:

Uncaught RuntimeException: Unable to claim actions. Database error: Lock wait timeout exceeded; try restarting transaction.

There were other Lock wait timeout errors being generated by the database for other queries to update and delete actions, however, those didn't trigger PHP errors because the handling code doesn't check for errors, only the modified count.

The query causing the long running locks was:

UPDATE wp_actionscheduler_actions 
SET claim_id=2074272, last_attempt_gmt='2024-09-18 17:23:58', last_attempt_local='2024-09-18 19:23:58' 
WHERE claim_id = 0 
AND scheduled_date_gmt <= '2024-09-18 17:23:58' 
AND status='pending' 
ORDER BY priority ASC, attempts ASC, scheduled_date_gmt ASC, action_id ASC
LIMIT 10

Explain:

+------+-------------+----------------------------+-------------+-----------------------------------------------------------------------+-----------------+---------+------+-------+---------------------------------------------------------------+
| id   | select_type | table                      | type        | possible_keys                                                         | key             | key_len | ref  | rows  | Extra                                                         |
+------+-------------+----------------------------+-------------+-----------------------------------------------------------------------+-----------------+---------+------+-------+---------------------------------------------------------------+
|    1 | SIMPLE      | wp_actionscheduler_actions | index_merge | status,scheduled_date_gmt,claim_id,claim_id_status_scheduled_date_gmt | status,claim_id | 22,8    | NULL | 30987 | Using intersect(status,claim_id); Using where; Using filesort |
+------+-------------+----------------------------+-------------+-----------------------------------------------------------------------+-----------------+---------+------+-------+---------------------------------------------------------------+

The query is forced to use a filesort to order and update just 10 records because there are no usable indexes to efficiently handle the ORDER BY clause. The combination of priority ASC and attempts ASC introduces additional sorting, which is not supported by an existing index.

@Konamiman Konamiman added priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc. priority: high The issue/PR is high priority—it affects lots of customers substantially, but not critically. and removed priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc. labels Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high The issue/PR is high priority—it affects lots of customers substantially, but not critically.
Projects
None yet
Development

No branches or pull requests

2 participants