Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Job Processing Loop Fails Due to Database Connection Timeout #30668

Open
fabrizzio-dotCMS opened this issue Nov 15, 2024 · 0 comments
Open

Comments

@fabrizzio-dotCMS
Copy link
Contributor

Parent Issue

#29498

Problem Statement

The job processing loop in JobQueueManagerAPIImpl is failing intermittently with a DotDataException caused by a database connection timeout. This results in an inability to fetch the next job for processing and potentially disrupts job queue management. The error logs indicate the following stack trace:

23:43:06.004  ERROR api.JobQueueManagerAPIImpl - Unexpected error in job processing loop: Error fetching next job
com.dotmarketing.exception.DotDataException: Error fetching next job
	at com.dotcms.jobs.business.api.JobQueueManagerAPIImpl.processNextJob(JobQueueManagerAPIImpl.java:613) ~[?:?]
	at com.dotcms.jobs.business.api.JobQueueManagerAPIImpl.processJobs(JobQueueManagerAPIImpl.java:573) ~[?:?]
	at com.dotcms.jobs.business.api.JobQueueManagerAPIImpl.lambda$start$0(JobQueueManagerAPIImpl.java:217) ~[?:?]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
	at java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:317) ~[?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: com.dotcms.jobs.business.queue.error.JobQueueDataException: Database error while fetching next job
	at com.dotcms.jobs.business.queue.PostgresJobQueue.nextJob(PostgresJobQueue.java:573) ~[?:?]
	at com.dotcms.jobs.business.api.JobQueueManagerAPIImpl.processNextJob(JobQueueManagerAPIImpl.java:604) ~[?:?]
	... 8 more
Caused by: com.dotmarketing.exception.DotDataException: jdbc/dotCMSPool - Connection is not available, request timed out after 5991ms.{
  "SQL": ["UPDATE job_queue SET state = ? WHERE id = (SELECT id FROM job_queue WHERE state = ? ORDER BY priority DESC, created_at ASC LIMIT 1 FOR UPDATE SKIP LOCKED) RETURNING *"],
  "maxRows": [-1],
  "offest": [0],
  "params": [
    "RUNNING",
    "PENDING"
  ]
}
	at com.dotmarketing.common.db.DotConnect.loadResult(DotConnect.java:310) ~[?:?]
	at com.dotmarketing.common.db.DotConnect.loadObjectResults(DotConnect.java:997) ~[?:?]
	at com.dotcms.jobs.business.queue.PostgresJobQueue.nextJob(PostgresJobQueue.java:562) ~[?:?]
	at com.dotcms.jobs.business.api.JobQueueManagerAPIImpl.processNextJob(JobQueueManagerAPIImpl.java:604) ~[?:?]
	... 8 more
Caused by: com.dotmarketing.exception.DotRuntimeException: jdbc/dotCMSPool - Connection is not available, request timed out after 5991ms.
	at com.dotmarketing.db.DbConnectionFactory.getConnection(DbConnectionFactory.java:236) ~[?:?]
	at com.dotmarketing.common.db.DotConnect.executeQuery(DotConnect.java:599) ~[?:?]
	at com.dotmarketing.common.db.DotConnect.loadResult(DotConnect.java:308) ~[?:?]
	at com.dotmarketing.common.db.DotConnect.loadObjectResults(DotConnect.java:997) ~[?:?]
	at com.dotcms.jobs.business.queue.PostgresJobQueue.nextJob(PostgresJobQueue.java:562) ~[?:?]
	at com.dotcms.jobs.business.api.JobQueueManagerAPIImpl.processNextJob(JobQueueManagerAPIImpl.java:604) ~[?:?]
	... 8 more
Caused by: java.sql.SQLTransientConnectionException: jdbc/dotCMSPool - Connection is not available, request timed out after 5991ms.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:689) ~[?:?]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196) ~[?:?]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161) ~[?:?]
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) ~[?:?]
	at com.dotmarketing.db.DbConnectionFactory.getConnection(DbConnectionFactory.java:223) ~[?:?]
	at com.dotmarketing.common.db.DotConnect.executeQuery(DotConnect.java:599) ~[?:?]
	at com.dotmarketing.common.db.DotConnect.loadResult(DotConnect.java:308) ~[?:?]
	at com.dotmarketing.common.db.DotConnect.loadObjectResults(DotConnect.java:997) ~[?:?]
	at com.dotcms.jobs.business.queue.PostgresJobQueue.nextJob(PostgresJobQueue.java:562) ~[?:?]
	at com.dotcms.jobs.business.api.JobQueueManagerAPIImpl.processNextJob(JobQueueManagerAPIImpl.java:604) ~[?:?]
	... 8 more
23:43:06.004  ERROR license.LicenseManager - Could not detect if server 5b7f00e4-49c5-48a0-8bd5-712cbfd0153c is duplicated
com.dotmarketing.exception.DotDataException: jdbc/dotCMSPool - Connection is not available, request timed out after 5991ms.{
  "SQL": ["SELECT id FROM sitelic WHERE (serverid = ? OR license = ?) AND startup_time != ?"],
  "maxRows": [-1],
  "offest": [0],
  "params": [
    "5b7f00e4-49c5-48a0-8bd5-712cbfd0153c",
    "0toaiuQe4vxQiX/5ewQQj9N6JGBqmb3WUdbMBmCUC17vYK86AX3HJ6PZoNI70BfRKzMBhRurLeICbPxZIOYhrwJKk7lgT39564Rt56FDOZAhvEidREV3YKZZrdffGGgF1sLGilVWKlU69i0wEoCD+SbtmT5DEdFamklkfvUtyKMAAAAIRGV2IFRlc3QAAAAIAAABWbfTJtkAAAAIAAAAAAAAAAAAAAAEAAAB9AAAACRjNmVhYzkzZS0wOGNiLTRjOTItYmFlMi05NmNmYjY4Y2ZjMDMAAAAEcHJvZAAAAAQAAAABAAAABAAAAAAAAAAEAAABLAAAACQxNzUwOTlkMS1iMzM1LTRmYzYtOTYzYS0wMGM5YTdhYTcyYzQ=",
    1731626407189
  ]
}

This problem arose with very little load

Steps to Reproduce

Unfortunately, I do not have a pattern to reproduce this
It started happening on a long-running instance. Like the next day

Acceptance Criteria

We need to revise the Database connection pool configuration (dotCMSPool) and determine if any different configuration can resolve the problem or ensure recovery. Perhaps a dedicated pool or something.

dotCMS Version

main

Proposed Objective

Technical User Experience

Proposed Priority

Priority 2 - Important

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

No response

Assumptions & Initiation Needs

No response

Quality Assurance Notes & Workarounds

No response

Sub-Tasks & Estimates

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Next 1-3 Sprints
Development

No branches or pull requests

2 participants