Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tested for concurrency? #11

Open
dtoso-skymesh opened this issue Sep 19, 2015 · 5 comments
Open

Tested for concurrency? #11

dtoso-skymesh opened this issue Sep 19, 2015 · 5 comments

Comments

@dtoso-skymesh
Copy link

In the past running multiple copies of wkhtmltopdf concurrently had issues; there were threading problems and some named-pipes were at the same fs path across multiple wkhtmltopdf processes. (I found out after an ugly incident involving customers getting other's PDFs in a parallelized batch run).

wkhtmltopdf seems to have had many version bumps since then, but nothing I've read from the commits screams out that this issue has been fixed.

In openlabs/docker-wkhtmltopdf-aas the gunicorn WSGI daemon seems to fork on request, so if the concurrency issue still exists in wkhtmltopdf then this service exports the problem to the service's users.

In my use case, I needed to substitute the version of wkhtmltopdf shipped with openlabs/docker-wkhtmltopdf-aas with a staticly linked copy of wkhtmltopdf 0.10.0rc2 because the PDF output from identical HTML had changed over the years due to webkit html rendering fixes. (I have legacy HTML that would be a massive PITA to change).

As I know at least my version of wkhtmltopdf (0.10.0rc2) has concurrency issues, I'm treating docker as an isolation mechanism rather than simply a deployment helper. I have 20 identical containers running with a home-made HTTP load-balancing proxy sitting in front of them. It hands off (unmodifed) requests to available containers and makes subsequent requests wait until workers become available (by simply blocking on the HTTP response).

@sharoonthomas
Copy link

Testing the returned content in PDF is a PITA. Any ideas on how a test with concurrency could be done ?

@dtoso-skymesh
Copy link
Author

I wrote a perl script (call it 'single.pl') that:

  • generates a random ID and MD5s it,
  • prints the <MD5> without a newline
  • takes known, simple HTML and substitutes the <MD5> into the that HTML.
  • makes a JSON-mode HTTP POST request to the service
  • compares the output to expected <MD5> using pdftotext from poppler-utils

Comparison done through this pipleline:

pdftotext - - | grep <MD5>

Then I wrote another perl script (call it 'bench.pl') to fork 5 children, where each child executes single.pl 20 times with a randomised Time::HiRes::usleep in between requests. I log the commandline and the result of the grep out to a file and then grep that for mismatches.

@sharoonthomas
Copy link

@dtoso-skymesh 👍 thank you

@alicpr
Copy link

alicpr commented Jan 15, 2021

We are going to use this on enterprise scale which will perform 100 req/s on each server. Is the issue still exists? Does any alternative solution available?

@dtoso-skymesh
Copy link
Author

@alicpr not sure if @sharoonthomas has fixed this issue, but I worked around it by running many docker containers each running wkhtmltopdf-aas. The solution was to only send one request at a time to each container.

If you've got a fast enough machine(s) you could just launch (docker run) them on demand from, say, a CGI script.

On our hardware that wasn't fast enough so I came up with an HTTP-proxy based solution.
Basically it does:

  • at startup & periodically: uses docker socket protocol to ask for a list available docker containers running wkhtmltopdf-aas, along with their NAT'd IP address and port.
  • uses select(2) to respond to large numbers of requesting HTTP clients -- they see the proxy as a blocking server
  • maintains a mapping of active docker containers to client sockets, blocking an HTTP client if none is available
  • when a container becomes available (new container or previously completed/aborted request) the request is forwarded in a non-blocking manner to the mapped docker container's IP address and port.
  • responses to container HTTP requests are forwarded back to the clients in a non-blocking fashion
  • when a client-response is completed, the container mapping is removed to service requests for other clients

I've found the limiting factor to be the server hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants