Reduce concurrency to 2 Gunicorn workers #178

andersy005 · 2022-11-02T17:57:34Z

This is an attempt at addressing recent memory issues

yuvipanda · 2022-11-02T17:59:17Z

This could also be because of memory use when running the recipe files?

In addition, I'd suggest increasing the dyno size too on heroku!

andersy005 · 2022-11-02T18:07:29Z

This could also be because of memory use when running the recipe files?

i hadn't thought about this. I presume by "recipe runs" you mean when we execute the recipe modules via pangeo-forge-runner expand-meta ... for registration, right?

andersy005 · 2022-11-02T18:10:08Z

In addition, I'd suggest increasing the dyno size too on heroku!

Do you have any recommendation? We are currently using hobby dyno, and It seems the next dyno type standard-1x isn't that different (memory wise).

andersy005 · 2022-11-02T18:22:02Z

In addition, I'd suggest increasing the dyno size too on heroku!

As a first pass, i'm going to enable log-runtime-metrics to track load and memory using four our current dyno: https://devcenter.heroku.com/articles/log-runtime-metrics.

yuvipanda · 2022-11-02T18:23:07Z

@andersy005 measuring seems right next step!

andersy005 · 2022-11-02T18:46:09Z

@yuvipanda, something is going on during the pangeo-forge-runner expand-meta .... call

Here's the memory profile after a reboot

2022-11-02T18:30:10.845831+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#memory_total=149.32MB sample#memory_rss=148.88MB sample#memory_cache=0.45MB sample#memory_swap=0.00MB sample#memory_pgpgin=69641pages sample#memory_pgpgout=31414pages sample#memory_quota=512.00MB

I then launch a test run for this recipe: pangeo-forge/staged-recipes#215

After calling pangeo-forge-runner expand-meta ..., i started noticing memory spikes

2022-11-02T18:32:02.363030+00:00 app[web.1]: 2022-11-02 18:32:02,362 DEBUG - orchestrator - Running command: ['pangeo-forge-runner', 'bake', '--repo=https://github.com/norlandrhagen/staged-recipes', '--ref=8308f82cbdede7d8039a72e4137e5d16c800eb89', '--json', '--prune', '--Bake.recipe_id=NWM', '-f=/tmp/tmp985ps8od.json', '--feedstock-subdir=recipes/NWM']
2022-11-02T18:32:14.054996+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#load_avg_1m=0.63
2022-11-02T18:32:14.188714+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#memory_total=329.25MB sample#memory_rss=326.84MB sample#memory_cache=2.41MB sample#memory_swap=0.00MB sample#memory_pgpgin=122482pages sample#memory_pgpgout=38195pages sample#memory_quota=512.00MB

notice how the memory increased from 149MB to 326MB. The memory eventually blew up, and heroku restarted the workers

2022-11-02T18:34:53.563144+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#memory_total=826.02MB sample#memory_rss=511.88MB sample#memory_cache=0.00MB sample#memory_swap=314.14MB sample#memory_pgpgin=255319pages sample#memory_pgpgout=124278pages sample#memory_quota=512.00MB
2022-11-02T18:34:53.720844+00:00 heroku[web.1]: Process running mem=826M(161.3%)
2022-11-02T18:34:53.926451+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
2022-11-02T18:34:54.931260+00:00 app[web.1]: [2022-11-02 18:34:54 +0000] [57] [CRITICAL] WORKER TIMEOUT (pid:58)
2022-11-02T18:34:54.964405+00:00 app[web.1]: [2022-11-02 18:34:54 +0000] [57] [WARNING] Worker with pid 58 was terminated due to signal 6
2022-11-02T18:34:55.311602+00:00 app[web.1]: [2022-11-02 18:34:55 +0000] [122] [INFO] Booting worker with pid: 122
2022-11-02T18:34:57.219544+00:00 app[web.1]: [2022-11-02 18:34:57 +0000] [122] [INFO] Started server process [122]
2022-11-02T18:34:57.219620+00:00 app[web.1]: [2022-11-02 18:34:57 +0000] [122] [INFO] Waiting for application startup.
2022-11-02T18:34:57.220136+00:00 app[web.1]: [2022-11-02 18:34:57 +0000] [122] [INFO] Application startup complete.

My suspicion is that the expansion of the meta-information of pangeo-forge runner is the cause of this spike. Not sure if the s3 crawling in pangeo-forge/staged-recipes#215 could also be another reason this recipe in particular is running into this memory issues.

andersy005 · 2022-11-03T22:27:26Z

Related to Recipe parsing overruns memory quotas #147

Reduce concurrency to 2 Gunicorn workers

05d11bd

cisaacstern temporarily deployed to pforge-pr-178 November 2, 2022 17:57 Inactive

andersy005 merged commit cb206da into main Nov 2, 2022

andersy005 deleted the reduce-concurrency branch November 2, 2022 18:22

andersy005 mentioned this pull request Nov 3, 2022

Revert back to three Gunicorn workers #180

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce concurrency to 2 Gunicorn workers #178

Reduce concurrency to 2 Gunicorn workers #178

andersy005 commented Nov 2, 2022 •

edited

Loading

yuvipanda commented Nov 2, 2022

andersy005 commented Nov 2, 2022

andersy005 commented Nov 2, 2022

andersy005 commented Nov 2, 2022

yuvipanda commented Nov 2, 2022

andersy005 commented Nov 2, 2022

andersy005 commented Nov 3, 2022

Reduce concurrency to 2 Gunicorn workers #178

Reduce concurrency to 2 Gunicorn workers #178

Conversation

andersy005 commented Nov 2, 2022 • edited Loading

yuvipanda commented Nov 2, 2022

andersy005 commented Nov 2, 2022

andersy005 commented Nov 2, 2022

andersy005 commented Nov 2, 2022

yuvipanda commented Nov 2, 2022

andersy005 commented Nov 2, 2022

andersy005 commented Nov 3, 2022

andersy005 commented Nov 2, 2022 •

edited

Loading