Doc 24 many tasks per gpu #228

notoraptor · 2023-10-20T13:41:11Z

Salut @lebrice ! Voici du code de base pour DOC-24 (Exemple: utiliser un gros GPU de manière efficace en lancant plusieurs jobs par GPU, avec des "task" SLURM).

Je ne suis pas sûr que le code soit bon: quand je le tests, j'ai l'impression que les tasks sont toujours roulées de façon séquentielle. Je ne comprends pas encore pourquoi.

…icace en lancant plusieurs jobs par GPU, avec des "task" SLURM)

docs/examples/good_practices/many_tasks_per_gpu/job.sh

lebrice

Merci!
@obilaniu si jamais tu as un meilleur wording à proposer pour index.rst ça pourrait être utile. Sinon no worries.

obilaniu · 2023-10-20T21:37:26Z

Êtes-vous au courant de

       --ntasks-per-gpu=<ntasks>
              Request  that  there  are ntasks tasks invoked for every GPU.  This
              option can work in two ways: 1) either specify  --ntasks  in  addi‐
              tion, in which case a type-less GPU specification will be automati‐
              cally determined to satisfy --ntasks-per-gpu,  or  2)  specify  the
              GPUs   wanted  (e.g.  via  --gpus  or  --gres)  without  specifying
              --ntasks, and the total task count  will  be  automatically  deter‐
              mined.   The  number of CPUs needed will be automatically increased
              if necessary to allow for any calculated task count.   This  option
              will  implicitly  set  --gpu-bind=single:<ntasks>,  but that can be
              overridden with an explicit --gpu-bind specification.  This  option
              is  not  compatible with a node range (i.e. -N<minnodes-maxnodes>).
              This   option   is    not    compatible    with    --gpus-per-task,
              --gpus-per-socket,  or  --ntasks-per-node.  This option is not sup‐
              ported unless SelectType=cons_tres is configured  (either  directly
              or indirectly on Cray systems).

?

notoraptor · 2023-10-23T18:43:58Z

Salut, @obilaniu ! En effet, je ne connaissais pas --ntasks-per-gpu. Je l'ai testé, et ça fonctionne ! Cf dernier commit. Rendu ici: https://mila-docs--228.org.readthedocs.build/en/228/examples/good_practices/many_tasks_per_gpu/index.html

Je note par contre que le combo --gpus=1 --ntasks=1 semble équivalent au combo --gpus=1 --ntasks-per-gpu=2, en tout cas dans cet exemple!

obilaniu · 2023-10-26T04:43:31Z

docs/examples/good_practices/many_tasks_per_gpu/README.rst

+ -#SBATCH --gpus-per-task=rtx8000:1
+ -#SBATCH --cpus-per-task=4
+ -#SBATCH --ntasks-per-node=1
+ +#SBATCH --gpus=1
+ +#SBATCH --ntasks-per-gpu=2


Pourquoi pas garder --gres=gpu:rtx8000:1 --ntasks-per-gpu=2 --cpus-per-task=4? Ils servent tous à quelque chose.

J'ai une légère aversion à --gpus/-G parce que ce flag-là indique le nombre de GPUs total dans la job, et ne contraint pas SLURM à les allouer tous sur un noeud, ou même également sur plusieurs noeuds.

lebrice

Un petit détail: C'est important que dans cet exemple les gens demandent un type de GPU spécifique à sbatch, parce qu'on veut qu'ils aient une idée de combien de VRAM une "task" a besoin, et donc on veut qu'ils sachent combien de tasks ils pourraient fitter confortablement sur le type de GPU choisi.

BTW, pas rapport, mais je vais faire un autre PR bientôt qui change un peu la narrative des exemples, pour rendre ça plus comme un "walkthrough", où les usagers commencent par apprendre à monitor leur utilisation des ressources et identifier les bottlenecks, ensuite comment utiliser efficacement le GPU (cet exemple-ci), et ensuite lancer plusieurs jobs avec un job array.
Ceci serait une des "trames narratives" présentes dans les exemples.

docs/examples/good_practices/many_tasks_per_gpu/README.rst

docs/examples/good_practices/many_tasks_per_gpu/job.sh

docs/examples/good_practices/many_tasks_per_gpu/README.rst

docs/examples/good_practices/many_tasks_per_gpu/main.py

notoraptor · 2023-10-31T14:09:29Z

@obilaniu @lebrice Mis à jour !

Signed-off-by: Fabrice Normandin <[email protected]>

notoraptor added 4 commits October 19, 2023 15:12

Prepare code for DOC-24 (Exemple: utiliser un gros GPU de manière eff…

47d9e84

…icace en lancant plusieurs jobs par GPU, avec des "task" SLURM)

Insert diffs into index.rst

34f1b4b

Use ntasks

605efcd

add comments

0284aad

notoraptor requested review from btravouillon and satyaog as code owners October 20, 2023 13:41

lebrice requested changes Oct 20, 2023

View reviewed changes

docs/examples/good_practices/many_tasks_per_gpu/job.sh Outdated Show resolved Hide resolved

notoraptor added 3 commits October 20, 2023 13:46

Call srun once and parameterize main.py using environment variable

6bd4e06

Add context to doc.

61d63fe

Initialize RNGs using SLURM_PROCID as random seed.

7e47732

lebrice approved these changes Oct 20, 2023

View reviewed changes

use --ntasks-per-gpu instead of --ntasks

2bee9fb

obilaniu reviewed Oct 26, 2023

View reviewed changes

lebrice requested changes Oct 27, 2023

View reviewed changes

Still use --cpus-per-task, and --gres instead of --gpus

dfbfd60

obilaniu approved these changes Oct 31, 2023

View reviewed changes

lebrice changed the title ~~(WIP) Doc 24 many tasks per gpu~~ Doc 24 many tasks per gpu Nov 6, 2023

lebrice added 4 commits November 6, 2023 13:53

Merge branch 'master' into doc-24-many-tasks-per-GPU

dd64855

Add argparse block to example + regen diffs

210e483

Signed-off-by: Fabrice Normandin <[email protected]>

Merge branch 'master' into doc-24-many-tasks-per-GPU

668a086

Fix some unwanted differences in other examples

970cb63

Signed-off-by: Fabrice Normandin <[email protected]>

lebrice self-requested a review November 6, 2023 22:13

lebrice approved these changes Nov 6, 2023

View reviewed changes

lebrice merged commit 6e9f6eb into mila-iqia:master Nov 6, 2023
4 checks passed

notoraptor deleted the doc-24-many-tasks-per-GPU branch November 7, 2023 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc 24 many tasks per gpu #228

Doc 24 many tasks per gpu #228

notoraptor commented Oct 20, 2023

lebrice left a comment

obilaniu commented Oct 20, 2023

notoraptor commented Oct 23, 2023

obilaniu Oct 26, 2023

lebrice left a comment •

edited

Loading

notoraptor commented Oct 31, 2023

Doc 24 many tasks per gpu #228

Doc 24 many tasks per gpu #228

Conversation

notoraptor commented Oct 20, 2023

lebrice left a comment

Choose a reason for hiding this comment

obilaniu commented Oct 20, 2023

notoraptor commented Oct 23, 2023

obilaniu Oct 26, 2023

Choose a reason for hiding this comment

lebrice left a comment • edited Loading

Choose a reason for hiding this comment

notoraptor commented Oct 31, 2023

lebrice left a comment •

edited

Loading