Run experiment on Multiple GPU in Kaya
Provided by Kai Niu and Supported by Chris
Multiple GPU Usage - UWA KAYA:
(/group/pmc015/kniu/kai_phd/conda_env/champ) bash-4.4$ salloc -p pophealth --mem=80G -N 2 -n 8 --gres=gpu:a100:2
salloc: Job allocation 550543 has been revoked.
salloc: error: Job submit/allocate failed: Requested node configuration is not available
(/group/pmc015/kniu/kai_phd/conda_env/champ) bash-4.4$ hostname
n006.hpc.uwa.edu.au(/group/pmc015/kniu/kai_phd/conda_env/champ) bash-4.4$ exit
srun: error: n006: task 0: Exited with exit code 130
salloc: Relinquishing job allocation 550408
salloc: Job allocation 550408 has been revoked.
(/group/pmc015/kniu/kai_phd/conda_env/champ) bash-4.4$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
work up 3-00:00:00 2 down* n[023,027]
work up 3-00:00:00 3 drain n[026,029,032]
work up 3-00:00:00 5 mix n[010,015-016,024,028]
work up 3-00:00:00 12 idle n[011-013,017-019,022,025,030-031,033-034]
long up 7-00:00:00 1 mix n021
long up 7-00:00:00 1 alloc n020
gpu up 3-00:00:00 13 mix n[001,003-005,037-044,046]
pophealth up 15-00:00:0 2 idle n[002,006]
ondemand up 12:00:00 1 down* n027
ondemand up 12:00:00 1 drain n026
ondemand up 12:00:00 2 mix n[024,028]
ondemand up 12:00:00 1 idle n025
ondemand-gpu up 12:00:00 8 mix n[036-043]Last updated