General GPU Tips
Background
Having access to these amazing machines is truly a privelage. Please consider the tips and best practices listed here to make sure you are not hogging all the computing time to yourself.
Understanding AI2ES Nodes
Current Setup
As of 18 Dec 2022, AI2ES has a total of 20 GPUs that we own
and can use. More specifically, we have 2 NVIDIA V100s
(previous generation) and 18 NVIDIA A100s (current generation). Inside some of these nodes, “NVlink” connects the GPUs together
at high bandwidth (i.e., can share information quickly) [the nodes with more than one GPU]. Each node has a local disk to do some fast computations on. This local disk is called $lscratch and varies per node. Also, the
amount of $lscratch you get scales with how many CPU threads you requested $SBATCH -n X; where X is the number of threads .
Here is a nice summary table of the nodes and their attributess
node |
$lscratch |
# of CPU threads |
# of GPUs |
GPU Type |
GPU RAM per card |
|---|---|---|---|---|---|
c314 |
892GB |
24 |
1 |
V100 |
32GB |
c315 |
892GB |
24 |
1 |
V100 |
32GB |
c731 |
852GB |
112 |
2 |
A100 |
40GB |
c732 |
384GB |
128 |
4 |
A100 |
40GB |
c733 |
384GB |
128 |
4 |
A100 |
40GB |
c829 |
384GB |
128 |
4 |
A100 |
40GB |
c830 |
384GB |
128 |
4 |
A100 |
40GB |
c980 |
384GB |
128 |
2 |
H100 |
80GB |
c981 |
384GB |
128 |
2 |
H100 |
80GB |
New Schooner Resources
In addition to the new AI2ES resources, AI2ES project members will also have
access to OSCER-owned GPUs as well. At this time I don’t have specifics, but know
there is are partitions named debug_gpu, gpu, gpu_a100. Please reach out to oscer help
to figure out the specifics on these nodes.
Note
Know that the OSCER-owned GPUs will be for general use for all (e.g., the > 1,300 Schooner users). So queue times will likely be much longer than our ai2es nodes.
Requesting AI2ES Nodes
In order to request ANY of the nodes available, add the following line of code to your sbatch file
#SBATCH -p ai2es
If you want to specifcally get the V100 add the following
#SBATCH -p ai2es_v100
Alternatively, if you want to specifcally get the A100 add the following
#SBATCH -p ai2es_a100
If you want to specifcally get the A100 with 2 GPUS do the following
#SBATCH -p ai2es_a100_2
Or if you want to specifcally get the A100 with 4 GPUS do the following
#SBATCH -p ai2es_a100_4