User-paid GPU pool available

2019-12-11

The UPPMAX board has decided to offer users an opportunity to
co-finance a pool of Nvidia Tesla T4 cards with 16 GB RAM each, at a price of
roughly 20 000 SEK/card. Read on if your research group may be interested in buying access to GPU capability at UPPMAX.

These cards are slower than the fastest gaming cards,
which we are not allowed to use in our datacenter due to licensing
restrictions. However, we believe that it will be beneficial to compile a pool
of moderately powered GPUs, rather than each interested group investing in
their own resource. You will not need to spend time administering the system,
and you will get a priority boost for monthly GPU hours equivalent to your
investment. That is, if we reach a pool of 12 cards, a researcher who has
paid for one of those can burst and run on all 12 with high priority
(assuming no other high-priority jobs are present in the queue), even if the
researcher's contribution to the pool only consisted of a single card. This
will speed up exploratory, iterative workflows.

When no high priority job is requiring access to the GPUs, all UPPMAX users
will also be allowed to use them, giving any user the ability to try out e.g.
new AI algorithms on a limited scale.

Primarily, we want to place these cards in existing Snowy nodes, but we can
also consider placing them within the UPPMAX region of the SNIC Science Cloud.
The HPC scheduler Slurm allows more efficient sharing of GPU resources, which
is the main reason for this choice.

We commit to maintain availability for the GPUs for their full warranty period
of 3 years.

If you are interested, contact support@uppmax.uu.se, UPPMAX Technical
Coordinator carl.nettelblad@uppmax.uu.se, or UPPMAX Director
elisabeth.larsson@uppmax.uu.se

FREQUENTLY ASKED QUESTIONS
--------------------------
Q: Why do I have to pay for GPUs?
A: SNIC already offers GPU resources on e.g. Kebnekaise and Tegner. There is
   also the private Uppsala GPU resource Davinci. However, we get inquiries on
   GPU availability on our general purpose resources, and hence we have decided
   to offer this opportunity for co-funding.

Q: I heard that there will be a Swedish national AI resource with GPUs?
A: That resource is still being designed. An early phase might get installed
   during the first half of 2020. It willl be a SNIC resource for research and
   the allocation policies might get competitive. If you co-fund one or several
   GPUs in our clusters, you are guaranteed access. We might have a few cards
   installed even before the end of 2019.

Q: Can I buy GPUs for teaching?
A: Yes! If you have educational funding that you are allowed to spend, we have
   been allowed to offer you to buy GPU capacity in the same way. If you have
   specific scheduled labs, we can make sure to "spend" your high-priority time
   to match the course needs with appropriate reservations.

Q: What about sensitive data (Bianca)?
A: The current Bianca nodes do not accept this type of GPU straight away. If
   you are interested in co-funding a Bianca GPU capability, get in touch with
   us nevertheless, since we are actively exploring such opportunities as well.  
   An active decision has been taken that the national AI resource will not be
   designed to handle sensitive personal data. SNIC might fund a GPU capability
   for sensitive data in the future, but not in the near-term.

Q: How many GPUs will be placed in each node?
A: A single one, due to the design of the Snowy nodes. We know that several
   users want powerful multi-GPU nodes, but the cost for one powerful such
   node is more like 800 000 SEK than 20 000 SEK. If there would be enough
   interest, we would be open to co-funding of such nodes as well.

Q: Gaming GPUs are so much cheaper. Why don't you buy those?
A: The Nvidia license prohibits "datacenter use" for their GeForce and Titan
   GPUs. In addition, a single RTX 2080 Ti card puts out 250 W of heat. A
   single T4 card, based on the same silicon, but different tuning, puts out
   a maximum of 70 W, with far more FLOPS/W (but lower net performance). By
   pooling resources in the datacenter, you gain access to the overall UPPMAX
   infrastructure, support, and the idle GPU time of other members in the pool.

For updates and information on current system status, please see the System Status page.

UPPMAX News