User-paid offload storage available at UPPMAX

2019-10-10

Since this spring, we have been offering the Lutra offload storage system suitable for "cold" data which is not accessed or changed frequently, but which still has to be kept available on our clusters.

Due to the popularity of this service, the UPPMAX board wants to offer current
PIs at UPPMAX the possibility to sign up for additional storage. This time,
it will also be possible to buy storage suitable for sensitive personal data. 

Read the full article for more details

Since this spring, we have been offering the Lutra offload storage system at
UPPMAX. This solution is intended for "cold" data which is not accessed or
changed frequently, but which still has to be kept available on our clusters.
This is in contrast to the active project storage provided by SNIC, which
is only intended for current analyses.

Due to the popularity of this service, the UPPMAX board wants to offer current
PIs at UPPMAX the possibility to sign up for additional storage. This time,
it will also be possible to buy storage suitable for sensitive personal data.

For data mounted at Rackham, the price will be 500 SEK/TB/year, and users
will have to commit in units of 50 TB for a period of 4 years. That is, the
minimum cost is 100 000 SEK for storing 50 TB for 4 years (paid in multiple
installments per year).

The Rackham cluster, and hence its offload storage, is unsuitable for
sensitive data of the sort processed on Bianca. The processing of sensitive
data is more time-consuming for our staff. It also requires different service
contracts with our vendors, and includes a more expensive encrypted and
physically secured solution for tape backup. Thus, the price will be
800 SEK/TB/year for the sensitive system, i.e. a minimum of 160 000 SEK for
50 TB over 4 years.

The intent is to order the hardware in late December with the systems online in
February. A signed agreement with approval from your head of department will
be needed before we order the hardware.

If you are interested, contact support@uppmax.uu.se, UPPMAX Technical
Coordinator carl.nettelblad@uppmax.uu.se, or UPPMAX Director
elisabeth.larsson@uppmax.uu.se

FREQUENTLY ASKED QUESTIONS
--------------------------
Q: How is this storage different from existing storage, e.g. Crex and Castor?
A: The normal UPPMAX storage systems are intended for active project data,
   i.e. the data which is needed during the course of a project. You have to
   justify your storage needs in your project applications and storage can be
   rationed when we run out. The storage itself is paid for through SNIC in
   that case. When we run out of space on these resources, we have to be more
   aggressive in urging users to limit their storage needs.

   This storage solution is provided by us, but paid for by its users. We will
   not question your needs to store data up to your quota. However, since it is
   not intended for active project data, the performance of the solution is
   tuned for large capacity, not a high amount of write operations. If you need
   that, you should still apply for project storage.

Q: What kind of data can I put there?
A: The kind of use cases we see are storing various large data sets from old
   projects. This can include the primary results from specific experiments.
   If you ever need to re-analyze the data, you'll have it readily available
   on our clusters. On the resource mounted on Rackham, you are not allowed to
   store sensitive data, with the same interpretation of that concept as is
   currently used for computation and storage projects allocations. Typical
   examples of sensitive data we encounter are personally identifiable data
   from population registries, health information systems, and biomolecular
   assays (including genomic data).

Q: I don't need 50 TB. Why don't you offer a smaller volume?
A: We have chosen this limit to keep both the technical and financial
   administration cost-efficient. Even at this price point, a substantial part
   of our costs are staff costs for maintaining the solution and providing user
   support.

Q: What will the availability be like? My data is super-critical.
A: We will maintain the same level of availability we do for other UPPMAX
   resources, that is a best effort intent to maintain continuous operations,
   with monthly service windows. An outage outside of office hours will in
   general not start to be addressed until the next working day. If you need to
   ensure immediate access to the data under all circumstances, we recommend
   that you choose another solution.

Q: What happens if the hardware breaks down?
A: We will have redundancy within the solution, so failure of individual disks
   will not affect user data. In addition, data will be backed up on tape at an
   off-site location.

Q: My budget is already set, I can't pay for this now, but I want to join. What
   do I do?
A: This is our second call of this sort. For now, we plan to continue making
   such calls once a year. Come back in the end of 2020.

Q: How does this relate to other storage offerings and future rules and
   solutions for long-term research data storage?
A: We are currently trying to serve a very concrete need for users that have
   data that cannot easily be considered active project data, but where the
   natural place to access the data, if it is ever needed again, would be
   UPPMAX. In those cases, we think it is better to provide a common solution,
   rather than individual groups buying and maintaining smaller storage systems.
   In addition, our solution will be directly connected to our core network.
   Even though it's a high-capacity solution, rather than a high-performance
   solution, it will give higher bandwidth to our clusters than any solution
   placed outside of our computer room.

   The technical and organizational frameworks for true long-term storage of
   research data will hopefully be clarified in the coming years, but we
   believe there will still be some need for keeping data close to
   computational resources, but outside of truly active project storage.
   This should not be considered a replacement for permanent archival and
   metadata tagging of data.

Q: When will this be available?
A: We intend to get agreements signed with all users and order the hardware
   during December. The solution should be fully online during February, if
   there are no significant delays from our vendors. Since this is the second
   time we are doing this, the process is expected to be smooth.

Q: Can we be sure that the storage is going to be installed?
A: We assess the total interest for the two classes of data (sensitive and
   non-sensitive) separately, since the hardware and software setup is
   different. We will need to collect allocations in the range of 1 PB
   in order to be able to go ahead with each system. This ensures that we
   achieve a sufficient economy of scale to be able to offer our pricing model.
   Judging from previous interest, we believe it is likely that we will
   reach that point.

Q: What happens after four years?
A: The technology and organization landscape is always fluid, but after four
   years the hardware will have reached its lifetime. At that point, you will
   be free to retrieve the data over e.g. ssh. If there is no obvious
   replacement solution provided by another entity, it is likely that we will
   offer another 4-year contract at a similar price point.

For updates and information on current system status, please see the System Status page.

Nyheter