Skip to content Skip to footer

Storing and protecting data during the project

About this chapter

This chapters collects information about your planned data storage and backup throughout the research project. Consider all digital data that you are expecting to handle in the project, including working documents.

If specific storage solutions are chosen as the project proceeds with data processing and analysis or preservation, this may be specified in the respective chapters of this plan.

Question-specific guidance

Storage guidelines

Institutions commonly have a storage guide that explains available infrastructure and which data can be stored where.

Institutional storage guides:

In collaboration projects there might be multiple guidelines of relevance for your project.

Expected total volume of data in the project

You are advised to estimate the total data volume as early as possible in order to budget for connected costs. You should also take into account that you might produce intermediate data that also requires space, even if some of these data may be discarded in the end of the project.

For some domains there might exist guides to estimate the data volume generated by instruments. (e.g. for Life Science data: Illumina Approximate sizes of sequencing run output folders, Publication: Numerical Compression Schemes for Proteomics Mass Spectrometry Data)

Will storage need change over time?

Often, the storage need of a project will increase with project length. Taking this into consideration will help you with budgeting and with reducing expenses for storage space. Also, in some cases there might be a need for temporary storage of intermediate results, which do not have to be stored for longer time-frames.

Temporarily storing data sets (e.g. to tape)

Temporarily storing can be a recommended solution for your project if you produce very large data volumes or if the project is collecting data over a long time period, but where the data will be analysed later. An issue to consider here is whether data in such a storage are quick and easy to access, or if access needs to be planned in advance.

Storage solutions

It is important to keep track of the storage solutions used in the project. Usually, your storage solution(s) will be selected from the solutions recommended in your institution’s storage guide. This is also a good opportunity to reflect on the data backup solution, its potentially associated costs and whether the information security level and certification of the solution match with requirements thart apply to your stored data. How easily the members of your project may access the data and how access is controlled can be crucial. The overall capacity and transfer speed of a storage solution can also be critical parameters in case you have larger amounts of data. You should also consider what best serves your needs and if a more advanced way to organise your data, such as object storage or databases might be of help for your project.

Information security level(s) your storage solution will be used for

While the general information security levels used at Norwegian higher educational institutions are similar to each other (open/ green, limited/ yellow, confidential/ red, strictly confidential/ black), the actual classification of which data falls into which category and which solution is allowed for which security level might vary between institutions and rom case to case. Be aware that dual use research and restricted information may need to follow separate provisions.

Please consult the institutional storage guides for details:

Is the storage solution associated with costs?

In case your project requires storage beyond a basic level supplied by your institution, there might be associated costs. Often, it is good practise to familiarize yourself with storage costs upfront, so that they can be factored into your budget. In many cases the costs will vary depending on the needs and nature of your project and potentially by requirements from your funding source. If you are using commercial cloud storage, also consider that traffic in and out of the system might be charged.

Backups

If you are using institutional solutions, the institution commonly provides backup and snapshotting functionality, that allow you to restore data. Applicable information should be provided by your IT department. It is good to be aware of the backup routines and frequency, and possible limitations of these before the need to recover data appears in your project.

If you are using other solutions you should check the backup routines of theses solutions, or you may have to implement your own routines. If not relying on managed storage, an often suggested backup rule is 3-2-1.

Here the explanation from NFDI4Chem License: CC BY-SA 4.0

The 3-2-1 Backup Rule in Detail

  • Three Copies of the Data: In research, this means that in addition to the original data, create at least two additional copies. This redundancy ensures that even in the event of hardware failure or data corruption, backup versions are always available for access.
  • Two Different Storage Media: It is advisable to back up the data on two different storage media. This could be, for example, a local server and an external hard drive. Using different media minimizes the risk that both copies could be lost simultaneously due to the same event (such as hardware failure). Do not rely on hard drives only!
  • One Copy at an External Location: Store this third copy at a different physical location to protect it from local disasters such as fires or floods. While cloud storage solutions are often chosen, we recommend central storage, for example, at central IT or library services of the university/institution. Because they take care of the server housing and hosting, and the backups.

Example of a Backup Plan

A research laboratory could secure its data as follows:

  • Primary Copy: Store the original data on the laboratory server with regularly maintaining and backup procedures.
  • Secondary Copy: Store a second copy on an external hard drive or a NAS (Network Attached Storage) located in another room of the building.
  • External Copy: Store a third copy in a cloud solution or a central storage.

Advantages of the 3-2-1 Backup Rule

  • Security: Data is protected from loss through physical separation and redundancy.
  • Access: Researchers can access their data even if one backup medium fails.
  • Traceability: Regular backups and the ability to access older versions of the data enable quick detection and correction of errors and data corruption.

What the 3-2-1 Rule Does Not Include

The 3-2-1 rule does not include a backup routine. However, this is just as important because the best backup is useless if it is out of date. Automations (e.g., a cron job) on your servers are best suited for this, copying your data to the secondary storage every night. Or use central services, they already have backup plans.

File naming and organisation

Consistent file naming and folder structures are an important part of organising data in your research project. It is critical to be consistent and to make all of your project partners following the same convention.

There are different strategies for file naming and folder structure. Please consult the chapters in e.g. the RDMkit for life sciences, the CESSDA Data Management Guide, or The Turing Way handbook for inspiration.

Risk evaluation

There are different risk considerations for your project you might want to carry out depending on the nature of your project. Some practises like carrying physical storage media can be associated with increased risks or violate regulation and laws. Thus project members should be informed about the risks and the risks should be mitigated where possible.

Further resources

Contributors