» Data storage is a central part of your project and loss of data might endanger it.
» The chosen data storage solution has to comply with legal requirements and local policies.
» Optimised data storage and organisation can speed up your project and save costs.
About this chapter
This chapter collects information about your planned data storage and backup throughout the research project, including information security aspects. Consider all digital data that you are expecting to handle in the project, including working documents.
If specific storage solutions are chosen as the project proceeds with data processing and analysis or preservation, this may be specified in the respective chapters of the DMP.
Question-specific guidance
Which data storage guidelines will you be following?
Institutions commonly have a storage guide that explains available infrastructure and which data can be stored where.
Institutional storage guides:
- NTNU Data storage guide
- UiB Storage guide
- UiO Data storage guide
- UiT Research data portal - Processing and storage
Alternatively, you can add storage guidelines manually. In collaboration projects there might be multiple guidelines of relevance for your project.
What is the expected total volume of data in the project?
You are advised to roughly estimate the total data volume as early as possible in order to budget for connected costs. You should also take into account that you might produce intermediate data that also requires space, even if some of these data may be discarded in the end of the project. It may be worth to consider if all data needs to be accessible all the time (‘hot’ storage) or whether using ‘cold’ storage should be considered for parts of the data. Deciding early which data needs to be preserved long-term and timely archiving is also advisable to save storage resources.
For some domains there might exist guides to estimate the data volume generated by instruments.
- Image file size and video storage calculators
- Life Science data, genomics: Illumina - Approximate sizes of sequencing run output folders
- Life Science data, proteomics: Numerical Compression Schemes for Proteomics Mass Spectrometry Data)
For data volumes >10TB
If you will be using Sigma2 computational resources, make yourself familiar with the user contribution model and application deadlines.
In case you are planning computationally demanding data analysis, see also the compute power planning in the chapter Processing and analysing data.
Will storage need change over time?
Specific to data/compute heavy projects
Taking changes in storage needs over time into consideration will help you with budgeting and with reducing expenses for storage space. Often, the needed storage quota for a project will increase with project length. In some projects storage needs decreases during the project or becomes largest in the middle of the project, as there is a need to temporarily store intermediate results, which do not have to be stored long-term.
Use the ‘Explore storage needs’ dialogue if you are unsure what applies to your project.
Where will you store data and documents?
It is important to keep track of the storage solutions used in the project. Usually, the IT department at your institution will provide information about the institutionally managed storage solutions that are available.
Consider all digital data that you are expecting to handle in the course of the project, including working documents such as manuscripts or electronic lab notebooks (ELN). This is also a good opportunity to reflect on the data backup solution, its possible associated costs and whether the information security level and certification of the solution match with requirements that apply to your stored data. You should also consider how data will be organised in the chosen storage solution.
How easily the members of your project may access the data and how access is controlled can be crucial. The overall capacity and transfer speed of a storage solution can also be critical parameters in case you have larger amounts of data.
Where will you store data and documents?
Usually, your storage solution(s) will be selected from the solutions recommended in your institution’s storage guide. If applicable, provide a link to the description of the storage solution for later reference.
Consult institutional storage guides and information about institutionally managed storage provided by the respective IT departments:
- NTNU Data storage guide
- UiB Storage guide
- UiO Data storage guide
- UiT Research data portal - Processing and storage
Storage solution examples (non-exhaustive):
- Local hardware
- Institutionally managed cloud storage
- E.g. OneDrive
- Institutionally managed data servers or network drives
- Institutionally managed ‘cold’ storage (e.g. tape storage)
- Temporarily storing data in ‘cold storage’ with longer access times can be a recommended solution for your project if you produce very large data volumes or if the project collects data over a long time period, but where the data will be analysed later. An issue to consider here is whether data in such a storage are quick and easy to access, or if access needs to be planned in advance.
- Trusted research environments (TREs)
- TSD provided by UiO
- HUNT Cloud provided by NTNU
- SAFE provided by UiB
- National storage infrastructures
- E.g. Sigma2 services such as NIRD
- Other cloud storage
- E.g. Google Drive, Dropbox
- Online version control services
- E.g. GitHub, GitLab
- Online document services
- E.g. Overleaf, Googledocs
- Online platforms offering software-as-a-service
Information security level(s) your storage solution will be used for
While the general information security levels used at Norwegian higher educational institutions are similar to each other (open/ green, limited/ yellow, confidential/ red, strictly confidential/ black), the actual classification of which data falls into which category and which solution is allowed for which security level might vary between institutions and from case to case. Be aware that dual use research and restricted information may need to follow separate provisions.
Please consult the institutional storage guides for details:
- NTNU Data storage guide
- UiB Storage guide
- UiO Data storage guide
- UiT Research data portal - Processing and storage
If the chosen storage solution has access restrictions, describe who will have access and how access rights will be controlled.
Storage solution structure
Indicate which data structure best describes the respective storage solution.
- A file system with files and folders
- Remember to specify file naming and folder structure conventions in the respective question below.
- An “object store” or a “document store” system
- Some “file” storage systems do not have a tree structure like we know in a file system, but rather have direct pointers to any file in the system. Such systems are called “object stores” or “document stores”. Examples: Amazon S3, CEPH, MongoDB
- A database system
- Database systems can be relational or non-relational. Examples: MySQL, NoSQL, Oracle Database
- Application-specific data storage
- Some applications may have their own data structure that can only be accessed through the application. Examples: some Electronic Lab Notebook (ELN) or Electronic Data Capture (EDC) applications
Backups
If you use institutional solutions, the institution commonly provides backup and functionality for snapshots, that allow you to restore data. Information should be provided by your IT department. It is good to be aware of the backup routines and frequency, and possible limitations of these before the need to recover data appears in your project.
If you use other solutions you should check the backup routines of theses solutions, or you may have to implement your own routines.
Relying on manual backup is not recommended but may be unavoidable in certain situations, for example during field work. If you are not relying on managed storage, an often suggested backup rule is 3-2-1.
Here the explanation from NFDI4Chem
The 3-2-1 Backup Rule in Detail
- Three Copies of the Data: In research, this means that in addition to the original data, create at least two additional copies. This redundancy ensures that even in the event of hardware failure or data corruption, backup versions are always available for access.
- Two Different Storage Media: It is advisable to back up the data on two different storage media. This could be, for example, a local server and an external hard drive. Using different media minimizes the risk that both copies could be lost simultaneously due to the same event (such as hardware failure). Do not rely on hard drives only!
- One Copy at an External Location: Store this third copy at a different physical location to protect it from local disasters such as fires or floods. While cloud storage solutions are often chosen, we recommend central storage, for example, at central IT or library services of the university/institution, as they take care of the server housing, hosting and backups.
Example of a Backup Plan - A research laboratory could secure its data as follows:
- Primary Copy: Store the original data on the laboratory server with regularly maintaining and backup procedures.
- Secondary Copy: Store a second copy on an external hard drive or a NAS (Network Attached Storage) located in another room of the building.
- External Copy: Store a third copy in a cloud solution or a central storage.
Advantages of the 3-2-1 Backup Rule
- Security: Data is protected from loss through physical separation and redundancy.
- Access: Researchers can access their data even if one backup medium fails.
- Traceability: Regular backups and the ability to access older versions of the data enable quick detection and correction of errors and data corruption.
What the 3-2-1 Rule Does Not Include
The 3-2-1 rule does not include a backup routine. However, this is just as important because the best backup is useless if it is outdated. Automations (e.g., a cron job) on your servers are best suited for this, copying your data to the secondary storage every night. Or use central services, they already have backup plans.
Is the storage solution associated with costs?
In case your project requires storage beyond a basic level supplied by your institution, there might be associated costs. It is often best practise to familiarize yourself with storage costs at the outset, so that they can be factored into your budget. In many cases the costs will vary depending on the needs and nature of your project and perhaps by requirements from your funding source. If you are using commercial cloud storage, also consider that traffic in and out of the system might be charged.
At the end of the project, what will happen with data in this storage solution?
It is important to consider what will happen with data beyond the project period. This is particularly important when working with large data volumes. If applicable, indicate the termination date for the storage solution.
If applicable: How will access to the storage solution be controlled?
If the chosen storage solution has access restrictions, describe who will have access and how access rights will be controlled.
How will you name files and organise folders?
Consistent file naming and folder structures are important elements in organising data in your research project. It is critical to be consistent and to make all of your project partners following the same convention.
It is advisable to document the chosen file naming and folder structure strategy or principles. This will make it easier to understand the data at a later timepoint or for others. This can e.g. be described in README-files.
Different strategies for file naming and folder structure exist. Please consult the chapters in e.g. the RDMkit for life sciences, the CESSDA Data Management Guide, or The Turing Way handbook for inspiration.
Do you need to work together on documents? (e.g. manuscripts)
If applicable, indicate how you will approach the need to collaborate on editing documents by several contributors.
Exchanging of files can pose version control and convergence challenges. Be aware that information security considerations also apply for sending files by email and e.g. Sikt FileSender. Consult your institutional storage guidance.
If shared documents are used, will the chosen solution allow contributors to edit the document simultaneously? How file access is handled needs to be clarified. Being aware of version control or snapshot frequency and recovery options if a mishap should happen, is recommended. Be aware that information security considerations also apply for using cloud storage solutions. Make sure to include this information under the question ‘Where will you store data and documents’.
IT security: If data is leaked, what will be the consequences?
You may want to carry out different risk considerations depending on the nature of your project. Some practises, like carrying around physical storage media, can be associated with increased risks or violate regulation and laws. Thus project members should be informed about the risks, which should be mitigated where possible.