6 Archive, Preserve, and Curate

6.1 INGEST DATA & METADATA

In some ways, the activity of archiving data, whether or not to a formal archive, is central to a longitudinal study (see Figures 1 and 6). Throughout the course of the whole study, data and associated metadata will need to be preserved, if only for use in later phases of the study.

As publications are generated, it will be good practice to be able to reproduce the exact set of data that were used in analyses for the publications. If the data are to be preserved at an archive (organization), something like an OAIS Submission Information Package SIP (http://public.ccsds.org/publications/archive/650x0b1.pdf) will need to be produced for submission. These activities may involve converting data and metadata from some in-house format to a more generally accessible format for the long term.

The Producer-Archive Interface – Methodology Abstract Standard (PAIMAS) (ISO 20652: 2006) does seem relevant here with four phases (page 11 of Principles and Good Practice for Preserving Data): “These phases make explicit the steps an organization must take when archiving data.”

  • 6.11 Preliminary – Define the information to be archived
  • 6.12 Formal Definition – Develop agreement
  • 6.13 Transfer – Actual transfer of the objects
  • 6.14 Validation – Validate the transferred objects

6.2 ENHANCE METADATA

Metadata from the data collection and analysis phases of the study may be enhanced in multiple ways on an ongoing basis. They may be integrated into some widely searchable system (see, for example, the Wikipedia article “Linked data”). Metadata may also accumulate as data are cited and reused. Having links from the data to reuses and from those reuses to the data will enhance the value of the data. An important issue for enhancing is the connection between scholarly publications and data. These connections may be made by establishing persistent identifiers (e.g., like DOIs) for datasets that can be published later (see 7.5). Another enhancement may be to translate metadata and documentation into other languages to make it understandable for other communities.

6.3 PRESERVE DATA & METADATA

Data preservation requires ongoing activity. Storage media decay or become obsolete, new formats become necessary, metadata accrue with ongoing access and use, and desirable access methods evolve. Most projects have funding for a limited period. In many cases the need for preservation will outlive the original funding. Arrangements for ongoing preservation may need to be made with local institutional repositories or more global archives. These arrangements, made toward the beginning of a project may generate requirements for ingestion activities described in step 6.1 above. Access control policies may also need to be established.

6.4 UNDERTAKE ONGOING CURATION

Once the data are in an archive additional curation activities may generate metadata which can be recorded as DDI LifeCycleEvents. Data may be migrated to new formats, or replicated to multiple sites. Legal contracts for access to confidential data may be drawn. Assessment of disclosure risk will be an ongoing activity for the life of the data, as other external data and procedures with the potential for allowing disclosure emerge.