Approaches to digital records preservation
Bitstream Preservation
Bitstream preservation can be used as a foundation for other preservation strategies but is not adequate on its own for ensuring long term accessibility and authenticity. It involves simply storing the binary code (1s and 0s) that comprises a digital object bearing in mind that the object will not be reproducible without the original combination of hardware and software that created it. The advantages of carrying out bitstream preservation include:
| Advantages | Disadvantages |
|---|---|
| Having the opportunity to go back to the 'original' record in this form to carry out different preservation techniques in the future. | Is not suitable as a preservation strategy on its own. |
Encapsulation
In the encapsulation approach, records are packaged as bitstream with metadata enabling a user in the future to display them. The leading example of this approach is the Victorian Electronic Records Strategy (VERS), the digital preservation program of ADRI member the Public Record Office Victoria.
In the VERS approach, record content is accepted in formats including Text files, PDF, PDF-A, JPEG, TIFF and MPEG, encapsulated using an XML 'wrapper' containing a standard set of metadata elements and authenticated using a digital signature. Each record that is 'encapsulated' can contain multiple documents that together form a record.
It is a similar approach to emulation, without the need to include specifications to exactly rebuild the original hardware and software to 'play' the record. Rather, the metadata provides a hardware and software independent method for understanding the record over time. In this sense it is similar to other approaches such as the National Archives of Australia's XENA 'normalisation' technique (discussed below); the difference being the manner in which the metadata is captured, linked to the record and stored.
| Advantages | Disadvantages |
|---|---|
| Content and contextual information kept together to minimise risk of loss. | Can be 'records-centric' - not as effective for recording contextual information about people, organisations and functions. |
Emulation
| Advantages | Disadvantages |
|---|---|
| Has the potential to be more effective for preservation of databases and multimedia. | Still relatively untested in digital records preservation. |
Migration
Format migration
Using archival data formats is an approach which is usually implemented in conjunction with other approaches such as encapsulation or migration.
A common format used in preserving digital information is XML (eXtensible Markup Language). XML provides a standard syntax for identifying parts of a document known as elements, and then a standard way (known as a schema) for describing the rules for how those elements can be linked together in a document. It is a widely accepted and fully documented way of structuring documents that is supported by many different open source software applications.[1]
A standard that is increasingly being adopted by governments and others to ensure interoperability, ease of access and longevity of digital information is ODF or OpenDocument Format. ODF is "…an open, XML based document file format for office applications that create and edit documents containing text, spreadsheets, charts and graphical elements."[2] ODF is designed to be vendor and implementation neutral, making it possible for people to access, use and share documents regardless of applications or operating systems they are using, so they are not bound by the license they may or may not hold, or the hardware they use. ODF can be used with open source applications such as OpenOffice, which offer the same kinds of desktop applications that are found in Microsoft Office, for example.
Normalisation
The National Archives of Australia has adopted a particular form of migration called 'normalisation'. This involves migration of digital records to a limited number of standard formats on their arrival at the Archives. At the heart of their approach is the software application Xena (XML Electronic Normalising Archives). Xena detects the file formats of digital objects and converts them into open formats for preservation.
Native formats that XENA can convert include:
- MS-Word, Excel, Powerpoint and Project
- OpenOffice.org Writer, Calc, and Impress
- RTF
- PST email format
- TRIM email format
- MBOX email format
- Comma Separated Files (CSV)
- JPG, GIF, TIFF, PNG, BMP, PCX
- HTML
- Plaintext (various encodings)
- PDF documents, and
- XML.
The National Archives of Australia is conducting ongoing research and development work to expand the list of formats that XENA can recognise and convert.[3]
| Advantages | Disadvantages |
|---|---|
| Data formats which are open standards or which have published codes allow records to be reconstructed if applications are lost. | Converting to a different format may cause the record to lose authenticity if essential characteristics are affected. |
| XML based formats have popularity worldwide for information sharing and ease of access as well as longevity. | Methods of converting some digital records to archival formats still to be developed. |
| Tools for converting records to XML formats are now available as open source software |
Software migration
This type of migration is a valid approach for maintaining accessibility and authenticity of records over time while those records are required for current business or while they are being retained for short to medium periods of time.
| Advantages | Disadvantages |
|---|---|
| Migrating records forward as systems change can be made a routine part of a public office's normal ICT upgrades. | Migration poses a risk of loss or alteration of records if not properly managed. |
| Records are available in current formats with up to date interoperability with other systems. | Can be costly if performed many times over a record's life. |
| Can be used to maintain records in complex database / case management type systems. |
Migration on request
This approach involves preserving the bitstream of the record and developing a tool which will be capable of reproducing the intellectual content of the record in a different format. The tool must be developed before the record becomes obsolete. Migration is then only performed when a record is requested.
| Advantages | Disadvantages |
|---|---|
| Limits the possibility for data loss or alteration from multiple migrations. | Extra effort is required to keep migration tools up to date. |
| Risk that the migration tools may themselves become obsolete. |
Footnotes
[1] National Archives of Australia website, Why does the archives use XML? Accessed online May 7 2007, http://www.naa.gov.au/recordkeeping/preservation/digital/xml_data_formats.html
[2] OASIS ODF Adoption Technical Committee, Open by Design: The advantages of the OpenDocument Format (ODF) An OASIS White Paper, version last updated 10 December 2006. http://www.oasis-open.org/committees/download.php/21450/oasis_odf_advantages_10dec2006.pdf
[3] XENA and the plugin architecture being developed for it are available as open source software for download from the website SourceForge (http://xena.sourceforge.net/index.html).