- Why collect metadata?
- Where will metadata come from?
- Consider metadata early in your project
- Determine what metadata you need
- Capturing metadata
- Maintaining metadata over time
Metadata is descriptive information that helps people to understand, use and manage records.
In digitisation projects, metadata can be used to:
- find and use digital images
- link images to the business process they document
- demonstrate that images are accurate and reliable renditions of the original paper records
- document the digitisation process itself
- document formats and dependencies to help manage images over time.
Images without appropriate metadata will quickly become useless. They will be impossible to find, view or migrate to new technology as this inevitably becomes necessary. 
Failing to identify and collect suitable metadata may prevent your organisation from reaping the business benefits of a digitisation project.
It is likely that your organisation will already have good metadata that can be automatically applied to all of your digital images.
Optical Character Recognition (OCR) technology offers greater possibilities for automatic metadata capture. Automatic capture of key fields might be possible by writing scripts, especially when the original paper records use a standard format or template. 
The Department of Education and Communities used document definition forms to automate a large amount of their data collection. See Case Study: Department of Education and Communities pilot digitisation of HR records.
Housing NSW did not capture image level metadata, but still managed to automate much of their metadata capture. See Case Study: Housing NSW – Outsourcing the digitisation of client files.
Note: OCR may not be suitable for some types of back-capture digitisation projects, e.g. if the records to be digitised contain handwriting.
If your organisation has an electronic document and records management system (EDRMS), digitisation software may be able to be integrated with it to facilitate the automatic metadata capture of most of the metadata needed for access and management purposes.
Digital images may also be able to inherit some metadata from business systems they are linked to.
You may use OCR technology to extract metadata from the digital images and import it (usually through an XML schema) for use in a business system. If you do this you will need to map the fields in the current system to the metadata to be collected. 
Your organisation should try to automate the capture of metadata wherever possible. Manual collection of metadata should be a last resort as it is costly and can lead to a lack of attention to detail and poor quality collection.
With back-capture digitisation projects some manual data entry may be unavoidable. This is costly in terms of time and resources and should not be underestimated. If records are required as State archives, State Records may require specific and more detailed metadata to be captured as part of the digitisation and transfer process. Contact State Records and be very careful in defining exactly what metadata is essential.
Your organisation should determine all of the individual pieces of metadata (properties and values) that need to be captured as early as possible.
It helps to know your metadata needs prior to liaising with vendors over digitisation equipment purchases. Then you can determine whether the equipment can facilitate automatic metadata capture and get specific technical advice on how to achieve this.
The metadata generated during digitisation will also usually need to be imported into your corporate EDRMS or a specific business system, along with the digital images. An early understanding of your digitisation metadata needs will help with this import. It will also help you to define what metadata can be inherited from or automatically generated by your EDRMS or business systems, and what will need to be applied during the digitisation process.
If you are intending to transfer original paper records to State Records as State archives after digitisation, it is essential that you contact State Records to discuss what metadata they will require.
An organisation conducted a back-capture digitisation project with the intention of transferring the original paper records to State Records. They created a database where metadata was recorded. However, they did not discuss their metadata requirements with State Records first. When the time came to transfer, they found that they could not extract the required metadata from the database to generate a consignment list. In addition, there was some metadata, e.g. end date, which was not collected as part of the digitisation project, but was necessary for transfer.
Each back-capture digitisation project has different aims and may require different metadata. Consider the aims and drivers for your project to determine what you need.
Good metadata is a requirement of digitisation and all other recordkeeping projects because good metadata is essential to the ongoing use and management of digital data.
A range of metadata is automatically generated by digitisation software. This usually consists of automatically generated numerical title strings (such as ‘doc20101115155012.pdf’), often based on digitisation sequencing and date data. In determining what metadata you need, you should look at any auto-generated metadata provided by your system and assess whether it actually meets your business needs.
A unique identifier helps to distinguish a record from other records. This identifier can be at various levels of aggregation or all levels, depending on what suits your organisation.
Your organisation may decide to have a unique identifier for every digital image within a file, and also a unique identifier for a file.
A digital image may have the identifier ‘D10/2009’ while the file it is attached to has the identifier 10/0252.
If the digital image is saved into a recordkeeping system, the system will usually automatically assign a unique identifier at the image level. The image will also inherit a file identifier when it is attached to a file.
Some business systems may also be able to automatically generate identifiers.
Title is one of the most significant metadata elements to facilitate retrieval so you should consider carefully what metadata is required here.
Again the title field can be at various levels of aggregation.
You can have file titles and also titles for digital images within files.
Your file could be called ‘Occupational Health and Safety - Committees’ and the image name may be ‘Minutes 2008-02-24’.
You should consider the use of naming conventions at either or both the file and the image level. These should work together to facilitate retrieval rather than contain duplicate information.
Standard, well devised and rigorously applied naming conventions can facilitate sharing of information. Conversely, inconsistent naming of files and images can make locating files and images problematic, leading to frustrating searches and wasted time, and may result in information being unavailable when it is needed.
File and image names should be meaningful as metadata is self-referencing. They may reflect the existing names of the equivalent original paper files and documents or you can design other conventions to meet your needs.
If non-descriptive file or image names are to be used, e.g. a sequential numbering string, the files and images must be associated with metadata stored elsewhere which will identify the file or image. 
Large scale digitisation projects may be able to use machine-generated names and rely on a database for sophisticated searching and retrieval of associated metadata. 
Note: This approach relies on very robust connections between the imaged records and the controlling database and depends on these connections being maintained over time. This can be costly and complex.
As part of your metadata design process, you should determine whether it is more cost and business effective to apply meaningful title metadata to an image when it is created, rather than rely on separately stored data.
Existing classification tools and metadata automation tools may assist in automatically generating titles or components of titles.
The following recommendations for file and image names may be considered to help to promote searching and interoperability: 
In general, file and image names should:
- be unique
- be consistently structured
- include the use of leading zeros to facilitate sorting in numerical order (applies when a numerical scheme is used)
- avoid special characters (e.g. tabs or symbols), including spaces (use underscores as an alternative) as they can cause problems across operating platforms.
Metadata embedded in file names (such as scan date, page number etc.) should be recorded in another location in addition to the image name. This provides a safety net for moving images across systems in the future, in the event they have to be renamed. In particular, sequencing information and major structural divisions of multi-part objects should be explicitly recorded in the structural metadata and not only embedded in image names. 
Multiple pages within an image
In some cases, multiple pages may be captured within one digital image. Metadata added to the image title (or in another metadata field) can help to identify where this has happened.
An 80 page document may be captured in four digital images, each with 20 pages. In this case your organisation will need to add metadata about what pages are included within each image and how the four images relate to each other.
Versions and derivatives
You may create multiple versions of a digital image before arriving at a satisfactory output. These may need to be temporarily distinguished from each other with metadata (e.g. with version numbers). Once a final version has been reached it should be saved as the official record and its version number need not be retained. The other versions can be deleted using Normal Administrative Practice.
If you create derivatives of a digital image (versions at lesser quality for different uses) and intend to keep these for future use, you will need to consider how to distinguish these through metadata.
You may choose to retain the same title, but add a qualifier at the end of the title to show its intended use. A typical example is adding 'p' for published version or 't' for thumbnail after the image title to clarify how the images differ. Qualifiers have an advantage over entirely new names as they keep all associated versions linked. 
Specific needs of back-capture digitisation projects
Some back-capture digitisation projects may include a range of records where it is not as straightforward to set standard naming conventions for titles. You will need to consider how to manage these.
Some older photographs may require more descriptive information included in the title.
Where records are required as State archives, State Records may stipulate the capture of specific metadata in a specific way. Therefore it is vital that you contact State Records to discuss your digitisation project prior to setting standard titling for digital images.
Date of creation
Date of creation refers to the date that an original paper record was created, not the date that a record was digitised.
It is important to capture the date of creation of an original record as this provides key accountability, use and management information.
If you are creating digital images of incoming correspondence and capturing these straight into the organisation’s EDRMS, then the date of registration may well be the same date as the date of creation.
This metadata can be applied at an aggregate or file level and/or image level as well. Therefore this metadata could record the date an original paper file was created or the date the individual record was created.
Who/what created the record
This metadata refers to the person who created an original paper record, not the person who created the digital image. For the purpose of doing business, it is important to document an original record's creator so that this data can be searched for or reported on as required.
This metadata can be applied at an aggregate or file level and/or at the image level as well. Therefore this metadata could record who created the original paper file or who created the individual record.
Business function/process it relates to
This metadata records the business an original paper record relates to. It is important to connect a record to the business it documents. This is usually done by linking it to a file.
It is important to record this metadata for each individual digital image. This requirement refers to the specific data format that provides the structure for the image.
For digital images created using PDF, this could be 'PDFCreator Version 2.5' or 'Adobe Professional Version 3' etc.
It is important the data format is captured (see Technical metadata) and, where appropriate, the creating application and version is captured for all digital images. Often creating application can be automated.
With digital images your organisation will also need to capture some technical metadata about each image and the imaging process. This type of metadata helps to support image quality assessment, ensures an image can be rendered accurately, and demonstrates the provenance of the production of an image. 
Technical metadata can include elements like the following:
|Extent (file size in bytes)|
|File bit depth|
|Image manipulation (if relevant), i.e. any information about manipulation of the image including de-speckling, de-skewing and enhancement|
|Manipulation package (if relevant)|
See Technical specifications for more information about the nature of these elements.
Ideally this technical metadata should be linked to each digital image.
Note: As part of digitisation projects your organisation should retain documentation about all these technical metadata elements as part of planning, reporting and procedures. These may be called on to verify standard procedures used for digitisation if digital images are ever questioned in court.
If the technical metadata cannot be linked to each image, your organisation should be able to determine what technical specifications were used by referring to this documentation. For example, you should be able to know what technical specifications were used for particular digital images created on particular dates.
Note: The most important technical element for the ongoing use and accessibility of digital images is file format. Without knowledge of this metadata an image may not be able to be read in the future.
Process metadata captures information about specific processes that are performed on records (also known as event metadata).
The key process in relation to digitisation is registration. Metadata can be used to document the date a digital image was registered in a system and who registered it.
Other process metadata, such as disposal or migration metadata, will usually be applied and maintained at the file level.
Your organisation will also need to consider if any other metadata is required to meet your business needs and the aims of your digitisation project.
With back-capture digitisation projects where access is the primary concern, it may be relevant to put greater emphasis on indexing and item level metadata collection. Your organisation may also consider using resource discovery metadata for images that are to be published on the web. See http://www.agls.gov.au
If you are transferring original paper records to State Records as State archives, you will need to capture the metadata required for transfer.
Policies and procedures for metadata capture and management
You will need to develop internal policies and procedures for metadata capture and management. These may form part of general digitisation procedures and should address:
- capturing metadata (including the elements to capture, conventions for recording names, places and dates, using controlled vocabularies when manually entering metadata, encoding schemes etc, who captures what elements and when, what tools are used etc.) 
- accommodating images with incomplete metadata
- checking the relevance and accuracy of metadata
- checking grammar, spelling and punctuation, especially for manually entered metadata
- ensuring consistency in the creation of and interpretation of metadata
- ensuring the completeness of metadata
- the metadata required when registering digital images into recordkeeping systems
- the documentation required to be kept regarding metadata capture.
Full quality checking of metadata must be completed before any original paper records are destroyed and the results of checks must be documented.See Benchmarks and quality assurance for more information.
If outsourcing digitisation, you will need to communicate the documented requirements for metadata to service providers.
Training for staff
Training for staff involved in the creation and maintenance of metadata is critical to its successful collection. Procedures for metadata creation and maintenance should be easy to follow and appropriate support should be provided. Any tools such as templates and data entry forms which facilitate the entry of metadata in a user friendly manner may prove to be beneficial for staff.
See Staffing digitisation projects for information on skill sets for digitisation.
For fields that cannot be automated, you may also benefit from developing encoding schemes where relevant for your project. Encoding schemes or ‘pick lists’ enable you to provide users with choices from which to select values when populating metadata fields.
In an organisation ‘electoral district’ was a changeable field that users needed to specify. For this field, users were provided with a drop-down list of all the electoral districts in the State. They simply needed to choose the relevant one.
The advantages of encoding schemes are that you can determine consistent ways of displaying information and are not reliant on idiosyncratic data entry by staff. Fields can remain consistent with no spelling errors.
Metadata needs to remain persistently linked with digital images and aggregations of digital records, including through migrations.
It is also important to remember that metadata itself is a record. Metadata needs to be retained in accordance with the State Records Act 1998 and the relevant disposal classes within approved retention and disposal authorities.
Primary control records (including metadata) for most records need to be kept for a minimum of 20 years after the records to which they relate are destroyed or finally disposed of. Some are kept longer. See General retention and disposal authority: administrative records for more information.
|Has the organisation identified and documented technical and other metadata requirements for the project?|
|Is metadata captured automatically, e.g. inherited from existing systems, where possible?|
|Are procedures for metadata capture and quality control documented, e.g. as part of digitisation procedures?|
|Are relevant staff trained in metadata capture and management?|
|Is metadata managed as a record and retained for as long as it is required?|
 Howard Besser, Introduction to Imaging: Metadata, Revised edition, Getty Research Institute, n.d.
 National Archives of Australia, Digitising accumulated physical records, 2011, p.22.
 Archives New Zealand, Digitisation standard, 2007, p.38.
 Ibid., p.39
 Ibid., p.18
 Public Records Office of Victoria, Guide to digitisation requirements, 2010, p.13.
Published 2014 / Revised February 2015