Frequently asked questions: digitisation
These frequently asked questions address some of the major questions about digitisation received by State Records’ staff. The FAQs will be added to when new questions arise. To submit some potential questions and/or answers, please email firstname.lastname@example.org Individual sections of the guidelines may also answer your questions.
Note: These FAQs apply to both business process and back-capture digitisation. Hyperlinks to individual sections of the guidelines have not been provided as you will need to navigate to the relevant guideline (Managing business process digitisation programs or Managing back-capture digitisation projects first.
1.1 Can I destroy original paper records after I have digitised them?
You may be able to if you can meet all the requirements of General retention and disposal authority: imaged records. See Disposal of original paper records in the relevant guideline for more information.
1.2 Can I destroy original paper records that were digitised before General retention and disposal authority: imaged records was issued?
If the originals are eligible for destruction in accordance with General retention and disposal authority: imaged records, and they meet all its conditions, they may be destroyed.
Check documentation about the digitisation process and quality control measures taken, so that if required, you will be able to demonstrate that these images are authentic, complete and accessible, and that the other conditions for destruction were met. If you do not have adequate documentation, it may be prudent to retain the originals.
Remember, originals that are ‘required as State archives’ or to be retained in agency that were created or received prior to January 1, 2000 are not eligible for destruction after digitising under General retention and disposal authority: imaged records.
1.3 If I am authorised to destroy the original paper records, how long should I keep them after digitisation before destroying them?
The original paper records should be kept for a period of time for quality control purposes. This is to allow for quality checks of the images and provides an extra safeguard in case of loss of images in the copying or registration process.
Your organisation should determine an appropriate period for retaining originals for quality control purposes. This period should be based on an assessment of the:
- level of assurance that a full and accurate record has been created
- level of assurance that the digitised image is being well managed in a recordkeeping system
- robustness of digitisation processes, including quality assurance processes
- level of assurance that the authenticity is being maintained (determined through results of quality assurance processes)
- need for access to the original paper records for other purposes such as legal proceedings.
This assessment should be:
- based on an understanding of your organisation's own digitisation and recordkeeping processes
- suitable for the types of business to which the records relate
- determined in consultation with relevant business units.
State Records recommends a minimum retention period of six months. For low risk records, this may be reduced to one or three months.
Note: It may be best to be initially cautious – as you become more experienced with digitisation you will be able to more easily determine what time period is appropriate.
Where original paper records are destroyed, the digitised copies of the records must be retained for the records' full retention periods, as required in the relevant retention and disposal authority.
1.4 My organisation keeps day boxes. Can these be destroyed?
Day boxing is a common practice for business process digitisation. It involves scanning records as they are received (ie as part of business process digitisation) and placing the originals in a ‘day box’ or batch. Records should be covered by retention and disposal authorities if they are to be day-boxed.
In back-capture digitisation projects, if you are destroying the original paper records after digitisation you may box records after scanning, but in all likelihood you will box them according to scanning batches. You should not use day boxing where originals are to be retained (they will usually need to be reconstructed).
It is a condition for destruction in the General retention and disposal authority: imaged records that original records awaiting destruction are kept for a certain period of time for ‘quality control’ purposes after digital imaging has occurred. Your organisation can choose this period (e.g. six months) and should apply it consistently.
Those operators doing the digitisation also need to be aware of the types of records excluded from General retention and disposal authority – Imaged records and any records required for current legal proceedings so they do not place these records in boxes awaiting destruction. Such records will need to be kept and managed separately.
You will also need to ensure that digital images are registered into a recordkeeping system and given retention periods from the appropriate retention and disposal authority.
1.5 I do not have a recordkeeping system, so am I able to destroy the originals once digitised?
Two of the conditions of General retention and disposal authority: imaged records require copies to be authentic, complete and accessible and kept for the authorised retention period. It is important that you meet these requirements if the digital image of the record will replace the original.
While you may not have a sophisticated recordkeeping system, like TRIM or Objective, if you can meet these conditions of the Authority you do effectively have a recordkeeping system as you can demonstrate that the integrity of the records can be safeguarded over time.
Providing you can meet the other requirements of the Authority that:
- all requirements for retaining originals have been assessed and fulfilled
- originals are kept for quality control purposes for an appropriate length of time after digitisation
you have authorisation to destroy certain original records after digitisation.
If you do not meet these requirements, you are not authorised to destroy any original records after digitisation.
1.6 How do I identify records associated with current legal proceedings or applications under the GIPA Act, HRIP Act or PPIP Act?
General retention and disposal authority: imaged records recommends that these records are considered for retention after digitisation. Your organisation will need to perform a risk assessment to determine whether the original paper records can be destroyed.
If you determine that the original paper records cannot be destroyed, you will need to have measures in place to ensure that these records are identified and managed appropriately.
- Your organisation may need to brief operators and provide them with a list of proceedings or applications which are underway and the records relating to these so they can be kept separately from day boxes.
- Your organisation may establish a system to flag certain files that may be, in future, required for litigation or access applications. For example, if your organisation receives a letter threatening legal action, it may be appropriate to retain the original paper records relating to the matter until the threat of legal action has passed. Likewise, if you are dealing with a client who has brought legal action against your organisation previously, you may flag records about the business conducted.
- A check should be included at the end of the quality control retention period, before originals are destroyed, to determine if they have become the subject of legal proceedings or applications for access.
2.1 Can digital images be admissible in court?
In NSW there is no barrier to organisations tendering digital images of records as evidence. They can be considered suitable to submit in legal proceedings in response to Government Information (Public Access) Act (GIPA) applications and for other evidentiary purposes.
However, the value or credibility of the digital image as evidence can still be questioned.
The authenticity of a presented record may be challenged or a judge may be given some other reason to doubt the reliability of a digital image.
In these cases, your documentation regarding how the digitisation was conducted and the digital images created and kept may help to demonstrate that the digital image is an authentic and credible representation of the original. See Legal admissibility and credibility of digital images in the relevant guideline for more information.
3.1 What technical standards (resolution, file format, compression etc.) should I use?
The primary goal with technical specifications is to create a legible digital image of sufficient quality for its purpose that can remain legible and useable for as long as required. For long term digital images, this may mean that they need to withstand time and a number of migrations.
In the Technical specifications section of the relevant guidelines, you will find an explanation of file formats, resolution, compression etc and a table listing the recommended technical specifications for records and considerations if departing from these.
3.2 What is the difference between PPI and DPI with resolution?
Resolution is a measure of the ability to capture detail in the original work. It is frequently quantified in pixels per inch (PPI) which is a measurement of resolution for computer display. The higher the PPI the better the resolution and the clearer the image.
Dots per inch (DPI) is often used interchangeably with PPI, but actually refers specifically to measurement of the resolution for computer printers.
Generally the PPI will equate roughly to the DPI.
3.3 Do I need to digitise in colour?
Not necessarily. You need to consider the particular records in question and determine what the ‘essential characteristics’ of the records are that need to be maintained and present in the digital image.
If it is essential to reproduce the colour in order to understand the records or preserve their evidential value, you will need to digitise in colour. If colour is not essential, you can choose not to digitise in colour.
An organisation had a particular group of records often required in court where colours were vital to understanding annotations on the records eg. green pen meant something different to red pen. In these cases colour was an important essential characteristic of the records that needed to be reproduced. This affected decisions regarding hardware and software and technical (colour management) requirements.
If the colour in a document is just within the letterhead, colour is not an essential characteristic – it is not essential to understanding the record.
3.4 Can I use enhancement techniques for the digital images?
Image enhancement techniques may be employed to make an image more exactly resemble the original. However, benchmarks should document acceptable changes and these must be routinely employed.
If enhancements change the evidential value of a record or if acceptable changes are not documented and routinely employed, the organisation may be subject to challenges that the digital images are not authentic representations of the original paper records.
3.5 Should I use watermarking or fingerprinting for records?
Some digitisation processing and management software may have the ability to modify the appearance of a digital image by adding information such as the date or organisation name. Two such techniques are watermarking and fingerprinting:
- Watermarking is the inclusion of static information on an image at time of storage, perhaps the name of the organisation and date of capture.
- Fingerprinting typically includes information generated when the image is accessed, such as login name of the end user and date / time information.
While this information may be useful, and the inclusion of it as part of the image convenient, these modified images are no longer a true and accurate copy of the original paper records. This is especially relevant where added information, such as a large watermark through the text, makes the content of a record difficult to read.
Organisations should instead retain a digital record as an unmodified representation of the original paper record and capture any additional information as metadata rather than as part of the image.
3.6 What is the difference between PDF and PDF/A?
PDF/A is an ISO standardised version of the Portable Document Format (PDF) specialised for the digital preservation of digital documents. In PDF/A the proprietary fonts are removed. All the information for displaying a document is embedded rather than linked, so the document can be displayed in the same way in years to come. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and hyperlinks). Use of standards based metadata is mandated.
Note: PDF/A can be edited. If your requirements include security or authenticity you should capture the PDF/A records to a recordkeeping system.
4.1 Should I use Optical Character Recognition (OCR) software?
Optical Character Recognition (OCR) allows the text depicted in a scanned document to be extracted as a text file or word processor document. OCR software is required to recognise the text contained in the image and usually provides search and export capabilities. Another advantage of OCR is that it allows you to automate more metadata capture through document definitions.
You will need to consider the aims of your digitisation project or program and the records in question to determine whether OCR will help you meet your organisation's needs. Your budget and the cost of the OCR in comparison to other software will be a factor. In addition, some documents are not suitable for OCR (see 5.3 for examples).
5.1 What quality checks should I perform?
You will need to determine the extent of quality control required based on the risks of your particular digitisation program or project. See Benchmarks and quality assurance in the relevant guideline for more information.
5.2 If I use enhancements such as sharpening, blurring or de-speckling to make the digital images more accurately resemble the original, do I need to check them?
Yes. Techniques such as ‘sharpening’ and/or ‘clipping’ of highlights or shadows, ‘blurring’ to eliminate scratches, ‘spotting’ or ‘de-speckling’ may be used to touch up specific areas of a digital image. Some software may automatically correct imperfections. The extent of these processes can be set through tolerance levels.
In quality assurance checks you should ensure that these processes are checked to make sure information is not lost (for example, if the tolerances are set too high the dots above the letter ‘i' may be removed). Processes employed should be documented so as to help ensure the authenticity and completeness of the images is not at risk of being challenged.
5.3 If I use Optical Character Recognition (OCR) do I need to check for the accuracy of text?
OCR is rarely a fully automated process and may require operator intervention to assist in obtaining an accurate transcription of a scanned record’s text. Documents containing handwriting, serif fonts, halftones and background text or images or those that are damaged or dirty may not be suited to the OCR process.
5.4 What are some common quality faults in digitisation we should plan to prevent?
Quality faults can be categorised as implementation faults, process faults or operator faults.
- Implementation faults can be avoided, providing appropriate procedural controls are in place to guide the digitisation.
- Process faults are normally out of the control of the operator and need to be addressed by a supervisor to the process.
- Operator faults are the day-to-day faults that are made by the operator as they work.
There are a number of faults that can be avoided with appropriate specification of procedures to guide an implementation. These include:
- dirty originals
- incorrect file-size and format, where files are made to the wrong size or with the wrong choice of file format
- compression, where files are made with an inappropriate type or level of compression.
There are a wide variety of process faults that can be caused by many problems within the workflow. These problems can include:
- incomplete or inaccurate specifications or process documentation
- faulty capture hardware (incorrectly calibrated and characterised devices)
- faulty software (inaccurate image processing or faulty image links within database)
- incorrectly established colour management systems
- low quality original data (either non-digital surrogates like a photocopy or legacy digital image files)
- inaccurate source metadata.
These faults are caused by some form of operator error within the workflow and can include:
- basic capture faults
- cropping that cuts into the image, is too loose, or is uneven
- incorrect orientation of the image, i.e. is the wrong way around, or upside down
- incorrect exposure of the image, i.e. it is too light or too dark
- incorrect focus, i.e. the image is out of focus
- daily calibration, where the capture device has not been calibrated
- basic image processing faults
- file optimisation faults, where incorrect adjustments are made to the colour, contrast and brightness of the image during processing
- incorrect file-naming, where image files are incorrectly named or use non-unique names
- basic metadata attribution faults
- placing digital images into incorrect folders, files or classification structures
- incorrect data entry, where data is incorrectly entered into the management control system
- incorrect use of controlled vocabulary, e.g. using words not established within scope notes.
5.5 Should pages of a file or volume be paginated prior to digitisation?
Your organisation should consider whether it is necessary to add page numbers to a file or volume prior to digitisation (if not numbered already). It does require additional effort (and potentially cost if digitisation is outsourced) but this should be balanced with your organisation’s need to have evidence of the exact order of the original papers on the file. A risk assessment should be conducted when planning the project. If records are high risk or long term/archival and/or tend to be requested in court, your organisation may well decide that the additional effort is justified. Pagination can assist you with quality assurance checking. Where original paper records are to be retained after digitisation, it will also assist you to reconstruct the original paper records.
If page numbers are to be added your organisation’s procedures for digitisation should indicate acceptable ways this must be done. Pagination of archives is generally done in soft pencil so that it can be removed if necessary. Stamping is not generally recommended for archives as the stamps alter the original, and may obscure text or negatively impact on pictures or images. In some cases you may be able to add page numbers to the metadata of digital images.
If your organisation decides not to paginate, you should (at least) determine the amount of pages digitised and compare these to the number of papers in the original records (this can be sampled where relevant). If pages are removed to be digitised in separate batches, e.g. non-standard materials that require a different scanner, flags should be added to ensure pages are returned to the correct order.
6.1 What are my organisation's responsibilities if digitisation is outsourced?
Public offices are responsible for meeting the requirements of the State Records Act and standards released under the Act. If your organisation outsources the digitisation of records, it is still responsible for the management of both the source records and the digital images. Therefore, you need to ensure that all relevant requirements are specified in contracts with providers.
6.2 What specifically should I address in my contracts with outsourced providers?
In order to ensure that a digitisation project is managed suitably, it is recommended that contracts contain:
- clear guidance on the range and type of records to be digitised
- clear timeframes, costs and expectations including that records should not be altered in any way
- roles and responsibilities of the organisation and service providers
- special requirements for sensitive or personal records or urgent requests
- benchmarks, eg. technical and metadata requirements etc.
- quality assurance measures (including early checks of samples and remediation required if benchmarks are not met)
- an agreed monitoring framework
- a statement that all State records and State archives must remain in NSW unless express permission is given by State Records to take them out of the State. See General authority: transferring records out of NSW for storage with or maintenance by service providers based outside of the State.
Service providers must be made aware of relevant standard including:
- Standard on the physical storage of State records, Principle 6: Careful handling Minimum compliance requirement 4 ‘Records are handled carefully during conversion and converted according to recognised standards’
- Standard on digital recordkeeping which specifies the minimum requirements for digital recordkeeping systems and metadata
- General retention and disposal authority: imaged records which specifies the conditions for destroying original records after digitisation.
The monitoring framework that forms part of the contract should ensure that all recordkeeping requirements are met throughout the term of the agreement.
If you are intending to digitise records required as State archives, you should consult State Records. This is to ensure that the specific issues concerning the reproduction of archival records can be discussed and suitable parameters for the digitisation process agreed upon.
7.1 What extra things should I consider when conducting large scale back-capture projects?
With large scale back-capture projects it can help to break them down into smaller components. The parameters of these components should be clearly defined and measurable. This way you can learn as you go along and lessons learnt can be applied to later parts of the project. Segmenting your project may also allow you to use your resources more efficiently.
If you have records in different formats or of different ages requiring different equipment or methods of capture they can form different parts of the project.
If records are still being used frequently remember to consider how long they can be unavailable and consider whether certain records can be digitised on request, even if this is out of sequence.
If you have multiple parts of the project being performed simultaneously you will need to have very efficient monitoring mechanisms in place.
See the Case study: Housing NSW - Outsourcing digitisation of client files [coming soon] as an example of the considerations required for large scale digitisation.
 Queensland State Archives, Digitisation Disposal Policy Toolkit, Quality Assurance Guideline, May 2010, section 3, available at: http://www.archives.qld.gov.au/government/files/Digitisation%20Disposal%20Policy%20Toolkit%20-%20Quality%20assurance%20guidance.pdf
 Legal advice provided by the NSW Crown Solicitor’s Office to Housing NSW. Reproduced with the kind permission of Housing NSW.
 Archives New Zealand, Digitisation standard, 2007, Appendix 7, available at: http://archives.govt.nz/advice/continuum-resource-kit/continuum-publications-html/s6-digitisation-standard, p.16.
 Queensland State Archives, Digitisation Disposal Policy Toolkit, Quality Assurance Guideline, May 2010, op.cit., section 2.2.1
 Wikipedia, PDF/A, available at: http://en.wikipedia.org/wiki/PDF/A. See also http://digitalpreservation.gov/formats/fdd/fdd000125.shtml and http://www.appligent.com/talkingpdf-how-to-implement-pdfa for more information about PDF/A.
 Queensland State Archives, op.cit., section 2.2.1
 Archives New Zealand, Digitisation standard, 2007, op.cit, Appendix 7
 National Archives of Australia, Digitising accumulated physical records, Commonwealth of Australia, 2011, available at: http://www.naa.gov.au/Images/Digitising%20accumulated%20physical%20records%2C%20April%202011_tcm16-47278.pdf, p13.