Subscribe to Future Proof - recent posts feed
A State Archives and Records initiative for the NSW Government
Updated: 1 week 5 days ago

NSW Information Commissioner and Open Data Advocate launches NEW Open Data e-learning module

5 June 2018 - 10:11am

By Cameron Duffy
Communications and Promotion Officer (IPC)

NSW Information Commissioner and Open Data Advocate, Elizabeth Tydd, has launched a new e-learning module on Open Data in collaboration with the Department of Finance, Services and Innovation (DFSI).
‘Launched in May to close out Information Awareness Month (IAM), our new Open Data e-learning module is an opportunity to increase public awareness of information and its place in all aspects of daily life and to promote information practices and policies to support sound information management across organisations,’ said Ms Tydd,  NSW Open Data Advocate and CEO of the Information and Privacy Commission NSW (IPC).

‘Transparency of government actions – sound practices for information access and information sharing are central to building trust and achieving an effective democratic system.

‘Our challenge as custodians of government information, is to embrace the ‘digital world’ and apply its benefits to promote accountability, deliver better services, engage with the community and at the same time, ensure our systems protect information privacy and security.
‘Building trust and confidence in our ability to ethically and effectively manage information in the digital age is essential to advancing Open Government. Our new e-learning module is also designed to elevate knowledge of sound information governance,’ Ms Tydd said.

The IPC is promoting good governance through the release of a new, freely available Open Data e-learning resource.

‘DFSI are leading the state’s work in better understanding and ensuring accountability for using and sharing Open Data. This e-learning resource has been developed in line with the NSW Open Data Policy and is being delivered under our commitment to provide education and training to our stakeholder groups across NSW information access and privacy legislation,’ Ms Tydd said.

‘I am pleased to launch the new Open Data module which has been designed to provide an understanding of Open Data along with an explanation of how public sector organisations can embed good information practices to support Open Data release in NSW.

‘Open Data offers great potential value to the community and government. The benefits are diverse, ranging from improved efficiency to greater public participation in the development of government policies and community services.
‘I encourage all public sector employees to complete the Open Data e-learning course, available for free on the IPC website,’ Ms Tydd said.

More information and resources on information access and privacy rights in NSW are available at http://ipc.nsw.gov.au/

Recordkeeping FAQs – Do the European Union’s new privacy laws apply to NSW public offices?

11 May 2018 - 12:11pm

The European Union’s General Data Protection Regulation (GDPR) is new data privacy legislation introduced to protect the personal data of all citizens across the EU. The GDPR comes into effect in a matter of weeks, on May 25th 2018.

Although a European law, the GDPR is designed to have extra-territorial reach and may apply to some NSW public offices, such as universities.

As this month is Information Awareness Month, it is more relevant than ever to ensure that NSW public offices are remaining up to date with new privacy regulations. According to NSW Information Commissioner Elizabeth Tydd, Information Awareness Month is a “timely reminder” of the importance of good governance and best practice around information management. Privacy frameworks such as the GDPR are designed to promote and regulate precisely this.

The Information and Privacy Commission (IPC) has created a fact sheet to inform NSW public sector agencies of their responsibilities in regards to the new legislation. The fact sheet includes answers to a few of the common questions about the GDPR, such as:

Does my public sector agency need to comply with GDPR?

How is the GDPR different to NSW privacy laws?

What are the risks of not complying with the GDPR?

For answers to these questions and more information about how the GDPR could affect the NSW public sector, visit the IPC’s fact sheet.

Photo by Dennis van der Heijden

Welcome to the Information Awareness Month 2018

1 May 2018 - 9:24am

May is Information Awareness Month (IAM). The purpose of IAM is to increase public awareness of information and its place in all aspects of daily life.

This year’s theme: Trust in the Digital World highlights the key role information plays in building trust in digital technologies.

To celebrate IAM, we will be posting information on how we can build trust in the digital world. So keep an eye on the blog during May.

In the meantime, you can read one of our blog posts regarding trust: Trust no one? The truth is out there.

You can also read the NSW Digital Strategy which provides details on how the NSW Government is approaching the design and delivery of its services. It also includes information on cyber security’s role in ensuring that government services provided digitally stay safe, secure and trustworthy.

The presentations and podcasts from the Records Managers Forum held 28th March 2018 are now available

23 April 2018 - 2:21pm

The Records Managers Forum provides an opportunity for NSW public sector records professionals to share stories, discuss issues of current concern and impart strategies on key records and information management initiatives.

The Forum included presentations from:

  • Nicola Forbes, Principal Manager Information Services and Records, Transport for NSW
  • Lewis Dryburgh, Records and Information Manager, NSW Treasury
  • Peter Donnelly, Information Services Officer and Right to Information Officer, Information & Privacy Commission NSW and Michael King, Principal Records Manager, Department of Family & Community Services
  • Catherine Robinson, Senior Project Officer, Government Recordkeeping, NSW State Archives and Records.

Nicola presented the Information Toolkit for Transport Projects recently developed by her unit, Transport Shared Services. The Toolkit comprises a suite of tools designed to assist Project Delivery Offices, including staff and contractors in embedding information and records management responsibilities.

Lewis shared the results of his recently completed Masters Research project which investigated the information management practices in technology start-ups. The information practices of young professionals in IT start-ups which Lewis describes provide interesting insights into the challenges and perceptions we face when introducing recordkeeping practices to new recruits to government.

In their presentation, Peter and Michael introduced the Community of Records Management Professionals – its mission, charter, and benefits for the members, workforce and the sector.

Catherine introduced the Code of Best Practice for Recordkeeping, based on AS ISO 15489.1:2017. Catherine provided a short summary of changes and implications, and also explained the importance of adopting the Code.

You can find the presentations and podcasts here.

As always, please don’t hesitate to contact us for more information on the presentations or if you have something to share with us.

Image credit: Community can be beautiful by Alan Levine 

Using auto-classification to classify unmanaged records

18 April 2018 - 1:49pm

Last week the Digital Implementers Group enjoyed a presentation by one of the members of the Group on auto-classification.

Following the end of a service provider’s contract, a government agency received the property service records that the provider had been creating and managing for ten years. These records consisted of over 400,000 electronic documents contained in 31,000 folders, some up to 14 levels deep. Many of the records did not have consistent titling or match the agency’s own records classification scheme. Due to the impending transition to a new service provider, the records needed to be migrated and classified in a matter of months.

With the scope and timeframe of this migration project rendering manual classification out of the question, it was the perfect opportunity to trial auto-classification.

How the auto-classification system worked

The project team chose to leverage existing investment in TRIM and pilot the use of the auto-classification module. The rationale was that the TRIM auto-classification module was more affordable than procuring a new system as it only required upgrading an existing system.

The auto-classification solution that the team used involved three components. The first stage was an Optical Character Recognition (OCR) program which transformed image files into readable text. The file was then indexed by a content indexing server, and finally forwarded to the auto-classification module to be classified.

While the OCR component of the project was slower and resource-heavy, there was still a strong business case to be made as the OCR component made documents searchable that were not previously.

An agile, continuous process for refining terms

The accuracy of the auto-classification system relied on the definition of a set of terms. When a collection of terms were identified in a record, the system filed it to the corresponding classification.

Initially, the team allowed the auto-classification program to train itself to define the terms for each category. This approach was not successful as the module identified many unknown or garbage terms. A subject matter expert then input manually terms they would expect to see for each classification. This was the most resource-heavy part of the project and the most critical for its success. Refining the terms, feeding new documents through, observing the results and then refining the terms again was an agile, continuous process.

Outcomes

During the testing phase, 5 -7,000 documents were uploaded into the system and were being auto-classified in under two hours, however this rate will change as the team are going to implement a bulk uploader. The OCR component was the most time-consuming in the process, and initially created a bottleneck in processing.

Key learnings ‘Better but not best’

One of the members of the group asked about the risks of classifying documents which were not in fact ‘records’. Due to the time limitations of the project, the team was unable to triage the documents so proceeded with an ‘over capture’ approach and accepted that ‘non-records’ would be captured.

The outcomes of the auto-classification project were described as ‘better but not best.’ Accepting that the outcomes will always be imperfect was one of the biggest lessons of the project.

Terms are vital                                                                     

The success of the auto-classification depended on the definition and weighing of the terms involved. For the category ‘Cleaning’, 95% of the records were auto-classified correctly. This was because many terms were specific to that category. Other categories did not work as well, usually due to the duplication of terms across classifications. The team learned that auto-classification systems do not work straight out of the box, and accurate classification only happens when there is good implementation and definitions of terms for use cases.

Importance of a strong business case

One member who had worked on the project explained that their auto-classification system worked best if you were dealing with a ‘mess’ of records. They found that there needed to be a strong business case for spending a large amount of resources on the labour intensive parts of the project and this would be hard to justify unless there was a large volume of unorganised records.

Educate stakeholders to manage user expectations

Stakeholders often wanted to know how well the auto-classification system would classify records (e.g. would it correctly classify 9 out of 10 documents?) Due to the variables and unknowns in how the system would work and what the records actually held, that question was not able to be answered. It was important in the face of these unknowns to educate stakeholders on the processes you are applying and to set expectations low. The team originally estimated the system would correctly classify 50% of records, although system testing is providing a higher success rate now.

What’s next?

The project team can see several other uses for the system. One of the ideas was to integrate the auto-classification system with front end customer service procedures. For example, the system could automatically classify routine forms for business services as soon as they were saved in the system.

Looking to the future, members discussed whether auto-classification could eventually make records managers redundant. Some members thought it could have the opposite effect, as auto-classification could allow records professionals to focus more on aspects of their work such as standards, procedures and programming rather than manual disposal and migration.

Photo by Matthew Paulson

Reducing file share dependencies: the Aboriginal Housing Office’s approach

16 April 2018 - 10:57am

Next to emails, share drives / network drives or file shares are probably the most utilised resource for storing records in any agency. Often they are a nightmare to navigate, let alone manage.

One of the strategies used to solve the problems associated with file shares is to implement an electronic document and records management system (EDRMs). However, having EDRMs in place doesn’t guarantee reduction of file share usage.

We are fortunate that the Aboriginal Housing Office  shared to us how their Records Management (RM) Program implementation increased their EDRMs usage adoption and reduced their file share footprint.

The Aboriginal Housing Office (AHO) is a legislative authority established under the Aboriginal Housing Act 1998. The AHO administers the policies and funding arrangements for Aboriginal community housing in NSW.

Pre-2015 records management in AHO

AHO’s records management was characterised by the following:

  • hard copy records were stored in various locations and pinpointing where specific records were, took a lot of time and effort
  • AHO didn’t maintain its own recordkeeping system
  • staff were not supported or trained in records management procedures / processes
  • file share G:/ was the default repository for most of AHO records.

It was increasingly hard for AHO to administer its programs without trustworthy records and robust records management processes.

In 2015, AHO partnered with the Department of Family and Community Services (FACS) to modernise records management through the OneTRIM program.

Records Management Program

The AHO had strong change sponsorship from Shane Hamilton, Chief Executive which was critical to the success of the RM Program. (Click here to listen and view Shane’s presentation.) The RM program consisted of:

File Share Reduction Implementation Timeline

During the RM Program implementation, AHO identified that records were stored in file shares and not in the records System.  A strategy to reduce file share usage was included as part of the RM Program. The strategy consisted of:

  • Policy – all records for comment and approval need to go through the Records System and that all records previously saved in file shares will have to be saved in the Records System. The policy also recommended that file share G:/ will be read-only.
  • Stakeholder engagement – business units were consulted and their business processes and needs were taken into consideration as part of change management. This resulted in the identification of exceptions where the Records System is unable to manage specific records or business processes.
  • Training and communication – staff were given training and cheat sheets were developed and provided to the staff. Information relating to the implementation were also communicated. In addition, records management training was also included in the induction process.

 

To view this case study as an infographic, click here. To download the infographic in PDF, click here.

Thank you very much to Christine Tran of AHO for sharing their strategy with us!

Case Study – Internal Pilot – Machine Learning and Records Management

20 March 2018 - 10:22am
Motivation

In 2017 State Archives NSW’s Digital Archives team began investigating the application of machine learning to records management. The first deliverable of this project was a research paper published on the FutureProof blog that explored the state of art (what the technology is capable of) and state of play (how it is being used in records management). One of the key findings of this report was that, although machine learning has potential to improve the classification and disposal of digital records, there has been very little adoption of the technology, particularly in New South Wales. In order to promote uptake we committed to a series of internal and external pilots to further explore and showcase machine learning for records management.

This case study documents an internal pilot that the Digital Archives team conducted in November and December 2017. The goal of this pilot was to apply off-the-shelf machine learning software to the problem of classifying a corpus of unstructured data against a retention and disposal authority. The results of this pilot were shared at the December 2017 Records Managers Forum.

Preliminary set-up

One of the constraints of the internal pilot was that we had limited resources: no budget and (very fortunately) an ICT graduate placement that had recent university experience in machine learning. So in identifying suitable technologies to use in the pilot we looked for low cost, off-the-shelf solutions. We quickly settled on scikit-learn: a free and open source machine-learning library for the Python programming language. This is a simple and accessible set of tools that includes pre-built classifiers and algorithms. It was fortunate that we had a machine with a big CPU, copious RAM, and SSDs to run the model on.

Method

Objective

The goal of the internal pilot was to test machine learning algorithms on a corpus of records that we had previously manually sentenced against a disposal authority. With what level of accuracy could we automatically match the corpus against the same disposal classes?

Corpus

The records that were chosen for the internal pilot had been transferred to the Digital State Archive in 2016 by a central government department. This corpus was unusual in that it contained a complete corporate folder structure extracted from Objective. The full corpus comprises 30 GB of data, in 7,561 folders, containing 42,653 files. At the point of transfer, no disposal rules had been applied to the files (ordinarily we require that only records required as State Archives are transferred to our custody). In a joint effort with the department we manually sentenced the corpus (at a folder level) against the General Retention and Disposal Authority Administrative Records (GA28). The result of this manual appraisal of the folders was a total of 12,369 files required as State archives.

The following options were considered for the internal pilot:

  • to apply all “Required as State archive” classes from GA28 (75 in total). Folders that didn’t fit these classes would be unclassified
  • to apply the subset of “Required as State archive” classes that had been manually identified in the corpus (23 in total). Folders that didn’t fit these classes would be excluded from the corpus
  • to apply all of the GA28 classes (686 in total). To do a complete test of all folders
  • to pre-treat the corpus by removing all folders which would be covered by NAP (Normal Administrative Practice) E.g. duplicates or non-official/ private records

The decision was made to pre-treat the corpus and remove all folders which would be covered by NAP (Normal Administrative Procedures) and to take the subset of 12,369 files that were identified as being “Required as State archives” which used only 23 classes of GA 28.  Further preparation of the subset involved assigning the classification from the folder level at the level of the individual files. This was done manually.

Summary table

Break down of the corpus:

 

Data set Number of file contained Complete  corpus 42653 NAP (Normal Administrative Procedures) 25643 Corporate file plan 17307 Required as State Archives 12369 Required as State Archives and formats that could be text extracted – i.e. the usable sample set 8784

 

Text Extraction and Classification steps

 

  1. Text Extraction

To be usable, the documents chosen for analysis need to be easily text extractable. This was to ensure performance and ease of conducting further text manipulation later in the project. Only 8,784 files of the 12,369 files which were classified as State archives were selected for use because their file types allowed simple text extraction.

After sorting the sample set, a Python program using various libraries was developed to extract text from the following file types: PDF, DOCX and DOC files.

The text that was extracted from documents was then placed within a single .csv file. The .csv file was divided into three columns: the file name (unique identifier), classification (GA 28 class), and lastly the text extract.

 

  1. Data cleaning

We took a very basic approach to data cleansing. The following concepts were utilised: remove document formatting, remove stop words, remove documents that are not required, and convert all letters to lower case.

 

  1. Text Vectorisation and Feature Extraction

Text vectorisation is the process of turning text into numerical feature vectors. A feature is unique quality that is being observed within a dataset and using these qualities we form an n-dimensional vector, which is used to represent each document. Text Vectorisation is necessary because machine learning and deep learning algorithms can’t work directly with text.  It is essential to convert text into numerical values that the machine learning algorithm can understand and work with.

The methodology we used for text Vectorisation is termed the Bag-of-Words approach. This is a simple model that disregards the placements of words within documents but focuses on the frequency instead. This is done by considering each unique word as a feature. We then use this approach to represent any document as a representation of a fixed length of unique words known as the vocabulary of features. Each position for the unique word is filled by the count of the particular word appearing in that document, thus creating a document-term matrix which is a mathematical matrix that describes the frequency of terms that occur in a collection of documents

Example[1]

Suppose we have the vocabulary that includes the following:

Brown, dog, fox, jumped, lazy, over, quick, the, zebra

Then we are given an input document:

the quick brown fox jumped over the lazy dog

 

Brown Dog Fox Jumped Lazy Over Quick The Zebra Document 1 1 1 1 1 1 1 1 2 0

 

This document term matrix shown above is the numerical representation of the given input document.

  1. Term Frequency Inverse Document Frequency (TF-IDF)

Having a document term matrix that uses counts is a good representation but a basic one. One of the biggest issues is that reoccurring words like “are” will have large count values that are not meaningful to the vector representations of documents. TF-IDF is an alternate method of calculating document feature values for a vector representation. TF-IDF works by calculating the term frequency (frequency of a particular word within a document) and then multiplying it by the Inverse document frequency (this helps decrease the rating of words that appear too frequently in the document set and favours unique/unusual words).

Therefore, once we had created a vocabulary and built the document term matrix (DTM) we applied the TF-IDF approach onto the DTM to increase the weighting of words that are unique to the documents themselves.

Example – Application on a Document-Term Matrix

Let’s say we have a document-term matrix with two documents in it and we want to put TF-IDF weighting on it.

 

Brown Dog Fox Jumped Lazy Over Quick The Zebra Doc 1 1 1 1 1 1 1 1 2 0 Doc 2 0 1 0 1 1 1 0 2 1

 

TF-IDF = Term Frequency * Inverse Document Frequency

Term Frequency = the number of times the word appears within a document.

Inverse Document Frequency = Log (Total number of documents / Number of documents having the particular word)

 

  Term Frequency Inverse Document Frequency TF-IDF Doc 1 Doc 2 Doc 1 Doc 2 Brown 1 0 Log(2/1) = Log(2) Log(2) 0 Dog 1 1 Log(2/2) = Log(0) = 0 0 0 Fox 1 0 Log(2/1) = Log(2) Log(2) 0 Jumped 1 1 Log(2/2) = Log(0) = 0 0 0 Lazy 1 1 Log(2/2) = Log(0) = 0 0 0 Over 1 1 Log(2/2) = Log(0) = 0 0 0 Quick 1 0 Log(2/1) = Log(2) Log(2) 0 The 2 2 Log(2/2) = Log(0) = 0 0 0 Zebra 0 1 Log(2/1) = Log(2) 0 Log(2)

 

After applying TF-IDF weighting it is clearly visible that words that are unique and provide greater meaning having higher weightings compared to those that don’t.

 

  1. Training and Test Split

The pilot used the standard ratio 75% training data to 25% testing data in its approach. To begin with we took 75% of the pre-classified “Required as State archive” content and used this data to train the algorithm to build the model. Once the training had been completed the same algorithm and model was used to process the 25% test set. This allows us to assess how accurately the model performs and determine a percentage of successful prediction. Our results are shown below.

  1. Machine learning algorithm overview

We used two machine learning algorithms to build our model: multinomial Naïve Bayes and the multi-layer perceptron. These algorithms were chosen as they are widely used for this type of application

  • Multinomial Naïve Bayes

Multinomial Naïve Bayes is part of a family of simplistic probability based classifiers. The classifier is based on the Bayes theorem that involves the use of strong independent assumptions between features.

  • Multi-Layer Perceptron

Multi-layer perceptron is a supervised learning algorithm that can learn as a non-linear function approximator for either classification or regression.

 

Statistical Analysis

To demonstrate the results of the internal pilot we have created a confusion matrix and summary result tables to display the comparison of the two algorithms used.

Confusion Matrix

A confusion matrix is a table that summarizes how successfully a classification model’s predictions were i.e. the correlation between the actual label and the model’s classification. One axis of a confusion matrix is the label that the model predicted, and the other axis is the actual label. The size of the table on both axes represents the number of classes. Note: The confusion matrices presented below are representative but aren’t the exact ones used to determine the results.

Confusion matrices contain sufficient information to calculate a variety of performance metrics, including precision and recall. Precision identifies the frequency with which a model was correct when predicting the positive class and recall answers the following question: out of all the possible positive labels, how many did the model correctly identify?

Multinomial Naïve Bayes

Pre Data Cleaning Cleaned Data Features 5,000

Best Accuracy: 65.4%

F1 Score: 0.624

Training Time: 109 ms Features 5,000

Best Accuracy: 69%

F1 Score: 0.648

Training Time: 108 ms Features 10,000

Accuracy: 64%

F1 Score:0.622

Training Time: 111 ms Features 10,000

Accuracy: 0.68%

F1 Score: 0.638

Training Time: 109 ms

Multi-Layer Perceptron

 

Pre Data Cleaning Cleaned Data Features 5,000,

Accuracy: 77%

F1 Score: 0.767

Training Time: 2 min 23s Features 5,000

Accuracy: 82.7%

F1 Score: 0.812

Training Time: 2 min 43s Features 10,000

Accuracy: 78%

F1 Score: 0.777

Training Time: 3 min 28s Features 10,000

Accuracy: 84%

F1 Score: 0.835

Training Time: 4 min 02s

 

Key Description

F1 Score[2]: is a measure of the models accuracy – It considers both the precision p and the recall r of the test to compute the score

Results

The pilot results have given us some pleasing statistics with a maximum of 84% successful hit rate using the Multi-layer Perceptron algorithm. The pilot gave us the opportunity to compare two algorithms and assess how both un-cleaned and cleaned data performed with those algorithms. The results demonstrate that this technology is capable of assisting with the classification and disposal of unclassified unstructured data.

Discussions

 

The following points provide considerations, limitations and the possible anticipation of the use of machine learning for records management:

  • Any error made on the training data during sentencing will only increase in the model over time. This would also apply to any intentional bias created in the training data.
  • The need for a large training set of classified data to achieve results over the test data.
  • Using cloud services and understanding all the terms of services before using them is very important especially around issues of personal privacy of individuals and legal ownership of the data being stored.
  • The corpus used was manually sentenced at folder level with only a sampling of individual documents whereas the model was able to sentence directly as document level in a much timelier manner.
  • Having enough available computational volume on local machines to process the model.
  • Exceptional results from only around 100 lines of code, having enough expertise and using the correct algorithm.
  • Could we build a GA28 Machine Learning Black Box to help agencies manage administrative records?
  • Do we know what the sentencing success rate was in the paper paradigm with manual human sentencing and how would that compare with the machine learning technologies?

Acknowledgements

I would like to thank and acknowledge the work of Malay Sharma (ICT Graduate) who was on a rotational placement just at the right time.

[1] https://stackoverflow.com/questions/17053459/how-to-transform-a-text-to-vector (Accessed on 1/12/2017)

[2] https://adamyedidia.files.wordpress.com/2014/11/f_score.pdf (Accessed on 5/12/2017)

 

Top digital trends affecting records, information and content management

12 March 2018 - 12:21pm

We recently attended an expo where a “big picture” view of the current digital trends was presented and we saw how these trends relate- to product developments in the records, information and content management space.

Some of the trends discussed were based around users and how they use and demand technology

  • Users are changing the way information technology (IT) works in that users are demanding technology to be always on, connected and providing real-time information.
  • Users are also changing procurement models wherein software applications are now offered on the web and in a subscription based model without requiring any assistance from the IT department. From a recordkeeping perspective, agencies need to assess their business needs, value and sensitivity of their business activities, and the records created before purchasing and implementing any web applications.
  • Providers are investing in the “app-ification” of traditional on premise applications to ensure buy-in from users. “App-ification” ensures that the software will be intuitive and be accessible with mobile devices.

Deployment model for records, information or content management systems is shifting from an on premise to hybrid to purely cloud environment. This shift has the possibility of making the integration of recordkeeping systems with software-as-a-service applications simple and less complicated. However, before moving your recordkeeping systems into the cloud ensure that you have satisfied the conditions under the general authority on transferring records out of NSW (GA35).

There is a big focus on data analytics software and how these applications are evolving. These applications are:

  • providing data visualisation functionality wherein graphs, maps and dashboards can be created and manipulated
  • enabling sharing of insights for decision-making and collaboration purposes
  • enabling delivery of insights/reports in any format to any device.

Automation of metadata and records management classification

We are following this trend closely as it has the potential to change recordkeeping as we know it. The premise is that the new technologies can automatically and consistently apply metadata to records and information that can help with search and faceted browsing.

With the use of machine learning technologies, records management classification with trigger-based retention will be consistently applied. NSW State Archives and Records presented the results of an in-house pilot on the application of machine learning in last year’s Records Managers Forum.

What’s in it for me?

Quite simply we need to learn new skills and with how the technology is rapidly progressing – it seems everyone needs to be a data scientist, business analyst, user experience specialist or software developer and, of course, a legal expert to interpret terms and conditions attached to purchasing applications.

Image credit: Quantum Computing by Kevin Dooley Attribution (CC BY 2.0)

‘Recordkeeping by design’ – opportunities for local government

26 February 2018 - 4:18pm

As we have noted before, ‘digital transformation’ is a priority in NSW. At the State level, the NSW Digital Government Strategy aims to see a ‘digital by design’ approach embedded across the NSW public sector.

The NSW Government’s goal is for people to transact with government via digital channels wherever possible. Four years ago, 44% of government transactions occurred via digital channels. The Government wants this level to reach 70% by 2019.

Local government is also caught up in the digital transformation, with many councils seeing their data as a major organisational asset that can drive innovation and productivity.  Last week I spoke at a conference for local government information managers on the recordkeeping opportunities and risks associated with digital transformation. Here is some of what I said:

Local councils in NSW must create and keep full and accurate records of their business. They must keep these records for as long as needed to meet regulatory, business and community requirements. And they must keep some records for ever as State archives.

But the transition to digital processes is not always accompanied by good recordkeeping. Will records be fit for purpose? Will they be complete and trustworthy? Will organisations be able to access and read relevant records when required? Will organisations protect records from unauthorised access or deletion? And will organisations destroy those records they no longer need?

An organisation might have had robust processes for creating and keeping records when the process was paper-based: officers may have placed completed forms on a paper file and stored the file in secure conditions until it could be legally destroyed. But how will the organisation maintain records of electronic forms submitted via its website? Does the organisation have a process for capturing the data from submitted forms in a digital system? Does this system have adequate controls to ensure the authenticity and reliability of the data? Can this system maintain the data for as long as needed? Can this system protect the data from unauthorised access and deletion? And what will happen to the data when the organisation moves to a new system?

The transition to digital ways of working provides opportunities to consider ‘recordkeeping by design’. When digitising a process, organisations can identify the records they need to create and keep. Organisations can then design effective ways of creating and keeping these for as long as required.

To highlight how this might work in practice, let’s look at a few examples:

Business systems

In NSW many local councils were early adopters of dedicated systems for functions ranging from HR and financial management to asset management, planning and cemetery management.

When effectively implemented these systems provide users with real-time, relevant information they need to do their jobs. They can also provide opportunities to embed recordkeeping ‘behind the scenes’ – users can simply use the systems to do their jobs, confident in the knowledge that full and accurate records are being created and kept.

To take a ‘recordkeeping by design’ approach, organisations first need to determine what records they need to create and keep. Are there legislative or other requirements for creating particular records, or for keeping particular records for defined periods of time?

Let’s take the provision of childcare services as an example. Many councils operate childcare facilities. In NSW at the moment, councils need to keep certain childcare-related records, including records of complaints, until the child reaches or would have reached 25 years of age.

If we think about a system for managing the resolution of complaints:

  • Can the system keep a fixed and complete version of each record of complaint? In a document-based system, this could be relatively straightforward. In other systems the ‘record’ might be a collection of data representing each complaint.
  • Is the system designed around the principle of ‘non-redundancy’, so users can update information without keeping a record of previous inputs? If so, to keep fixed and complete versions of records you may need to periodically export a report of the data and retain this as the record.
  • Does the system capture and keep core recordkeeping metadata? This includes information about the business context in which records were created and used, such as who created the records and when.
  • Can the system prevent ad hoc deletion of data? If it can’t, the integrity of the records will be compromised.
  • Does the system generate, log and show all actions carried out in the system? This includes information about what changes were made, when and by whom. This kind of information is critical to ensuring that your organisation can account for how complaints were resolved.
  • As the records need to be kept for long periods of time it is likely that they will outlive the system. What export functionality does the system have? Is it capable of exporting data and system logs without compromising their quality and integrity?

NSW State Archives has published a checklist for assessing business systems – organisations can use this checklist to determine if a specific system has adequate recordkeeping functionality and, if not, what mitigation actions are required.

Social media

Social media provides an excellent opportunity for local councils to engage with residents on a variety of subjects. This engagement can occur in real time and be reciprocal, allowing councils to provide timely updates and information and residents to provide immediate feedback.

A recent skim of my local council’s Twitter account revealed discussions between Council and residents about playgroups, instances of illegal parking, a mistakenly issued parking ticket, resolutions from Council meetings, the redevelopment of the aquatic centre and upcoming talks at the library.

Social media channels such as Twitter and Facebook create records by default. They also automatically create contextual metadata, such as information about who created the records and when.

Many records in social media accounts document low risk business activities and have short retention periods (e.g. Tweets advertising events at the library). For these types of records, an appropriate recordkeeping strategy may be to leave them in the social media application and rely on two things:

  • the application continuing to exist in the short to medium term
  • the application keeping the records for you.

However some interactions on social media may require more interventionist strategies when it comes to keeping records (e.g. a discussion about a mistakenly issued parking ticket). NSW councils need to keep records relating to complaints that require investigation or a specific response for 7 years. If the council manages records relating to complaint investigation and resolution in a specific system, it may be a better strategy to capture a record of this engagement on Twitter in their complaints management system.

NSW State Archives has published advice on managing records arising from the use of social media – this includes working out what social media records you need to keep and for how long, and developing strategies for capturing and storing these. Our general advice is: if you need it, manage it; if you don’t need it, leave it.

In local government, the management of social media records can be complicated if a councillor uses their personal social media account to conduct council business. In NSW, councillors must make and keep records of any council business they conduct. If they are doing significant council business via their personal social media account, the council may need to extract records from the account and keep them in council recordkeeping systems.

The use of personal accounts to transact organisational business is an issue that extends beyond the use of social media and applies to all organisations, not just local councils. We have seen recent examples of ministers in Queensland and Canada in trouble for using their personal email accounts to do government business. And we know that many organisations have identified the use of personal email accounts as a risk to their corporate information.

Some organisations have taken the step of prohibiting their employees from using personal email accounts for work purposes. Others have taken a more pragmatic approach, and instead advise employees that they must copy any work-related messages from their personal accounts to corporate systems.

The cloud

Digital transformation often involves moving business to cloud-based technologies. Organisations are implementing cloud-based customer relationship management systems, HR systems and finance systems. Organisations are moving their email to the cloud. And many organisations are moving away from on premise storage of documents to the use of Microsoft Office 365, One Drive or Google Suite for Business.

The uptake of cloud-based services is a key component of ICT strategy in many organisations. The transition to the cloud results in streamlined procurement, more effective pricing, agility and scalability, and greater flexibility for organisations in how they consume services.

For users, it also offers opportunities to access information from multiple devices and locations – users no longer need to be sitting at their desk in the head office to log on to a system. In local government we see, for example, that employees who spend their days out and about (like rangers or maintenance officers) are reaping benefits from using cloud-based systems to log data on the go.

The use of cloud-based services does not diminish or remove the statutory responsibilities of local councils in NSW to make and keep full and accurate records of their activities and ensure the safe custody and proper preservation of these records. And there are information risks associated with the transition to the cloud:

  • organisations may cede control or ownership of their data to cloud providers
  • organisations may be prevented from accessing their data when they need it, or from keeping their data as long as they need it.

NSW State Archives advises that organisations in NSW should think about recordkeeping when procuring cloud based services. In particular, organisations need to put appropriate controls in place to ensure that they will continue to have access to records created and kept in the cloud, and can export records in useable and complete formats at the conclusion of projects or contracts.

We recommend that organisations ask a range of questions before starting to use cloud-based systems, especially if these systems will support high value or high risk areas of business. For example:

  • Can the provider commit to storing and processing your data in specific jurisdictions that are acceptable to your organisation (that have, for example, legal frameworks which are compatible with Australia’s environment)?
  • What form can the data be exported from the system in, and what metadata is exportable?
  • Can the provider assure that no copy of the data is retained by the provider after the termination of the contract?
  • Can your organisation specify data to be destroyed and can the provider give assurance of destruction, such as certificates of destruction?
  • Can the provider assure that your data cannot be used for applications not specified in the contract (e.g. to data match with databases owned by other clients of the provider)?
  • Will your organisation be consulted regarding any third party seeking to have access to your data? And how will third party access to your records be managed, for example if required by a government watchdog organisation in the jurisdiction in which the records are stored?

It is so important that organisations identify information risks BEFORE entering into contracts for cloud-based services, and that organisations develop and implement appropriate mitigation strategies. Mitigation strategies might include:

  • establishing contractual arrangements to manage known risks
  • periodically exporting any data that documents high value and high risk areas of business to an on premise system that has appropriate recordkeeping functionality
  • monitoring contractual arrangements.
The role of risk assessment

Of course, this proactive approach to recordkeeping requires an investment of time, money and people. Organisations need to create appropriate policies and processes, and implement suitable systems, security and storage.

The business value of records must be commensurate with the cost of maintaining them. Prioritising high value and high risk business areas and the records they create provides the clearest opportunity to demonstrate the value of recordkeeping to an organisation.

NSW State Archives encourages organisations to prioritise the work they do to implement effective recordkeeping. Organisations should target records which document and support high risk and high value areas of business, and which are subject to information risks, for appropriate management. Organisations need to:

  • know what their key digital records are
  • keep these records in secure, well-managed systems
  • protect and manage these records for as long as they need them
  • develop migration strategies for information that is needed for the long term.

We have previously posted about the high risk and high value areas of business for local councils. Focussing ‘recordkeeping by design’ efforts in these areas will help to ensure that councils are creating and keeping critical records for as long as needed.

photo by: hehaden