The presentations and podcasts from the Records Managers Forum held 28th March 2018 are now available
The Records Managers Forum provides an opportunity for NSW public sector records professionals to share stories, discuss issues of current concern and impart strategies on key records and information management initiatives.
The Forum included presentations from:
- Nicola Forbes, Principal Manager Information Services and Records, Transport for NSW
- Lewis Dryburgh, Records and Information Manager, NSW Treasury
- Peter Donnelly, Information Services Officer and Right to Information Officer, Information & Privacy Commission NSW and Michael King, Principal Records Manager, Department of Family & Community Services
- Catherine Robinson, Senior Project Officer, Government Recordkeeping, NSW State Archives and Records.
Nicola presented the Information Toolkit for Transport Projects recently developed by her unit, Transport Shared Services. The Toolkit comprises a suite of tools designed to assist Project Delivery Offices, including staff and contractors in embedding information and records management responsibilities.
Lewis shared the results of his recently completed Masters Research project which investigated the information management practices in technology start-ups. The information practices of young professionals in IT start-ups which Lewis describes provide interesting insights into the challenges and perceptions we face when introducing recordkeeping practices to new recruits to government.
In their presentation, Peter and Michael introduced the Community of Records Management Professionals – its mission, charter, and benefits for the members, workforce and the sector.
Catherine introduced the Code of Best Practice for Recordkeeping, based on AS ISO 15489.1:2017. Catherine provided a short summary of changes and implications, and also explained the importance of adopting the Code.
You can find the presentations and podcasts here.
As always, please don’t hesitate to contact us for more information on the presentations or if you have something to share with us.Image credit: Community can be beautiful by Alan Levine
Last week the Digital Implementers Group enjoyed a presentation by one of the members of the Group on auto-classification.
Following the end of a service provider’s contract, a government agency received the property service records that the provider had been creating and managing for ten years. These records consisted of over 400,000 electronic documents contained in 31,000 folders, some up to 14 levels deep. Many of the records did not have consistent titling or match the agency’s own records classification scheme. Due to the impending transition to a new service provider, the records needed to be migrated and classified in a matter of months.
With the scope and timeframe of this migration project rendering manual classification out of the question, it was the perfect opportunity to trial auto-classification.How the auto-classification system worked
The project team chose to leverage existing investment in TRIM and pilot the use of the auto-classification module. The rationale was that the TRIM auto-classification module was more affordable than procuring a new system as it only required upgrading an existing system.
The auto-classification solution that the team used involved three components. The first stage was an Optical Character Recognition (OCR) program which transformed image files into readable text. The file was then indexed by a content indexing server, and finally forwarded to the auto-classification module to be classified.
While the OCR component of the project was slower and resource-heavy, there was still a strong business case to be made as the OCR component made documents searchable that were not previously.An agile, continuous process for refining terms
The accuracy of the auto-classification system relied on the definition of a set of terms. When a collection of terms were identified in a record, the system filed it to the corresponding classification.
Initially, the team allowed the auto-classification program to train itself to define the terms for each category. This approach was not successful as the module identified many unknown or garbage terms. A subject matter expert then input manually terms they would expect to see for each classification. This was the most resource-heavy part of the project and the most critical for its success. Refining the terms, feeding new documents through, observing the results and then refining the terms again was an agile, continuous process.Outcomes
During the testing phase, 5 -7,000 documents were uploaded into the system and were being auto-classified in under two hours, however this rate will change as the team are going to implement a bulk uploader. The OCR component was the most time-consuming in the process, and initially created a bottleneck in processing.Key learnings ‘Better but not best’
One of the members of the group asked about the risks of classifying documents which were not in fact ‘records’. Due to the time limitations of the project, the team was unable to triage the documents so proceeded with an ‘over capture’ approach and accepted that ‘non-records’ would be captured.
The outcomes of the auto-classification project were described as ‘better but not best.’ Accepting that the outcomes will always be imperfect was one of the biggest lessons of the project.Terms are vital
The success of the auto-classification depended on the definition and weighing of the terms involved. For the category ‘Cleaning’, 95% of the records were auto-classified correctly. This was because many terms were specific to that category. Other categories did not work as well, usually due to the duplication of terms across classifications. The team learned that auto-classification systems do not work straight out of the box, and accurate classification only happens when there is good implementation and definitions of terms for use cases.Importance of a strong business case
One member who had worked on the project explained that their auto-classification system worked best if you were dealing with a ‘mess’ of records. They found that there needed to be a strong business case for spending a large amount of resources on the labour intensive parts of the project and this would be hard to justify unless there was a large volume of unorganised records.Educate stakeholders to manage user expectations
Stakeholders often wanted to know how well the auto-classification system would classify records (e.g. would it correctly classify 9 out of 10 documents?) Due to the variables and unknowns in how the system would work and what the records actually held, that question was not able to be answered. It was important in the face of these unknowns to educate stakeholders on the processes you are applying and to set expectations low. The team originally estimated the system would correctly classify 50% of records, although system testing is providing a higher success rate now.What’s next?
The project team can see several other uses for the system. One of the ideas was to integrate the auto-classification system with front end customer service procedures. For example, the system could automatically classify routine forms for business services as soon as they were saved in the system.
Looking to the future, members discussed whether auto-classification could eventually make records managers redundant. Some members thought it could have the opposite effect, as auto-classification could allow records professionals to focus more on aspects of their work such as standards, procedures and programming rather than manual disposal and migration.
Photo by Matthew Paulson
Next to emails, share drives / network drives or file shares are probably the most utilised resource for storing records in any agency. Often they are a nightmare to navigate, let alone manage.
One of the strategies used to solve the problems associated with file shares is to implement an electronic document and records management system (EDRMs). However, having EDRMs in place doesn’t guarantee reduction of file share usage.
We are fortunate that the Aboriginal Housing Office shared to us how their Records Management (RM) Program implementation increased their EDRMs usage adoption and reduced their file share footprint.
The Aboriginal Housing Office (AHO) is a legislative authority established under the Aboriginal Housing Act 1998. The AHO administers the policies and funding arrangements for Aboriginal community housing in NSW.
Pre-2015 records management in AHO
AHO’s records management was characterised by the following:
- hard copy records were stored in various locations and pinpointing where specific records were, took a lot of time and effort
- AHO didn’t maintain its own recordkeeping system
- staff were not supported or trained in records management procedures / processes
- file share G:/ was the default repository for most of AHO records.
It was increasingly hard for AHO to administer its programs without trustworthy records and robust records management processes.
In 2015, AHO partnered with the Department of Family and Community Services (FACS) to modernise records management through the OneTRIM program.
Records Management Program
The AHO had strong change sponsorship from Shane Hamilton, Chief Executive which was critical to the success of the RM Program. (Click here to listen and view Shane’s presentation.) The RM program consisted of:
File Share Reduction Implementation Timeline
During the RM Program implementation, AHO identified that records were stored in file shares and not in the records System. A strategy to reduce file share usage was included as part of the RM Program. The strategy consisted of:
- Policy – all records for comment and approval need to go through the Records System and that all records previously saved in file shares will have to be saved in the Records System. The policy also recommended that file share G:/ will be read-only.
- Stakeholder engagement – business units were consulted and their business processes and needs were taken into consideration as part of change management. This resulted in the identification of exceptions where the Records System is unable to manage specific records or business processes.
- Training and communication – staff were given training and cheat sheets were developed and provided to the staff. Information relating to the implementation were also communicated. In addition, records management training was also included in the induction process.
Thank you very much to Christine Tran of AHO for sharing their strategy with us!
In 2017 State Archives NSW’s Digital Archives team began investigating the application of machine learning to records management. The first deliverable of this project was a research paper published on the FutureProof blog that explored the state of art (what the technology is capable of) and state of play (how it is being used in records management). One of the key findings of this report was that, although machine learning has potential to improve the classification and disposal of digital records, there has been very little adoption of the technology, particularly in New South Wales. In order to promote uptake we committed to a series of internal and external pilots to further explore and showcase machine learning for records management.
This case study documents an internal pilot that the Digital Archives team conducted in November and December 2017. The goal of this pilot was to apply off-the-shelf machine learning software to the problem of classifying a corpus of unstructured data against a retention and disposal authority. The results of this pilot were shared at the December 2017 Records Managers Forum.
One of the constraints of the internal pilot was that we had limited resources: no budget and (very fortunately) an ICT graduate placement that had recent university experience in machine learning. So in identifying suitable technologies to use in the pilot we looked for low cost, off-the-shelf solutions. We quickly settled on scikit-learn: a free and open source machine-learning library for the Python programming language. This is a simple and accessible set of tools that includes pre-built classifiers and algorithms. It was fortunate that we had a machine with a big CPU, copious RAM, and SSDs to run the model on.Method
The goal of the internal pilot was to test machine learning algorithms on a corpus of records that we had previously manually sentenced against a disposal authority. With what level of accuracy could we automatically match the corpus against the same disposal classes?
The records that were chosen for the internal pilot had been transferred to the Digital State Archive in 2016 by a central government department. This corpus was unusual in that it contained a complete corporate folder structure extracted from Objective. The full corpus comprises 30 GB of data, in 7,561 folders, containing 42,653 files. At the point of transfer, no disposal rules had been applied to the files (ordinarily we require that only records required as State Archives are transferred to our custody). In a joint effort with the department we manually sentenced the corpus (at a folder level) against the General Retention and Disposal Authority Administrative Records (GA28). The result of this manual appraisal of the folders was a total of 12,369 files required as State archives.
The following options were considered for the internal pilot:
- to apply all “Required as State archive” classes from GA28 (75 in total). Folders that didn’t fit these classes would be unclassified
- to apply the subset of “Required as State archive” classes that had been manually identified in the corpus (23 in total). Folders that didn’t fit these classes would be excluded from the corpus
- to apply all of the GA28 classes (686 in total). To do a complete test of all folders
- to pre-treat the corpus by removing all folders which would be covered by NAP (Normal Administrative Practice) E.g. duplicates or non-official/ private records
The decision was made to pre-treat the corpus and remove all folders which would be covered by NAP (Normal Administrative Procedures) and to take the subset of 12,369 files that were identified as being “Required as State archives” which used only 23 classes of GA 28. Further preparation of the subset involved assigning the classification from the folder level at the level of the individual files. This was done manually.
Break down of the corpus:
Data set Number of file contained Complete corpus 42653 NAP (Normal Administrative Procedures) 25643 Corporate file plan 17307 Required as State Archives 12369 Required as State Archives and formats that could be text extracted – i.e. the usable sample set 8784
Text Extraction and Classification steps
- Text Extraction
To be usable, the documents chosen for analysis need to be easily text extractable. This was to ensure performance and ease of conducting further text manipulation later in the project. Only 8,784 files of the 12,369 files which were classified as State archives were selected for use because their file types allowed simple text extraction.
After sorting the sample set, a Python program using various libraries was developed to extract text from the following file types: PDF, DOCX and DOC files.
The text that was extracted from documents was then placed within a single .csv file. The .csv file was divided into three columns: the file name (unique identifier), classification (GA 28 class), and lastly the text extract.
- Data cleaning
We took a very basic approach to data cleansing. The following concepts were utilised: remove document formatting, remove stop words, remove documents that are not required, and convert all letters to lower case.
- Text Vectorisation and Feature Extraction
Text vectorisation is the process of turning text into numerical feature vectors. A feature is unique quality that is being observed within a dataset and using these qualities we form an n-dimensional vector, which is used to represent each document. Text Vectorisation is necessary because machine learning and deep learning algorithms can’t work directly with text. It is essential to convert text into numerical values that the machine learning algorithm can understand and work with.
The methodology we used for text Vectorisation is termed the Bag-of-Words approach. This is a simple model that disregards the placements of words within documents but focuses on the frequency instead. This is done by considering each unique word as a feature. We then use this approach to represent any document as a representation of a fixed length of unique words known as the vocabulary of features. Each position for the unique word is filled by the count of the particular word appearing in that document, thus creating a document-term matrix which is a mathematical matrix that describes the frequency of terms that occur in a collection of documents
Suppose we have the vocabulary that includes the following:
Brown, dog, fox, jumped, lazy, over, quick, the, zebra
Then we are given an input document:
the quick brown fox jumped over the lazy dog
Brown Dog Fox Jumped Lazy Over Quick The Zebra Document 1 1 1 1 1 1 1 1 2 0
This document term matrix shown above is the numerical representation of the given input document.
- Term Frequency Inverse Document Frequency (TF-IDF)
Having a document term matrix that uses counts is a good representation but a basic one. One of the biggest issues is that reoccurring words like “are” will have large count values that are not meaningful to the vector representations of documents. TF-IDF is an alternate method of calculating document feature values for a vector representation. TF-IDF works by calculating the term frequency (frequency of a particular word within a document) and then multiplying it by the Inverse document frequency (this helps decrease the rating of words that appear too frequently in the document set and favours unique/unusual words).
Therefore, once we had created a vocabulary and built the document term matrix (DTM) we applied the TF-IDF approach onto the DTM to increase the weighting of words that are unique to the documents themselves.
Example – Application on a Document-Term Matrix
Let’s say we have a document-term matrix with two documents in it and we want to put TF-IDF weighting on it.
Brown Dog Fox Jumped Lazy Over Quick The Zebra Doc 1 1 1 1 1 1 1 1 2 0 Doc 2 0 1 0 1 1 1 0 2 1
TF-IDF = Term Frequency * Inverse Document Frequency
Term Frequency = the number of times the word appears within a document.
Inverse Document Frequency = Log (Total number of documents / Number of documents having the particular word)
Term Frequency Inverse Document Frequency TF-IDF Doc 1 Doc 2 Doc 1 Doc 2 Brown 1 0 Log(2/1) = Log(2) Log(2) 0 Dog 1 1 Log(2/2) = Log(0) = 0 0 0 Fox 1 0 Log(2/1) = Log(2) Log(2) 0 Jumped 1 1 Log(2/2) = Log(0) = 0 0 0 Lazy 1 1 Log(2/2) = Log(0) = 0 0 0 Over 1 1 Log(2/2) = Log(0) = 0 0 0 Quick 1 0 Log(2/1) = Log(2) Log(2) 0 The 2 2 Log(2/2) = Log(0) = 0 0 0 Zebra 0 1 Log(2/1) = Log(2) 0 Log(2)
After applying TF-IDF weighting it is clearly visible that words that are unique and provide greater meaning having higher weightings compared to those that don’t.
- Training and Test Split
The pilot used the standard ratio 75% training data to 25% testing data in its approach. To begin with we took 75% of the pre-classified “Required as State archive” content and used this data to train the algorithm to build the model. Once the training had been completed the same algorithm and model was used to process the 25% test set. This allows us to assess how accurately the model performs and determine a percentage of successful prediction. Our results are shown below.
- Machine learning algorithm overview
We used two machine learning algorithms to build our model: multinomial Naïve Bayes and the multi-layer perceptron. These algorithms were chosen as they are widely used for this type of application
- Multinomial Naïve Bayes
Multinomial Naïve Bayes is part of a family of simplistic probability based classifiers. The classifier is based on the Bayes theorem that involves the use of strong independent assumptions between features.
- Multi-Layer Perceptron
Multi-layer perceptron is a supervised learning algorithm that can learn as a non-linear function approximator for either classification or regression.
To demonstrate the results of the internal pilot we have created a confusion matrix and summary result tables to display the comparison of the two algorithms used.
A confusion matrix is a table that summarizes how successfully a classification model’s predictions were i.e. the correlation between the actual label and the model’s classification. One axis of a confusion matrix is the label that the model predicted, and the other axis is the actual label. The size of the table on both axes represents the number of classes. Note: The confusion matrices presented below are representative but aren’t the exact ones used to determine the results.
Confusion matrices contain sufficient information to calculate a variety of performance metrics, including precision and recall. Precision identifies the frequency with which a model was correct when predicting the positive class and recall answers the following question: out of all the possible positive labels, how many did the model correctly identify?
Multinomial Naïve Bayes
Best Accuracy: 65.4%
F1 Score: 0.624
Training Time: 109 ms Features – 5,000
Best Accuracy: 69%
F1 Score: 0.648
Training Time: 108 ms Features – 10,000
Training Time: 111 ms Features – 10,000
F1 Score: 0.638
Training Time: 109 ms
Pre Data Cleaning Cleaned Data Features – 5,000,
F1 Score: 0.767
Training Time: 2 min 23s Features – 5,000
F1 Score: 0.812
Training Time: 2 min 43s Features – 10,000
F1 Score: 0.777
Training Time: 3 min 28s Features – 10,000
F1 Score: 0.835
Training Time: 4 min 02s
The pilot results have given us some pleasing statistics with a maximum of 84% successful hit rate using the Multi-layer Perceptron algorithm. The pilot gave us the opportunity to compare two algorithms and assess how both un-cleaned and cleaned data performed with those algorithms. The results demonstrate that this technology is capable of assisting with the classification and disposal of unclassified unstructured data.Discussions
The following points provide considerations, limitations and the possible anticipation of the use of machine learning for records management:
- Any error made on the training data during sentencing will only increase in the model over time. This would also apply to any intentional bias created in the training data.
- The need for a large training set of classified data to achieve results over the test data.
- Using cloud services and understanding all the terms of services before using them is very important especially around issues of personal privacy of individuals and legal ownership of the data being stored.
- The corpus used was manually sentenced at folder level with only a sampling of individual documents whereas the model was able to sentence directly as document level in a much timelier manner.
- Having enough available computational volume on local machines to process the model.
- Exceptional results from only around 100 lines of code, having enough expertise and using the correct algorithm.
- Could we build a GA28 Machine Learning Black Box to help agencies manage administrative records?
- Do we know what the sentencing success rate was in the paper paradigm with manual human sentencing and how would that compare with the machine learning technologies?
I would like to thank and acknowledge the work of Malay Sharma (ICT Graduate) who was on a rotational placement just at the right time.
 https://stackoverflow.com/questions/17053459/how-to-transform-a-text-to-vector (Accessed on 1/12/2017)
 https://adamyedidia.files.wordpress.com/2014/11/f_score.pdf (Accessed on 5/12/2017)
We recently attended an expo where a “big picture” view of the current digital trends was presented and we saw how these trends relate- to product developments in the records, information and content management space.
Some of the trends discussed were based around users and how they use and demand technology
- Users are changing the way information technology (IT) works in that users are demanding technology to be always on, connected and providing real-time information.
- Users are also changing procurement models wherein software applications are now offered on the web and in a subscription based model without requiring any assistance from the IT department. From a recordkeeping perspective, agencies need to assess their business needs, value and sensitivity of their business activities, and the records created before purchasing and implementing any web applications.
- Providers are investing in the “app-ification” of traditional on premise applications to ensure buy-in from users. “App-ification” ensures that the software will be intuitive and be accessible with mobile devices.
Deployment model for records, information or content management systems is shifting from an on premise to hybrid to purely cloud environment. This shift has the possibility of making the integration of recordkeeping systems with software-as-a-service applications simple and less complicated. However, before moving your recordkeeping systems into the cloud ensure that you have satisfied the conditions under the general authority on transferring records out of NSW (GA35).
There is a big focus on data analytics software and how these applications are evolving. These applications are:
- providing data visualisation functionality wherein graphs, maps and dashboards can be created and manipulated
- enabling sharing of insights for decision-making and collaboration purposes
- enabling delivery of insights/reports in any format to any device.
Automation of metadata and records management classification
We are following this trend closely as it has the potential to change recordkeeping as we know it. The premise is that the new technologies can automatically and consistently apply metadata to records and information that can help with search and faceted browsing.
With the use of machine learning technologies, records management classification with trigger-based retention will be consistently applied. NSW State Archives and Records presented the results of an in-house pilot on the application of machine learning in last year’s Records Managers Forum.
What’s in it for me?
Quite simply we need to learn new skills and with how the technology is rapidly progressing – it seems everyone needs to be a data scientist, business analyst, user experience specialist or software developer and, of course, a legal expert to interpret terms and conditions attached to purchasing applications.Image credit: Quantum Computing by Kevin Dooley Attribution (CC BY 2.0)
As we have noted before, ‘digital transformation’ is a priority in NSW. At the State level, the NSW Digital Government Strategy aims to see a ‘digital by design’ approach embedded across the NSW public sector.
The NSW Government’s goal is for people to transact with government via digital channels wherever possible. Four years ago, 44% of government transactions occurred via digital channels. The Government wants this level to reach 70% by 2019.
Local government is also caught up in the digital transformation, with many councils seeing their data as a major organisational asset that can drive innovation and productivity. Last week I spoke at a conference for local government information managers on the recordkeeping opportunities and risks associated with digital transformation. Here is some of what I said:
Local councils in NSW must create and keep full and accurate records of their business. They must keep these records for as long as needed to meet regulatory, business and community requirements. And they must keep some records for ever as State archives.
But the transition to digital processes is not always accompanied by good recordkeeping. Will records be fit for purpose? Will they be complete and trustworthy? Will organisations be able to access and read relevant records when required? Will organisations protect records from unauthorised access or deletion? And will organisations destroy those records they no longer need?
An organisation might have had robust processes for creating and keeping records when the process was paper-based: officers may have placed completed forms on a paper file and stored the file in secure conditions until it could be legally destroyed. But how will the organisation maintain records of electronic forms submitted via its website? Does the organisation have a process for capturing the data from submitted forms in a digital system? Does this system have adequate controls to ensure the authenticity and reliability of the data? Can this system maintain the data for as long as needed? Can this system protect the data from unauthorised access and deletion? And what will happen to the data when the organisation moves to a new system?
The transition to digital ways of working provides opportunities to consider ‘recordkeeping by design’. When digitising a process, organisations can identify the records they need to create and keep. Organisations can then design effective ways of creating and keeping these for as long as required.
To highlight how this might work in practice, let’s look at a few examples:Business systems
In NSW many local councils were early adopters of dedicated systems for functions ranging from HR and financial management to asset management, planning and cemetery management.
When effectively implemented these systems provide users with real-time, relevant information they need to do their jobs. They can also provide opportunities to embed recordkeeping ‘behind the scenes’ – users can simply use the systems to do their jobs, confident in the knowledge that full and accurate records are being created and kept.
To take a ‘recordkeeping by design’ approach, organisations first need to determine what records they need to create and keep. Are there legislative or other requirements for creating particular records, or for keeping particular records for defined periods of time?
Let’s take the provision of childcare services as an example. Many councils operate childcare facilities. In NSW at the moment, councils need to keep certain childcare-related records, including records of complaints, until the child reaches or would have reached 25 years of age.
If we think about a system for managing the resolution of complaints:
- Can the system keep a fixed and complete version of each record of complaint? In a document-based system, this could be relatively straightforward. In other systems the ‘record’ might be a collection of data representing each complaint.
- Is the system designed around the principle of ‘non-redundancy’, so users can update information without keeping a record of previous inputs? If so, to keep fixed and complete versions of records you may need to periodically export a report of the data and retain this as the record.
- Does the system capture and keep core recordkeeping metadata? This includes information about the business context in which records were created and used, such as who created the records and when.
- Can the system prevent ad hoc deletion of data? If it can’t, the integrity of the records will be compromised.
- Does the system generate, log and show all actions carried out in the system? This includes information about what changes were made, when and by whom. This kind of information is critical to ensuring that your organisation can account for how complaints were resolved.
- As the records need to be kept for long periods of time it is likely that they will outlive the system. What export functionality does the system have? Is it capable of exporting data and system logs without compromising their quality and integrity?
NSW State Archives has published a checklist for assessing business systems – organisations can use this checklist to determine if a specific system has adequate recordkeeping functionality and, if not, what mitigation actions are required.Social media
Social media provides an excellent opportunity for local councils to engage with residents on a variety of subjects. This engagement can occur in real time and be reciprocal, allowing councils to provide timely updates and information and residents to provide immediate feedback.
A recent skim of my local council’s Twitter account revealed discussions between Council and residents about playgroups, instances of illegal parking, a mistakenly issued parking ticket, resolutions from Council meetings, the redevelopment of the aquatic centre and upcoming talks at the library.
Social media channels such as Twitter and Facebook create records by default. They also automatically create contextual metadata, such as information about who created the records and when.
Many records in social media accounts document low risk business activities and have short retention periods (e.g. Tweets advertising events at the library). For these types of records, an appropriate recordkeeping strategy may be to leave them in the social media application and rely on two things:
- the application continuing to exist in the short to medium term
- the application keeping the records for you.
However some interactions on social media may require more interventionist strategies when it comes to keeping records (e.g. a discussion about a mistakenly issued parking ticket). NSW councils need to keep records relating to complaints that require investigation or a specific response for 7 years. If the council manages records relating to complaint investigation and resolution in a specific system, it may be a better strategy to capture a record of this engagement on Twitter in their complaints management system.
NSW State Archives has published advice on managing records arising from the use of social media – this includes working out what social media records you need to keep and for how long, and developing strategies for capturing and storing these. Our general advice is: if you need it, manage it; if you don’t need it, leave it.
In local government, the management of social media records can be complicated if a councillor uses their personal social media account to conduct council business. In NSW, councillors must make and keep records of any council business they conduct. If they are doing significant council business via their personal social media account, the council may need to extract records from the account and keep them in council recordkeeping systems.
The use of personal accounts to transact organisational business is an issue that extends beyond the use of social media and applies to all organisations, not just local councils. We have seen recent examples of ministers in Queensland and Canada in trouble for using their personal email accounts to do government business. And we know that many organisations have identified the use of personal email accounts as a risk to their corporate information.
Some organisations have taken the step of prohibiting their employees from using personal email accounts for work purposes. Others have taken a more pragmatic approach, and instead advise employees that they must copy any work-related messages from their personal accounts to corporate systems.The cloud
Digital transformation often involves moving business to cloud-based technologies. Organisations are implementing cloud-based customer relationship management systems, HR systems and finance systems. Organisations are moving their email to the cloud. And many organisations are moving away from on premise storage of documents to the use of Microsoft Office 365, One Drive or Google Suite for Business.
The uptake of cloud-based services is a key component of ICT strategy in many organisations. The transition to the cloud results in streamlined procurement, more effective pricing, agility and scalability, and greater flexibility for organisations in how they consume services.
For users, it also offers opportunities to access information from multiple devices and locations – users no longer need to be sitting at their desk in the head office to log on to a system. In local government we see, for example, that employees who spend their days out and about (like rangers or maintenance officers) are reaping benefits from using cloud-based systems to log data on the go.
The use of cloud-based services does not diminish or remove the statutory responsibilities of local councils in NSW to make and keep full and accurate records of their activities and ensure the safe custody and proper preservation of these records. And there are information risks associated with the transition to the cloud:
- organisations may cede control or ownership of their data to cloud providers
- organisations may be prevented from accessing their data when they need it, or from keeping their data as long as they need it.
NSW State Archives advises that organisations in NSW should think about recordkeeping when procuring cloud based services. In particular, organisations need to put appropriate controls in place to ensure that they will continue to have access to records created and kept in the cloud, and can export records in useable and complete formats at the conclusion of projects or contracts.
We recommend that organisations ask a range of questions before starting to use cloud-based systems, especially if these systems will support high value or high risk areas of business. For example:
- Can the provider commit to storing and processing your data in specific jurisdictions that are acceptable to your organisation (that have, for example, legal frameworks which are compatible with Australia’s environment)?
- What form can the data be exported from the system in, and what metadata is exportable?
- Can the provider assure that no copy of the data is retained by the provider after the termination of the contract?
- Can your organisation specify data to be destroyed and can the provider give assurance of destruction, such as certificates of destruction?
- Can the provider assure that your data cannot be used for applications not specified in the contract (e.g. to data match with databases owned by other clients of the provider)?
- Will your organisation be consulted regarding any third party seeking to have access to your data? And how will third party access to your records be managed, for example if required by a government watchdog organisation in the jurisdiction in which the records are stored?
It is so important that organisations identify information risks BEFORE entering into contracts for cloud-based services, and that organisations develop and implement appropriate mitigation strategies. Mitigation strategies might include:
- establishing contractual arrangements to manage known risks
- periodically exporting any data that documents high value and high risk areas of business to an on premise system that has appropriate recordkeeping functionality
- monitoring contractual arrangements.
Of course, this proactive approach to recordkeeping requires an investment of time, money and people. Organisations need to create appropriate policies and processes, and implement suitable systems, security and storage.
The business value of records must be commensurate with the cost of maintaining them. Prioritising high value and high risk business areas and the records they create provides the clearest opportunity to demonstrate the value of recordkeeping to an organisation.
NSW State Archives encourages organisations to prioritise the work they do to implement effective recordkeeping. Organisations should target records which document and support high risk and high value areas of business, and which are subject to information risks, for appropriate management. Organisations need to:
- know what their key digital records are
- keep these records in secure, well-managed systems
- protect and manage these records for as long as they need them
- develop migration strategies for information that is needed for the long term.
We have previously posted about the high risk and high value areas of business for local councils. Focussing ‘recordkeeping by design’ efforts in these areas will help to ensure that councils are creating and keeping critical records for as long as needed.
photo by: hehaden
The Digital Implementers Group met last week to talk about the ways in which NSW Government agencies have implemented electronic approvals (e-approvals). The group talked about the pros and cons of various applications and how digitising approval processes has benefits both for business AND recordkeeping.
Here is a summary of what the group said:Paper-based processes are inefficient and don’t work in flexible workplaces
One member talked about how the inefficiency of paper-based approval processes drove the adoption of e-approvals. Employees kept losing paper approval forms, it took a long time to get things approved, it was difficult to track who had approved what and when, and documents related to an approval process were saved in random and multiple places.
In this organisation, the inefficiencies of the paper-based process resulted in organisational support for moving to e-approvals.
Another member commented that paper-based processes are doomed as organisations move to flexible and activity-based working. It won’t be possible to circulate pieces of paper around an organisation for sign-off when users are working at different times and in various locations.The adoption of e-approvals presents opportunities to redesign and improve processes
One member commented that the digitisation of an approval process presents a good opportunity to re-examine the business purpose of the process and identify possible improvements. When digitising a process you need to consider the business objective, and not necessarily replicate the current process if it is inefficient or contains unnecessary steps.
Another member noted that their organisation has had e-approvals in place for a few years and it is now timely to re-examine the processes with a view to streamlining and improving them.The use of e-approvals results in better recordkeeping
One member talked about how the implementation of e-approvals has dramatically increased the number of records captured in the corporate recordkeeping system. Because the e-approvals application is integrated with the organisation’s EDRMS and all documents as well as details of who approved what when are captured, recordkeeping is a by-product of using the application.
As well as increasing record capture rates, the use of e-approvals has also increased the visibility of the organisation’s EDRMS. Users know that if they want to refer to past approvals documentation it will be in the EDRMS.
For recordkeeping professionals in organisations without a strong recordkeeping culture, the use of e-approvals applications presents an opportunity to achieve recordkeeping by default. One member noted that they are aiming to use their e-approvals application for any process involving two or more users making a decision – this will ensure that records documenting organisational decision making are captured and kept in ways to support their ongoing accessibility and value.E-approvals applications need to extend beyond organisational borders
Some of the members commented that the key disadvantage of their organisations’ existing e-approvals applications is the inability to involve external users in approval processes. In these organisations, approval processes can only get to a certain point electronically and documents must then be printed or otherwise exported to external approvers.Approval processes can also be managed in business systems
One member noted that their organisation has numerous systems with approval functionality used by different business areas in support of various processes. These systems meet business needs, so there is no need to implement an additional application from the perspective of the business.
The group noted that such systems may have sufficient recordkeeping functionality for the records they capture and keep. Organisations can use the checklist for assessing business systems to determine this.
Another member noted that care must be taken when using action tracking software that users understand what ‘completing an action’ means (e.g. does completing an action equate to approval?) Outcomes for actions may not be complete and well defined.Case studies
We have been talking about the recordkeeping benefits associated with digitising approval processes for some time:
- In March 2017 Denise North shared the Public Service Commission’s experience in implementing an electronic approvals workflow system at the Records Managers Forum. You can listen to Denise’s presentation and see the accompanying slides here.
- At the same event, Ann Turner, Chris Leeming, Jason Covell and Michael King shared their experiences on the completion of the OneTRIM project, including implementing MiniApp, a workflow tool for ministerial and executive records. You can listen to their presentation and see the accompanying slides here. Further information about MiniApp is included in our Q&A with Tim Hume on OneTRIM and our case study of the OneTRIM project.
- In 2016 we published a case study describing the Department of Premier and Cabinet’s electronic approvals project. You can also listen to a presentation by Mitya Antoncic, Nadine Louis and Dave Phillips at the August 2015 Records Managers Forum and see the accompanying slides here.
- We also published an infographic on six steps to e-approvals based on the experiences of the PSC and DPC.
Government use of social media are many and varied, but in total, it is used to communicate and to engage the community regarding policies, new and existing services including service disruptions, and government projects / initiatives.
In our guidance “Strategies for managing social media records” we have enumerated various strategies depending on business needs and risks.
As social media platforms and technologies mature and offer more services, we recommend that agencies consider the approach of leaving social media records in its native platform or application as most of these records have short term retention requirements and the risks associated with the business are low.
We have identified some approaches on how to preserve social media records with long-term retention periods but you should only consider them if there is a business need to retain them on premise or if the records support high risk business processes.
Platform self-archiving service approach
The big social media players such as Twitter, Facebook, Youtube provide ways of letting users download content shared which is great. However, this approach is still a manual process and does not offer wriggle room if further information or customisation is needed.
Application Programming Interfaces (APIs) approach
Another approach is the use of Application Programming Interfaces (APIs) provided by social media platforms which enable access to raw social media data. This approach entails additional processing to reformat the social media data into a user-friendly format. Raw social media data are very useful though for research or information reuse purposes.
Social media archiving tool approach
The use of social media archiving tools is another approach used by some agencies. This approach seems promising as these tools claim that they enable agencies to comply with their various recordkeeping obligations including the State Records Act. However, the extent to which the tool performs archiving is based on the vendors’ definition of what archiving entails.
From a recordkeeping perspective below are some of the functional requirements we have identified that we’d want a social media archiving tool to have:
- ability to capture all records in any social media platform which includes:
- original social media posts, associated comments, likes/dislikes, hash tags, emojis and if needed, deleted posts or comments
- embedded images, videos, files and links to other websites
- ability to manage social media records captured, which includes ability to:
- tag and capture user-assigned metadata to classify or group records
- delete records
- authorise deletion of records
- identify which records need to be preserved
- prevent alteration of captured social media records
- manage access and security permissions
- ability to capture metadata regarding the post, including process capture information such as timestamp and system information
- ability to display social media records in human readable formats
- ability to export records out of the system in various formats – json, xml, html, csv, pdf.
Most importantly the social media archiving tool should have a robust search capability.
The list above is not exhaustive and may not necessarily be required in every scenario, but it provides enough information to assess social media archiving tools against recordkeeping requirements.
For information on what strategies and tools to use to manage your social media records click here.
Please contact us if you want to share your social media recordkeeping pains or preservation strategies.
State Archives and Records NSW uses the Forum to engage with public offices and to provide an opportunity for other NSW public sector organisations to share information about key initiatives or government programmes.
The Forum included presentations from:
- Christine Boardman, Executive General Adjuster, Crawford Global Technical Services – “Damage – the Journey from Destruction to Recovery”
- Dominique Mossou, Conservator, State Archives NSW – “Practical aspects of records recovery and conservation”
- Richard Lehane, Glen Humphries and Malay Sharma from Digital Archives Team, State Archives NSW – “Machine learning and records management.”
In her presentation Christine Boardman provided a brief overview of the insurance industry, how they respond to losses in the public and private sector and what happens when there is a claim on material damage or consequential loss.
Dominique Moussou provided tips on how to prevent disaster from happening to records and what to do when it does happen.
Lastly, Richard Lehane, Glen Humphries and Malay Sharma from State Archives NSW’s Digital Archives team talked about machine learning and its potential application for records management, including the results of an in-house pilot on the application of machine learning to classification and appraisal of unstructured information.
You can find the presentations and podcasts here.
As always, please don’t hesitate to contact us for more information on the presentations or if you have something to share with us.