Quality Report 2021
The Year 2021 of the National Digital Preservation Services in Finland
General
Digital preservation services (DPS) refer to services produced together for the digital preservation of cultural heritage and research data. The development of DPS is continuous and takes place in close cooperation with the organizations that make use of them. The aim is that the most significant digitized and born-digital cultural heritage content in the Digital Preservation Service for Cultural Heritage will be preserved for future generations and that long-term utilization of the content is possible. Similarly, the Digital Preservation Service for Research Data ensures the availability and preservation of digital research data. Both services use a common digital preservation system for bit-level preservation.
The Digital Preservation Service for Cultural Heritage started preserving content in 2015 and the Digital Preservation Service for Research Data in late 2019. Organizations using the Digital Preservation Service for Research Data for preparing and storing data can also make more extensive use of Fairdata services, including the packaging service and the management interface.
The Main Results of the Year
The amount of data in preservation grew significantly during the year 2021 due to the new partner organizations starting to use the services and the earlier partner organizations continuing to transfer their data to the services. During 2021 the amount of preserved data exceeded 1.6 petabytes while the annual growth was over 583 terabytes and was the biggest yearly growth in the history of the DPS. The number of preserved packages exceeded 2.000.000 during the year.
A requirement specification for logical level preservation was published (only in Finnish) in the spring. The requirement specification presents the joint understanding of the needs from the partner organizations to the logical level preservation provided by the DPS. The document was produced as part of the work within the digital preservation collaboration group. The goal of the document is to identify the requirements to logical preservation and to describe the tasks and responsibilities for the DPS as well as for the partner organizations.
The exceptional times and the corona pandemic that continued in the year 2021 also affected the operation of the DPS. Remote work was used as extensively as possible both in the collaboration with the partner organizations and in the production and the development tasks of the DPS. Therefore, it was not possible to organize all the practical training aimed for the partner organizations. A few workshops were held remotely. The unusual conditions hardly affected the quality of the services at all, which is shown by the increase in the data accumulation and the satisfaction of the partner organizations regarding the DPS as a whole.
A customer satisfaction survey for the digital preservation collaboration group was carried out in late 2021. The customer satisfaction with the DPS was quite positive (5.5/6). Additionally, the partner organizations were especially satisfied with the user support and consulting as a whole (5.0/6).
In the year 2021 a storage capacity expansion by 1.8 petabytes was carried out for the digital preservation system which raised the storage capacity to 3.6 petabytes.
Partner Organizations
The Ministry of Education and Culture has granted capacity to the Digital preservation services for the following organizations by the end of year 2021:
Organisaatio | Käyttötarkoitus | Kapasiteetti (TT) |
---|---|---|
Celia | Master-arkisto ja pitkäaikaisesti säilytettävät valitut uudet äänikirjat | 110 |
Kansallinen audiovisuaalinen instituutti | Valikoitu osa kotimaisen elokuvan digitoitavista aineistoista | 1200 |
Kansallisarkisto | Kansallisarkiston vastaanottamat alkujaan digitaaliset valtionhallinnon asiakirjalliset aineistot | 41 |
Kansallisarkisto | VAPA-järjestelmään siirretyt tietoaineistot | 1 |
Kansallisarkisto | Kansallisarkiston massadigitointi-hankkeen aineistot | 114 |
Kansallisarkisto | Kansallisarkiston digitaaliarkistosta siirrettävät aineistot ja takautuvan digitoinnin aineistot | 805 |
Kansallisarkisto | Kansallisarkiston yksinomaan digitaalisessa muodossa olevat yksityisarkistoaineistot | 27 |
Kansallisgalleria | Kiasman mediataiteen teosten pitkäaikaissäilytys | 20 |
Kansalliskirjasto | Kansalliskirjaston digitoimat kulttuuriperintöaineistot | 175 |
Kansalliskirjasto | Kulttuuriaineistolain nojalla kerätyt aineistot | 355 |
Kotimaisten kielten keskus Kotus | Kotuksen kielentutkimus- ja kulttuuriperintöaineistojen pitkäaikaissäilytys | 60 |
Museovirasto | Kulttuuriympäristön tutkimusraportit | 1 |
Musiikkiarkisto | Musiikkiarkiston pitkäaikaissäilytettävät aineistot | 70 |
Svenska Litteratursällskapet SLS | SLS:n pitkäaikaissäilytettävät aineistot | 50 |
Yhteiskuntatieteellinen tietoarkisto, FSD | Tietoarkiston arkistoimien tutkimusaineistojen kokoelman pitkäaikaissäilytys | 1 |
Organisaatio | Käyttötarkoitus | Kapasiteetti (TT) |
---|---|---|
Geologian Tutkimuskeskus | GTK:n tomografialaitteen tuottamat tietoaineistot | 12 |
Helsingin yliopisto | Helsingin yliopiston SMEAR-aineistojen valikoima meteorologisia - ja ilmanlaatumittauksia | 2 |
Helsingin yliopisto | M. cinxia and C. melitaearum in the Åland metapopulation system | 2 |
Itä-Suomen yliopisto | SENSOTRA | 1 |
Jyväskylän yliopiston kiihdytinlaboratorio | 250-Nobeliumin hajoamisspektroskopia | 1 |
Oulun yliopisto, Sodankylän geofysikaalinen observatorio | Havaintoaineistot | 30 |
Tampereen yliopisto | Yhteiskuntatieteiden tiedekunnan Kansanperinteen arkiston A-K-kokoelma | 2 |
Turun yliopisto | Historian, kulttuurin ja taiteiden tutkimuksen arkiston aineistot (HKT-arkisto) | 20 |
Data Accumulation in 2021
About 583 terabytes of new data were received for preservation during the year, and the total amount of data in preservation at the end of 2021 was over 1065 terabytes. The data accumulation during 2021 is shown in the figure below.
The DPS took during 2021 responsibility for preserving more than 568.000 content packages, and at the end of 2021 there were more than 2.000.000 content packages in preservation. The accumulation of content packages during 2021 is shown in the figure below.
Maintenance of the Digital Preservation Services
A wide range of activities are required to produce digital preservation services: maintenance tasks, development of methods and models, software development, development of equipment infrastructure, and administrative work. The following section focuses in particular on the maintenance tasks of the digital preservation services, using the model for quality reporting of IT services’ production operations, which typically focus, over a certain period of time, on the growth of data, incidents and the recovery from them.
The main objectives of maintaining the Digital Preservation Services are:
- ensure the integrity and availability of archival information packages in preservation
- monitor the functionality of the service; and
- support organizations in utilizing the DPS services (e.g. fixing invalid or incomplete submission information packages detected during ingest).
Monitoring of the Digital Preservation Services
The following items are automatically monitored in the DPS at the moment:
- device failures (such as broken hard drives),
- broken tape drives, server availability,
- disk area fill rate,
- visibility of distributed storage areas on different servers,
- up-to-dateness of virus database for virus checks,
- storage layer integrity,
- availability of tape libraries,
- SSL certificate life cycles, and
- failed login attempts of SFTP port on frontend servers.
In addition, the following items are manually monitored:
- the progress of the work queue at ingest,
- processing submission information packages stuck in the work queue,
- checking the integrity of archival information packets,
- analysing problems with rejected transfer packets,
- replicating broken media, and
- creating copies for the dark archive.
As part of the development of the DPS, monitoring the service will be improved and new processes will be automated. This makes it possible to maintain a cost-effective service even though the amount of content to be preserved is increasing.
Quality Deviations Related to the Data in Preservation in 2021
We have together with the partner organizations considered what quality means in terms of the long-term preservation of data. It has been agreed that the integrity of the data and the reliability of preservation are of particular importance. In this case, quality deviations are situations where the preservation of data is threatened, and not for example situations where the service is temporarily unavailable.
Reporting on the quality of the service using these criteria is somewhat challenging, as the usual indicators of IT environments (e.g. service accessibility percentages) do not indicate deviations or actual threats to the preservation of the data. We have defined that situations where the preservation of data is threatened are situations where there are less than three intact copies of archival information packages of the data. These situations are typically recovered from using an intact copy on another media type. The maintenance of the DPS is able to restore these situations to normal as part of its normal operation.
During 2021, a single LTO-8 tape was reported broken. All of the 23 archival information packages on the tape were copied to a new tape. Scheduled fixity checks found 10 fixity exceptions from the tape copies of archival information packages. All broken copies were restored to new tapes from an undamaged copy.
15 disk drives failed in the digital preservation system’s disk storage during the year. None of the disk failures threatened the data in preservation for RAID volumes of the disk storage protected the data from corruption.
New Features of Software Development
In the Digital Preservation Service for Cultural Heritage and the digital preservation system, the architectural changes for geographical distribution and to extend the disk storage, with consideration especially to automatic fixity checking, was brought to finish. Support for multiple back-end systems with the same contract identifier and functionality to delete a dissemination information package were added to the REST interface. The first working version of an ARC/WARC migration tool for web archive content was produced and the development of extending the statistics to varying time series progressed further. Additionally, software development required by the changes in the specifications, fixes for file format validation related, among other things, to special characters and PDF format, and other lesser bug fixes and optimizations were carried out. Also, software migration to Python 3 progressed well.
In the Digital Preservation Service for Research Data, the packaging service was migrated to a new platform and the software layer was migrated to Python 3 environment. For the joining partner organizations, a testing environment was made available to try out the data transfer to the digital preservation service before moving to the production environment. Extended integration update from the service to the other Fairdata services was carried out and a common login protocol was taken into use. All supported file formats in the digital preservation system were made available for the packaging service. A classification feature for different file formats and AVI file format support was added. The service developed an interface component for the automated usage of the DPS’ REST interface. Additionally, the development of graphical environments for sending and downloading content progressed well.
Support for Partner Organizations
The DPS help organizations that make use of the data in questions related to the digital preservation of the data. In particular, this support is provided during the DPS deployment process, but organizations can also submit service requests in other situations. Requests for support are received at the support address of the DPS: pas-support@csc.fi.
In 2021, a total of 62 service requests were received from organizations utilizing the DPS. The requests dealt especially with the pre-ingest and ingest operations on the data. Another significant issue was providing user support in deployment of the service. In addition to the service requests, discussions with the partner organizations are held through the digital preservation collaboration group which meets 3-4 times a year. The established practices within the group, such as regular monitoring of software development, continued the development of the logical preservation requirement specification to update and specify earlier plans and descriptions of the DPS’ operations. Monthly virtual group meetings (#PASKaffet) with potential partner organizations, started in 2020, were continued also in 2021. Also, virtual support meetings (#PASKlinikka) were started in 2021. The virtual support meetings are held for the partner organizations, as well as for the organizations interested to become partners, to provide discussions with a low participation threshold where the organization gets to have its own reserved time slot with the DPS' specialists.
The events and current affairs of the DPS were announced on the digitalpreservation.fi website, the Twitter account (@dpres_fi) and on an email list intended for information purposes.