NRGL Technical Solution


The basic elements of the NRGL software solution comprise Invenio for the NRGL digital repository and the Elasticsearch indexing and search system for the NRGL central search interface. The same solution architecture has been successfully run in the Swiss CERN for several years. Individual activities and cooperation of the digital repository in Invenio and the Elasticsearch indexing and search system are depicted in the figure.

NRGL central search interface in the Elasticsearch

The NRGL central search interface is aimed to create an integrating search platform of grey literature repositories. This integrating function used to be ensured by the ESP FAST indexing and search system, but it was replaced with Elasticsearch system in 2016.  Elasticsearch provides a secure, relevant and scalable search in linked repositories. This solution should allow users to access the data from both the digital repository and the selected grey literature repositories in a single interactive environment. The search is primarily performed according to navigations including document types, authors, keywords, linked bases and also timeline.

NRGL digital repository in the Invenio system

Invenio belongs to Open Source software. It may be freely installed, used and modified, which enables its setup for storing grey literature and its distribution among partner organizations. In 2010 the system was debugged on the basis of continuous testing of system operation, introduction of data into the system and harvesting data from partner repositories. The Invenio system was modified in all parts, from the format structure over templates, and setup of collections to the search setup, etc. At the same time, the digital repository was designed graphically, it was fully localized into Czech language and the record search was adjusted.

Selection process

The software solution for the NRGL project was selected on the basis of a public tender that took place in 2009. The software functionality requirements were defined in such a way as to include the requirements necessary for pilot implementation of the system as well as to help choose a modern, well-supported technology with good developmental prospects. Software functionality requirements may be found in the repository.
The preparation for the selection of the software solution included an analysis of selected Open Source software for digital libraries. The following Open Source software was analyzed: DSpace, Fedora, CDS Invenio, Eprint, and Greenstone. The results of the analysis may be found in the repository.

Metadata

A format for storing metadata is an essential part of the construction of repositories. An individual metadata format was defined for the needs of the National Repository of Grey Literature (NRGL). The NRGL metadata format was designed especially for processing records on grey digital documents.
The basic requirements for the NRGL format are the maximum simplicity and compatibility with the Dublin Core standard. The NRGL metadata format uses elements of Dublin Core, Dublin Core Terms, EVSKP-MS, ETD-MS and some individual elements.
The first draft of an individual NRGL metadata format version 0.1 was defined in 2008 and in 2009 it was tested on own data in the NTK and at the University of Economics in Prague. The results of testing and expertise were included into the beta version of 0.2 of the NRGL metadata format. In 2010 the metadata format was optimized using practical experience with the introduction of metadata and full texts into the repository, with harvesting of metadata and files with full texts from partner organizations and with requirements on compliance with the OpeGrey system. In this way, the verified version 1.0 of the NRGL metadata format originated, may be found in the repository (in Czech only).
The implementation of the NRGL metadata format into the selected sofware solution Invenio, which uses MARC-21 native format, was accompanied by the creation of a conversion table.

Identifiers

The primary purpose of digital archives is to store digital information and make it accessible. Persistent identifiers ensure the permanent access to digital documents. Here, persistence of the identifier means the permanence of identification irrespective of the permanence of the identified document. Therefore, it is important that the source marked by a persistent identifier is never relocated or liquidated unless the information on its location is updated in the persistent identification registry. The solution concerning the use of persistent identifiers is described here.
It was intended to use persistent identifier like URN:NBN, Handle etc. Unfortunately, there is currently no working URN:NBN resolver for grey literature in the Czech Republic. As a solution, URI identifier is generated in Invenio system in this format: www.nusl.cz/ntk/nusl-ID. Identifier nusl-ID represents the number assignet to the record by the Invenio.
The defining criteria to select the persistent identifier for the NRGL are defined in the repository. Resources used in this work are cited in another document connected to the same record.

NRGL provides central access to information on grey literature produced in the Czech Republic in the fields of science, research and education.
NRGL is operated by the National Library of Technology in Prague.
NTK