Creating new harvest configurations
Last updated
Last updated
Harvester configurations are used to define the settings of recurrent harvesting routines.
From LinkedIn: OAI-PMH is a set of standards and guidelines that define how metadata can be exchanged among different systems. OAI-PMH is based on the concept of data providers and service providers. Data providers are the systems that expose their metadata through OAI-PMH, while service providers are the systems that harvest and use the metadata from data providers. OAI-PMH uses HTTP and XML as the communication and data formats, and supports six basic requests: Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord. These requests allow service providers to retrieve information about the data providers, their metadata formats, their collections, and their individual records.
For a harvest configuration to be created, you will need:
A compatible OAI-PMH provider's URL (e.g. https://datacatalogue.cessda.eu/oai-pmh/v0/oai)
A custom transformation service to have been set in the 'Custom Transformation Service List' (see Managing custom transformation services).
Select 'Administrator tools' from the dashboard side panel.
Select 'Manage Harvester Tasks'.
Select 'Add new harvest configuration'.
Select 'Add new harvest configuration'.
Populate the 'URL' field with an appropriate OAI-PMH provider's URL address.
If you wish to add custom keys or values to the header of the http request, select 'Show advanced options, populate the relevant fields and select the '+' to add however many you require.
Select 'Validate'.
Select a metadata prefix from the 'Metadata Prefix' dropdown.
Select a custom transformation service from the 'Transformation Service' dropdown.
Note: See Managing custom transformation services for instructions on creating new custom transformation services.
Select whether the configuration will be active, or paused upon creation using the 'Status' dropdown.
Populate the following fields as desired:
'Set Spec': A set of items to be harvested from the registry can be specified in this field. Note: This field is non-editable after creation of the harvester configuration
'Frequency': The frequency with which the configuration will harvest the specified items
'Notifying Email': An email address can be entered here for its owner to be informed of the outcome of each of the configuration's harvests
'Identifier': An identifier to harvest can be optionally specified in this field. Note: This field is non-editable after creation of the harvester configuration
'Error Handling': This field is used to specify whether the harvest will be:
'Atomic': Harvested items will only be saved if every item is successfully harvested
'Non atomic': Harvested items will be saved as they are harvested, even if other items in the harvest are not successfully harvested
'Assigned User': The user responsible for this harvest configuration
'Workgroup': The workgroup optionally responsible for this harvest configuration.
Select 'Create'.
The configuration will now appear in the 'Harvester Configurations' list. From here it can be:
Manually run via 'Run Task'
Edited via 'Edit'
Deleted via 'Delete'.
Note: If a task is run manually, it cannot be cancelled or stopped until it has completed.