Download this gist and create a symbolic link
$ ln -s catmandu.yml config.yml
This is necessary for the dancer app. In this case Catmandu and Dancer are using the same configuration file.
In the last days you have learned how to store data with Catmandu. Storing data is a cool thing, but sharing data is awesome. Interoperability is important as other people may use your data (and you will profit from other people’s interoperable data)
In the day 13 tutorial we’ve learned the basic principle of metadata harvesting via OAI-PMH.
We will set up our OAI service with the Perl Dancer framework and an easy-to-use plugin called Dancer::Plugin::Catmandu::OAI. To install the required modules run:
$ cpanm Dancer
$ cpanm Dancer::Plugin::Catmandu::OAI
and you also might need
$ cpanm Template
Let’s start and index some data with Elasticsearch as learned in the previous post:
$ catmandu import OAI --url https://lib.ugent.be/oai --metadataPrefix oai_dc --set flandrica --handler oai_dc to Elasticsearch --index_name oai --bag publication
After this, you should have some data in your Elasticsearch index. Run the following command to check this:
$ catmandu export Elasticsearch --index_name oai --bag publication
Everything is fine, so let’s create a simple webservice which exposes to collected data via OAI-PMH. The following code can be downloaded from this gist.
Download this gist and create a symbolic link
$ ln -s catmandu.yml config.yml
This is necessary for the dancer app. In this case Catmandu and Dancer are using the same configuration file.
| store: | |
| oai: | |
| package: Elasticsearch | |
| options: | |
| index_name: oai | |
| bags: | |
| publication: | |
| cql_mapping: | |
| default_index: basic | |
| indexes: | |
| _id: | |
| op: | |
| 'any': true | |
| 'all': true | |
| '=': true | |
| 'exact': true | |
| field: '_id' | |
| basic: | |
| op: | |
| 'any': true | |
| 'all': true | |
| '=': true | |
| '<>': true | |
| field: '_all' | |
| description: "index with common fields..." | |
| datestamp: | |
| op: | |
| '=': true | |
| '<': true | |
| '<=': true | |
| '>=': true | |
| '>': true | |
| 'exact': true | |
| field: '_datestamp' | |
| index_mappings: | |
| publication: | |
| properties: | |
| _datestamp: {type: date, format: date_time_no_millis} | |
| plugins: | |
| 'Catmandu::OAI': | |
| store: oai | |
| bag: publication | |
| datestamp_field: datestamp | |
| repositoryName: "My OAI DataProvider" | |
| uri_base: "http://oai.service.com/oai" | |
| adminEmail: me@example.com | |
| earliestDatestamp: "1970-01-01T00:00:01Z" | |
| deletedRecord: persistent | |
| repositoryIdentifier: oai.service.com | |
| cql_filter: "datestamp>2014-12-01T00:00:00Z" | |
| limit: 200 | |
| delimiter: ":" | |
| sampleIdentifier: "oai:oai.service.com:1585315" | |
| metadata_formats: | |
| - | |
| metadataPrefix: oai_dc | |
| schema: "http://www.openarchives.org/OAI/2.0/oai_dc.xsd" | |
| metadataNamespace: "http://www.openarchives.org/OAI/2.0/oai_dc/" | |
| template: oai_dc.tt | |
| fix: | |
| - nothing() | |
| sets: | |
| - | |
| setSpec: openaccess | |
| setName: Open Access | |
| cql: 'oa=1' |
| #!/usr/bin/env perl | |
| use Dancer; | |
| use Catmandu; | |
| use Dancer::Plugin::Catmandu::OAI; | |
| Catmandu->load; | |
| Catmandu->config; | |
| oai_provider '/oai'; | |
| dance; |
| <oai_dc:dc xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/" | |
| xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" | |
| xmlns:dc="http://purl.org/dc/elements/1.1/" | |
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | |
| xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> | |
| [%- FOREACH var IN ['title' 'creator' 'subject' 'description' 'publisher' 'contributor' 'date' 'type' 'format' 'identifier' 'source' 'language' 'relation' 'coverage' 'rights'] %] | |
| [%- FOREACH val IN $var %] | |
| <dc:[% var %]>[% val | html %]</dc:[% var %]> | |
| [%- END %] | |
| [%- END %] | |
| </oai_dc:dc> |
What’s going on here? Well, the script oai-app.pl defines a route /oai via the plugin Dancer::Plugin::Catmandu::OAI.
The template oai_dc.tt defines the xml output of the records. And finally the configuration file catmandu.yml handles the settings for the Dancer plugin as well as for the Elasticsearch indexing and querying.
Run the following command to start a local webserver
$ perl oai-app.pl
and point your browser to https://localhost:3000/oai?verb=Identify. To get some records go to http://localhost:3000/oai?verb=ListRecords&metadataPrefix=oai_dc.
Yes, it’s that easy. You can extend this simple example by adding fixes to transform the data as you need it.
Continue to Day 15: MARC to Dublin Core >>