Monday, August 25, 2014

DSpace OAI profiles

By default in DSpace, OAI-PMH will share all of your public accessible Items in DSpace through OAI. In case you wanted to restrict or modify the set of results that get shared, you would have to customize the ouput, luckily recent versions of DSpace have an easily modifiable configuration, that essentially gives you "profiles" in OAI.

The default profile is called "request", it doesn't filter the results, and it allows harvesting in many different metadata formats. Note: only publicly accessible items/objects can be disseminatable through OAI.

The other profiles in DSpace are OpenAIRE (Open Access Infrastructure for Research in Europe) and DRIVER (Digital Repository Infrastructure Vision for European Research). By default your repository won't disseminate any objects in OpenAIRE or DRIVER format because the filters in place require some specific metadata to be collected for those profiles/guidelines.

https://github.com/DSpace/DSpace/blob/dspace-4_x/dspace/config/crosswalks/oai/xoai.xml#L33

The DRIVER profile declares a number of filters, which restrict the items that disseminate under that profile, to match the requirements of DRIVER. In this case the filters will require: that there is a title (dc.title), that there is an author (dc.contributor.author), that the document type (dc.type) is one of article, thesis, book, etc,  also that dc.rights is equal to "open access", and lastly that there is a publicly accessible bitstream, hopefully that means that the full text is available.



So, in case you wanted to customize your default "request" profile to restrict the output to all items in the repository that also had full-text available, you would customize:
 <context baseurl="request">  
 To add:  
 <filter refid="bitstreamaccessFilter"/>  

In addition to this information about DSpace OAI profiles, I did run into some bugs or potential issues in the DSpace XOAI code base. For one, there are two modes to run DSpace XOAI in. There is either database mode, where the database responds to all OAI queries, or a performance optimized version, where SOLR indexes your repository. One of the bugs was that the solr mode had a slightly different interpretation of "bitstreamaccessFilter", i.e. database required that there was an original bundle bitstream, the solr version only required that the item was public. To correct this I've patched our code at Longsight, and have contacted the XOAI author to confirm and test the issue.