OAI Document Library

is a document management application intended for the management and tracking of large amounts of documents in an organization. The OAI Document Library (OAI-DL) helps authors maintain their documents, automatically handling versioning, workflow, and expiration notification. It provides reviewers (librarians) with updates, fast approval, and overviews of all activity.

Information in the library can be accessed using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), meaning that besides being open source, the OAI-DL is an open data application. Because it’s open data, the OAI Document Library is easy to integrate with other systems, such as the Silva CMS or any other application capable of OAI-PMH harvesting.

For organizations in the UK, the OAI-DL has specific features that help ensure compliance with the Freedom of Information Act. For example, the application supports archiving via a custom Publication Scheme, following the FOI Act guidelines.

Why is a OAI Document Library needed?

Organizations deal with numerous documents, such as word processor documents and PDFs. These documents often reside on someone’s computer and are not network accessible. Versions of documents are hard to track – the same document may be passed around using email in multiple versions over time. In large organizations it therefore becomes important to structure the flow of documents and present them in a common format. This is typically done using a DMS (Document Management System).

The OAI Document Library is one such DMS. It can help organizations in the following ways:

  • Internal communication in an organization about documents is enhanced as documents are all available in a central location.
  • Organizations, especially public ones, have to deal with more and more legal requirements concerning retention and publication of documents. The OAI-DL can help an organization in making sure it is compliant with legal requirements surrounding documents.
  • Information about documents in an organization can be accessed and published, for instance on a website.

OAI Document Library characteristics

The main focuses of the OAI Document Library are:

  • It’s easy to use. Users are not exposed to many complicated screens when they just want documents to appear in the system. There’s just a single screen that takes care of everything when you put a document in the library. It’s web-based, so no custom client installation is necessary.
  • It integrates with other systems. The OAI-DL is not a monolithic black box in which documents and metadata disappear and cannot be retrieved anymore, but instead is easy to integrate with other systems, such as web sites that publish its content. Features can often be added outside the library instead of having to expand the scope of the OAI-DL beyond document management itself. Web publication of documents is for instance better left to a CMS (such as Silva).
  • It can be made to scale. Uploads and downloads of documents can be handled seamlessly using the sophisticated Apache integration technology Tramline.

A document life cycle

This is the typical lifecycle of a document in the OAI Document Library:

  • A user submits a document to the OAI-DL. A mail on the submission is sent by the DL to all authors listed for the document, and another is sent to all librarians responsible for the section in which the document was submitted.
  • A librarian receives the email and knows a new document was submitted. A librarian can also see all newly submitted documents in the sections the librarian is responsible for in an overview page.
  • The librarian reviews the document and can either reject or approve the document for publication. An email is sent to all authors listed for the document. An email is also sent to all librarians that manage this document’s section.
  • The available date of the document determines when the document is made available. Once the document becomes available, it shows up in the OAI-PMH feed and can be harvested. External systems that publish the metadata get a link to the document.
  • Documents may automatically expire or can become retracted. Documents leave information behind even when they are deleted, so that it is always possible to find out what happened to them. This is important for instance in the context of freedom of information legislation.
  • Documents may need to be updated on a regular basis. The OAI-DL sends emails to authors with an advance warning that an update or review is required.

OAI Document Library features

  • Automatic conversion service: using OpenOffice, the OAI-DL can convert Word documents into PDFs and plain text, and PDFs into plain text. The plain text version is important as it allows full-text indexing of document contents, and also makes documents more accessible to people with disabilities.
  • Publication workflow: documents only become available for harvesting and download after a review process.
  • Delegation of control: reviewers (librarians) can be assigned to particular sections.
  • Dynamic access: authors have automatic access to all the documents that list them as an author.
  • Versions: multiple versions of the same document can coexist, one public and one under preparation.
  • Email reminder functionality: users receive emails of the progress of the document through the workflow.
  • Expiration notification: authors will be notified in advance about documents that need updating on a specified date.
  • OAI-PMH data provider: allows other systems to harvest document metadata using a standard protocol.
  • Integration with the Silva CMS (using OAI-PMH).
  • Fast upload and download integration with Apache using Tramline.
  • Easy overview screens for librarians.
  • Smart file upload user interface: files need to be uploaded only once even if the rest of form needs to be amended.
  • OAI Document Library is built using the powerful Zope 3 application server platform.

The Open Archives Initiative

The Open Archives Initiative Protocol for Metadata Harvesting is a well-established standard in the content management and library science worlds gaining in importance. It provides an application-independent interoperability framework for metadata exchange between online parties. Many academic libraries and other organizations expose OAI-PMH compliant repositories to the web that can be harvested. The OAI-PMH standard defines the following parties and software components:

  • A ‘Data Provider’ such as an academic library runs a Repository that supports OAI-PMH as a means of exposing metadata information about resources, for instance academic publications.
  • A ‘Service Provider’ uses Harvester software to harvest metadata from such Repositories. The harvested metadata can be used to provide valued-added services, such as a website that allows browsing and searching through a catalog.

Infrae has extended Silva so it enables users to browse and search harvested metadata, further enriching the extensive feature set of this open source CMS. An organization that uses Silva can easily become an OAI-PMH Service Provider.

In the process, Infrae also developed a module for accessing OAI-PMH compliant repositories in Python, and developed a sophisticated harvesting and indexing system for using harvested metadata in Zope. These reusable components are designed to be building blocks for other Python or Zope-based applications (see the OAI Pack product).

Silva integration features

The OAI Document Library can be integrated with external systems using the OAI-PMH protocol. Following are the features of the OAI-DL’s integration with Silva.

  • Uses OAI-PMH standard to harvest documents from OAI-DL, but is aware of the library’s specific metadata.
  • Ability to add listings of document references, based on metadata selection criteria, in CMS documents.
  • Ability to add references to individual documents in CMS documents.
  • Ability to create search pages for documents in the library in the CMS. Not only metadata is indexed, but also the full-text content of these documents. This means that end users of the website can do full-text searches in document contents.
  • Document download (.doc, .pdf, .txt, etc.) is handled by the OAI-DL, the CMS just handles presentation.

Conclusion

The OAI Document Library is a document management system with a wide set of features and simple usability that can be introduced into an organization relatively easily. It does not try to take over all document-related activities such as their publication on the web, just the management of the documents themselves in a single repository.

By doing integration via OAI-PMH, the information in the OAI Document Library can be made available in numerous ways, such as web publication using Silva, as well for other systems that support the protocol.