Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Component Requirements/Record Binary Storage Requirements
This page defines requirements posed on SMILA's record binary storage.
Overview
The purpose of record binary storage is to store document binary data. Usually this is the content of the binary document referenced as one or more attachments in the record. The natural client component of this low level service is the blackboard service.
Note: The content (record attachments) of XML document should rather be stored in record XML storage since this enables the client component to fire some XQueries on the document content itself.
Requirements
- Record binary store has to offer an implementation-agnostic API. This particularly means that the client component should have no knowledge about the actual persistence technology being used (local file system, DB or distributed file system)
- The usage of one special implementation of binary storage service should be simply a matter of the framework configuration.
- The essential API should be kept short (max 10 methods)
- Expanded API may contain batch operations
- The 'get' and 'set' methods should operate both with streams (in case very large documents - more than 2GB in size - need to be stored/processed) and byte arrays (for convenience reasons)
- The client component must use different instances of the binary store fully transparently
- Namespaces/Collections: Bin Storage shall support the notion of a namespace or collection which serves as a separation mechanism of the data. The characteristic of a namespace is such that no two diff. files with the same ID may exist. Backups/restores shall be possible on namespace level.
- Clustering: fail-over clustering is not the primary needed use case currently. More important is the case of storing large amounts of data (e.g. Terabytes) in the same namespace, which requires client-transparent storing and retrieving from diff. nodes in the cluster.
- Proposal for essential API:
void storeRecordAttachment(String attachmentId, InputStream attachmentStream) void storeRecordAttachment(String attachmentId, byte[] attachmentStream) byte[] fetchRecordAttachmentAsByte(String attachmentId) InputStream fetchRecordAttachmentAsStream(String attachmentId) void removeRecordAttachment(String attachmentId) int fetchRecordAttachmentSize(String attachmentId)
Note: By being able to get the size of the stored content at first, the client component developer can decide which method (stream or byte-array oriented) he/she should use.