Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Project Concepts/Core Indexing Process (global view)
< SMILA | Project Concepts
Contents
Description
This concept describes the general system architecture. Additionally core or general concepts are referenced. Good and bad practices to keep a ceirtain maintainability are handled also.
Key parts of the architecture could be found in the following concepts:
Discussion
Technical proposal
{info} Note: This section may only be edited by assigned developer(s). His responsibility is also to reflect any agreed changes/details in discussion section. {info}
SMILA Core Indexing Process (Global View)
Process overview
Create/Delete Record
- Compounds
- Optimized queue access -> could obsolete requests be avoided. An example for a obsolete request is the indexing of a document while a delete operation is already present in the queue. It must be guaranteed that its not required to read all messages in the queue (in client process); The question is are queues available that could cover such issues?
- Parallel processing could lead to difficulties when concurrently performing create/delete operations
- The update of fields of a document in the storage must be possible (e.g. the user rights field)
- Delete by Query (several objects, XQuery, Source, ...)
Delta-Indexing
- Source or subset (a part of information in storage) based
- Compounds
- Status storage using an interface (status for delta discovery at IRM, probably Lucene or an Indexer as storage (e.g. hashes, data, URLs, modifications in user rights)
Index creation
- Pre/post actions of an index process (e.g. starting of services, invoke a functionality of another external system)
Due to queue usage we did not have a real end "of a indexing process"; how do we solve this?
- Initial index creation
- Delta indexing
- Continue indexing (start at point XY)
- Stupid append of information (from any origin/source)
Compound Management
- Processing via BPEL or via a sole pipelet (which approach is better?)
- How do we cover filters (is it possible to design a relationship between IRM and filter configuration \[P2 for this remark\])
- Warning: Large streams
- Recursion
- Delta indexing
- Extensibility of compound management (e.g. using extensions points)
- Ability for debugging
- Project templates for covering best practices
- Inheritance of data to child records (e.g. user rights)
- MIME Type detection
- In den unterschiedlichen Ausprägungen der Installationen
Maintenance Operations
- CRUD (e.g. collections, indexes)
- Backup/Restore/Reset (remove all process related data;
Empty temp storage for delta indexing; empty collection XY)
- Backup/Restore/Reset (? Maintenance concept; Probably hosted ad eccenca)
- Migration of software versions (including the data)
- Reorganization, save/security points, training (e.g. for search, what's related)
- Adding of nodes (indices, SMILA, ...)
- Creation of reports or statistics