Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
TPTP data persistence layer
Contents
Overview
Scalability, scalability, and scalability with appropriate performance are the main themes for today's TPTP model/data persistence layer, with some stringent solution targeted for TPTP 4.4.
We should be limited only by the amount of disk space (for local and distributed cases) in our scenarios. Log analysis, trace analysis, symptom analysis, test execution results analysis and statistical data analysis are case where the amount of data generated can become extremely large and a scalable data store mechanism with appropriate performance becomes an important requirement.
During our first 3 TPTP releases we realized that EMF file based approach would not provide us enough scalability, although for those cases where the model fits in the available memory (RAM) it is very performant.
In the following sections we will discuss the different approaches from the current EMF based approach to the service oriented backed by a database data store approach.
By looking at the current TPTP use cases where the persistence layer is involved we will also drive the intended complexity of the first implementation.
The intention is to produce something that would also be directly reusable in other projects, COSMOS being the first candidate, being also one that will rely on other parts of TPTP.
We need to cover all scenarios (this could happen incrementally with initial emphasis on the main pain points) that are currently available in TPTP 4.3 and have a low impact on the UI required changes to support the new approach, in the same time keep the user experience close to what we have in TPTP 4.3, but with much improved performance and scalability.
EMF based approach
This section is under construction. Here are some quick notes.
Today we use EMF for most of our data modeling/manipulation needs. EMF is a very popular modeling framework which provides a powerful and extensible modeling infrastructure including very performant runtime and good integrated XML based persistence layer.
The main problem that we have been tackling in TPTP was to tweak our EMF based implementation so it can scale appropriately. We worked around some problems and we will continue to see how we can still leverage EMF in TPTP, but in the same time we are looking to move toward a more controllable infrastructure regarding memory footprint and simpler data manipulation using specialized services instead of the complex and flexible approach that we have today. Less should be better in this case.
Most of the things we learnt using (and tweaking) EMF should help us define and approach a better infrastructure for TPTP 4.4 and later versions.
A presentation which shows some of the things we did in order to improve this approach TPTPModel-EMF-scalability.zip
Service based approach
This section is under construction. Here are some quick notes.
COSMOSTPTPQuereryIntrfaceNotes contains an initial design with some diagrams and feedback. Soon I'll move that content over here and update it with the latest changes.
A demo which shows how to use the new DMS API is available here COSMOSEclipseCon2007Demo#demo_v02
The data management services implementation related with feature 169353 are available here TPTP DMS CVS link
File vs database
This section is under construction. Here are some quick notes.
Simple vs complex queries/results structures
This section is under construction. Here are some quick notes.