Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/AgentController
Contents
Overview
The AgentController is a component that manages and monitors Agents. Whenever a new agent task is triggered (via startAgent()) a new instance of the used Agent is created and the agent object hash value is used a an id (called import run id) to identify records created by this agent instance. This import run id is set as an attribute _importRunId on all records and will also visible on the agent instance in the JMX console.
API
AgentController provides two interfaces, one is used by management clients to start/stop agent instances, the other is used by Agents to execute callback methods on the AgentController itself, executing the ccommon processing logic.
Javadoc:
- org.eclipse.smila.connectivity.framework.AgentController
- org.eclipse.smila.connectivity.framework.util.AgentControllerCallback
Implementations
It is possible to provide different implementations for the AgentController interface. At the moment there is one implementation available.
org.eclipse.smila.connectivity.framework.impl
This bundle contains the default implementation of the AgentController interface.
The AgentController implements the general processing logic common for all types of Agents. Its interface is a pure management interface that can be accessed by its Java interface or its wrapping JMX interface. It has references to the following OSGi services:
- ConnectivityManager
- Agent ComponentFactory
- ConfigurationManagement (t.b.d.)
- CompoundManagement (t.b.d.)
Agent Factories register themselves at the AgentController. Each time an agent is started with a datasource for a specific type of agent, a new instance of that Agent type is created via the Agent ComponentFactory. This allows parallel watching of datasources with the same type (e.g. several rss feeds). Note that it is not possible to start muptiple agents on the same data source concurrently!
This chart shows the current AgentController processing logic for one agent run:
- the Agent is started, initializes DeltaIndexing for the data source by calling DeltaIndexingManager:init(...) and waits for events in a separate thread. One of the following events can occur:
- ADD: a new or updated object on the datasource was detected. A record object is created. It is checked if the record was updated by calling DeltaIndexingManager:checkForUpdate(...)
- YES: the record is added to the Queue by calling ConnectivityManager:add(...) and updated in the DeltaIndexingManager by calling DeltaIndexingManager:visit(...)
- NO: no actions are taken
- DELETE: an object on the datasource was deleted. An Id object is created for the deleted object. This Id is deleted from both ConnectivityManager and DeltaIndexingManager by calling ConnectivityManager:delete(...)and DeltaIndexingManager:delete(..).
- STOP: the agent is stopped either via an external command or because some fatal errors occured
- it finishes DeltaIndexing by calling DeltaIndexingManager:finish(...) and ends the thread
- ADD: a new or updated object on the datasource was detected. A record object is created. It is checked if the record was updated by calling DeltaIndexingManager:checkForUpdate(...)
The processing logic will be enhanced when CompoundManagement is integrated.
- Note
The exact logic depends on the settings of DeltaIndexing in the data source configuration. Depending on the configured value, delta indexing logic is executed fully, partially or not at all.
Configuration
There are no configuration options available for this bundle.
JMX interface
Javadoc: org.eclipse.smila.connectivity.framework.AgentControllerAgent
Here is a screenshot of the AgentController in the JMX Console:
HTTP ReST JSON interface
Since version 0.9 the AgentController can also be controlled via the SMILA ReST API. It provides the following endpoints:
endpoint | method | description |
---|---|---|
/smila/agents | GET | list data sources available for agents and the current agent state |
/smila/agents/<datasource-id> | GET | get statistics of current or last agent run, if one exists. |
/smila/agents/<datasource-id> | POST + JSON-Body | start agent |
/smila/agents/<datasource-id>/finish | POST | stop agent |
Agent Datasource Listing
The listing contains the available data sources that can be used for crawling and the current agent state. State "Undefined" means that no agent run for the datasource has yet been started. Other states can be
- Initializing: The agent is starting
- Running: A agent is current working on this datasource.
- Stopped: The agent was stopped by the user.
- Aborted: A fatal error occurred while working on the datasource.
If the state has one of these four values, it is possible to read statistics for the datasource by using the given URL. Example:
GET /smila/agents/ --> 200 OK { "agents": [ { "name": "feeds", "state": "Running", "url": "http://localhost:8080/smila/agents/feeds/" }, { "name": "jobfile", "state": "Undefined", "url": "http://localhost:8080/smila/agents/jobfile/" } ] }
Start a Agent
If a datasource is not in agent state "Running" it can be started using the URL given in the datasource listing. The request must contain a JSON body describing the destination job to submit records to. In case of success the response contains the internal import run ID.
POST /smila/agents/feeds/ { "jobName": "indexUpdateJob" } --> 200 OK { "importRunId": 1231907158 }
Other response codes:
- 400 Bad Request: datasource ID does not exist, destination job not given or not active, datasource is not a agent source or a agent is already running for the datasource.
- 500 Internal Server Error: Ohter errors.
Get Agent Statistics
If a datasource has been agent or is currently agent you can read the performance counters using the datasource URL:
GET /smila/agents/feeds/ --> 200 OK { "jobName": "indexUpdateJob", "attachmentBytesTransfered": 0, "attachmentTransferRate": 0, "averageAttachmentTransferRate": 0, "averageDeltaIndicesProcessingTime": 0, "averageRecordsProcessingTime": 0, "deltaIndices": 0, "errorBuffer": "[]", "exceptions": 0, "exceptionsCritical": 0, "importRunId": "1231907158", "overallAverageDeltaIndicesProcessingTime": 1990.95, "overallAverageRecordsProcessingTime": 1990.95, "records": 460, "startDate": "2011-09-06", "dataSourceId": "feeds", "state": "Running" }
Other responses are
- 400 Bad Request: Invalid datasource ID
- 404 Not Found: No statistics available for given datasource
- 500 Internal Server Error: Other error.
Stop a Agent
To stop a running agent, use the following HTTP request. The response will be empty, just the response code will be "OK".
POST /smila/agents/feeds/finish/ --> 200 OK
Other responses are:
- 400 Bad Request: No agent is running for this datasource.
- 500 Internal Server Error: Other errors.