Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/HowTo/How to access the REST API with the RestClient
SMILA provides an extensive REST API to control SMILA, check the status, import or search data, attach workers to its job control etc.
This HowTo describes how to utilize the included RestClient to access SMILA's REST API from within a Java application.
Contents
Preconditions
For the sake of simplicity, we assume that you check out the complete SMILA development environment, although it would be sufficient to just check out the relevant bundles to be able to access SMILA. This would be the case, to given some examples, for an asynchronous SMILA worker running in a JRE different to SMILA's JRE or a testing application accessing SMILA via the REST API etc.
- Set up your development environment, see How to set up the development environment.
- Have a look at the REST API Reference to get an overview of SMILA's REST API.
Basics
The following examples and code snippets all apply when you are running SMILA out-of-the box on localhost.
If you are running SMILA on a different host or with a different port (or an altered root context), please see non-default configuration on how to use the Rest Client in these cases.
Interfaces and default implementations
The RestClient interface encapsulates the REST access to SMILA. It provides methods for GET, POST, PUT and DELETE calls to the REST API and represents data using SMILA's Any interface and attachments using the Attachments interface. The latter allow working with binary data in SMILA.
The package org.eclipse.smila.http.client.impl provides a default implementation for the RestClient named DefaultRestClient.
Another implementation is provided in package org.eclipse.smila.http.client.impl.failover named FailoverRestClient. It can be created with a list of several SMILA host addresses. Usually, it tries to talk to the first of those hosts. If this node cannot be reached anymore (because SMILA has crashed or there is a network failure), this client will retry a request on the next node until it could be executed on one node, or all nodes have been tried.
There are two helper classes providing the resources as described in REST API Reference:
- ResourceHelper for all resources beginning with /smila, except for those that are marked as deprecated in the REST API Reference.
- TaskManagerClientHelper to provide workers that are not directly driven by the WorkerManager with resources for task handling (internal TaskManager REST API, i.e. the resources beginning with /taskmanager).
Accessing SMILA
To access SMILA via its REST interface, instantiate the RestClient, like:
RestClient restClient = new DefaultRestClient();
The following code snippet creates a job definition, sends it to the JobManager and starts it if posting was successful:
final RestClient restClient = new DefaultRestClient(); final ResourceHelper resourceHelper = new ResourceHelper(); final String jobName = "crawlCData"; // create job description as an AnyMap final AnyMap jobDescription = DataFactory.DEFAULT.createAnyMap(); jobDescription.put("name", jobName); jobDescription.put("workflow", "fileCrawling"); final AnyMap parameters = DataFactory.DEFAULT.createAnyMap(); parameters.put("tempStore", "temp"); parameters.put("jobToPushTo", "importJob"); parameters.put("dataSource", "file_data"); parameters.put("rootFolder", "c:/data"); jobDescription.put("parameters", parameters); // the resourcehelper provides us with the resource to the jobs API // we send the (AnyMap) job description in the POST body restClient.post(resourceHelper.getJobsResource(), jobDescription); // POST (here without a body) to start the Job, // the ResourceHelper provides the resource to the named job restClient.post(resourceHelper.getJobResource(jobName));
The following snippet checks if the job with the given name is already running, if not, it is started, and a record with an attachment is sent to it.
final RestClient restClient = new DefaultRestClient(); final ResourceHelper resourceHelper = new ResourceHelper(); final String jobName = "indexUpdate"; // check for a current run of this job final AnyMap currentJobRun = restClient.get(resourceHelper.getJobResource(jobName)).getMap("runs").getMap("current"); if (currentJobRun != null && !currentJobRun.isEmpty()) { // a current run exists, so we don't need to start one but it may not be running. if (!"RUNNING".equalsIgnoreCase(currentJobRun.getStringValue("state"))) { // well it's just an example... throw new IllegalStateException("Job '" + jobName + "' is not running but has status '" + currentJobRun.getStringValue("state") + "'."); } } else { // no current job run, start another one. restClient.post(resourceHelper.getJobResource(jobName)); } // create attachment with a file's content final File file = new File("c:/data/notice.html"); final Attachments attachments = new AttachmentWrapper("file", file); // put some sample metadata final AnyMap metadata = DataFactory.DEFAULT.createAnyMap(); metadata.put("_recordid", "1"); metadata.put("fileName", file.getCanonicalPath()); // now post metadata with an attachment from a file. // if we had a Record with attachments, we could POST that one... // note: we could add more than one attachment using the AttachmentWrapper. restClient.post(resourceHelper.getPushRecordToJobResource(jobName), metadata, attachments);
Issues with Authentication
The RestClients will throw an error if you try to access an HTTP server that requires authentication with an initial request using InputStreams as arguments, e.g.:
final RestClient restClient = new DefaultRestClient(); final ResourceHelper resourceHelper = new ResourceHelper(); InputStream jsonStream = new FileInputStream("jobdefinition.json"); Any jsonResult = restClient.invoke(HttpMethod.POST, resourceHelper.getJobsResource(), jsonStream, httpParams); --> org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) ... Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:693) ...
This happens because the HTTP client needs to repeat the request with credentials after receiving the authentication challenge for the initial request, and this is not possible because the argument stream cannot be resetted. You can work-around this by ensuring that the first request done with the RestClient instance is a non-stream-argument request, e.g. a simple GET. Then the RestClient caches the authentication information for the site and does not need to repeat the request:
final RestClient restClient = new DefaultRestClient(); final ResourceHelper resourceHelper = new ResourceHelper(); InputStream jsonStream = new FileInputStream("jobdefinition.json"); restClient.get(resourceHelper.getJobManagerResource()); Any jsonResult = restClient.invoke(HttpMethod.POST, resourceHelper.getJobsResource(), jsonStream, httpParams);
Note that when the server uses DIGEST authentication, the client may regularly need to repeat requests because the cached authentication information will go stale when the so-called "nonce"-value expires. You can prevent this by setting the "MaxNonceAge" of the DigestAuthenticator to a very high value:
<Set name="Authenticator"> <New class="org.eclipse.jetty.security.authentication.DigestAuthenticator" > <Set name="MaxNonceAge">9223372036854775807</Set> <!-- time in milliseconds, this is Long.MAX_VALUE --> </New> </Set>
Using Attachments with the RestClient
As seen above, the RestClient bundle provides an Attachments interface allowing attachments to be POSTed. An attachment consists of a string key and binary data that will be POSTed as application/octet-stream in a multi-part message.
Handling attachments manually
You can use the AttachmentWrapper in order to add attachments from the following sources if you want to handle attachments manually:
- a byte[]
- a String
- a File
- an InputStream
There are convenience constructors to provide an attachment when constructing an AttachmentWrapper but you can add more than one attachment and mix the types.
Example:
final RestClient restClient = new DefaultRestClient(); byte[] byteAttachment = new byte[1000]; String stringAttachment = "string attachment"; File fileAttachment = new File("c:/data/notice.html"); InputStream inputStreamAttachment = new FileInputStream(fileAttachment); AttachmentWrapper attachments = new AttachmentWrapper("byte-data", byteAttachment); attachments.add("string-data", stringAttachment); attachments.add("file-data", fileAttachment); attachments.add("stream-data", inputStreamAttachment); restClient.post(resource, parameters, attachments);
Handling attachments with records
SMILA records can also include attachments, and since SMILA's target data units are records, it is natural, that the RestClient also supports records (with attachments) directly.
That means, the record's metadata will be sent with the records' attachments as parts of a multi-part message.
Example:
final byte[] data1 = ...; final byte[] data2 = ...; record.setAttachment("data1", data1); record.setAttachment("data2", data2); // POST the record with the attachments restClient.post(resourceHelper.getPushRecordToJobResource(jobName), record);
Using the RestClient without the complete development environment
This section describes the steps to follow when using the RestClient from a Java application outside SMILA's JRE.
- Build or download the SMILA distribution.
- Create a new workspace.
- Create a Java project of your gusto.
- Add the following JARs from your downloaded/built SMILA application to the Java Build Path of your new project (exact version numbers are omitted in this list and replaced with *, just use the latest version you'll find in your SMILA application):
- from the plugins directory:
- org.apache.commons.collections_*.jar
- org.apache.commons.io_*.jar
- org.apache.commons.lang_*.jar
- org.apache.httpcomponents.httpclient_*.jar (>=4.1)
- org.apache.httpcomponents.httpcore_*.jar (>=4.1)
- org.apache.log4j_*.jar
- org.codehaus.jackson.core_*.jar
- org.eclipse.smila.datamodel_*.jar
- org.eclipse.smila.http.client_*.jar
- org.eclipse.smila.ipc_*.jar
- org.eclipse.smila.utils_*.jar
- from the plugins/org.apache.commons.logging_*/lib directory
- commons-logging-*.jar
- from the plugins directory:
Now you have all means to access SMILA's REST API from another Java application.
E.g. you could now write a simple program that creates and starts up a crawl job and the indexUpdate-job:
public class CrawlMyData { public static void main(String[] args) { final RestClient restClient = new DefaultRestClient(); final ResourceHelper resourceHelper = new ResourceHelper(); final String jobName = "crawlCData"; // create job description as an AnyMap final AnyMap jobDescription = DataFactory.DEFAULT.createAnyMap(); jobDescription.put("name", jobName); jobDescription.put("workflow", "fileCrawling"); final AnyMap parameters = DataFactory.DEFAULT.createAnyMap(); parameters.put("tempStore", "temp"); parameters.put("jobToPushTo", "indexUpdate"); parameters.put("dataSource", "file_data"); parameters.put("rootFolder", "c:/data"); jobDescription.put("parameters", parameters); try { // start the referred job "indexUpdate" that indexes our sent data. // We should check if it is still be running, etc.. restClient.post(resourceHelper.getJobResource("indexUpdate")); } catch (RestException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } try { // create (or update) the job, we chould check if it exists or is runnung, etc... restClient.post(resourceHelper.getJobsResource(), jobDescription); // POST with no body to start the Job in default mode restClient.post(resourceHelper.getJobResource(jobName)); } catch (RestException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
Putting it together
Now start your SMILA application and when it's up, run the application from above and watch the jobs using your preferred REST client (e.g. browser plugin, see Interactive REST tools) at http://localhost:8080/smila/jobmanager/jobs/.
You should see:
- the newly created job "crawlCData",
- the job "indexUpdate" is RUNNING,
- the job "crawlCData" is FINISHING (or has already finished, depending on the amount of data in your crawled directory).
Wait a bit and you can search your crawled data at http://localhost:8080/SMILA/search.
Using non-default configuration
SMILA's RestClient and ResourceHelper have default constructors using the standard values for the SMILA application. These are:
- Host: localhost
- Port: 8080
- Root context: /smila
If your bundle runs under a different root context path, you have to create your ResourceHelper using the actual context path. Also, if your application runs on a different server and/or uses a different port, you will have to supply this information to the constructor of the DefaultRestClient (you can omit the leading http://).
E.g. the following code snippet:
final RestClient restClient = new DefaultRestClient("host.domain.org:80"); final ResourceHelper resourceHelper = new ResourceHelper("/context");
creates a RestClient and a ResourceHelper connecting to a SMILA instance running on http://host.domain.org:80/context.
You can also use your own connection manager or limit the number of total connections and max connections per host by using the respective constructors of DefaultRestClient.
Links
- Using the REST API describes the general usage of the REST API.