Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/Solr 4.x
Configuration
Of the four different ways to run solr, all are prepared but using the SolrCloud will be the most featured. There is one Class named SolrConfig which is responsible for all those kinds of ways to run solr. It all depends on the solr-config.json in which way smila is going to connect to solr. Here is an example showing possible configuration parameters and their explanations.
solr-config.json
{ "mode":"cloud", "idFields":{ "collection1":"id" }, "restUri":"http://localhost:8983/solr/", "ResponseParser.fetchFacetFieldType":"false", "ResponseParser.errors":"THROW", "CloudSolrServer.zkHost":"localhost:9983", "CloudSolrServer.updatesToLeaders":"true", "EmbeddedSolrServer.solrHome":"configuration\\org.eclipse.smila.solr\\solr_home", "HttpSolrServer.baseUrl":"http://localhost:8983/" }
- mode: can be cloud, embedded or http - for further details please read on SolrServerService in Search.
- restUri: defines the URI to connect to solr. (for cloud and http only)
- ResponseParser.fetchFacetFieldType: can be either true or false and tells the ResponseParser to either fetch the FacetFieldType or not.
- ResponseParser.errors: tells the ResponseParser what to do if an error occurs. Possible values: IGNORE, LOG, THROW.
- CloudSolrServer.zkHost: takes host and port of the ZooKeeper-server if solr runs in cloud mode with separate ZooKeeper-instance.
- CloudSolrServer.updatesToLeaders: takes a boolean controlling wheather to send updates only to leaders or not. (defaults to true)
- EmbeddedSolrServer.solrHome: states the solr home folder relative to the working directory. (only for embedded)
- HttpSolrServer.baseUrl: takes the base URL for the http solr server.
Since Solr 4 the field name containing the id of a record can be configured for each collection separately. Therefore the map idFields takes collection names as keys and id field names as values.
Parameters
SolrParams
The SolrParams class extends the ParameterAccessor off the smila.processing-bundle. Read: SMILA/Documentation/HowTo/How_to_write_a_Pipelet. Therefore it has the same modular behaviour concerning the source of the configuration. The configuration can be passed via these three ways:
- Via Blackboard (global configuration from the pipeline-configuration)
- Via AnyMap
- Via Record (requires blackboard)
Usually getting the configuration off the blackboard is the standard case. This global configuration is valid for all records likewise. But Records can also have their own configuration which then will override the global settings for this very Record. The Record-Configuration doesn't need to be complete. It is sufficient that only those parameters that are different for one Record are stated in the map: _solr.importing .
If you put the configuration via AnyMap than only this configuration can be used, because the record-configurations are accessed via the blackboard.
The SolrParams class itself is extended by the following three:
- UpdateParams
The parameters accessible via the ImportingParams are used for importing/indexing.
- public Boolean getAttachments(final boolean defaultIfNull)
- Will get you if the attachments should be processed or not. The method expects a parameter that states the default if the needed parameter is not available.
- public String getServerName()
- Will get you the String representation of the name of the Core or the Collection in which the record should be indexed into.
- public Integer getCommitWithinMs(final boolean defaultIfNull)
- Will get you the maximum time in milliseconds before the next commit happens. The methods expects a parameter for you to choose if you want to get the default value if there is none configured.
- public Float getDocumentBoost(final boolean defaultIfNull)
- Will get you the boost-factor for the document. The method expects a parameter for you to choose if you want to get the default value if there is none configured.
- public AnyMap getMapping(final boolean defaultIfNull)
- Will get you the mapping of the fields.The method expects a parameter for you to choose if you want to get the default value if there is none configured.
- public Operation getOperation()
- Will get you the Operation Enum (mode). Can be: ADD, DELETE_BY_ID, DELETE_BY_QUERY, NONE;
- public String getQeuery()
- Will return the solr query.
- ResponseParams
There is only one parameter here:
- public Integer getQTime()
- Returns the time in milliseconds needed for the execution of the query.
- SearchParams
- public String getServerName()
- Returns the String representation of the Core or Collection of the SolrServer that should be used for searching.
- public METHOD getMethod()
- Returns the HTTP method to use. Can be POST or GET. Defaults to POST.
- public String getWorkflow()
- Returns the String representation of the workflow name.
- public ErrorHandling getErrorHandling()
- Returns how an error should be handled. Can be: IGNORE, LOG, THROW. Defaults to IGNORE.
- public QueryLogging getQueryLogging()
- Returns how the query should be logged. Can be: SMILA, SOLR, BOTH, NONE. Defaults to NONE.
ImportingParameters
- DocumentBoost: Each document can have its own boost, defaulting to 1.0. The DocumentBoost is part of the importing parameters that are located as follows:
<?xml version="1.0" encoding="utf-8"?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">param</Val> <Val key="title">Parameters</Val> <Map key="_solr"> <Map key="importing"> <Val key="documentBoost" type="double">23.0</Val> </Map> </Map> </Record>
Result SolrInputDocument:
SolrInputDocument(fields: [_recordid=param, title=Parameters]) DocumentBoost:23.0
Other importing parameters can be set via Pipelet-Configuration.
- Mapping: The mapping can also be put via the importing parameters:
<Map key="_solr"> <Map key="importing"> <Map key="mapping"> [...] </Map> </Map> </Map>
- Commit Within Ms
<Map key="_solr"> <Map key="importing"> <Val key="commitWithinMS"> [...] </Val> </Map> </Map>
- Servername: Describing the solr core or collection name the record(s) should be stored to or read from.
<Map key="_solr"> <Map key="importing"> <Val key="serverName"> [...] </Val> </Map> </Map>
- Attachments: With this parameter one can set if attachments should be parsed from a record or ignored.
<Map key="_solr"> <Map key="importing"> <Val key="attachments"> [...] </Val> </Map> </Map>
- Operation: Can be one of the following: ADD, DELETE_BY_ID, DELETE_BY_QUERY, NONE
<Map key="_solr"> <Map key="importing"> <Val key="operation"> [...] </Val> </Map> </Map>
- Query: The Solr query.
<Map key="_solr"> <Map key="importing"> <Val key="query"> [...] </Val> </Map> </Map>
Search
SolrQueryBuilder
The SolrQueryBuilder can be used to build a solr-query programmatically. All fields that smila can use are available as such. Existing documentation can be found here: http://wiki.eclipse.org/SMILA/Documentation/Search
Further (Solr-specific) parameters haven been also included using native parameters.
All methods have the SolrQueryBuilder-Instance itself as return value, so daisy-chaining the methods makes it quiet easy to construct a query.
For Facetting details please look into the solr-documentation: https://cwiki.apache.org/confluence/display/solr/Faceting
For Query details the solr-documentation can be helpful as well: https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing
ResponseParser
The ResponseParser takes a SolrResponse (which can either be a SolrResponseBase or a QueryResponse) and translates this data into a Record for further processing within smila.
The ResponseParser can be instantiated with either of two constructors:
- public ResponseParser(final String workflowName, final Record result, final String idField)
- public ResponseParser(final String workflowName, final Record result, final String idField, final ErrorHandling errors)
The constructors need the workflow name, a Record for storing the result and the name of the idField. The first constructor sets the ErrorHandling to ErrorHandling.IGNORE. The second one expects the ErrorHandling to be given as a parameter. Possible Values are: IGNORE, LOG, THROW.
The ResponseParser is able to parse results, facets, grouping, terms, spellcheck, more like this, cursor mark, debug and stats off of the SolrResponse.
(TODO: Was kann der Parsen, wie sieht das dann in der smila-struktur aus)
SolrServerService
The SolrServerService is implemented as OSGi-Service that uses the default solr-config.json to determine which mode solr is running. Within its activate method CloudServers, EmbeddedServer or RemoteServers are destinguished and stored as reference.
Using the getConfig() method returns the configuration from the solr-config.json.
Using the getServer(String name) method will return the SolrServer with the given name. SolrServer in this content means the name of the core or the collection depending on the solr-mode.
SolrServers
SolrServers is the super-class for EmbeddedServers, CloudServers and RemoteServers. It is used for creating and caching solr-servers. The public method getServer(String name) returns the SolrServer with the given name or throws an SolrServerException if no server with the given name is found. This method also includes a caching mechanism for directly return the SolrServer if it was already created earlier. Servers can also be removed from the cache using the removeServer(String name) method. Finally the whole cache can be cleared using the clearCache() method. As well as from the SolrServerService itself the SolrConfig can also be returned using the getConfig() method off SolrServers. The abstract method createServer(String name) is implemented within the three Classes stated earlier and are described as follows:
CloudServers
Using Solr as cloud is likely going to be the most used solr-mode of all. The SolrServer for solr-cloud is created using the createServer(String name) method. This method is used by SolrServer when the method getServer(String name) is called and the solr-mode is cloud. Take a look at the Configuration to see the fields required in solr-config.json.
EmbeddedServers
Using solr embedded in smila should only be used for testing and development purposes and NOT for production. Solr embedded will not be as high-performance as stand-alone (cloud, remote) and will not have all the features solr offers. As with CloudServers the createServer(String name) method is called by the getServer(String name) method of SolrServers. No special configuration is needed, because no connection to external processes are made. Stating a solr.home-folder in the solr-config.json is nevertheless mandatory. EmbeddedServers has a CoreContainer that inherits all the SolrServers.
RemoteServers
Using solr in remote mode is so-to-say legacy, but still the easiest way to set up solr if no cloud (clustering) is needed. As with CloudServers the createServer(String name) method is called from the getServer(String name) method off SolrServers. Take a look at Configuration to see which fields are required in solr-config.json to make solr work in remote with smila.
SolrSearchPipelet
The SolrSearchPipelet uses the same configuration-fallback-logic that is described in Parameters. As with all new Pipelets the SolrSearchPipelet can be configured to either drop or fail if one record induces an error. If _dropOnError is not stated it defaults to false.
Sample Configuration:
<extensionActivity> <proc:invokePipelet name="SolrSearchPipelet"> <proc:pipelet class="org.eclipse.smila.solr.search.SolrSearchPipelet" /> <proc:variables input="request" output="request" /> <proc:configuration> <rec:Map key="search"> <rec:Val key="serverName">collection1</rec:Val> <rec:Val key="method">GET</rec:Val> <rec:Val key="workflow">SearchPipeline</rec:Val> </rec:Map> </proc:configuration> </proc:invokePipelet> </extensionActivity>
The value of the parameter serverName states the core or collection in which the search shall be executed. The value of the parameter method states whether the request to the solrServer(s) should be requested via HTTP GET or HTTP POST. The value of the parameter workflow states the name of the workflow in which the SolrSearchPipelet is being executed.
How to search The default solr search handler accepting search records is located at: http://localhost:8080/SMILA/recordsearch Basic search The simplest query available could look like this:
<Record> <Val key="query">Oldtimer </Val> </Record>
solr query: Icon start=0&rows=10&q=Oldtimer
This will search for "Oldtimer" in the solr-field defined as "df"-parameter in solrconfig.xml . The string within the query parameter is passed "as-is" to solr. This gives the possibility to use solr/lucene search-syntax without any additional parameters. Take a look at the fielded search for an example. A result could look like this (truncated):
<SearchResult xmlns="http://www.eclipse.org/smila/search"> <Workflow>SearchPipeline</Workflow> <Record version="2.0" xmlns="http://www.eclipse.org/smila/record"> <Val key="query">Oldtimer</Val> <Val key="_recordid">SearchPipeline-a046ed63-4637-4eab-8b18-c3f0a917e056</Val> <Seq key="records"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Oldtimer-Katalog_Tuev_Sued.pdf</Val> <Val key="id">E:/TestData/dataECE/Auto/Oldtimer-Katalog_Tuev_Sued.pdf</Val> <Val key="_source">file</Val> <Seq key="Filename"><Seq><Val>Oldtimer-Katalog_Tuev_Sued.pdf</Val></Seq></Seq> <Val key="MimeType">application/pdf</Val> <Val key="Size">125433</Val> <Val key="LastModifiedDate" type="datetime">2009-10-29T11:31:16.000+0100</Val> <Seq key="Content"><Seq><Val/></Seq></Seq> <Val key="Extension">pdf</Val> <Val key="_version_" type="long">1483104970887659520</Val> </Map> [...] <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">0</Val> <Map key="params"> <Val key="start">0</Val> <Val key="q">Oldtimer</Val> <Val key="wt">javabin</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> </Map> </Map> <Val key="count" type="long">11</Val> <Val key="runtime" type="long">31</Val> </Record> </SearchResult>
In the first part of the result the parameters of the given query are returned. Making it easier to debug in case of an unexpected result. Within the records sequence there are maps representing the result-records with all the field available for one very record. In the map "_solr" (after [...]) the solr responseHeader and the params (also default params) that were used to create the response are located. Where QTime being the time solr itself needed to calculate the result (transport times between smila and solr are not included) and status "0" meaning the request was successfully processed by solr. The last two parameters represent the number of records found in total (count) and the number of milliseconds smila and solr (including transport-time between smila and solr) took for creating the result (runtime). Fielded search It is also possible to search on specified fields. Two ways to do this are available while using solr with SMILA:
<Record> <Val key="query">Filename:WWF</Val> </Record>
OR
<Record> <Map key="query"> <Val key="Filename">WWF</Val> </Map> </Record>
solr query Icon start=0&rows=10&q=Filename%3AWWF
Now only if the term "WWF" is stated in the field "Filename" the corresponding records are being added to the ResultList:
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Filename:WWF</Val> <Val key="_recordid">SearchPipeline-75c92acb-77e1-4ad9-b00e-b9b90f5d20ef</Val> <Seq key="records"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\WWF\WWF-Stromkonzept.pdf</Val> <Val key="id">E:/TestData/dataECE/WWF/WWF-Stromkonzept.pdf</Val> <Val key="_source">file</Val> <Seq key="Filename"> <Seq> <Val>WWF-Stromkonzept.pdf</Val> </Seq> </Seq> <Val key="MimeType">application/pdf</Val> <Val key="Size">67626</Val> <Val key="LastModifiedDate" type="datetime">2009-10-29T10:13:16.000+0100</Val> <Seq key="Content">[...]</Seq> <Val key="Extension">pdf</Val> <Val key="_version_" type="long">1483105029430706176</Val> </Map> </Seq> <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">1</Val> <Map key="params"> <Val key="start">0</Val> <Val key="q">Filename:WWF</Val> <Val key="wt">javabin</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> </Map> </Map> <Val key="count" type="long">1</Val> <Val key="runtime" type="long">11</Val> </Record> </SearchResult>
It is also possible to pass multiple terms to on field, e.g.:
<Record> <Map key="query"> <Seq key="Filename"> <Val>Eclipse</Val> <Val>Process</Val> <Val>Framework</Val> </Seq> </Map> </Record>
This will be translated to FileName:(Eclipse Process Framework) By default the operator for those terms is OR. Still possible but deprecated is to set <solrQueryParser defaultOperator="AND"/> in solrconfig.xml. But the better way to change the operator is adding q.op to the _solr.native Maps, as shown in the next chapter "native params". Native params Query strings can also be passed directly to solr using the _solr.native Maps:
<Record> <Map key="_solr"> <Map key="native"> <Val key="q">Oldtimer</Val> </Map> </Map> </Record>
The result to this query is equivalent to the result of the first example. solr query Icon start=0&rows=10&q=Oldtimer
Result attributes To reduce the amount of data travelling over the wire and also to reduce the amount of computing power needed for the result of a query. The result attributes (fields to return) can be filtered:
<Record> <Val key="query">Filename:Presseinformation</Val> <Seq key="resultAttributes"> <Val>Content</Val> </Seq> </Record>
solr query Icon start=0&rows=10&fl=Content&q=Filename%3APresseinformation
This will only return the content of the field "Content". As shown the field to be searched on doesn't have to be in the result attributes and still result in a valid and successful query. The result could look like this (truncated):
<SearchResult xmlns="http://www.eclipse.org/smila/search"> <Workflow>SearchPipeline</Workflow> <Record version="2.0" xmlns="http://www.eclipse.org/smila/record"> <Val key="query">Filename:Presseinformation</Val> <Seq key="resultAttributes"> <Val>Content</Val> </Seq> <Val key="_recordid">SearchPipeline-89faff78-6b3c-465d-b44e-3e22bd87e107</Val> <Seq key="records"> <Map> <Seq key="Content"> <Seq> <Val>Oldtimer-Rallye „2000 km durch Deutschland“ im Jahr 2008: [...] </Val> </Seq> </Seq> </Map> </Seq> <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">2</Val> <Map key="params"> <Val key="fl">Content</Val> <Val key="start">0</Val> <Val key="q">Filename:Presseinformation</Val> <Val key="wt">javabin</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> </Map> </Map> <Val key="count" type="long">1</Val> <Val key="runtime" type="long">21</Val> </Record> </SearchResult>
Scoring By default, scoring is disabled. Of course solr does its scoring internally (to determine the order of the results) but it doesn't return the calculated values and therefore disabling scoring by default saves a lot of computing power. Nevertheless scoring can easily be activated by stating the keyword "score" in the result attributes sequence.
<Record> <Val key="query">Oldtimer</Val> <Seq key="resultAttributes"> <Val>score</Val> </Seq> </Record>
solr query Icon start=0&rows=10&fl=score&q=Oldtimer
This query record will return something like this:
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Oldtimer</Val> <Seq key="resultAttributes"> <Val>score</Val> </Seq> <Val key="_recordid">SearchPipeline-6f551b16-a013-489c-867f-f24da3c01d6f</Val> <Seq key="records"> <Map> <Val key="score" type="double">1.5369712114334106</Val> </Map> <Map> <Val key="score" type="double">0.6777405738830566</Val> [...] </Map> </Seq> <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">0</Val> <Map key="params"> <Val key="fl">score</Val> <Val key="start">0</Val> <Val key="q">Oldtimer</Val> <Val key="wt">javabin</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> <Val key="maxScore" type="double">1.5369712114334106</Val> </Map> </Map> <Val key="count" type="long">11</Val> <Val key="runtime" type="long">17</Val> </Record> </SearchResult>
As shown only the score values are returned. If the score is needed in addition to the other fields the resultsAttributes sequence of the search record as shown above needs the value * added (or just the fields that are needed, e.g. _recordid, score). The response map within the _solr map now shows also the maxScore representing the maximum score over all the records found (not only those that are returned in this very request). Highlighting Retrieving a highlighted response is as easy as possible. Only the name of the field that should be used for highlighting has to be stated in the highlight sequence.
<Record> <Val key="query">Oldtimer</Val> <Seq key="highlight"> <Val>text</Val> </Seq> </Record>
solr query Icon start=0&rows=10&fl=id&q=Oldtimer&hl.fl=text&hl=true
This query record returns a solr result like this:
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Oldtimer</Val> <Seq key="highlight"> <Val>text</Val> </Seq> <Val key="_recordid">SearchPipeline-0017a1ab-6c8f-4f23-b57b-75396c78143f</Val> <Seq key="records"> <Map> <Val key="id">E:/TestData/dataECE/Auto/Oldtimer-Katalog_Tuev_Sued.pdf</Val> <Map key="_highlight"> <Map key="text"> <Val key="text">Oldtimer-Katalog_Tuev_Sued.pdf</Val> </Map> </Map> </Map> [...] </Seq> <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">486</Val> <Map key="params"> <Val key="fl">id</Val> <Val key="start">0</Val> <Val key="q">Oldtimer</Val> <Val key="hl.fl">text</Val> <Val key="wt">javabin</Val> <Val key="hl">true</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> </Map> </Map> <Val key="count" type="long">11</Val> <Val key="runtime" type="long">505</Val> </Record> </SearchResult>
As can be seen only the id and the highlighted text field for each result record are returned. If more fields are needed it is necessary to state those in the resultAttributes sequence. A list of all supportet highlighting-related parameters can be found in the solr documentation: https://cwiki.apache.org/confluence/display/solr/Highlighting Little example for getting two snippets per result record instead of default (1):
<Record> <Val key="query">Oldtimer</Val> <Seq key="highlight"> <Val>text</Val> </Seq> <Map key="_solr"> <Map key="native"> <Val key="hl.snippets">2</Val> </Map> </Map> </Record>
solr query Icon start=0&rows=10&fl=id&q=Oldtimer&hl.fl=text&hl=true&hl.snippets=2
maxcount, offset The maxcount parameter defaults to 10 and limits the number of result records returned per search request. The offset parameters defines the number of results that should be "skipped" in order to get the further results. Those parameters can easily be changed:
<Record> <Val key="query">Oldtimer</Val> <Val key="maxcount" type="long">20</Val> <Val key="offset" type="long">20</Val> </Record>
solr query Icon start=20&rows=20&q=Oldtimer
The preceding query record will retrieve 20 results (if there are) starting with the 21st records. This is so-to-say the 2nd page. Facetting All details on facetting with solr can be found in the solr wiki: https://cwiki.apache.org/confluence/display/solr/Faceting Facets are specified by the map facetby. Three different types of facetting are available. First one is facetting over attribute. Facets are designed to arrange the search results in categories based on terms found in the index. Attribute To use an attribute for facetting the following record with its facetby map will be necessary.
<Record> <Val key="query">Oldtimer</Val> <Val key="maxcount">1</Val> <Map key="facetby"> <Val key="attribute">Extension</Val> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> </Seq> </Record>
solr query Icon start=0&rows=1&fl=_recordid&q=Oldtimer&facet.field=Extension&facet=true
Again resultAttributes and maxcount 1 are only stated to make the following search result more readable by concerning only values important for this example.
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Oldtimer</Val> <Val key="maxcount">1</Val> <Map key="facetby"> <Val key="attribute">Extension</Val> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> </Seq> <Val key="_recordid">SearchPipeline-dadce3ba-d7f0-4e26-b383-d062f38e620c</Val> <Seq key="records"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Oldtimer-Katalog_Tuev_Sued.pdf</Val> </Map> </Seq> [...] <Val key="count" type="long">11</Val> <Map key="facets"> <Seq key="Extension"> <Map> <Val key="value">doc</Val> <Val key="count" type="long">9</Val> </Map> <Map> <Val key="value">pdf</Val> <Val key="count" type="long">1</Val> </Map> <Map> <Val key="value">ppt</Val> <Val key="count" type="long">1</Val> </Map> <Map> <Val key="value">dwg</Val> <Val key="count" type="long">0</Val> </Map> <Map> <Val key="value">dxf</Val> <Val key="count" type="long">0</Val> </Map> <Map> <Val key="value">htm</Val> <Val key="count" type="long">0</Val> </Map> <Map> <Val key="value">txt</Val> <Val key="count" type="long">0</Val> </Map> </Seq> <Seq key="queries"/> </Map> <Val key="runtime" type="long">2321</Val> </Record> </SearchResult>
The facets map contains all "categories" and the number of occurrences within this search. Range Fecetting over a range is typically done with date fields.
<Record> <Val key="query">Oldtimer</Val> <Map key="facetby"> <Val key="range">Date</Val> <Val key="start" type="DateTime">2009-10-20T10:50:44.000+0100</Val> <Val key="end" type="DateTime">2009-10-22T11:50:44.000+0100</Val> <Val key="gap">+1DAY</Val> </Map> </Record>
Solr query Icon facet=true&fl=_recordid&f.last_modified.facet.range.gap=%2B1DAY&start=0&f.last_modified.facet.range.end=2009-10-22T10:50:44.000Z&q=Oldtimer&facet.range=last_modified&wt=javabin&version=2&f.last_modified.facet.range.start=2009-10-20T09:50:44.000Z&rows=10
The value of the range parameter sets the field upon the facetting should be calculated. If the range field is a Date, the DateMathParser can NOT be used for describing the range. Further details on range facets can be found in the solr wiki: https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-RangeFaceting The returned result might look like this:
<SearchResult xmlns="http://www.eclipse.org/smila/search"> <Workflow>SearchPipeline</Workflow> <Record version="2.0" xmlns="http://www.eclipse.org/smila/record"> <Val key="query">Oldtimer</Val> <Map key="facetby"> <Val key="range">last_modified</Val> <Val key="start" type="datetime">2009-10-20T10:50:44.000+0100</Val> <Val key="end" type="datetime">2009-10-22T11:50:44.000+0100</Val> <Val key="gap">+1DAY</Val> </Map> <Val key="_recordid">SearchPipeline-10ea2bbe-8913-4276-a7ba-2de0a12653f0</Val> <Seq key="records"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Ausnahme für Oldtimer.doc</Val> </Map> [...] </Seq> <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">0</Val> <Map key="params"> <Val key="facet">true</Val> <Val key="fl">_recordid</Val> <Val key="f.last_modified.facet.range.gap">+1DAY</Val> <Val key="start">0</Val> <Val key="f.last_modified.facet.range.end">2009-10-22T10:50:44.000Z</Val> <Val key="q">Oldtimer</Val> <Val key="facet.range">last_modified</Val> <Val key="wt">javabin</Val> <Val key="version">2</Val> <Val key="f.last_modified.facet.range.start">2009-10-20T09:50:44.000Z</Val> <Val key="rows">10</Val> </Map> </Map> </Map> </Map> <Val key="count" type="long">10</Val> <Map key="facets"> <Seq key="last_modified"> <Map> <Val key="value">2009-10-20T09:50:44Z</Val> <Val key="count" type="long">0</Val> </Map> <Map> <Val key="value">2009-10-21T09:50:44Z</Val> <Val key="count" type="long">0</Val> </Map> <Map> <Val key="value">2009-10-22T09:50:44Z</Val> <Val key="count" type="long">0</Val> </Map> </Seq> <Seq key="queries"/> </Map> <Val key="runtime" type="long">1943</Val> </Record> </SearchResult>
Query (not yet complete) Using a query as parameter for calcutaling a facette can be done like this:
<Record> <Val key="query">Oldtimer</Val> <Map key="facetby"> <Val key="query">Date</Val> </Map> </Record>
Pivot (not yet implemented) Interval (query working, answer from solr not yet converted) Fields that should be used for interval faceting need the "docValues" parameter enabled.
<Record> <Val key="query">*</Val> <Map key="facetby"> <Val key="interval">Size</Val> <Seq key="set"> <Val>(0,1000)</Val> <Val>[1000,10000)</Val> <Val>[10000,1000000)</Val> <Val>[1000000,*)</Val> </Seq> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> <Val>Size</Val> </Seq> </Record>
solr query Icon facet=true&fl=_recordid,Size&indent=true&start=0&facet.interval=Size&q=*:*&q=*&_=1418047058873&f.Size.facet.interval.set=(0,1000)&f.Size.facet.interval.set=[1000,10000)&f.Size.facet.interval.set=[10000,1000000)&f.Size.facet.interval.set=[1000000,*)
Filtering on Facets Here is an example for filtering on a facet. The filter can be directly defined within the facetby Map, saving the extra definition for the attribute or range... The filtering on the result attributes are just to clean up the result record below.
<Record> <Val key="query">*</Val> <Map key="facetby"> <Val key="attribute">Extension</Val> <Seq key="filter"> <Map> <Val key="oneOf">dwg</Val> </Map> </Seq> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> <Val>Extension</Val> </Seq> </Record>
Solr query Icon facet=true&fl=_recordid,Extension&start=0&q=*&facet.field=Extension&fq=+(Extension:dwg)&rows=10
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">*</Val> <Map key="facetby"> <Val key="attribute">Extension</Val> <Seq key="filter"> <Map> <Val key="oneOf">dwg</Val> <Val key="tag">tag_Extension</Val> </Map> </Seq> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> <Val>Extension</Val> </Seq> <Val key="_recordid">SearchPipeline-2120380d-d138-4900-ac42-461849640099</Val> <Seq key="records"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\CAD\Antoinette.dwg</Val> <Val key="Extension">dwg</Val> </Map> [...] </Seq> <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">0</Val> <Map key="params"> <Val key="facet">true</Val> <Val key="fl">_recordid,Extension</Val> <Val key="start">0</Val> <Val key="q">*</Val> <Val key="facet.field">Extension</Val> <Val key="wt">javabin</Val> <Val key="fq"> (Extension:dwg)</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> </Map> </Map> <Val key="count" type="long">11</Val> <Map key="facets"> <Seq key="Extension"> <Map> <Val key="value">dwg</Val> <Val key="count" type="long">11</Val> </Map> <Map> <Val key="value">doc</Val> <Val key="count" type="long">0</Val> </Map> [...] </Seq> <Seq key="queries"/> </Map> <Val key="runtime" type="long">14</Val> </Record> </SearchResult>
Grouping Solr features grouping by using either a field, a function or a query. In this example the field Extension is used to group the results. Function and query searches look similar.
<Record> <Val key="query">Oldtimer</Val> <Map key="groupby"> <Val key="attribute">Extension</Val> </Map> <Map key="_solr"> <Map key="native"> <Val key="group">true</Val> </Map> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> </Seq> </Record>
solr query Icon start=0&rows=10&fl=_recordid&q=Oldtimer&group.field=Extension&group=true
It is important that grouping itself is enabled using the _solr.native maps as shown above. Not enabling "group" will cause the groupby map to be ignored by solr. The resultAttributes are only stated to keep the following search result at a reasonable length and thus to focus on the grouping feature.
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Oldtimer</Val> <Map key="groupby"> <Val key="attribute">Extension</Val> <Val key="order">ascending</Val> </Map> <Map key="_solr"> <Map key="native"> <Val key="group">true</Val> </Map> <Map key="response"> <Val key="status" type="long">0</Val> <Val key="qTime" type="long">0</Val> </Map> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> </Seq> <Val key="_recordid">SearchPipeline-1d927085-da0b-4957-a685-21faaa0d4271</Val> <Seq key="records"/> <Map key="groups"> <Seq key="matches"> <Val type="long">11</Val> </Seq> <Seq key="Extension"> <Map> <Val key="value">pdf</Val> <Val key="count" type="long">1</Val> <Seq key="results"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Oldtimer-Katalog_Tuev_Sued.pdf</Val> </Map> </Seq> </Map> <Map> <Val key="value">doc</Val> <Val key="count" type="long">9</Val> <Seq key="results"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Ausnahme für Oldtimer.doc</Val> </Map> </Seq> </Map> [...] <Val key="runtime" type="long">2132</Val> </Record> </SearchResult>
Besides the grouping via attribute grouping via function and via query are also possible. These two parameters can be passed as smila parameters like the attribute in the example above. All available grouping-parameters can be found in the solr documentation: https://cwiki.apache.org/confluence/display/solr/Result+Grouping More Like This The "More Like This"-function off solr can be used to find similar documents for a given query. Details can be found in the solr documentation: https://cwiki.apache.org/confluence/display/solr/MoreLikeThis More Like This is activated by adding the mlt parameter in the _solr.native maps and also stating a field that should be used to calculate the similarity of documents.
<Record> <Val key="query">id:"E\:/TestData/dataECE/Auto/Oldtimer Rallye Strecke und Fragen.doc"</Val> <Map key="_solr"> <Map key="native"> <Val key="mlt">true</Val> <Val key="mlt.fl">text</Val> </Map> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> <Val>score</Val> </Seq> </Record>
solr query Icon start=0&rows=10&fl=_recordid%2Cscore&q=id%3A%22E%5C%3A%2FTestData%2FdataECE%2FAuto%2FOldtimer+Rallye+Strecke+und+Fragen.doc%22&mlt=true&mlt.fl=text
Stating a value of the uniqueKey field id will result in searching for "more like this" with only one record. Stating a search query that returns mutiple hits will cause solr to calculate "more like this" for each result record independently. Again the resultAttributes parameter is only used to keep this example at a reasonable length.
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">id:"E\:/TestData/dataECE/Auto/Oldtimer Rallye Strecke und Fragen.doc"</Val> <Map key="_solr"> <Map key="native"> <Val key="mlt">true</Val> <Val key="mlt.fl">text</Val> </Map> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">0</Val> <Map key="params"> <Val key="mlt.fl">text</Val> <Val key="fl">_recordid,score</Val> <Val key="start">0</Val> <Val key="q">id:"E\:/TestData/dataECE/Auto/Oldtimer Rallye Strecke und Fragen.doc"</Val> <Val key="mlt">true</Val> <Val key="wt">javabin</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> <Val key="maxScore" type="double">5.8903489112854</Val> <Map key="moreLikeThis"> <Map key="E:/TestData/dataECE/Auto/Oldtimer Rallye Strecke und Fragen.doc"> <Val key="numFound" type="long">57</Val> <Val key="start" type="long">0</Val> <Val key="maxScore" type="double">0.8097046613693237</Val> <Seq key="related"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Klimaschutz\HANSA_Meer_Klima_Seerecht_1993.doc</Val> <Val key="score" type="double">0.8097046613693237</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\2000 km Presseinformation 06 08.doc</Val> <Val key="score" type="double">0.7938450574874878</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Klimaschutz\Klimagefaehrdung.doc</Val> <Val key="score" type="double">0.7846448421478271</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Klimaschutz\emission_beschaeftigungswirkung.doc</Val> <Val key="score" type="double">0.6580735445022583</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Klimaschutz\medienmitteilung_20070322.doc</Val> <Val key="score" type="double">0.6203610301017761</Val> </Map> </Seq> </Map> </Map> </Map> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> <Val>score</Val> </Seq> <Val key="_recordid">SearchPipeline-fa881498-887e-4459-9583-799972cd5c21</Val> <Seq key="records"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Oldtimer Rallye Strecke und Fragen.doc</Val> <Val key="score" type="double">5.8903489112854</Val> </Map> </Seq> <Val key="count" type="long">1</Val> <Val key="runtime" type="long">19</Val> </Record> </SearchResult>
The default number of "more like this" results for each result record is 5. This and other parameters available for solr's more like this feature can be found at the solr wiki: https://cwiki.apache.org/confluence/display/solr/MoreLikeThis Sorting By default sorting is done by the score in descending order. But the field used for sorting can be altered using the sortby map. This map needs to parameters to set the sorting on a different field. First is attribute which contains the name of the field to sort by and second is the order that specifies the sorting "direction". Both parameters are mandatory and will result in an exception if not set.
<Record> <Val key="query">Oldtimer</Val> <Map key="sortby"> <Val key="attribute">Extension</Val> <Val key="order">ascending</Val> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> </Seq> </Record>
solr query Icon start=0&rows=10&fl=_recordid&q=Oldtimer&sort=Extension+asc
The resultAttrubutes again are only set to keep the focus on sorting and not be distracted by all the other fields that could be returned.
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Oldtimer</Val> <Map key="sortby"> <Val key="attribute">Extension</Val> <Val key="order">ascending</Val> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> </Seq> <Val key="_recordid">SearchPipeline-aadeb884-3dde-4f09-ab8b-d63e1e502ea0</Val> <Seq key="records"> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Oldtimer Rallye Strecke und Fragen.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\oldtimer-katalog_tuev_sueddeutschland.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\OldtimerKlassiker.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\oldtimer_versicherung.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\pressetext_havelland_classic_2009.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Reservierung_Chauffeur_PK1-2.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\10_06_2005 Oldtimer-Nachtrallye kommt nach Halle 20_07_.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\2000 km Presseinformation 06 08.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Ausnahme für Oldtimer.doc</Val> </Map> <Map> <Val key="_recordid">file:E:\TestData\dataECE\Auto\Oldtimer-Katalog_Tuev_Sued.pdf</Val> </Map> </Seq> <Map key="_solr"> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">16</Val> <Map key="params"> <Val key="sort">Extension asc</Val> <Val key="fl">_recordid</Val> <Val key="start">0</Val> <Val key="q">Oldtimer</Val> <Val key="wt">javabin</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> </Map> </Map> <Val key="count" type="long">11</Val> <Val key="runtime" type="long">2535</Val> </Record> </SearchResult>
If the sortby parameters weren't set, the last item would actually be the first one, because of its highest score. Debug Being able to "see" what solr is doing in the background is a new feature of the solr 4 integration. This feature is designed for debugging purposes only and should not be used in production scenario because of high processing load. Enabling the debug-feature is done like that:
<Record> <Val key="query">Oldtimer</Val> <Val key="maxcount" type="long">1</Val> <Map key="_solr"> <Map key="native"> <Val key="debug">true</Val> </Map> </Map> </Record>
solr query Icon start=0&rows=1&q=Oldtimer&debug=true
Setting the debug parameter to true will return all debug-information. But there is also the possibility to return just parts of the debug-information. Please refer to the Solr Wiki to examine the other options. https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThedebugParameter
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Oldtimer</Val> <Val key="maxcount" type="long">1</Val> <Seq key="highlight"> <Val>text</Val> </Seq> <Map key="_solr"> <Map key="native"> <Val key="debug">true</Val> </Map> <Map key="response"> <Val key="status" type="long">0</Val> <Val key="qTime" type="long">0</Val> <Map key="debug"> <Val key="rawquerystring">Oldtimer</Val> <Val key="querystring">Oldtimer</Val> <Val key="parsedquery">text:oldtimer</Val> <Val key="parsedquery_toString">text:oldtimer</Val> <Map key="explain"> <Val key="E:/TestData/dataECE/Auto/Oldtimer-Katalog_Tuev_Sued.pdf">1.5706561 = (MATCH) weight(text:oldtimer in 2) [DefaultSimilarity], result of: 1.5706561 = score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.99999994 = queryWeight, product of: 4.1884165 = idf(docFreq=11, maxDocs=291) 0.2387537 = queryNorm 1.5706562 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.1884165 = idf(docFreq=11, maxDocs=291) 0.375 = fieldNorm(doc=2)</Val> </Map> <Val key="QParser">LuceneQParser</Val> <Map key="timing"> <Val key="time" type="double">0.0</Val> <Map key="prepare"> <Val key="time" type="double">0.0</Val> <Map key="query"> <Val key="time" type="double">0.0</Val> </Map> <Map key="facet"> <Val key="time" type="double">0.0</Val> </Map> <Map key="mlt"> <Val key="time" type="double">0.0</Val> </Map> <Map key="highlight"> <Val key="time" type="double">0.0</Val> </Map> <Map key="stats"> <Val key="time" type="double">0.0</Val> </Map> <Map key="debug"> <Val key="time" type="double">0.0</Val> </Map> </Map> <Map key="process"> <Val key="time" type="double">0.0</Val> <Map key="query"> <Val key="time" type="double">0.0</Val> </Map> <Map key="facet"> <Val key="time" type="double">0.0</Val> </Map> <Map key="mlt"> <Val key="time" type="double">0.0</Val> </Map> <Map key="highlight"> <Val key="time" type="double">0.0</Val> </Map> <Map key="stats"> <Val key="time" type="double">0.0</Val> </Map> <Map key="debug"> <Val key="time" type="double">0.0</Val> </Map> </Map> </Map> </Map> </Map> </Map> <Val key="_recordid">SearchPipeline-718f9044-bac3-4cb1-8994-b490dac0bd52</Val> <Seq key="records"> <Map> <Val key="id">E:/TestData/dataECE/Auto/Oldtimer-Katalog_Tuev_Sued.pdf</Val> </Map> </Seq> <Val key="count" type="long">11</Val> <Val key="runtime" type="long">23</Val> </Record> </SearchResult>
Echo Params For debugging purposes it can be also helpful to see which parameters are used to create the result. This also includes parameters that are no explicitly stated in the query record, but are default values or configured in the solrconfig.xml. Setting the echoParams to none is the same as not stating the parameter at all Setting echoParams to explicit defines that the parameters defined in the query should be returned when debug information is returned. Setting the echoParams to all will cause solr to always return the parameters involved in calculating the result. Example:
<Record> <Val key="query">Olktimer</Val> <Map key="_solr"> <Map key="native"> <Val key="qt">/spell</Val> <Val key="spellcheck">true</Val> <Val key="echoParams">all</Val> </Map> </Map> </Record>
solr query Icon start=0&rows=10&q=Olktimer&qt=%2Fspell&spellcheck=true&echoParams=all
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Olktimer</Val> <Map key="_solr"> <Map key="native"> <Val key="qt">/spell</Val> <Val key="spellcheck">true</Val> <Val key="echoParams">all</Val> </Map> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">0</Val> <Map key="params"> <Val key="spellcheck">true</Val> <Val key="df">text</Val> <Val key="spellcheck.maxResultsForSuggest">5</Val> <Val key="spellcheck.collateExtendedResults">true</Val> <Val key="spellcheck.extendedResults">false</Val> <Val key="spellcheck.maxCollations">5</Val> <Val key="spellcheck.maxCollationTries">10</Val> <Seq key="spellcheck.dictionary"> <Val>default</Val> <Val>wordbreak</Val> </Seq> <Val key="spellcheck.count">10</Val> <Val key="spellcheck.collate">true</Val> <Val key="spellcheck.alternativeTermCount">5</Val> <Val key="echoParams">all</Val> <Val key="start">0</Val> <Val key="q">Olktimer</Val> <Val key="wt">javabin</Val> <Val key="qt">/spell</Val> <Val key="version">2</Val> <Val key="rows">10</Val> </Map> </Map> <Map key="spellcheck"> <Map key="suggestions"> <Map key="olktimer"> <Val key="numFound" type="long">2</Val> <Val key="startOffset" type="long">0</Val> <Val key="endOffset" type="long">8</Val> <Val key="origFreq" type="long">0</Val> <Seq key="suggestion"> <Seq> <Val>oldtimer</Val> <Val>oldtimern</Val> </Seq> </Seq> </Map> </Map> <Seq key="collations"/> </Map> </Map> </Map> <Val key="_recordid">SearchPipeline-6723e9bd-20fd-450e-bf43-8ee2acdff594</Val> <Seq key="records"/> <Val key="count" type="long">0</Val> <Val key="runtime" type="long">24</Val> </Record> </SearchResult>
Spellcheck The Spellcheck-feature is also know as "did you mean". It tries to "correct" the given query string by looking at the index and calculating the Levenshtein-distance to find similar terms. In the following example the word "Oldtimer" was intentionally written wrong "Olktimer".
<Record> <Val key="query">Olktimer</Val> <Map key="_solr"> <Map key="native"> <Val key="qt">/spell</Val> <Val key="spellcheck">true</Val> <Val key="spellcheck.extendedResults">false</Val> </Map> </Map> </Record>
solr query Icon start=0&rows=10&q=Olktimer&qt=%2Fspell&spellcheck=true&spellcheck.extendedResults=false
By looking at the results, solr shows the expected correction "Oldtimer" and also "oldtimern" which both have a levenshtein-distance of 1 (and therefore are equally "correct" in terms of spellcheck). Solr search result:
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Olktimer</Val> <Map key="_solr"> <Map key="native"> <Val key="qt">/spell</Val> <Val key="spellcheck">true</Val> <Val key="spellcheck.extendedResults">false</Val> </Map> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">47</Val> </Map> <Map key="spellcheck"> <Map key="suggestions"> <Map key="olktimer"> <Val key="numFound" type="long">2</Val> <Val key="startOffset" type="long">0</Val> <Val key="endOffset" type="long">8</Val> <Val key="origFreq" type="long">0</Val> <Seq key="suggestion"> <Seq> <Val>oldtimer</Val> <Val>oldtimern</Val> </Seq> </Seq> </Map> </Map> <Seq key="collations"/> </Map> </Map> </Map> <Val key="_recordid">SearchPipeline-cf8075ec-d549-44a5-97a8-a823691f231e</Val> <Seq key="records"/> <Val key="count" type="long">0</Val> <Val key="runtime" type="long">65</Val> </Record> </SearchResult>
Besides the "standard"-result as shown above, extended results can be achieved using the following search record:
<Record> <Val key="query">Olktimer</Val> <Map key="_solr"> <Map key="native"> <Val key="qt">/spell</Val> <Val key="spellcheck">true</Val> <Val key="spellcheck.extendedResults">true</Val> </Map> </Map> </Record>
solr query Icon start=0&rows=10&q=Olktimer&qt=%2Fspell&spellcheck=true&spellcheck.extendedResults=true
As can be seen the suggestions are now within a map and the different suggestions itself come with the frequency found within the index.
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Olktimer</Val> <Map key="_solr"> <Map key="native"> <Val key="qt">/spell</Val> <Val key="spellcheck">true</Val> <Val key="spellcheck.extendedResults">true</Val> </Map> <Map key="response"> <Map key="responseHeader"> <Val key="status" type="long">0</Val> <Val key="QTime" type="long">16</Val> </Map> <Map key="spellcheck"> <Map key="suggestions"> <Map key="olktimer"> <Val key="numFound" type="long">2</Val> <Val key="startOffset" type="long">0</Val> <Val key="endOffset" type="long">8</Val> <Val key="origFreq" type="long">0</Val> <Seq key="suggestion"> <Map> <Val key="word">oldtimer</Val> <Val key="freq" type="long">11</Val> </Map> <Map> <Val key="word">oldtimern</Val> <Val key="freq" type="long">4</Val> </Map> </Seq> </Map> </Map> <Seq key="collations"/> </Map> </Map> </Map> <Val key="_recordid">SearchPipeline-b0c2fc68-3264-4c25-9ad4-291c34587c8e</Val> <Seq key="records"/> <Val key="count" type="long">0</Val> <Val key="runtime" type="long">29</Val> </Record> </SearchResult>
Please refer to the solr documentation on spellchecking for more information on setup and configuration concerning the spellcheck: https://cwiki.apache.org/confluence/display/solr/Spell+Checking Stats The solr stats component returns statistics for numeric, string, and date fields for the search result. Within the _solr.native maps the property stats has to be true and the stats.field parameter must state a field upon the stats should be calculated.
<Record> <Val key="query">Oldtimer</Val> <Map key="_solr"> <Map key="native"> <Val key="stats">true</Val> <Val key="stats.field">Size</Val> </Map> </Map> <Seq key="resultAttributes"> <Val>_recordid</Val> </Seq> </Record>
solr query Icon start=0&rows=10&fl=_recordid&q=Oldtimer&stats=true&stats.field=Size
<SearchResult> <Workflow>SearchPipeline</Workflow> <Record version="2.0"> <Val key="query">Oldtimer</Val> <Map key="_solr"> <Map key="native"> <Val key="stats">true</Val> <Val key="stats.field">Size</Val> </Map> <Map key="response"> <Val key="status" type="long">0</Val> <Val key="qTime" type="long">0</Val> <Map key="stats"> <Map key="Size"> <Val key="min" type="double">20992.0</Val> <Val key="max" type="double">3461120.0</Val> <Val key="count" type="long">11</Val> <Val key="missing" type="long">0</Val> <Val key="sum" type="double">4209145.0</Val> <Val key="mean" type="double">382649.54545454547</Val> <Val key="stddev" type="double">1022562.5018264032</Val> </Map> </Map> </Map> [...] </Record> </SearchResult>
Further details on the stats feature can be found in the solr wiki at: https://cwiki.apache.org/confluence/display/solr/The+Stats+Component Logging There are three types of logging the query entered. The SolrSearchPipelet allows to decide whether no logging, smila, solr or both querys to be logged. By default no logging of the query is activated. Here is an example for logging both smila and solr style query:
<Record> <Val key="query">Oldtimer</Val> <Map key="_solr"> <Map key="search"> <Val key="serverName">collection1</Val> <Val key="queryLogging">BOTH</Val> </Map> </Map> </Record>
Note: By adding the search Map to the _solr Map the defaults of the Map will be overwritten causing to at least state the serverName, so the SolrSearchPipelet knows in which index to search into.
Solr Administration Handler
The SMILA Solr Administration Handler consists of two parts. The first one uses the collections API of solr and has to be used if solr is running as cloud. The second one uses the cores API of solr and has to be used if solr is running stand-alone or embedded. Both are implemented as REST handlers.
Most of the calls need parameters like ?name=collection1
Using just the base URL with HTTP GET method will return all possible calls.
The actual commands have to be sent as HTTP POST.
The examplary parameters are only the mandatory, if not stated otherwise. For full information please look into the solr wiki links.
Collections API
The base URL is: http://localhost:8080/solr/admin/collections/COMMAND/.
COMMAND | Description | Parameters | Solr Wiki Link |
---|---|---|---|
CREATE | Creates a new collection. | ?name=newCollection | [1] |
DELETE | Deletes a existing collection. | ?name=collectionName | [2] |
RELOAD | Reloads a collection if you have changed a configuration in ZooKeeper. | ?name=collectionName | [3] |
CREATEALIAS | Creates a new alias pointing to one or more collections. | ?name=aliasName&collections=collection1,collection2 | [4] |
DELETEALIAS | Deletes an alias. | ?name=aliasName | [5] |
SPLITSHARD | Splits a shard into two pieces. | ?collection=collectionName&shard=shardName | [6] |
DELETESHARD | Delets inactice shard. | ?collection=collectionName&shard=shardName | [7] |
CREATESHARD | Creates a new shard. | ?collection=collectionName&shard=shardName | [8] |
DELETEREPLICA | Delets a replica from a given collection and shard. | ?collection=collectionName&shard=shardName&replica=replicaName | [9] |
MIGRATE | Migrates all documents from one collection into another. | ?collection=collectionName&target.collection=targetNamej&split.key=split | [10] |
ADDROLE | Assigns a role to a node in the cluser. | ?role=roleName&node=nodeName | [11] |
REMOVEROLE | Removes an assigned role from a node. | ?role=roleName&node=nodeName | [12] |
CLUSTERPROP | Adds, edits or deletes properties of a cluster. | ?name=propertyName&val=propertyValue | [13] |
REQUESTSTATUS | Requests the status of an already submitted Collections API Call. | ?requestid=1000 | [14] |
ADDREPLICA | Adds a replica to a shard in a collection. | ?collection=collectionName | [15] |
OVERSEERSTATUS | Returns the status of the overseer. | none | [16] |
LIST | Returns a list of all collections in the cluster. | none | [17] |
CLUSTERSTATUS | Returns the status of a cluster. | (all optional) ?collection=collectionName&shard=shardName | [18] |
CoreAdmin API
The base URL is: http://localhost:8080/solr/admin/cores/COMMAND
COMMAND | Description | Parameters | Solr Wiki Link |
---|---|---|---|
STATUS | Returns the status of one/all core(s). | (all optional) ?core=coreName&indexInfo=true | [19] |
UNLOAD | Removes the given core from solr. | ?core=name | [20] |
RELOAD | Reloads the configuration of an existing solr core. | ?core=name | [21] |
CREATE | Creates a new core and loads it into solr. | ?name=coreName&instanceDir=dir/to/solr/core | [22] |
SWAP | Swaps the name of the solr cores. | ?core=core1&other=core2 | [23] |
RENAME | Changes the name of a solr core. | ?core=coreName&other=newName | [24] |
MERGEINDEXES | Merges the content of an index into another index. | ?core=coreName | [25] |
SPLIT | Splits an index into two or more indexes. | ?core=coreName | [26] |
REQUESTSTATUS | Returns the status of an already submitted CoreAdmin API call. | ?requestid=1000 | [27] |
Update
SolrDocumentConverter
The SolrDocumentConverter takes a Record or a List of Records as well as the option for mappings within an AnyMap and converts it to a SolrDocument for further processing with solrj.
Simple document conversion
This is maybe the simplest example for the conversion of the SolrDocumentConverter.
Using the following method:
- public SolrInputDocument toSolrDocument(final Record record)
Record:
<?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">simple</Val> <Val key="title">simple-title</Val> </Record>
Converted SolrInputDocument:
SolrInputDocument(fields: [_recordid=simple, title=simple-title])
Note: No attachments are converted (Of course the shown records has no attachments, but even if it had, none were converted). Please look into the next section if you like to also convert attachments.
Attachment: If the values of attachments should be converted also the following method must be used, setting the attachment boolean parameter to true:
- public SolrInputDocument toSolrDocument(final Record record, final boolean attachments)
Record:
<?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">recordid</Val> <Val key="title">Record-with-attachment</Val> <Attachment>Content</Attachment> </Record>
Converted SolrInputDocument:
SolrInputDocument(fields: [_recordid=recordid, title=Record-with-attachment, Content=If ever there is tomorrow when we're not together.. there is something you must always remember. you are braver than you believe, stronger than you seem, and smarter than you think. but the most important thing is, even if we're apart.. i'll always be with you.])Sol
Mapping
The SolrDocumentConverter has the ability to map record-field-names to SolrDocument-field-names using an AnyMap as mapping-map. One can use the following methods with mapping:
- public SolrInputDocument toSolrDocument(final Record record, final AnyMap mapping)
- public SolrInputDocument toSolrDocument(final Record record, final AnyMap mapping, final List<Record> children)
Sample Record (xml):
<?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">mapping</Val> <Val key="title">mapping-title</Val> <Val key="description">mapping-description</Val> <Val key="other">other-contetn</Val> <Val key="toAttachment">attachment-value</Val> <Attachment>Content</Attachment> </Record>
Sample Mapping (json):
{ "title" : "Title", "description" : "", "other" : [ "title2", "title3" ], "Content" : { "fieldName" : "Attachment", "fieldBoost" : 23.0, "type" : "ATTACHMENT" }
}
Sample SolrInputDocument-output:
SolrInputDocument(fields: [_recordid=mapping, Title=mapping-title, description=mapping-description, title2=other-contetn, title3=other-contetn, Attachment(23.0)=All your base are belong to us!])
As you can see three different kinds of mappings are possible:
- Simple Key-Value, meaning "source-field-name":""target-field-name"
- Mapping one source to many targets: "source-field-name": [ "target-field-name1", "target-field-name2" ... ]
- Mapping one source to one target field while adjusting fieldBoost:
"source-field-name": { "fieldName":"target-field-name", "fieldBoost":<FLOAT>, "type": "ATTACHMENT" or "ATTRIBUTE" }
Note: type defaults to ATTRIBUTE. Meaning it has to be stated if the source field is ATTACHMENT and can be left out if the source field is ATTRIBUTE.
Children
A new feature of solr 4 is the possibility to have direct relationships between records. To use this feature those documents have to be processed at once with thier parent-children relationship.
Use these methods:
- public SolrInputDocument toSolrDocument(final Record record, final List<Record> children)
- public SolrInputDocument toSolrDocument(final Record record, final boolean attachments, final List<Record> children)
- public SolrInputDocument toSolrDocument(final Record record, final AnyMap mapping, final List<Record> children)
As you can see children can be combined with mapping and attachments. Note that the mapping is used - if available - for both parent and childrens. Is is also true for the attachments.
Example for the combination of attachments, childrens and mapping:
Parent:
<?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">parent</Val> <Val key="title">Parent</Val> <Attachment>Content</Attachment> </Record>
Children:
<?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">child1</Val> <Val key="title">Children1</Val> <Val key="desc">text</Val> <Attachment>Content</Attachment> </Record> <?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">child2</Val> <Val key="title">Children2</Val> <Attachment>Content</Attachment> </Record> <?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">child3</Val> <Val key="title">Children3</Val> <Attachment>Content</Attachment> </Record>
Mapping:
{ "title" : "Title", "description" : "", "other" : [ "title2", "title3" ], "Content" : { "fieldName" : "Attachment", "fieldBoost" : 23.0, "type" : "ATTACHMENT" } }
Result SolrInputDocument:
SolrInputDocument(fields: [_recordid=parent, Title=Parent, Attachment(23.0)=parentsContent], children: [SolrInputDocument(fields: [_recordid=child1, Title=Children1, Attachment(23.0)=child1sContent]), SolrInputDocument(fields: [_recordid=child2, Title=Children2, Attachment(23.0)=child2sContent]), SolrInputDocument(fields: [_recordid=child3, Title=Children3, Attachment(23.0)=child3sContent])])
Document Boost
Documents that should generally be higher rated than others can have a boost at the document-level. To do so a map with the key "_solr" has to be added to the metadata-map of the record. This "_solr"-map must contain another map with the key "importing" which then has a key "documentBoost" of which value contains the boost-value. For example:
Record:
<?xml version='1.0' encoding='utf-8'?> <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">param</Val> <Val key="title">Parameters</Val> <Map key="_solr"> <Map key="importing"> <Val key="documentBoost" type="double">23.0</Val> </Map> </Map> </Record>
Result:
SolrInputDocument(fields: [_recordid=param, title=Parameters]) DocumentBoost:23.0
SolrUpdatePipelet
The SolrUpdatePipelet uses the same configuration-fallback-logic that is described in Parameters.
As with all new Pipelets the SolrUpdatePipelet can be configured to either drop or fail if one record induces an error. If _dropOnError is not stated it defaults to false.
The Pipelet has also the ability to process several records at once, reducing Pipelet instantiation and therefore increasing processing speed. Just state the configuration parameter processAsBunch with value true in the Pipeletconfiguration.
<extensionActivity> <proc:invokePipelet name="SolrUpdatePipelet"> <proc:pipelet class="org.eclipse.smila.solr.update.SolrUpdatePipelet" /> <proc:variables input="request" output="request" /> <proc:configuration> <rec:Map key="update"> <rec:Val key="indexName">collection1</rec:Val> <rec:Val key="operation">ADD</rec:Val> <rec:Val key="commitWithinMs">60000</rec:Val> <rec:Val key="processAsBunch">true</rec:Val> <rec:Map key="mapping"> <rec:Val key="_source"></rec:Val> <rec:Val key="Path"></rec:Val> <rec:Val key="Url"></rec:Val> <rec:Val key="Filename"></rec:Val> <rec:Val key="MimeType"></rec:Val> <rec:Val key="Size"></rec:Val> <rec:Val key="LastModifiedDate"></rec:Val> <rec:Val key="Content"></rec:Val> <rec:Val key="Extension"></rec:Val> <rec:Val key="Title"></rec:Val> <rec:Val key="Author"></rec:Val> </rec:Map> </rec:Map> </proc:configuration> </proc:invokePipelet> </extensionActivity>
The value of the parameter indexName describes the core or collection of the solr-Server in which the records should be indexed.
The value of the parameter operation describes if the records should be added, deleted or updated by using the keywords: ADD, DELETE, UPDATE.
The value of the parameter commitWithinMs states the maximum time in milliseconds between two commits.
The value of the parameter processAsBunch enables the pipelet to put all records within a bunch at once to the solr server. Defaults to false.
Assignments of record-fields to solr-fields can be done with the Map mapping. The keys in the Map describe the fields of the record and the values the solr-field the content should be assigned to. If no value is given (as in the example) the record-fields and solr-fields will have the same name. Only fields stated in the Map will be indexed.