Get tokens from text analysis Generally available
All methods and paths for this operation:
The analyze API performs analysis on a text string and returns the resulting tokens.
Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze endpoint without a specified index will always use 10000 as its limit.
Required authorization
- Index privileges:
index
Path parameters
-
Index used to derive the analyzer. If specified, the
analyzeror field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Query parameters
-
Index used to derive the analyzer. If specified, the
analyzeror field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Body Required
-
The name of the analyzer that should be applied to the provided
text. This could be a built-in analyzer, or an analyzer that’s been configured in the index. -
Array of token attributes used to filter the output of the
explainparameter. -
Array of character filters used to preprocess characters before the tokenizer.
External documentation -
If
true, the response includes token attributes and additional details.Default value is
false. -
Field used to derive the analyzer. To use this parameter, you must specify an index. If specified, the
analyzerparameter overrides this value. -
Array of token filters used to apply after the tokenizer.
External documentation -
Normalizer to use to convert text into a single token.
text
string | array[string] Text to analyze. If an array of strings is provided, it is analyzed as a multi-value field.
-
Tokenizer to use to convert text into tokens.
External documentation
GET /_analyze { "analyzer": "standard", "text": "this is a test" } resp = client.indices.analyze( analyzer="standard", text="this is a test", ) const response = await client.indices.analyze({ analyzer: "standard", text: "this is a test", }); response = client.indices.analyze( body: { "analyzer": "standard", "text": "this is a test" } ) $resp = $client->indices()->analyze([ "body" => [ "analyzer" => "standard", "text" => "this is a test", ], ]); curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"analyzer":"standard","text":"this is a test"}' "$ELASTICSEARCH_URL/_analyze" { "analyzer": "standard", "text": "this is a test" } { "analyzer": "standard", "text": [ "this is a test", "the second text" ] } { "tokenizer": "keyword", "filter": [ "lowercase" ], "char_filter": [ "html_strip" ], "text": "this is a <b>test</b>" } { "tokenizer": "whitespace", "filter": [ "lowercase", { "type": "stop", "stopwords": [ "a", "is", "this" ] } ], "text": "this is a test" } { "field": "obj1.field1", "text": "this is a test" } { "normalizer": "my_normalizer", "text": "BaR" } { "tokenizer": "standard", "filter": [ "snowball" ], "text": "detailed output", "explain": true, "attributes": [ "keyword" ] } { "detail": { "custom_analyzer": true, "charfilters": [], "tokenizer": { "name": "standard", "tokens": [ { "token": "detailed", "start_offset": 0, "end_offset": 8, "type": "<ALPHANUM>", "position": 0 }, { "token": "output", "start_offset": 9, "end_offset": 15, "type": "<ALPHANUM>", "position": 1 } ] }, "tokenfilters": [ { "name": "snowball", "tokens": [ { "token": "detail", "start_offset": 0, "end_offset": 8, "type": "<ALPHANUM>", "position": 0, "keyword": false }, { "token": "output", "start_offset": 9, "end_offset": 15, "type": "<ALPHANUM>", "position": 1, "keyword": false } ] } ] } }