トークンフィルタリファレンス - トークン数の制限（Limit token count）

トークン数制限フィルター
設定可能なパラメータ
例
アナライザーに追加
カスタマイズ

トークン数制限フィルター

出力トークンの数を制限します。limitフィルターは、トークン数に基づいてドキュメントフィールド値のサイズを制限するために一般的に使用されます。

デフォルトでは、limitフィルターはストリーム内の最初のトークンのみを保持します。たとえば、フィルターはトークンストリーム[ one, two, three ]を[ one ]に変更できます。

このフィルターはLuceneのLimitTokenCountFilterを使用します。

If you want to limit the size of field values based on
_character length_, use the <<ignore-above,`ignore_above`>> mapping parameter.

設定可能なパラメータ

max_token_count
（オプション、整数）保持する最大トークン数。この制限に達すると、残りのトークンは出力から除外されます。デフォルトは1です。
consume_all_tokens
（オプション、ブール値）trueの場合、limitフィルターはトークンストリームを使い果たします。max_token_countにすでに達していてもです。デフォルトはfalseです。

例

次のanalyze APIリクエストは、limitフィルターを使用してquick fox jumps over lazy dog内の最初の2つのトークンのみを保持します:

Python

resp = client.indices.analyze(
   tokenizer="standard",
   filter=[
   {
   "type": "limit",
   "max_token_count": 2
   }
   ],
   text="quick fox jumps over lazy dog",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'standard',
   filter: [
   {
   type: 'limit',
   max_token_count: 2
   }
   ],
   text: 'quick fox jumps over lazy dog'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: [
   {
   type: "limit",
   max_token_count: 2,
   },
  ],
  text: "quick fox jumps over lazy dog",
});
console.log(response);

コンソール

GET _analyze
{
  "tokenizer": "standard",
   "filter": [
   {
   "type": "limit",
   "max_token_count": 2
   }
  ],
  "text": "quick fox jumps over lazy dog"
}

フィルターは次のトークンを生成します:

テキスト

[ quick, fox ]

アナライザーに追加

次のcreate index APIリクエストは、limitフィルターを使用して新しいカスタムアナライザーを構成します。

Python

resp = client.indices.create(
   index="limit_example",
   settings={
   "analysis": {
   "analyzer": {
   "standard_one_token_limit": {
   "tokenizer": "standard",
   "filter": [
   "limit"
   ]
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'limit_example',
  body: {
   settings: {
   analysis: {
   analyzer: {
   standard_one_token_limit: {
   tokenizer: 'standard',
   filter: [
   'limit'
   ]
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "limit_example",
  settings: {
   analysis: {
   analyzer: {
   standard_one_token_limit: {
   tokenizer: "standard",
   filter: ["limit"],
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT limit_example
{
  "settings": {
   "analysis": {
   "analyzer": {
   "standard_one_token_limit": {
   "tokenizer": "standard",
   "filter": [ "limit" ]
   }
   }
   }
  }
}

カスタマイズ


たとえば、次のリクエストは、ストリームの最初の5つのトークンのみを保持するカスタム`````limit`````フィルターを作成します:
#### Python
``````python
resp = client.indices.create(
   index="custom_limit_example",
   settings={
   "analysis": {
   "analyzer": {
   "whitespace_five_token_limit": {
   "tokenizer": "whitespace",
   "filter": [
   "five_token_limit"
   ]
   }
   },
   "filter": {
   "five_token_limit": {
   "type": "limit",
   "max_token_count": 5
   }
   }
   }
   },
)
print(resp)
`

Ruby

response = client.indices.create(
  index: 'custom_limit_example',
  body: {
   settings: {
   analysis: {
   analyzer: {
   whitespace_five_token_limit: {
   tokenizer: 'whitespace',
   filter: [
   'five_token_limit'
   ]
   }
   },
   filter: {
   five_token_limit: {
   type: 'limit',
   max_token_count: 5
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "custom_limit_example",
  settings: {
   analysis: {
   analyzer: {
   whitespace_five_token_limit: {
   tokenizer: "whitespace",
   filter: ["five_token_limit"],
   },
   },
   filter: {
   five_token_limit: {
   type: "limit",
   max_token_count: 5,
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT custom_limit_example
{
  "settings": {
   "analysis": {
   "analyzer": {
   "whitespace_five_token_limit": {
   "tokenizer": "whitespace",
   "filter": [ "five_token_limit" ]
   }
   },
   "filter": {
   "five_token_limit": {
   "type": "limit",
   "max_token_count": 5
   }
   }
   }
  }
}