トークンフィルタリファレンス - n-gram（N-gram） - 《Elasticsearchガイドv8.15》日本語

N-gram token filter
- Ruby
- Js
- Console
- Text
Add to an analyzer
Configurable parameters
Customize

N-gram token filter

指定された長さのn-gramsをトークンから形成します。

例えば、ngramトークンフィルターを使用してfoxを[ f, fo, o, ox, x ]に変更できます。

このフィルターはLuceneのNGramTokenFilterを使用します。


## Example
以下の[analyze API](/read/elasticsearch-8-15/1a51b9d359d8a54c.md)リクエストは、`````ngram`````フィルターを使用して`````Quick fox`````を1文字および2文字のn-gramsに変換します:
#### Python
``````python
resp = client.indices.analyze(
   tokenizer="standard",
   filter=[
   "ngram"
   ],
   text="Quick fox",
)
print(resp)
`

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'standard',
   filter: [
   'ngram'
   ],
   text: 'Quick fox'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: ["ngram"],
  text: "Quick fox",
});
console.log(response);

Console

GET _analyze
{
  "tokenizer": "standard",
  "filter": [ "ngram" ],
  "text": "Quick fox"
}

フィルターは以下のトークンを生成します:

Text

[ Q, Qu, u, ui, i, ic, c, ck, k, f, fo, o, ox, x ]

Add to an analyzer

以下のcreate index APIリクエストは、ngramフィルターを使用して新しいcustom analyzerを構成します。

Python

resp = client.indices.create(
   index="ngram_example",
   settings={
   "analysis": {
   "analyzer": {
   "standard_ngram": {
   "tokenizer": "standard",
   "filter": [
   "ngram"
   ]
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'ngram_example',
  body: {
   settings: {
   analysis: {
   analyzer: {
   standard_ngram: {
   tokenizer: 'standard',
   filter: [
   'ngram'
   ]
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "ngram_example",
  settings: {
   analysis: {
   analyzer: {
   standard_ngram: {
   tokenizer: "standard",
   filter: ["ngram"],
   },
   },
   },
  },
});
console.log(response);

Console

PUT ngram_example
{
  "settings": {
   "analysis": {
   "analyzer": {
   "standard_ngram": {
   "tokenizer": "standard",
   "filter": [ "ngram" ]
   }
   }
   }
  }
}

Configurable parameters

max_gram
（オプション、整数）グラム内の文字の最大長。デフォルトは2です。
min_gram
（オプション、整数）グラム内の文字の最小長。デフォルトは1です。
preserve_original
（オプション、ブール値）trueに設定すると元のトークンを出力します。デフォルトはfalseです。

index.max_ngram_diffインデックスレベル設定を使用して、max_gramとmin_gramの値の間の最大許可差を制御できます。

Customize


例えば、以下のリクエストは、3〜5文字のn-gramsを形成するカスタム`````ngram`````フィルターを作成します。このリクエストは`````index.max_ngram_diff`````設定を`````2`````に増加させます。
#### Python
``````python
resp = client.indices.create(
   index="ngram_custom_example",
   settings={
   "index": {
   "max_ngram_diff": 2
   },
   "analysis": {
   "analyzer": {
   "default": {
   "tokenizer": "whitespace",
   "filter": [
   "3_5_grams"
   ]
   }
   },
   "filter": {
   "3_5_grams": {
   "type": "ngram",
   "min_gram": 3,
   "max_gram": 5
   }
   }
   }
   },
)
print(resp)
`

Ruby

response = client.indices.create(
  index: 'ngram_custom_example',
  body: {
   settings: {
   index: {
   max_ngram_diff: 2
   },
   analysis: {
   analyzer: {
   default: {
   tokenizer: 'whitespace',
   filter: [
   '3_5_grams'
   ]
   }
   },
   filter: {
   "3_5_grams": {
   type: 'ngram',
   min_gram: 3,
   max_gram: 5
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "ngram_custom_example",
  settings: {
   index: {
   max_ngram_diff: 2,
   },
   analysis: {
   analyzer: {
   default: {
   tokenizer: "whitespace",
   filter: ["3_5_grams"],
   },
   },
   filter: {
   "3_5_grams": {
   type: "ngram",
   min_gram: 3,
   max_gram: 5,
   },
   },
   },
  },
});
console.log(response);

Console

PUT ngram_custom_example
{
  "settings": {
   "index": {
   "max_ngram_diff": 2
   },
   "analysis": {
   "analyzer": {
   "default": {
   "tokenizer": "whitespace",
   "filter": [ "3_5_grams" ]
   }
   },
   "filter": {
   "3_5_grams": {
   "type": "ngram",
   "min_gram": 3,
   "max_gram": 5
   }
   }
   }
  }
}