トークンフィルタリファレンス - 一般的なn-gram（Common grams）

一般的なグラムトークンフィルター
例
アナライザーに追加
設定可能なパラメータ
カスタマイズ

一般的なグラムトークンフィルター

指定された一般的な単語のセットに対してbigramsを生成します。

例えば、isとtheを一般的な単語として指定できます。このフィルターは、トークン[the, quick, fox, is, brown]を[the, the_quick, quick, fox, fox_is, is, is_brown, brown]に変換します。

一般的な単語を完全に無視したくない場合は、ストップトークンフィルターの代わりにcommon_gramsフィルターを使用できます。

このフィルターはLuceneのCommonGramsFilterを使用します。

例

次のanalyze APIリクエストは、isとtheのためにbigramsを作成します:

Python

resp = client.indices.analyze(
   tokenizer="whitespace",
   filter=[
   {
   "type": "common_grams",
   "common_words": [
   "is",
   "the"
   ]
   }
   ],
   text="the quick fox is brown",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'whitespace',
   filter: [
   {
   type: 'common_grams',
   common_words: [
   'is',
   'the'
   ]
   }
   ],
   text: 'the quick fox is brown'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "whitespace",
  filter: [
   {
   type: "common_grams",
   common_words: ["is", "the"],
   },
  ],
  text: "the quick fox is brown",
});
console.log(response);

コンソール

GET /_analyze
{
  "tokenizer" : "whitespace",
  "filter" : [
   {
   "type": "common_grams",
   "common_words": ["is", "the"]
   }
  ],
  "text" : "the quick fox is brown"
}

フィルターは次のトークンを生成します:

テキスト

[ the, the_quick, quick, fox, fox_is, is, is_brown, brown ]

アナライザーに追加

次のcreate index APIリクエストは、common_gramsフィルターを使用して新しいカスタムアナライザーを構成します:

Python

resp = client.indices.create(
   index="common_grams_example",
   settings={
   "analysis": {
   "analyzer": {
   "index_grams": {
   "tokenizer": "whitespace",
   "filter": [
   "common_grams"
   ]
   }
   },
   "filter": {
   "common_grams": {
   "type": "common_grams",
   "common_words": [
   "a",
   "is",
   "the"
   ]
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'common_grams_example',
  body: {
   settings: {
   analysis: {
   analyzer: {
   index_grams: {
   tokenizer: 'whitespace',
   filter: [
   'common_grams'
   ]
   }
   },
   filter: {
   common_grams: {
   type: 'common_grams',
   common_words: [
   'a',
   'is',
   'the'
   ]
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "common_grams_example",
  settings: {
   analysis: {
   analyzer: {
   index_grams: {
   tokenizer: "whitespace",
   filter: ["common_grams"],
   },
   },
   filter: {
   common_grams: {
   type: "common_grams",
   common_words: ["a", "is", "the"],
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT /common_grams_example
{
  "settings": {
   "analysis": {
   "analyzer": {
   "index_grams": {
   "tokenizer": "whitespace",
   "filter": [ "common_grams" ]
   }
   },
   "filter": {
   "common_grams": {
   "type": "common_grams",
   "common_words": [ "a", "is", "the" ]
   }
   }
   }
  }
}

設定可能なパラメータ

common_words
(必須*, 文字列の配列) トークンのリスト。フィルターはこれらのトークンに対してbigramsを生成します。
このパラメータまたはcommon_words_pathパラメータのいずれかが必要です。
common_words_path
(必須*, 文字列) トークンのリストを含むファイルへのパス。フィルターはこれらのトークンに対してbigramsを生成します。
このパスは絶対パスまたはconfigの場所に対する相対パスでなければなりません。ファイルはUTF-8エンコードされている必要があります。ファイル内の各トークンは改行で区切られている必要があります。
このパラメータまたはcommon_wordsパラメータのいずれかが必要です。
ignore_case
(オプション, ブール) trueの場合、一般的な単語の一致は大文字と小文字を区別しません。デフォルトはfalseです。
query_mode
(オプション, ブール) trueの場合、フィルターは出力から次のトークンを除外します:
- 一般的な単語のユニグラム
- 一般的な単語に続く用語のユニグラム
  デフォルトはfalseです。このパラメータは検索アナライザーに対して有効にすることをお勧めします。
  例えば、このパラメータを有効にし、isとtheを一般的な単語として指定できます。このフィルターはトークン[the, quick, fox, is, brown]を[the_quick, quick, fox_is, is_brown,]に変換します。

カスタマイズ


例えば、次のリクエストは`````common_grams`````フィルターを`````ignore_case`````と`````query_mode`````を`````true`````に設定して作成します:
#### Python
``````python
resp = client.indices.create(
   index="common_grams_example",
   settings={
   "analysis": {
   "analyzer": {
   "index_grams": {
   "tokenizer": "whitespace",
   "filter": [
   "common_grams_query"
   ]
   }
   },
   "filter": {
   "common_grams_query": {
   "type": "common_grams",
   "common_words": [
   "a",
   "is",
   "the"
   ],
   "ignore_case": True,
   "query_mode": True
   }
   }
   }
   },
)
print(resp)
`

Ruby

response = client.indices.create(
  index: 'common_grams_example',
  body: {
   settings: {
   analysis: {
   analyzer: {
   index_grams: {
   tokenizer: 'whitespace',
   filter: [
   'common_grams_query'
   ]
   }
   },
   filter: {
   common_grams_query: {
   type: 'common_grams',
   common_words: [
   'a',
   'is',
   'the'
   ],
   ignore_case: true,
   query_mode: true
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "common_grams_example",
  settings: {
   analysis: {
   analyzer: {
   index_grams: {
   tokenizer: "whitespace",
   filter: ["common_grams_query"],
   },
   },
   filter: {
   common_grams_query: {
   type: "common_grams",
   common_words: ["a", "is", "the"],
   ignore_case: true,
   query_mode: true,
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT /common_grams_example
{
  "settings": {
   "analysis": {
   "analyzer": {
   "index_grams": {
   "tokenizer": "whitespace",
   "filter": [ "common_grams_query" ]
   }
   },
   "filter": {
   "common_grams_query": {
   "type": "common_grams",
   "common_words": [ "a", "is", "the" ],
   "ignore_case": true,
   "query_mode": true
   }
   }
   }
  }
}