トークンフィルタリファレンス - 単語を保持（Keep words）

キーワードトークンフィルター
例
設定可能なパラメータ
アナライザーのカスタマイズと追加
- Js
- コンソール

キーワードトークンフィルター

指定された単語リストに含まれるトークンのみを保持します。

このフィルターはLuceneのKeepWordFilterを使用します。

トークンストリームから単語のリストを削除するには、stopフィルターを使用します。

例

以下のanalyze APIリクエストは、keepフィルターを使用して、foxおよびdogトークンのみをthe quick fox jumps over the lazy dogから保持します。

Python

resp = client.indices.analyze(
   tokenizer="whitespace",
   filter=[
   {
   "type": "keep",
   "keep_words": [
   "dog",
   "elephant",
   "fox"
   ]
   }
   ],
   text="the quick fox jumps over the lazy dog",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'whitespace',
   filter: [
   {
   type: 'keep',
   keep_words: [
   'dog',
   'elephant',
   'fox'
   ]
   }
   ],
   text: 'the quick fox jumps over the lazy dog'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "whitespace",
  filter: [
   {
   type: "keep",
   keep_words: ["dog", "elephant", "fox"],
   },
  ],
  text: "the quick fox jumps over the lazy dog",
});
console.log(response);

コンソール

GET _analyze
{
  "tokenizer": "whitespace",
  "filter": [
   {
   "type": "keep",
   "keep_words": [ "dog", "elephant", "fox" ]
   }
  ],
  "text": "the quick fox jumps over the lazy dog"
}

フィルターは次のトークンを生成します:

テキスト

[ fox, dog ]

設定可能なパラメータ

keep_words
(必須*, 文字列の配列) 保持する単語のリスト。このリスト内の単語と一致するトークンのみが出力に含まれます。
このパラメータまたはkeep_words_pathのいずれかを指定する必要があります。
keep_words_path
(必須*, 文字列の配列) 保持する単語のリストを含むファイルへのパス。このリスト内の単語と一致するトークンのみが出力に含まれます。
このパスは絶対パスまたはconfigの場所に対する相対パスでなければならず、ファイルはUTF-8エンコードされている必要があります。ファイル内の各単語は改行で区切られている必要があります。
このパラメータまたはkeep_wordsのいずれかを指定する必要があります。
keep_words_case
(オプション, ブール値) trueの場合、すべての保持単語を小文字にします。デフォルトはfalseです。

アナライザーのカスタマイズと追加


たとえば、以下の[create index API](/read/elasticsearch-8-15/b5c127aabf881d48.md)リクエストは、カスタム`````keep`````フィルターを使用して2つの新しい[カスタムアナライザー](/read/elasticsearch-8-15/f8c7123dddb484d0.md)を構成します:  
-  `````standard_keep_word_array`````、これはインライン配列の保持単語を持つカスタム`````keep`````フィルターを使用します  
-  `````standard_keep_word_file`````、これは保持単語ファイルを持つカスタム`````keep`````フィルターを使用します
#### Python
``````python
resp = client.indices.create(
   index="keep_words_example",
   settings={
   "analysis": {
   "analyzer": {
   "standard_keep_word_array": {
   "tokenizer": "standard",
   "filter": [
   "keep_word_array"
   ]
   },
   "standard_keep_word_file": {
   "tokenizer": "standard",
   "filter": [
   "keep_word_file"
   ]
   }
   },
   "filter": {
   "keep_word_array": {
   "type": "keep",
   "keep_words": [
   "one",
   "two",
   "three"
   ]
   },
   "keep_word_file": {
   "type": "keep",
   "keep_words_path": "analysis/example_word_list.txt"
   }
   }
   }
   },
)
print(resp)
`

Js

const response = await client.indices.create({
  index: "keep_words_example",
  settings: {
   analysis: {
   analyzer: {
   standard_keep_word_array: {
   tokenizer: "standard",
   filter: ["keep_word_array"],
   },
   standard_keep_word_file: {
   tokenizer: "standard",
   filter: ["keep_word_file"],
   },
   },
   filter: {
   keep_word_array: {
   type: "keep",
   keep_words: ["one", "two", "three"],
   },
   keep_word_file: {
   type: "keep",
   keep_words_path: "analysis/example_word_list.txt",
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT keep_words_example
{
  "settings": {
   "analysis": {
   "analyzer": {
   "standard_keep_word_array": {
   "tokenizer": "standard",
   "filter": [ "keep_word_array" ]
   },
   "standard_keep_word_file": {
   "tokenizer": "standard",
   "filter": [ "keep_word_file" ]
   }
   },
   "filter": {
   "keep_word_array": {
   "type": "keep",
   "keep_words": [ "one", "two", "three" ]
   },
   "keep_word_file": {
   "type": "keep",
   "keep_words_path": "analysis/example_word_list.txt"
   }
   }
   }
  }
}