トークンフィルタリファレンス - ストップ（Stop） - 《Elasticsearchガイドv8.15》日本語

ストップトークンフィルター
アナライザーに追加
設定可能なパラメータ
カスタマイズ
言語別のストップワード

ストップトークンフィルター

トークンストリームからストップワードを削除します。

カスタマイズされていない場合、フィルターはデフォルトで以下の英語のストップワードを削除します:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

英語に加えて、stopフィルターはいくつかの言語のための事前定義されたストップワードリストをサポートしています。独自のストップワードを配列またはファイルとして指定することもできます。


## 例
以下の分析APIリクエストは、`````stop`````フィルターを使用して`````a`````と`````the`````のストップワードを`````a quick fox jumps over the lazy dog`````から削除します:
#### Python
``````python
resp = client.indices.analyze(
   tokenizer="standard",
   filter=[
   "stop"
   ],
   text="a quick fox jumps over the lazy dog",
)
print(resp)
`

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'standard',
   filter: [
   'stop'
   ],
   text: 'a quick fox jumps over the lazy dog'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: ["stop"],
  text: "a quick fox jumps over the lazy dog",
});
console.log(response);

コンソール

GET /_analyze
{
  "tokenizer": "standard",
  "filter": [ "stop" ],
  "text": "a quick fox jumps over the lazy dog"
}

フィルターは以下のトークンを生成します:

テキスト

[ quick, fox, jumps, over, lazy, dog ]

アナライザーに追加

以下のインデックス作成APIリクエストは、stopフィルターを使用して新しいカスタムアナライザーを構成します。

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "my_analyzer": {
   "tokenizer": "whitespace",
   "filter": [
   "stop"
   ]
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   tokenizer: 'whitespace',
   filter: [
   'stop'
   ]
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   tokenizer: "whitespace",
   filter: ["stop"],
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT /my-index-000001
{
  "settings": {
   "analysis": {
   "analyzer": {
   "my_analyzer": {
   "tokenizer": "whitespace",
   "filter": [ "stop" ]
   }
   }
   }
  }
}

設定可能なパラメータ

stopwords
(オプション、文字列または文字列の配列) 言語の値、例えば_arabic_または_thai_。デフォルトは_english_。
各言語の値はLuceneの事前定義されたストップワードリストに対応しています。言語別のストップワードを参照して、サポートされている言語の値とそのストップワードを確認してください。
ストップワードの配列も受け付けます。
ストップワードの空のリストには_none_を使用します。
stopwords_path
(オプション、文字列) 削除するストップワードのリストを含むファイルへのパス。
このパスは絶対パスまたはconfigの場所に対する相対パスでなければならず、ファイルはUTF-8エンコードされている必要があります。ファイル内の各ストップワードは改行で区切られている必要があります。
ignore_case
(オプション、ブール値) trueの場合、ストップワードの一致は大文字と小文字を区別しません。例えば、trueの場合、theのストップワードはThe、THE、またはtheを一致させて削除します。デフォルトはfalseです。
remove_trailing
(オプション、ブール値) trueの場合、ストリームの最後のトークンがストップワードであれば削除されます。デフォルトはtrueです。
このパラメータは、補完サジェスターとフィルターを使用する場合はfalseであるべきです。これにより、green aのようなクエリがgreen appleを一致させて提案し、他のストップワードを削除することができます。

カスタマイズ


例えば、以下のリクエストは、[`````_english_`````](352e9dddd26f5c96.md#english-stop-words)ストップワードリストからストップワードを削除するカスタムの大文字小文字を区別しない`````stop`````フィルターを作成します:
#### Python
``````python
resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "default": {
   "tokenizer": "whitespace",
   "filter": [
   "my_custom_stop_words_filter"
   ]
   }
   },
   "filter": {
   "my_custom_stop_words_filter": {
   "type": "stop",
   "ignore_case": True
   }
   }
   }
   },
)
print(resp)
`

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   default: {
   tokenizer: 'whitespace',
   filter: [
   'my_custom_stop_words_filter'
   ]
   }
   },
   filter: {
   my_custom_stop_words_filter: {
   type: 'stop',
   ignore_case: true
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   default: {
   tokenizer: "whitespace",
   filter: ["my_custom_stop_words_filter"],
   },
   },
   filter: {
   my_custom_stop_words_filter: {
   type: "stop",
   ignore_case: true,
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT /my-index-000001
{
  "settings": {
   "analysis": {
   "analyzer": {
   "default": {
   "tokenizer": "whitespace",
   "filter": [ "my_custom_stop_words_filter" ]
   }
   },
   "filter": {
   "my_custom_stop_words_filter": {
   "type": "stop",
   "ignore_case": true
   }
   }
   }
  }
}

独自のストップワードリストを指定することもできます。例えば、以下のリクエストは、and、is、およびtheのストップワードのみを削除するカスタムの大文字小文字を区別しないstopフィルターを作成します:

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "default": {
   "tokenizer": "whitespace",
   "filter": [
   "my_custom_stop_words_filter"
   ]
   }
   },
   "filter": {
   "my_custom_stop_words_filter": {
   "type": "stop",
   "ignore_case": True,
   "stopwords": [
   "and",
   "is",
   "the"
   ]
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   default: {
   tokenizer: 'whitespace',
   filter: [
   'my_custom_stop_words_filter'
   ]
   }
   },
   filter: {
   my_custom_stop_words_filter: {
   type: 'stop',
   ignore_case: true,
   stopwords: [
   'and',
   'is',
   'the'
   ]
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   default: {
   tokenizer: "whitespace",
   filter: ["my_custom_stop_words_filter"],
   },
   },
   filter: {
   my_custom_stop_words_filter: {
   type: "stop",
   ignore_case: true,
   stopwords: ["and", "is", "the"],
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT /my-index-000001
{
  "settings": {
   "analysis": {
   "analyzer": {
   "default": {
   "tokenizer": "whitespace",
   "filter": [ "my_custom_stop_words_filter" ]
   }
   },
   "filter": {
   "my_custom_stop_words_filter": {
   "type": "stop",
   "ignore_case": true,
   "stopwords": [ "and", "is", "the" ]
   }
   }
   }
  }
}

言語別のストップワード

以下のリストは、stopwordsパラメータに対するサポートされている言語の値と、それらの事前定義されたストップワードへのリンクを含んでいます。