組み込みアナライザーのリファレンス - ストップ（Stop）

ストップアナライザー
例の出力
設定
例の設定
定義

ストップアナライザー

stop アナライザーは simple アナライザーと同じですが、ストップワードを削除するサポートが追加されています。デフォルトでは _english_ ストップワードを使用します。

例の出力

Python

resp = client.indices.analyze(
   analyzer="stop",
   text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   analyzer: 'stop',
   text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
  }
)
puts response

Js

const response = await client.indices.analyze({
  analyzer: "stop",
  text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.",
});
console.log(response);

コンソール

POST _analyze
{
  "analyzer": "stop",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

上記の文は次の用語を生成します:

テキスト

[ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]

設定

stop アナライザーは次のパラメーターを受け入れます:


`stopwords`	`_english_` のような事前定義されたストップワードリストまたはストップワードのリストを含む配列。デフォルトは `_english_` です。
`stopwords_path`	ストップワードを含むファイルへのパス。このパスは Elasticsearch `config` ディレクトリに対して相対的です。

ストップワードの設定に関する詳細はストップトークンフィルターを参照してください。

例の設定

この例では、stop アナライザーを使用して指定された単語のリストをストップワードとして設定します:

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "my_stop_analyzer": {
   "type": "stop",
   "stopwords": [
   "the",
   "over"
   ]
   }
   }
   }
   },
)
print(resp)
resp1 = client.indices.analyze(
   index="my-index-000001",
   analyzer="my_stop_analyzer",
   text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.",
)
print(resp1)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   my_stop_analyzer: {
   type: 'stop',
   stopwords: [
   'the',
   'over'
   ]
   }
   }
   }
   }
  }
)
puts response
response = client.indices.analyze(
  index: 'my-index-000001',
  body: {
   analyzer: 'my_stop_analyzer',
   text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   my_stop_analyzer: {
   type: "stop",
   stopwords: ["the", "over"],
   },
   },
   },
  },
});
console.log(response);
const response1 = await client.indices.analyze({
  index: "my-index-000001",
  analyzer: "my_stop_analyzer",
  text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.",
});
console.log(response1);

コンソール

PUT my-index-000001
{
  "settings": {
   "analysis": {
   "analyzer": {
   "my_stop_analyzer": {
   "type": "stop",
   "stopwords": ["the", "over"]
   }
   }
   }
  }
}
POST my-index-000001/_analyze
{
  "analyzer": "my_stop_analyzer",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

上記の例は次の用語を生成します:

テキスト

[ quick, brown, foxes, jumped, lazy, dog, s, bone ]

定義

それは次のように構成されています:

トークナイザー
- 小文字トークナイザー
トークンフィルター
- ストップトークンフィルター

stop アナライザーを設定パラメーターを超えてカスタマイズする必要がある場合は、custom アナライザーとして再作成し、通常はトークンフィルターを追加して修正する必要があります。これにより、組み込みの stop アナライザーが再作成され、さらなるカスタマイズの出発点として使用できます:

Python

resp = client.indices.create(
   index="stop_example",
   settings={
   "analysis": {
   "filter": {
   "english_stop": {
   "type": "stop",
   "stopwords": "_english_"
   }
   },
   "analyzer": {
   "rebuilt_stop": {
   "tokenizer": "lowercase",
   "filter": [
   "english_stop"
   ]
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'stop_example',
  body: {
   settings: {
   analysis: {
   filter: {
   english_stop: {
   type: 'stop',
   stopwords: '_english_'
   }
   },
   analyzer: {
   rebuilt_stop: {
   tokenizer: 'lowercase',
   filter: [
   'english_stop'
   ]
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "stop_example",
  settings: {
   analysis: {
   filter: {
   english_stop: {
   type: "stop",
   stopwords: "_english_",
   },
   },
   analyzer: {
   rebuilt_stop: {
   tokenizer: "lowercase",
   filter: ["english_stop"],
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT /stop_example
{
  "settings": {
   "analysis": {
   "filter": {
   "english_stop": {
   "type":       "stop",
   "stopwords":  "_english_"
   }
   },
   "analyzer": {
   "rebuilt_stop": {
   "tokenizer": "lowercase",
   "filter": [
   "english_stop"
   ]
   }
   }
   }
  }
}


	デフォルトのストップワードは `stopwords` または `stopwords_path` パラメーターで上書きできます。
	`english_stop` の後に任意のトークンフィルターを追加します。