マッピングパラメータ - analyzer - 《Elasticsearchガイドv8.15》日本語

analyzer
search_quote_analyzer

analyzer

のみ text フィールドは analyzer マッピングパラメータをサポートします。

analyzer パラメータは、text フィールドのインデックス作成または検索時に使用される analyzer を指定します。

update mapping API マッピングパラメータでオーバーライドされない限り、このアナライザーは index and search analysis の両方に使用されます。 Specify an analyzer を参照してください。

本番環境で使用する前にアナライザーをテストすることをお勧めします。 Test an analyzer を参照してください。

analyzer 設定は、既存のフィールドで update mapping API を使用して更新することはできません。

search_quote_analyzer

search_quote_analyzer 設定を使用すると、フレーズ用のアナライザーを指定できます。これは、フレーズクエリのストップワードを無効にする際に特に便利です。

フレーズのストップワードを無効にするには、次の3つのアナライザー設定を利用するフィールドが必要です：

1. ストップワードを含むすべての用語をインデックス作成するための analyzer 設定
2. ストップワードを削除する非フレーズクエリ用の search_analyzer 設定
3. ストップワードを削除しないフレーズクエリ用の search_quote_analyzer 設定

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "my_analyzer": {
   "type": "custom",
   "tokenizer": "standard",
   "filter": [
   "lowercase"
   ]
   },
   "my_stop_analyzer": {
   "type": "custom",
   "tokenizer": "standard",
   "filter": [
   "lowercase",
   "english_stop"
   ]
   }
   },
   "filter": {
   "english_stop": {
   "type": "stop",
   "stopwords": "_english_"
   }
   }
   }
   },
   mappings={
   "properties": {
   "title": {
   "type": "text",
   "analyzer": "my_analyzer",
   "search_analyzer": "my_stop_analyzer",
   "search_quote_analyzer": "my_analyzer"
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="my-index-000001",
   id="1",
   document={
   "title": "The Quick Brown Fox"
   },
)
print(resp1)
resp2 = client.index(
   index="my-index-000001",
   id="2",
   document={
   "title": "A Quick Brown Fox"
   },
)
print(resp2)
resp3 = client.search(
   index="my-index-000001",
   query={
   "query_string": {
   "query": "\"the quick brown fox\""
   }
   },
)
print(resp3)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   type: 'custom',
   tokenizer: 'standard',
   filter: [
   'lowercase'
   ]
   },
   my_stop_analyzer: {
   type: 'custom',
   tokenizer: 'standard',
   filter: [
   'lowercase',
   'english_stop'
   ]
   }
   },
   filter: {
   english_stop: {
   type: 'stop',
   stopwords: '_english_'
   }
   }
   }
   },
   mappings: {
   properties: {
   title: {
   type: 'text',
   analyzer: 'my_analyzer',
   search_analyzer: 'my_stop_analyzer',
   search_quote_analyzer: 'my_analyzer'
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
   title: 'The Quick Brown Fox'
  }
)
puts response
response = client.index(
  index: 'my-index-000001',
  id: 2,
  body: {
   title: 'A Quick Brown Fox'
  }
)
puts response
response = client.search(
  index: 'my-index-000001',
  body: {
   query: {
   query_string: {
   query: '"the quick brown fox"'
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   type: "custom",
   tokenizer: "standard",
   filter: ["lowercase"],
   },
   my_stop_analyzer: {
   type: "custom",
   tokenizer: "standard",
   filter: ["lowercase", "english_stop"],
   },
   },
   filter: {
   english_stop: {
   type: "stop",
   stopwords: "_english_",
   },
   },
   },
  },
  mappings: {
   properties: {
   title: {
   type: "text",
   analyzer: "my_analyzer",
   search_analyzer: "my_stop_analyzer",
   search_quote_analyzer: "my_analyzer",
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "my-index-000001",
  id: 1,
  document: {
   title: "The Quick Brown Fox",
  },
});
console.log(response1);
const response2 = await client.index({
  index: "my-index-000001",
  id: 2,
  document: {
   title: "A Quick Brown Fox",
  },
});
console.log(response2);
const response3 = await client.search({
  index: "my-index-000001",
  query: {
   query_string: {
   query: '"the quick brown fox"',
   },
  },
});
console.log(response3);

Console

PUT my-index-000001
{
   "settings":{
   "analysis":{
   "analyzer":{
   "my_analyzer":{
   "type":"custom",
   "tokenizer":"standard",
   "filter":[
   "lowercase"
   ]
   },
   "my_stop_analyzer":{
   "type":"custom",
   "tokenizer":"standard",
   "filter":[
   "lowercase",
   "english_stop"
   ]
   }
   },
   "filter":{
   "english_stop":{
   "type":"stop",
   "stopwords":"_english_"
   }
   }
   }
   },
   "mappings":{
   "properties":{
   "title": {
   "type":"text",
   "analyzer":"my_analyzer",
   "search_analyzer":"my_stop_analyzer",
   "search_quote_analyzer":"my_analyzer"
   }
   }
   }
}
PUT my-index-000001/_doc/1
{
   "title":"The Quick Brown Fox"
}
PUT my-index-000001/_doc/2
{
   "title":"A Quick Brown Fox"
}
GET my-index-000001/_search
{
   "query":{
   "query_string":{
   "query":"\"the quick brown fox\""
   }
   }
}

search_quote_analyzer 設定は、update mapping API を使用して既存のフィールドで更新できます。


	`my_analyzer` アナライザーは、ストップワードを含むすべての用語をトークン化します
	`my_stop_analyzer` アナライザーは、ストップワードを削除します
	`analyzer` 設定は、インデックス作成時に使用される `my_analyzer` アナライザーを指します
	`search_analyzer` 設定は、非フレーズクエリのストップワードを削除する `my_stop_analyzer` を指します
	`search_quote_analyzer` 設定は、フレーズクエリからストップワードが削除されないことを保証する `my_analyzer` アナライザーを指します
	クエリが引用符で囲まれているため、フレーズクエリとして検出され、`search_quote_analyzer` が作動し、ストップワードがクエリから削除されないことを保証します。 `my_analyzer` アナライザーは次のトークンを返します [`the`, `quick`, `brown`, `fox`] これにより、ドキュメントの1つに一致します。同時に、用語クエリは `my_stop_analyzer` アナライザーで分析され、ストップワードがフィルタリングされます。したがって、`The quick brown fox` または `A quick brown fox` のいずれかを検索すると、両方のドキュメントが返されます。なぜなら、両方のドキュメントには次のトークンが含まれているからです [`quick`, `brown`, `fox`]。 `search_quote_analyzer` がなければ、フレーズクエリの正確な一致を行うことはできません。なぜなら、フレーズクエリのストップワードが削除され、両方のドキュメントが一致することになるからです。