マッピングパラメータ - ignore_above - 《Elasticsearchガイドv8.15》日本語

ignore_above

ignore_above

設定された ignore_above より長い文字列はインデックスされず、保存されません。文字列の配列の場合、ignore_above は各配列要素に対して個別に適用され、ignore_above より長い文字列要素はインデックスされず、保存されません。

すべての文字列/配列要素は、_source フィールドに存在します。これは、Elasticsearch のデフォルトであるため、これが有効になっている場合です。

Python

resp = client.indices.create(
   index="my-index-000001",
   mappings={
   "properties": {
   "message": {
   "type": "keyword",
   "ignore_above": 20
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="my-index-000001",
   id="1",
   document={
   "message": "Syntax error"
   },
)
print(resp1)
resp2 = client.index(
   index="my-index-000001",
   id="2",
   document={
   "message": "Syntax error with some long stacktrace"
   },
)
print(resp2)
resp3 = client.search(
   index="my-index-000001",
   aggs={
   "messages": {
   "terms": {
   "field": "message"
   }
   }
   },
)
print(resp3)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   mappings: {
   properties: {
   message: {
   type: 'keyword',
   ignore_above: 20
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
   message: 'Syntax error'
  }
)
puts response
response = client.index(
  index: 'my-index-000001',
  id: 2,
  body: {
   message: 'Syntax error with some long stacktrace'
  }
)
puts response
response = client.search(
  index: 'my-index-000001',
  body: {
   aggregations: {
   messages: {
   terms: {
   field: 'message'
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
   properties: {
   message: {
   type: "keyword",
   ignore_above: 20,
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "my-index-000001",
  id: 1,
  document: {
   message: "Syntax error",
  },
});
console.log(response1);
const response2 = await client.index({
  index: "my-index-000001",
  id: 2,
  document: {
   message: "Syntax error with some long stacktrace",
  },
});
console.log(response2);
const response3 = await client.search({
  index: "my-index-000001",
  aggs: {
   messages: {
   terms: {
   field: "message",
   },
   },
  },
});
console.log(response3);

Console

PUT my-index-000001
{
  "mappings": {
   "properties": {
   "message": {
   "type": "keyword",
   "ignore_above": 20
   }
   }
  }
}
PUT my-index-000001/_doc/1
{
  "message": "Syntax error"
}
PUT my-index-000001/_doc/2
{
  "message": "Syntax error with some long stacktrace"
}
GET my-index-000001/_search
{
  "aggs": {
   "messages": {
   "terms": {
   "field": "message"
   }
   }
  }
}


	このフィールドは20文字を超える文字列を無視します。
	このドキュメントは正常にインデックスされました。
	このドキュメントはインデックスされますが、`message` フィールドのインデックスは行われません。
	検索は両方のドキュメントを返しますが、最初のドキュメントのみが用語集約に存在します。

ignore_above 設定は、マッピングの更新 APIを使用して既存のフィールドに対して更新できます。

このオプションは、Lucene の用語バイト長制限 32766 に対する保護にも役立ちます。

ignore_above の値は 文字数 ですが、Lucene はバイトをカウントします。多くの非ASCII文字を含むUTF-8テキストを使用する場合、UTF-8文字は最大4バイトを占める可能性があるため、制限を 32766 / 4 = 8191 に設定することをお勧めします。