バケット集約 - コンポジット（Composite） - 《Elasticsearchガイドv8.15》日本語

コンポジット集約
- Js
- Js
値ソース
順序
欠落バケット
サイズ
ページネーション
早期終了
サブ集約
パイプライン集約

コンポジット集約

コンポジット集約は高コストです。コンポジット集約を本番環境にデプロイする前に、アプリケーションの負荷テストを行ってください。

異なるソースからコンポジットバケットを作成するマルチバケット集約です。

他の multi-bucket 集約とは異なり、composite 集約を使用して、マルチレベル集約から すべての バケットを効率的にページネートできます。この集約は、特定の集約の すべての バケットをストリーミングする方法を提供し、これはスクロールがドキュメントに対して行うことに似ています。

コンポジットバケットは、各ドキュメントから抽出/作成された値の組み合わせから構築され、各組み合わせはコンポジットバケットと見なされます。

例えば、次のドキュメントを考えてみてください：

Js

{
  "keyword": ["foo", "bar"],
  "number": [23, 65, 76]
}

keyword と number を集約結果のソースフィールドとして使用すると、次のコンポジットバケットが生成されます：

Js

{ "keyword": "foo", "number": 23 }
{ "keyword": "foo", "number": 65 }
{ "keyword": "foo", "number": 76 }
{ "keyword": "bar", "number": 23 }
{ "keyword": "bar", "number": 65 }
{ "keyword": "bar", "number": 76 }

値ソース

sources パラメータは、コンポジットバケットを構築する際に使用するソースフィールドを定義します。sources の定義順序は、キーが返される順序を制御します。

sources を定義する際には、一意の名前を使用する必要があります。

sources パラメータは、次のいずれかのタイプである必要があります：

用語

terms 値ソースは、単純な terms 集約に似ています。値は terms 集約と同様にフィールドから抽出されます。

例:

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "product": {
   "terms": {
   "field": "product"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   product: {
   terms: {
   field: 'product'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   product: {
   terms: {
   field: "product",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "product": { "terms": { "field": "product" } } }
   ]
   }
   }
  }
}

terms 集約と同様に、ランタイムフィールドを使用してコンポジットバケットの値を作成することが可能です：

Python

resp = client.search(
   runtime_mappings={
   "day_of_week": {
   "type": "keyword",
   "script": "\n        emit(doc['timestamp'].value.dayOfWeekEnum\n          .getDisplayName(TextStyle.FULL, Locale.ENGLISH))\n      "
   }
   },
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "dow": {
   "terms": {
   "field": "day_of_week"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Js

const response = await client.search({
  runtime_mappings: {
   day_of_week: {
   type: "keyword",
   script:
   "\n        emit(doc['timestamp'].value.dayOfWeekEnum\n          .getDisplayName(TextStyle.FULL, Locale.ENGLISH))\n      ",
   },
  },
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   dow: {
   terms: {
   field: "day_of_week",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "runtime_mappings": {
   "day_of_week": {
   "type": "keyword",
   "script": """
   emit(doc['timestamp'].value.dayOfWeekEnum
   .getDisplayName(TextStyle.FULL, Locale.ENGLISH))
   """
   }
  },
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "dow": {
   "terms": { "field": "day_of_week" }
   }
   }
   ]
   }
   }
  }
}

似ているものの、terms 値ソースは terms 集約と同じパラメータセットをサポートしていません。他のサポートされている値ソースパラメータについては、次を参照してください：

ヒストグラム

histogram 値ソースは数値値に適用され、値に対して固定サイズの間隔を構築します。interval パラメータは、数値値がどのように変換されるべきかを定義します。例えば、interval が 5 に設定されている場合、任意の数値値はその最も近い間隔に変換され、101 の値は 100 と 105 の間の間隔のキーである 100 に変換されます。

例:

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "histo": {
   "histogram": {
   "field": "price",
   "interval": 5
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   histo: {
   histogram: {
   field: 'price',
   interval: 5
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   histo: {
   histogram: {
   field: "price",
   interval: 5,
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "histo": { "histogram": { "field": "price", "interval": 5 } } }
   ]
   }
   }
  }
}

histogram 集約と同様に、ランタイムフィールドを使用してコンポジットバケットの値を作成することが可能です：

Python

resp = client.search(
   runtime_mappings={
   "price.discounted": {
   "type": "double",
   "script": "\n        double price = doc['price'].value;\n        if (doc['product'].value == 'mad max') {\n          price *= 0.8;\n        }\n        emit(price);\n      "
   }
   },
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "price": {
   "histogram": {
   "interval": 5,
   "field": "price.discounted"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   runtime_mappings: {
   'price.discounted' => {
   type: 'double',
   script: "\n        double price = doc['price'].value;\n        if (doc['product'].value == 'mad max') {\n          price *= 0.8;\n        }\n        emit(price);\n      "
   }
   },
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   price: {
   histogram: {
   interval: 5,
   field: 'price.discounted'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  runtime_mappings: {
   "price.discounted": {
   type: "double",
   script:
   "\n        double price = doc['price'].value;\n        if (doc['product'].value == 'mad max') {\n          price *= 0.8;\n        }\n        emit(price);\n      ",
   },
  },
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   price: {
   histogram: {
   interval: 5,
   field: "price.discounted",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "runtime_mappings": {
   "price.discounted": {
   "type": "double",
   "script": """
   double price = doc['price'].value;
   if (doc['product'].value == 'mad max') {
   price *= 0.8;
   }
   emit(price);
   """
   }
  },
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "price": {
   "histogram": {
   "interval": 5,
   "field": "price.discounted"
   }
   }
   }
   ]
   }
   }
  }
}

日付ヒストグラム

date_histogram は histogram 値ソースに似ていますが、間隔は日付/時間の式によって指定されます：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d" } } }
   ]
   }
   }
  }
}

上記の例は、1日ごとに間隔を作成し、すべての timestamp 値をその最も近い間隔の開始に変換します。間隔に利用可能な式：year、quarter、month、week、day、hour、minute、second

時間値は、時間単位解析によってサポートされる略語を介して指定することもできます。分数の時間値はサポートされていないことに注意してくださいが、別の時間単位にシフトすることで対処できます（例：1.5h は 90m として指定できます）。

フォーマット

内部的に、日付はエポックからのミリ秒で表される64ビットの数として表現されます。これらのタイムスタンプはバケットキーとして返されます。フォーマットパラメータで指定されたフォーマットを使用して、フォーマットされた日付文字列を返すことも可能です：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d",
   "format": "yyyy-MM-dd"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d',
   format: 'yyyy-MM-dd'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   format: "yyyy-MM-dd",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d",
   "format": "yyyy-MM-dd"
   }
   }
   }
   ]
   }
   }
  }
}


	表現力豊かな日付形式パターンをサポートします。

タイムゾーン

日付時刻はElasticsearchにUTCで保存されます。デフォルトでは、すべてのバケット化と丸めもUTCで行われます。time_zoneパラメータを使用して、バケット化に異なるタイムゾーンを使用することを示すことができます。

タイムゾーンは、ISO 8601 UTCオフセット（例：+01:00または-08:00）として指定するか、America/Los_AngelesのようなTZデータベースで使用される識別子として指定できます。

オフセット

offset パラメータを使用して、各バケットの開始値を指定された正の（+）または負のオフセット（-）の期間で変更します。たとえば、1h は1時間、1d は1日です。詳細な時間の持続オプションについては、時間単位を参照してください。

たとえば、day の間隔を使用すると、各バケットは真夜中から真夜中まで実行されます。offset パラメータを +6h に設定すると、各バケットは午前6時から午前6時まで実行されるようになります：

Python

resp = client.index(
   index="my-index-000001",
   id="1",
   refresh=True,
   document={
   "date": "2015-10-01T05:30:00Z"
   },
)
print(resp)
resp1 = client.index(
   index="my-index-000001",
   id="2",
   refresh=True,
   document={
   "date": "2015-10-01T06:30:00Z"
   },
)
print(resp1)
resp2 = client.search(
   index="my-index-000001",
   size="0",
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "date",
   "calendar_interval": "day",
   "offset": "+6h",
   "format": "iso8601"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp2)

Ruby

response = client.index(
  index: 'my-index-000001',
  id: 1,
  refresh: true,
  body: {
   date: '2015-10-01T05:30:00Z'
  }
)
puts response
response = client.index(
  index: 'my-index-000001',
  id: 2,
  refresh: true,
  body: {
   date: '2015-10-01T06:30:00Z'
  }
)
puts response
response = client.search(
  index: 'my-index-000001',
  size: 0,
  body: {
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: 'date',
   calendar_interval: 'day',
   offset: '+6h',
   format: 'iso8601'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.index({
  index: "my-index-000001",
  id: 1,
  refresh: "true",
  document: {
   date: "2015-10-01T05:30:00Z",
  },
});
console.log(response);
const response1 = await client.index({
  index: "my-index-000001",
  id: 2,
  refresh: "true",
  document: {
   date: "2015-10-01T06:30:00Z",
  },
});
console.log(response1);
const response2 = await client.search({
  index: "my-index-000001",
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: "date",
   calendar_interval: "day",
   offset: "+6h",
   format: "iso8601",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response2);

コンソール

PUT my-index-000001/_doc/1?refresh
{
  "date": "2015-10-01T05:30:00Z"
}
PUT my-index-000001/_doc/2?refresh
{
  "date": "2015-10-01T06:30:00Z"
}
GET my-index-000001/_search?size=0
{
  "aggs": {
   "my_buckets": {
   "composite" : {
   "sources" : [
   {
   "date": {
   "date_histogram" : {
   "field": "date",
   "calendar_interval": "day",
   "offset": "+6h",
   "format": "iso8601"
   }
   }
   }
   ]
   }
   }
  }
}

真夜中から始まる単一のバケットの代わりに、上記のリクエストは午前6時から始まるバケットにドキュメントをグループ化します：

コンソール-結果

{
  ...
  "aggregations": {
   "my_buckets": {
   "after_key": { "date": "2015-10-01T06:00:00.000Z" },
   "buckets": [
   {
   "key": { "date": "2015-09-30T06:00:00.000Z" },
   "doc_count": 1
   },
   {
   "key": { "date": "2015-10-01T06:00:00.000Z" },
   "doc_count": 1
   }
   ]
   }
  }
}

各バケットの開始 offset は、time_zone 調整が行われた後に計算されます。

GeoTile グリッド

geotile_grid 値ソースは geo_point フィールドで機能し、ポイントをグリッド内のセルを表すバケットにグループ化します。結果のグリッドはスパースであり、マッチするデータを持つセルのみを含みます。各セルは、多くのオンラインマップサイトで使用されるマップタイルに対応しています。各セルは「{zoom}/{x}/{y}」形式でラベル付けされ、zoom はユーザー指定の精度に等しいです。

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "tile": {
   "geotile_grid": {
   "field": "location",
   "precision": 8
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   tile: {
   geotile_grid: {
   field: 'location',
   precision: 8
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   tile: {
   geotile_grid: {
   field: "location",
   precision: 8,
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "tile": { "geotile_grid": { "field": "location", "precision": 8 } } }
   ]
   }
   }
  }
}

精度

最も高精度のジオタイルは長さ29で、10cm x 10cm未満の土地をカバーするセルを生成します。この精度は、各タイルを生成してメモリにロードする必要がないため、コンポジット集約に特に適しています。

精度（ズーム）が地上のサイズにどのように相関するかについては、ズームレベルのドキュメントを参照してください。この集約の精度は0から29の範囲で設定できます。

バウンディングボックスフィルタリング

ジオタイルソースは、特定のジオバウンディングボックスに制約をかけることができ、使用されるタイルの範囲を減少させます。これらの境界は、特定の地理的エリアの一部に高精度のタイルが必要な場合に便利です。

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "tile": {
   "geotile_grid": {
   "field": "location",
   "precision": 22,
   "bounds": {
   "top_left": "POINT (4.9 52.4)",
   "bottom_right": "POINT (5.0 52.3)"
   }
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   tile: {
   geotile_grid: {
   field: 'location',
   precision: 22,
   bounds: {
   top_left: 'POINT (4.9 52.4)',
   bottom_right: 'POINT (5.0 52.3)'
   }
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   tile: {
   geotile_grid: {
   field: "location",
   precision: 22,
   bounds: {
   top_left: "POINT (4.9 52.4)",
   bottom_right: "POINT (5.0 52.3)",
   },
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "tile": {
   "geotile_grid": {
   "field": "location",
   "precision": 22,
   "bounds": {
   "top_left": "POINT (4.9 52.4)",
   "bottom_right": "POINT (5.0 52.3)"
   }
   }
   }
   }
   ]
   }
   }
  }
}

異なる値ソースの混合

sources パラメータは、値ソースの配列を受け入れます。異なる値ソースを混合してコンポジットバケットを作成することが可能です。例えば：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d"
   }
   }
   },
   {
   "product": {
   "terms": {
   "field": "product"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d'
   }
   }
   },
   {
   product: {
   terms: {
   field: 'product'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   },
   },
   },
   {
   product: {
   terms: {
   field: "product",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d" } } },
   { "product": { "terms": { "field": "product" } } }
   ]
   }
   }
  }
}

これにより、2つの値ソース、date_histogram と terms によって作成された値からコンポジットバケットが作成されます。各バケットは、集約で定義された各値ソースの1つの値で構成されます。任意のタイプの組み合わせが許可され、配列内の順序はコンポジットバケットに保持されます。

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "shop": {
   "terms": {
   "field": "shop"
   }
   }
   },
   {
   "product": {
   "terms": {
   "field": "product"
   }
   }
   },
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   shop: {
   terms: {
   field: 'shop'
   }
   }
   },
   {
   product: {
   terms: {
   field: 'product'
   }
   }
   },
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   shop: {
   terms: {
   field: "shop",
   },
   },
   },
   {
   product: {
   terms: {
   field: "product",
   },
   },
   },
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "shop": { "terms": { "field": "shop" } } },
   { "product": { "terms": { "field": "product" } } },
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d" } } }
   ]
   }
   }
  }
}

順序

デフォルトでは、コンポジットバケットはその自然な順序でソートされます。値はその値の昇順でソートされます。複数の値ソースが要求される場合、順序は値ソースごとに行われ、コンポジットバケットの最初の値は他のコンポジットバケットの最初の値と比較され、等しい場合はコンポジットバケット内の次の値がタイブレークに使用されます。これは、コンポジットバケット [foo, 100] が [foobar, 0] よりも小さいと見なされることを意味します。foo が foobar よりも小さいと見なされます。各値ソースのソートの方向を定義することが可能で、order を asc（デフォルト値）または desc（降順）に設定することで、値ソース定義内で直接設定できます。例えば：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d",
   "order": "desc"
   }
   }
   },
   {
   "product": {
   "terms": {
   "field": "product",
   "order": "asc"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d',
   order: 'desc'
   }
   }
   },
   {
   product: {
   terms: {
   field: 'product',
   order: 'asc'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   order: "desc",
   },
   },
   },
   {
   product: {
   terms: {
   field: "product",
   order: "asc",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } },
   { "product": { "terms": { "field": "product", "order": "asc" } } }
   ]
   }
   }
  }
}

… は、date_histogram ソースからの値を比較する際にコンポジットバケットを降順にソートし、terms ソースからの値を比較する際に昇順にソートします。

欠落バケット

デフォルトでは、特定のソースに値がないドキュメントは無視されます。missing_bucket を true に設定することで、応答に含めることが可能です（デフォルトは false）：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "product_name": {
   "terms": {
   "field": "product",
   "missing_bucket": True,
   "missing_order": "last"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   product_name: {
   terms: {
   field: 'product',
   missing_bucket: true,
   missing_order: 'last'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   product_name: {
   terms: {
   field: "product",
   missing_bucket: true,
   missing_order: "last",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [{
   "product_name": {
   "terms": {
   "field": "product",
   "missing_bucket": true,
   "missing_order": "last"
   }
   }
   }]
   }
   }
  }
}

上記の例では、product_name ソースは product 値を持たないドキュメントのために明示的な null バケットを出力します。このバケットは最後に配置されます。

missing_order パラメータを使用して null バケットの位置を制御できます。missing_order が first または last の場合、null バケットはそれぞれ最初または最後の位置に配置されます。missing_order が省略されるか default の場合、ソースの order がバケットの位置を決定します。order が asc（昇順）の場合、バケットは最初の位置にあります。order が desc（降順）の場合、バケットは最後の位置にあります。

サイズ

size パラメータを設定して、いくつのコンポジットバケットを返すべきかを定義できます。各コンポジットバケットは単一のバケットと見なされるため、サイズを10に設定すると、値ソースから作成された最初の10個のコンポジットバケットが返されます。応答には、各コンポジットバケットの値が、各値ソースから抽出された値を含む配列で含まれます。デフォルトは 10 です。

ページネーション

コンポジットバケットの数が高すぎる（または不明）場合、単一の応答で返すことができるように、複数のリクエストに分割することが可能です。コンポジットバケットは本質的にフラットであるため、要求された size は、応答で返されるコンポジットバケットの正確な数です（返すべきコンポジットバケットが少なくとも size であると仮定します）。すべてのコンポジットバケットを取得する必要がある場合は、小さなサイズ（100 または 1000 など）を使用し、その後 after パラメータを使用して次の結果を取得することが望ましいです。例えば：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "size": 2,
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d"
   }
   }
   },
   {
   "product": {
   "terms": {
   "field": "product"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   size: 2,
   sources: [
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d'
   }
   }
   },
   {
   product: {
   terms: {
   field: 'product'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   size: 2,
   sources: [
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   },
   },
   },
   {
   product: {
   terms: {
   field: "product",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "size": 2,
   "sources": [
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d" } } },
   { "product": { "terms": { "field": "product" } } }
   ]
   }
   }
  }
}

… 返される:

コンソール-結果

{
  ...
  "aggregations": {
   "my_buckets": {
   "after_key": {
   "date": 1494288000000,
   "product": "mad max"
   },
   "buckets": [
   {
   "key": {
   "date": 1494201600000,
   "product": "rocky"
   },
   "doc_count": 1
   },
   {
   "key": {
   "date": 1494288000000,
   "product": "mad max"
   },
   "doc_count": 2
   }
   ]
   }
  }
}

次のバケットセットを取得するには、after パラメータを応答で返された after_key 値に設定して、同じ集約を再送信します。例えば、このリクエストは前の応答で提供された after_key 値を使用します：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "size": 2,
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d",
   "order": "desc"
   }
   }
   },
   {
   "product": {
   "terms": {
   "field": "product",
   "order": "asc"
   }
   }
   }
   ],
   "after": {
   "date": 1494288000000,
   "product": "mad max"
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   size: 2,
   sources: [
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d',
   order: 'desc'
   }
   }
   },
   {
   product: {
   terms: {
   field: 'product',
   order: 'asc'
   }
   }
   }
   ],
   after: {
   date: 1_494_288_000_000,
   product: 'mad max'
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   size: 2,
   sources: [
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   order: "desc",
   },
   },
   },
   {
   product: {
   terms: {
   field: "product",
   order: "asc",
   },
   },
   },
   ],
   after: {
   date: 1494288000000,
   product: "mad max",
   },
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "size": 2,
   "sources": [
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } },
   { "product": { "terms": { "field": "product", "order": "asc" } } }
   ],
   "after": { "date": 1494288000000, "product": "mad max" }
   }
   }
  }
}


	提供された値の後にソートされるバケットに集約を制限する必要があります。

after_key は通常応答で返された最後のバケットのキーですが、それは保証されていません。常にバケットから導出するのではなく、返された after_key を使用してください。

早期終了

最適なパフォーマンスのために、インデックスソートをインデックスに設定し、コンポジット集約のソース順序の一部または完全に一致させる必要があります。例えば、次のインデックスソート：

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "index": {
   "sort.field": [
   "username",
   "timestamp"
   ],
   "sort.order": [
   "asc",
   "desc"
   ]
   }
   },
   mappings={
   "properties": {
   "username": {
   "type": "keyword",
   "doc_values": True
   },
   "timestamp": {
   "type": "date"
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   index: {
   'sort.field' => [
   'username',
   'timestamp'
   ],
   'sort.order' => [
   'asc',
   'desc'
   ]
   }
   },
   mappings: {
   properties: {
   username: {
   type: 'keyword',
   doc_values: true
   },
   timestamp: {
   type: 'date'
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   index: {
   "sort.field": ["username", "timestamp"],
   "sort.order": ["asc", "desc"],
   },
  },
  mappings: {
   properties: {
   username: {
   type: "keyword",
   doc_values: true,
   },
   timestamp: {
   type: "date",
   },
   },
  },
});
console.log(response);

コンソール

PUT my-index-000001
{
  "settings": {
   "index": {
   "sort.field": [ "username", "timestamp" ],
   "sort.order": [ "asc", "desc" ]
   }
  },
  "mappings": {
   "properties": {
   "username": {
   "type": "keyword",
   "doc_values": true
   },
   "timestamp": {
   "type": "date"
   }
   }
  }
}


	このインデックスは `username` で最初にソートされ、その後 `timestamp` でソートされます。
	… `username` フィールドの昇順および `timestamp` フィールドの降順で。 1. これらのコンポジット集約を最適化するために使用できます：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "user_name": {
   "terms": {
   "field": "user_name"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   user_name: {
   terms: {
   field: 'user_name'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   user_name: {
   terms: {
   field: "user_name",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "user_name": { "terms": { "field": "user_name" } } }
   ]
   }
   }
  }
}


	`user_name` はインデックスソートのプレフィックスであり、順序が一致します（`asc`）。

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "user_name": {
   "terms": {
   "field": "user_name"
   }
   }
   },
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d",
   "order": "desc"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   user_name: {
   terms: {
   field: 'user_name'
   }
   }
   },
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d',
   order: 'desc'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   user_name: {
   terms: {
   field: "user_name",
   },
   },
   },
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   order: "desc",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "user_name": { "terms": { "field": "user_name" } } },
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
   ]
   }
   }
  }
}


	`user_name` はインデックスソートのプレフィックスであり、順序が一致します（`asc`）。
	`timestamp` もプレフィックスと一致し、順序が一致します（`desc`）。

早期終了を最適化するために、リクエスト内で track_total_hits を false に設定することが推奨されます。リクエストに一致する総ヒット数は最初のリクエストで取得でき、この数をすべてのページで計算するのはコストがかかります。

Python

resp = client.search(
   size=0,
   track_total_hits=False,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "user_name": {
   "terms": {
   "field": "user_name"
   }
   }
   },
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d",
   "order": "desc"
   }
   }
   }
   ]
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   track_total_hits: false,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   user_name: {
   terms: {
   field: 'user_name'
   }
   }
   },
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d',
   order: 'desc'
   }
   }
   }
   ]
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  track_total_hits: false,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   user_name: {
   terms: {
   field: "user_name",
   },
   },
   },
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   order: "desc",
   },
   },
   },
   ],
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "track_total_hits": false,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "user_name": { "terms": { "field": "user_name" } } },
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
   ]
   }
   }
  }
}

ソースの順序は重要であり、以下の例では user_name と timestamp を入れ替えると、ソート最適化が無効になります。この構成はインデックスソート仕様に一致しません。ソースの順序が使用ケースにとって重要でない場合は、次の簡単なガイドラインに従うことができます：

最も高いカーディナリティを持つフィールドを最初に配置します。
フィールドの順序がインデックスソートの順序と一致することを確認します。
マルチバリューフィールドは最後に配置します。これらは早期終了に使用できません。

index sort はインデックス作成を遅くする可能性があるため、特定の使用ケースとデータセットでインデックスソートをテストして、要件に一致することを確認することが非常に重要です。一致しない場合、composite 集約は、クエリがすべてのドキュメント（match_all クエリ）に一致する場合、ソートされていないインデックスで早期終了を試みることもあります。

サブ集約

multi-bucket 集約と同様に、composite 集約はサブ集約を保持できます。これらのサブ集約は、この親集約によって作成された各コンポジットバケットに対して他のバケットや統計を計算するために使用できます。例えば、次の例は、各コンポジットバケットごとのフィールドの平均値を計算します：

Python

resp = client.search(
   size=0,
   aggs={
   "my_buckets": {
   "composite": {
   "sources": [
   {
   "date": {
   "date_histogram": {
   "field": "timestamp",
   "calendar_interval": "1d",
   "order": "desc"
   }
   }
   },
   {
   "product": {
   "terms": {
   "field": "product"
   }
   }
   }
   ]
   },
   "aggregations": {
   "the_avg": {
   "avg": {
   "field": "price"
   }
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   size: 0,
   aggregations: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: 'timestamp',
   calendar_interval: '1d',
   order: 'desc'
   }
   }
   },
   {
   product: {
   terms: {
   field: 'product'
   }
   }
   }
   ]
   },
   aggregations: {
   the_avg: {
   avg: {
   field: 'price'
   }
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  size: 0,
  aggs: {
   my_buckets: {
   composite: {
   sources: [
   {
   date: {
   date_histogram: {
   field: "timestamp",
   calendar_interval: "1d",
   order: "desc",
   },
   },
   },
   {
   product: {
   terms: {
   field: "product",
   },
   },
   },
   ],
   },
   aggregations: {
   the_avg: {
   avg: {
   field: "price",
   },
   },
   },
   },
  },
});
console.log(response);

コンソール

GET /_search
{
  "size": 0,
  "aggs": {
   "my_buckets": {
   "composite": {
   "sources": [
   { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } },
   { "product": { "terms": { "field": "product" } } }
   ]
   },
   "aggregations": {
   "the_avg": {
   "avg": { "field": "price" }
   }
   }
   }
  }
}

… 返される:

コンソール-結果

{
  ...
  "aggregations": {
   "my_buckets": {
   "after_key": {
   "date": 1494201600000,
   "product": "rocky"
   },
   "buckets": [
   {
   "key": {
   "date": 1494460800000,
   "product": "apocalypse now"
   },
   "doc_count": 1,
   "the_avg": {
   "value": 10.0
   }
   },
   {
   "key": {
   "date": 1494374400000,
   "product": "mad max"
   },
   "doc_count": 1,
   "the_avg": {
   "value": 27.0
   }
   },
   {
   "key": {
   "date": 1494288000000,
   "product": "mad max"
   },
   "doc_count": 2,
   "the_avg": {
   "value": 22.5
   }
   },
   {
   "key": {
   "date": 1494201600000,
   "product": "rocky"
   },
   "doc_count": 1,
   "the_avg": {
   "value": 10.0
   }
   }
   ]
   }
  }
}

パイプライン集約

コンポジット集約は現在、パイプライン集約と互換性がなく、ほとんどの場合意味がありません。例えば、コンポジット集約のページング特性により、単一の論理パーティション（例えば1日）が複数のページに分散する可能性があります。パイプライン集約は最終的なバケットリストに対する純粋なポストプロセッシングであるため、コンポジットページでの導関数のようなものを実行すると、ページ上の「部分的」結果のみを考慮するため、不正確な結果をもたらす可能性があります。

bucket_selector のように、単一のバケットに自己完結するパイプライン集約は将来的にサポートされる可能性があります。