メトリクス集計 - トップヒット（Top hits） - 《Elasticsearchガイドv8.15》日本語

トップヒット集約
オプション
サポートされるヒットごとの機能
例
フィールド崩壊の例
ネストされたまたは逆ネストされた集約器における top_hits サポート
パイプライン集約での使用

トップヒット集約

top_hits メトリック集約器は、集約される最も関連性の高いドキュメントを追跡します。この集約器はサブ集約器として使用されることを意図しており、バケットごとに最も一致するドキュメントを集約できます。

top_hits をトップレベルの集約として使用することはお勧めしません。検索ヒットをグループ化したい場合は、代わりに collapse パラメータを使用してください。

top_hits 集約器は、バケット集約器を介して特定のフィールドによって結果セットを効果的にグループ化するために使用できます。1つ以上のバケット集約器が、結果セットがどのプロパティによってスライスされるかを決定します。

オプション

from - 取得したい最初の結果からのオフセット。
size - バケットごとに返す最大の一致するヒット数。デフォルトでは、最初の3つの一致するヒットが返されます。
sort - 最も一致するヒットのソート方法。デフォルトでは、ヒットはメインクエリのスコアによってソートされます。

サポートされるヒットごとの機能

トップヒット集約は通常の検索ヒットを返すため、多くのヒットごとの機能がサポートされます:

docvalue_fields、size、sort のみが必要な場合は、トップメトリクスの方がトップヒット集約よりも効率的な選択かもしれません。

top_hits は rescore パラメータをサポートしていません。クエリの再スコアリングは検索ヒットにのみ適用され、集約結果には適用されません。集約で使用されるスコアを変更するには、function_score または script_score クエリを使用してください。

例

以下の例では、タイプごとに売上をグループ化し、各タイプの最後の売上を表示します。各売上について、日付と価格フィールドのみがソースに含まれます。

Python

resp = client.search(
   index="sales",
   size="0",
   aggs={
   "top_tags": {
   "terms": {
   "field": "type",
   "size": 3
   },
   "aggs": {
   "top_sales_hits": {
   "top_hits": {
   "sort": [
   {
   "date": {
   "order": "desc"
   }
   }
   ],
   "_source": {
   "includes": [
   "date",
   "price"
   ]
   },
   "size": 1
   }
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  index: 'sales',
  size: 0,
  body: {
   aggregations: {
   top_tags: {
   terms: {
   field: 'type',
   size: 3
   },
   aggregations: {
   top_sales_hits: {
   top_hits: {
   sort: [
   {
   date: {
   order: 'desc'
   }
   }
   ],
   _source: {
   includes: [
   'date',
   'price'
   ]
   },
   size: 1
   }
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  index: "sales",
  size: 0,
  aggs: {
   top_tags: {
   terms: {
   field: "type",
   size: 3,
   },
   aggs: {
   top_sales_hits: {
   top_hits: {
   sort: [
   {
   date: {
   order: "desc",
   },
   },
   ],
   _source: {
   includes: ["date", "price"],
   },
   size: 1,
   },
   },
   },
   },
  },
});
console.log(response);

コンソール

POST /sales/_search?size=0
{
  "aggs": {
   "top_tags": {
   "terms": {
   "field": "type",
   "size": 3
   },
   "aggs": {
   "top_sales_hits": {
   "top_hits": {
   "sort": [
   {
   "date": {
   "order": "desc"
   }
   }
   ],
   "_source": {
   "includes": [ "date", "price" ]
   },
   "size": 1
   }
   }
   }
   }
  }
}

可能な応答:

コンソール-結果

{
  ...
  "aggregations": {
   "top_tags": {
   "doc_count_error_upper_bound": 0,
   "sum_other_doc_count": 0,
   "buckets": [
   {
   "key": "hat",
   "doc_count": 3,
   "top_sales_hits": {
   "hits": {
   "total" : {
   "value": 3,
   "relation": "eq"
   },
   "max_score": null,
   "hits": [
   {
   "_index": "sales",
   "_id": "AVnNBmauCQpcRyxw6ChK",
   "_source": {
   "date": "2015/03/01 00:00:00",
   "price": 200
   },
   "sort": [
   1425168000000
   ],
   "_score": null
   }
   ]
   }
   }
   },
   {
   "key": "t-shirt",
   "doc_count": 3,
   "top_sales_hits": {
   "hits": {
   "total" : {
   "value": 3,
   "relation": "eq"
   },
   "max_score": null,
   "hits": [
   {
   "_index": "sales",
   "_id": "AVnNBmauCQpcRyxw6ChL",
   "_source": {
   "date": "2015/03/01 00:00:00",
   "price": 175
   },
   "sort": [
   1425168000000
   ],
   "_score": null
   }
   ]
   }
   }
   },
   {
   "key": "bag",
   "doc_count": 1,
   "top_sales_hits": {
   "hits": {
   "total" : {
   "value": 1,
   "relation": "eq"
   },
   "max_score": null,
   "hits": [
   {
   "_index": "sales",
   "_id": "AVnNBmatCQpcRyxw6ChH",
   "_source": {
   "date": "2015/01/01 00:00:00",
   "price": 150
   },
   "sort": [
   1420070400000
   ],
   "_score": null
   }
   ]
   }
   }
   }
   ]
   }
  }
}

フィールド崩壊の例

フィールド崩壊または結果グループ化は、結果セットを論理的にグループに分け、各グループごとにトップドキュメントを返す機能です。グループの順序は、グループ内の最初のドキュメントの関連性によって決まります。Elasticsearchでは、top_hits 集約器をサブ集約器としてラップするバケット集約器を介してこれを実装できます。

以下の例では、クロールされたウェブページを検索します。各ウェブページについて、本文とそのウェブページが属するドメインを保存します。terms 集約器を domain フィールドに定義することで、ウェブページの結果セットをドメインでグループ化します。top_hits 集約器はサブ集約器として定義され、バケットごとに最も一致するヒットが収集されます。

また、max 集約器が定義されており、terms 集約器の順序機能によって、バケットを最も関連性の高いドキュメントの関連性順に返すために使用されます。

Python

resp = client.search(
   index="sales",
   query={
   "match": {
   "body": "elections"
   }
   },
   aggs={
   "top_sites": {
   "terms": {
   "field": "domain",
   "order": {
   "top_hit": "desc"
   }
   },
   "aggs": {
   "top_tags_hits": {
   "top_hits": {}
   },
   "top_hit": {
   "max": {
   "script": {
   "source": "_score"
   }
   }
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  index: 'sales',
  body: {
   query: {
   match: {
   body: 'elections'
   }
   },
   aggregations: {
   top_sites: {
   terms: {
   field: 'domain',
   order: {
   top_hit: 'desc'
   }
   },
   aggregations: {
   top_tags_hits: {
   top_hits: {}
   },
   top_hit: {
   max: {
   script: {
   source: '_score'
   }
   }
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  index: "sales",
  query: {
   match: {
   body: "elections",
   },
  },
  aggs: {
   top_sites: {
   terms: {
   field: "domain",
   order: {
   top_hit: "desc",
   },
   },
   aggs: {
   top_tags_hits: {
   top_hits: {},
   },
   top_hit: {
   max: {
   script: {
   source: "_score",
   },
   },
   },
   },
   },
  },
});
console.log(response);

コンソール

POST /sales/_search
{
  "query": {
   "match": {
   "body": "elections"
   }
  },
  "aggs": {
   "top_sites": {
   "terms": {
   "field": "domain",
   "order": {
   "top_hit": "desc"
   }
   },
   "aggs": {
   "top_tags_hits": {
   "top_hits": {}
   },
   "top_hit" : {
   "max": {
   "script": {
   "source": "_score"
   }
   }
   }
   }
   }
  }
}

現在、max (または min) 集約器が必要で、terms 集約器からのバケットが各ドメインの最も関連性の高いウェブページのスコアに従って順序付けられることを確認します。残念ながら、top_hits 集約器はまだ order オプションの terms 集約器で使用できません。

ネストされたまたは逆ネストされた集約器における top_hits サポート

top_hits 集約器が nested または reverse_nested 集約器にラップされている場合、ネストされたヒットが返されます。ネストされたヒットは、通常のドキュメントの一部である隠れたミニドキュメントであり、マッピングでネストされたフィールドタイプが構成されています。top_hits 集約器は、nested または reverse_nested 集約器にラップされている場合、これらのドキュメントを表示する能力を持っています。ネストされたについての詳細は、ネストされたタイプのマッピングを参照してください。

ネストされたタイプが構成されている場合、単一のドキュメントは実際には複数のLuceneドキュメントとしてインデックスされ、同じIDを共有します。ネストされたヒットのアイデンティティを決定するには、IDだけでは不十分であるため、ネストされたヒットにはネストされたアイデンティティも含まれます。ネストされたアイデンティティは、検索ヒットの _nested フィールドに保持され、配列フィールドとネストされたヒットが属する配列フィールド内のオフセットを含みます。オフセットはゼロベースです。

実際のサンプルでどのように機能するかを見てみましょう。以下のマッピングを考慮してください:

Python

resp = client.indices.create(
   index="sales",
   mappings={
   "properties": {
   "tags": {
   "type": "keyword"
   },
   "comments": {
   "type": "nested",
   "properties": {
   "username": {
   "type": "keyword"
   },
   "comment": {
   "type": "text"
   }
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'sales',
  body: {
   mappings: {
   properties: {
   tags: {
   type: 'keyword'
   },
   comments: {
   type: 'nested',
   properties: {
   username: {
   type: 'keyword'
   },
   comment: {
   type: 'text'
   }
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "sales",
  mappings: {
   properties: {
   tags: {
   type: "keyword",
   },
   comments: {
   type: "nested",
   properties: {
   username: {
   type: "keyword",
   },
   comment: {
   type: "text",
   },
   },
   },
   },
  },
});
console.log(response);

コンソール

PUT /sales
{
  "mappings": {
   "properties": {
   "tags": { "type": "keyword" },
   "comments": {
   "type": "nested",
   "properties": {
   "username": { "type": "keyword" },
   "comment": { "type": "text" }
   }
   }
   }
  }
}


	`comments` は、`product` オブジェクトの下にネストされたドキュメントを保持する配列です。

いくつかのドキュメント:

Python

resp = client.index(
   index="sales",
   id="1",
   refresh=True,
   document={
   "tags": [
   "car",
   "auto"
   ],
   "comments": [
   {
   "username": "baddriver007",
   "comment": "This car could have better brakes"
   },
   {
   "username": "dr_who",
   "comment": "Where's the autopilot? Can't find it"
   },
   {
   "username": "ilovemotorbikes",
   "comment": "This car has two extra wheels"
   }
   ]
   },
)
print(resp)

Ruby

response = client.index(
  index: 'sales',
  id: 1,
  refresh: true,
  body: {
   tags: [
   'car',
   'auto'
   ],
   comments: [
   {
   username: 'baddriver007',
   comment: 'This car could have better brakes'
   },
   {
   username: 'dr_who',
   comment: "Where's the autopilot? Can't find it"
   },
   {
   username: 'ilovemotorbikes',
   comment: 'This car has two extra wheels'
   }
   ]
  }
)
puts response

Js

const response = await client.index({
  index: "sales",
  id: 1,
  refresh: "true",
  document: {
   tags: ["car", "auto"],
   comments: [
   {
   username: "baddriver007",
   comment: "This car could have better brakes",
   },
   {
   username: "dr_who",
   comment: "Where's the autopilot? Can't find it",
   },
   {
   username: "ilovemotorbikes",
   comment: "This car has two extra wheels",
   },
   ],
  },
});
console.log(response);

コンソール

PUT /sales/_doc/1?refresh
{
  "tags": [ "car", "auto" ],
  "comments": [
   { "username": "baddriver007", "comment": "This car could have better brakes" },
   { "username": "dr_who", "comment": "Where's the autopilot? Can't find it" },
   { "username": "ilovemotorbikes", "comment": "This car has two extra wheels" }
  ]
}

次の top_hits 集約 (nested 集約にラップされた) を実行することが可能になりました:

Python

resp = client.search(
   index="sales",
   query={
   "term": {
   "tags": "car"
   }
   },
   aggs={
   "by_sale": {
   "nested": {
   "path": "comments"
   },
   "aggs": {
   "by_user": {
   "terms": {
   "field": "comments.username",
   "size": 1
   },
   "aggs": {
   "by_nested": {
   "top_hits": {}
   }
   }
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  index: 'sales',
  body: {
   query: {
   term: {
   tags: 'car'
   }
   },
   aggregations: {
   by_sale: {
   nested: {
   path: 'comments'
   },
   aggregations: {
   by_user: {
   terms: {
   field: 'comments.username',
   size: 1
   },
   aggregations: {
   by_nested: {
   top_hits: {}
   }
   }
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  index: "sales",
  query: {
   term: {
   tags: "car",
   },
  },
  aggs: {
   by_sale: {
   nested: {
   path: "comments",
   },
   aggs: {
   by_user: {
   terms: {
   field: "comments.username",
   size: 1,
   },
   aggs: {
   by_nested: {
   top_hits: {},
   },
   },
   },
   },
   },
  },
});
console.log(response);

コンソール

POST /sales/_search
{
  "query": {
   "term": { "tags": "car" }
  },
  "aggs": {
   "by_sale": {
   "nested": {
   "path": "comments"
   },
   "aggs": {
   "by_user": {
   "terms": {
   "field": "comments.username",
   "size": 1
   },
   "aggs": {
   "by_nested": {
   "top_hits": {}
   }
   }
   }
   }
   }
  }
}

ネストされたヒットを含むトップヒット応答スニペットは、配列フィールド comments の最初のスロットに存在します:

コンソール-結果

{
  ...
  "aggregations": {
   "by_sale": {
   "by_user": {
   "buckets": [
   {
   "key": "baddriver007",
   "doc_count": 1,
   "by_nested": {
   "hits": {
   "total" : {
   "value": 1,
   "relation": "eq"
   },
   "max_score": 0.3616575,
   "hits": [
   {
   "_index": "sales",
   "_id": "1",
   "_nested": {
   "field": "comments",
   "offset": 0
   },
   "_score": 0.3616575,
   "_source": {
   "comment": "This car could have better brakes",
   "username": "baddriver007"
   }
   }
   ]
   }
   }
   }
   ...
   ]
   }
   }
  }
}


	ネストされたヒットを含む配列フィールドの名前
	含まれる配列内のネストされたヒットの位置
	ネストされたヒットのソース

_source が要求されると、ネストされたオブジェクトのソースの一部のみが返され、ドキュメント全体のソースは返されません。また、ネストされた内部オブジェクトレベルの保存されたフィールドは、top_hits 集約器を介して nested または reverse_nested 集約器に存在することができます。

ネストされたヒットのみがヒット内に _nested フィールドを持ち、非ネストされた（通常の）ヒットは _nested フィールドを持ちません。

_nested の情報は、_source が有効でない場合に、元のソースを他の場所で解析するためにも使用できます。

マッピングで複数のレベルのネストされたオブジェクトタイプが定義されている場合、_nested の情報は、2層以上のネストされたヒットのアイデンティティを表現するために階層的である可能性があります。

以下の例では、ネストされたヒットがフィールド nested_grand_child_field の最初のスロットに存在し、次に nested_child_field フィールドの2番目のスロットに存在します:

Js

...
"hits": {
 "total" : {
   "value": 2565,
   "relation": "eq"
 },
 "max_score": 1,
 "hits": [
   {
   "_index": "a",
   "_id": "1",
   "_score": 1,
   "_nested" : {
   "field" : "nested_child_field",
   "offset" : 1,
   "_nested" : {
   "field" : "nested_grand_child_field",
   "offset" : 0
   }
   }
   "_source": ...
   },
   ...
 ]
}
...

パイプライン集約での使用

top_hits は、バケットごとに単一の値を消費するパイプライン集約で使用できます。たとえば、bucket_selector はバケットフィルタリングを適用し、SQLのHAVING句を使用するのと似ています。これには、size を1に設定し、ラッピング集約器に渡す値の正しいパスを指定する必要があります。後者は、_source、_sort、または _score の値である可能性があります。例えば:

Python

resp = client.search(
   index="sales",
   size="0",
   aggs={
   "top_tags": {
   "terms": {
   "field": "type",
   "size": 3
   },
   "aggs": {
   "top_sales_hits": {
   "top_hits": {
   "sort": [
   {
   "date": {
   "order": "desc"
   }
   }
   ],
   "_source": {
   "includes": [
   "date",
   "price"
   ]
   },
   "size": 1
   }
   },
   "having.top_salary": {
   "bucket_selector": {
   "buckets_path": {
   "tp": "top_sales_hits[_source.price]"
   },
   "script": "params.tp < 180"
   }
   }
   }
   }
   },
)
print(resp)

Js

const response = await client.search({
  index: "sales",
  size: 0,
  aggs: {
   top_tags: {
   terms: {
   field: "type",
   size: 3,
   },
   aggs: {
   top_sales_hits: {
   top_hits: {
   sort: [
   {
   date: {
   order: "desc",
   },
   },
   ],
   _source: {
   includes: ["date", "price"],
   },
   size: 1,
   },
   },
   "having.top_salary": {
   bucket_selector: {
   buckets_path: {
   tp: "top_sales_hits[_source.price]",
   },
   script: "params.tp < 180",
   },
   },
   },
   },
  },
});
console.log(response);

コンソール

POST /sales/_search?size=0
{
  "aggs": {
   "top_tags": {
   "terms": {
   "field": "type",
   "size": 3
   },
   "aggs": {
   "top_sales_hits": {
   "top_hits": {
   "sort": [
   {
   "date": {
   "order": "desc"
   }
   }
   ],
   "_source": {
   "includes": [ "date", "price" ]
   },
   "size": 1
   }
   },
   "having.top_salary": {
   "bucket_selector": {
   "buckets_path": {
   "tp": "top_sales_hits[_source.price]"
   },
   "script": "params.tp < 180"
   }
   }
   }
   }
  }
}

bucket_path は、top_hits 名称 top_sales_hits と、集約値を提供するフィールドのキーワード、すなわち上記の _source フィールド price を使用します。他のオプションには、上記のソート値 top_sales_hits[_sort] にフィルタリングするためのものや、トップヒットのスコアにフィルタリングするための top_sales_hits[_score] があります。