ベクトルクエリ - テキスト拡張（Text expansion） - 《Elasticsearchガイドv8.15》日本語

テキスト拡張クエリ
8.15.0で非推奨。

テキスト拡張クエリ

8.15.0で非推奨。

このクエリは、スパースベクトルに置き換えられました。

テキスト拡張クエリは、自然言語処理モデルを使用して、クエリテキストをトークン-ウェイトペアのリストに変換し、それをスパースベクトルまたはランクフィーチャーフィールドに対するクエリに使用します。

例リクエスト

Python

resp = client.search(
   query={
   "text_expansion": {
   "<sparse_vector_field>": {
   "model_id": "the model to produce the token weights",
   "model_text": "the query string"
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  body: {
   query: {
   text_expansion: {
   "<sparse_vector_field>": {
   model_id: 'the model to produce the token weights',
   model_text: 'the query string'
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  query: {
   text_expansion: {
   "<sparse_vector_field>": {
   model_id: "the model to produce the token weights",
   model_text: "the query string",
   },
   },
  },
});
console.log(response);

コンソール

GET _search
{
   "query":{
   "text_expansion":{
   "<sparse_vector_field>":{
   "model_id":"the model to produce the token weights",
   "model_text":"the query string"
   }
   }
   }
}

テキスト拡張のトップレベルパラメータ

<sparse_vector_field>
(必須、オブジェクト) 入力テキストに基づいてNLPモデルが作成したトークン-ウェイトペアを含むフィールドの名前。

<sparse_vector_field>のトップレベルパラメータ

model_id
(必須、文字列) クエリテキストをトークン-ウェイトペアに変換するために使用するモデルのID。これは、入力テキストからトークンを作成するために使用されたモデルIDと同じでなければなりません。
model_text
(必須、文字列) 検索に使用したいクエリテキスト。
pruning_config
(オプション、オブジェクト) [プレビュー] この機能は技術プレビュー中であり、将来のリリースで変更または削除される可能性があります。Elasticは問題を修正するために作業しますが、技術プレビューの機能は公式GA機能のサポートSLAの対象ではありません。オプションのプルーニング設定。これが有効な場合、クエリのパフォーマンスを向上させるために、重要でないトークンをクエリから省略します。デフォルト: 無効。
```
-   `````tokens_freq_ratio_threshold
```
- (オプション、整数) [プレビュー] この機能は技術プレビュー中であり、将来のリリースで変更または削除される可能性があります。Elasticは問題を修正するために作業しますが、技術プレビューの機能は公式GA機能のサポートSLAの対象ではありません。指定されたフィールド内のすべてのトークンの平均頻度のtokens_freq_ratio_threshold倍を超える頻度を持つトークンは外れ値と見なされ、プルーニングされます。この値は1から100の間でなければなりません。デフォルト: 5。
- tokens_weight_threshold
- (オプション、浮動小数点) [プレビュー] この機能は技術プレビュー中であり、将来のリリースで変更または削除される可能性があります。Elasticは問題を修正するために作業しますが、技術プレビューの機能は公式GA機能のサポートSLAの対象ではありません。 tokens_weight_threshold未満のウェイトを持つトークンは重要でないと見なされ、プルーニングされます。この値は0から1の間でなければなりません。デフォルト: 0.4。
- only_score_pruned_tokens
- (オプション、ブール) [プレビュー] この機能は技術プレビュー中であり、将来のリリースで変更または削除される可能性があります。Elasticは問題を修正するために作業しますが、技術プレビューの機能は公式GA機能のサポートSLAの対象ではありません。 trueの場合、スコアリングにプルーニングされたトークンのみを入力し、非プルーニングされたトークンを破棄します。メインクエリにはfalseに設定することを強くお勧めしますが、より関連性の高い結果を得るために再スコアクエリにはtrueに設定できます。デフォルト: false。
  tokens_freq_ratio_thresholdとtokens_weight_thresholdのデフォルト値は、最も最適な結果を提供するELSERを使用したテストに基づいて選択されました。

ELSERクエリの例

以下は、ELSERモデルを参照して意味検索を実行するtext_expansionクエリの例です。ELSERを使用して意味検索を実行する方法の詳細な説明については、このチュートリアルを参照してください。

Python

resp = client.search(
   index="my-index",
   query={
   "text_expansion": {
   "ml.tokens": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?"
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  index: 'my-index',
  body: {
   query: {
   text_expansion: {
   'ml.tokens' => {
   model_id: '.elser_model_2',
   model_text: 'How is the weather in Jamaica?'
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  index: "my-index",
  query: {
   text_expansion: {
   "ml.tokens": {
   model_id: ".elser_model_2",
   model_text: "How is the weather in Jamaica?",
   },
   },
  },
});
console.log(response);

コンソール

GET my-index/_search
{
   "query":{
   "text_expansion":{
   "ml.tokens":{
   "model_id":".elser_model_2",
   "model_text":"How is the weather in Jamaica?"
   }
   }
   }
}

複数のtext_expansionクエリを互いにまたは他のクエリタイプと組み合わせることができます。これは、ブールクエリ句でラップし、線形ブースティングを使用することで実現できます:

Python

resp = client.search(
   index="my-index",
   query={
   "bool": {
   "should": [
   {
   "text_expansion": {
   "ml.inference.title_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?",
   "boost": 1
   }
   }
   },
   {
   "text_expansion": {
   "ml.inference.description_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?",
   "boost": 1
   }
   }
   },
   {
   "multi_match": {
   "query": "How is the weather in Jamaica?",
   "fields": [
   "title",
   "description"
   ],
   "boost": 4
   }
   }
   ]
   }
   },
)
print(resp)

Ruby

response = client.search(
  index: 'my-index',
  body: {
   query: {
   bool: {
   should: [
   {
   text_expansion: {
   'ml.inference.title_expanded.predicted_value' => {
   model_id: '.elser_model_2',
   model_text: 'How is the weather in Jamaica?',
   boost: 1
   }
   }
   },
   {
   text_expansion: {
   'ml.inference.description_expanded.predicted_value' => {
   model_id: '.elser_model_2',
   model_text: 'How is the weather in Jamaica?',
   boost: 1
   }
   }
   },
   {
   multi_match: {
   query: 'How is the weather in Jamaica?',
   fields: [
   'title',
   'description'
   ],
   boost: 4
   }
   }
   ]
   }
   }
  }
)
puts response

Js

const response = await client.search({
  index: "my-index",
  query: {
   bool: {
   should: [
   {
   text_expansion: {
   "ml.inference.title_expanded.predicted_value": {
   model_id: ".elser_model_2",
   model_text: "How is the weather in Jamaica?",
   boost: 1,
   },
   },
   },
   {
   text_expansion: {
   "ml.inference.description_expanded.predicted_value": {
   model_id: ".elser_model_2",
   model_text: "How is the weather in Jamaica?",
   boost: 1,
   },
   },
   },
   {
   multi_match: {
   query: "How is the weather in Jamaica?",
   fields: ["title", "description"],
   boost: 4,
   },
   },
   ],
   },
  },
});
console.log(response);

コンソール

GET my-index/_search
{
  "query": {
   "bool": {
   "should": [
   {
   "text_expansion": {
   "ml.inference.title_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?",
   "boost": 1
   }
   }
   },
   {
   "text_expansion": {
   "ml.inference.description_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?",
   "boost": 1
   }
   }
   },
   {
   "multi_match": {
   "query": "How is the weather in Jamaica?",
   "fields": [
   "title",
   "description"
   ],
   "boost": 4
   }
   }
   ]
   }
  }
}

これは、逆順位融合 (RRF)を使用して、複数のstandardリトリーバーを持つrrfリトリーバーを通じて実現することもできます。

Python

resp = client.search(
   index="my-index",
   retriever={
   "rrf": {
   "retrievers": [
   {
   "standard": {
   "query": {
   "multi_match": {
   "query": "How is the weather in Jamaica?",
   "fields": [
   "title",
   "description"
   ]
   }
   }
   }
   },
   {
   "standard": {
   "query": {
   "text_expansion": {
   "ml.inference.title_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?"
   }
   }
   }
   }
   },
   {
   "standard": {
   "query": {
   "text_expansion": {
   "ml.inference.description_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?"
   }
   }
   }
   }
   }
   ],
   "window_size": 10,
   "rank_constant": 20
   }
   },
)
print(resp)

Js

const response = await client.search({
  index: "my-index",
  retriever: {
   rrf: {
   retrievers: [
   {
   standard: {
   query: {
   multi_match: {
   query: "How is the weather in Jamaica?",
   fields: ["title", "description"],
   },
   },
   },
   },
   {
   standard: {
   query: {
   text_expansion: {
   "ml.inference.title_expanded.predicted_value": {
   model_id: ".elser_model_2",
   model_text: "How is the weather in Jamaica?",
   },
   },
   },
   },
   },
   {
   standard: {
   query: {
   text_expansion: {
   "ml.inference.description_expanded.predicted_value": {
   model_id: ".elser_model_2",
   model_text: "How is the weather in Jamaica?",
   },
   },
   },
   },
   },
   ],
   window_size: 10,
   rank_constant: 20,
   },
  },
});
console.log(response);

コンソール

GET my-index/_search
{
  "retriever": {
   "rrf": {
   "retrievers": [
   {
   "standard": {
   "query": {
   "multi_match": {
   "query": "How is the weather in Jamaica?",
   "fields": [
   "title",
   "description"
   ]
   }
   }
   }
   },
   {
   "standard": {
   "query": {
   "text_expansion": {
   "ml.inference.title_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?"
   }
   }
   }
   }
   },
   {
   "standard": {
   "query": {
   "text_expansion": {
   "ml.inference.description_expanded.predicted_value": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?"
   }
   }
   }
   }
   }
   ],
   "window_size": 10,
   "rank_constant": 20
   }
  }
}

プルーニング設定と再スコアを伴うELSERクエリの例

以下は、text_expansionクエリにプルーニング設定を追加した上記の例の拡張です。このプルーニング設定は、クエリからプルーニングする重要でないトークンを特定し、クエリのパフォーマンスを向上させるために使用されます。

トークンプルーニングはシャードレベルで行われます。これにより、シャード間で同じトークンが重要でないとラベル付けされるはずですが、各シャードの構成に基づいて保証されるものではありません。したがって、マルチシャードインデックスでtext_expansionをpruning_configと共に実行する場合は、元々クエリからプルーニングされたトークンを使用して再スコアフィルタリングされた検索結果機能を追加することを強くお勧めします。これにより、プルーニングされたトークンのシャードレベルの不整合を軽減し、全体的な関連性を向上させることができます。

Python

resp = client.search(
   index="my-index",
   query={
   "text_expansion": {
   "ml.tokens": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?",
   "pruning_config": {
   "tokens_freq_ratio_threshold": 5,
   "tokens_weight_threshold": 0.4,
   "only_score_pruned_tokens": False
   }
   }
   }
   },
   rescore={
   "window_size": 100,
   "query": {
   "rescore_query": {
   "text_expansion": {
   "ml.tokens": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?",
   "pruning_config": {
   "tokens_freq_ratio_threshold": 5,
   "tokens_weight_threshold": 0.4,
   "only_score_pruned_tokens": True
   }
   }
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.search(
  index: 'my-index',
  body: {
   query: {
   text_expansion: {
   'ml.tokens' => {
   model_id: '.elser_model_2',
   model_text: 'How is the weather in Jamaica?',
   pruning_config: {
   tokens_freq_ratio_threshold: 5,
   tokens_weight_threshold: 0.4,
   only_score_pruned_tokens: false
   }
   }
   }
   },
   rescore: {
   window_size: 100,
   query: {
   rescore_query: {
   text_expansion: {
   'ml.tokens' => {
   model_id: '.elser_model_2',
   model_text: 'How is the weather in Jamaica?',
   pruning_config: {
   tokens_freq_ratio_threshold: 5,
   tokens_weight_threshold: 0.4,
   only_score_pruned_tokens: true
   }
   }
   }
   }
   }
   }
  }
)
puts response

Js

const response = await client.search({
  index: "my-index",
  query: {
   text_expansion: {
   "ml.tokens": {
   model_id: ".elser_model_2",
   model_text: "How is the weather in Jamaica?",
   pruning_config: {
   tokens_freq_ratio_threshold: 5,
   tokens_weight_threshold: 0.4,
   only_score_pruned_tokens: false,
   },
   },
   },
  },
  rescore: {
   window_size: 100,
   query: {
   rescore_query: {
   text_expansion: {
   "ml.tokens": {
   model_id: ".elser_model_2",
   model_text: "How is the weather in Jamaica?",
   pruning_config: {
   tokens_freq_ratio_threshold: 5,
   tokens_weight_threshold: 0.4,
   only_score_pruned_tokens: true,
   },
   },
   },
   },
   },
  },
});
console.log(response);

コンソール

GET my-index/_search
{
   "query":{
   "text_expansion":{
   "ml.tokens":{
   "model_id":".elser_model_2",
   "model_text":"How is the weather in Jamaica?",
   "pruning_config": {
   "tokens_freq_ratio_threshold": 5,
   "tokens_weight_threshold": 0.4,
   "only_score_pruned_tokens": false
   }
   }
   }
   },
   "rescore": {
   "window_size": 100,
   "query": {
   "rescore_query": {
   "text_expansion": {
   "ml.tokens": {
   "model_id": ".elser_model_2",
   "model_text": "How is the weather in Jamaica?",
   "pruning_config": {
   "tokens_freq_ratio_threshold": 5,
   "tokens_weight_threshold": 0.4,
   "only_score_pruned_tokens": true
   }
   }
   }
   }
   }
   }
}

データによっては、テキスト拡張クエリはtrack_total_hits: falseでより高速になる場合があります。