フィールドデータ型 - テキスト（Text） - 《Elasticsearchガイドv8.15》日本語

テキストタイプファミリー
テキストフィールドタイプ
テキストとキーワードの両方としてフィールドを使用する
テキストフィールドのパラメータ
合成 _source
フィールドデータマッピングパラメータ
- Ruby
- Go
- Js
- Console
テキストフィールドでのフィールドデータの有効化
- Python
- Ruby
- Go
- Js
- Console
フィールドデータ頻度フィルターマッピングパラメータ
- Python
- Ruby
- Go
- Js
- Console
マッチ専用テキストフィールドタイプ
マッチ専用テキストフィールドのパラメータ

テキストタイプファミリー

テキストファミリーには以下のフィールドタイプが含まれます：

text、メールの本文や商品の説明などのフルテキストコンテンツのための従来のフィールドタイプ。
match_only_text、スコアリングを無効にし、位置が必要なクエリでは遅くなるtextのスペース最適化バリアント。ログメッセージのインデックス作成に最適です。

テキストフィールドタイプ

メールの本文や商品の説明など、フルテキスト値をインデックスするためのフィールドです。これらのフィールドはanalyzedであり、analyzerを通じて文字列を個々の用語のリストに変換してからインデックスされます。分析プロセスにより、Elasticsearchは各フルテキストフィールド内の個々の単語を検索できます。テキストフィールドはソートには使用されず、集約にはほとんど使用されません（ただし、significant text aggregationは顕著な例外です）。

textフィールドは、非構造的で人間が読みやすいコンテンツに最適です。非構造的な機械生成コンテンツをインデックスする必要がある場合は、Mapping unstructured contentを参照してください。

メールアドレス、ホスト名、ステータスコード、またはタグなどの構造化されたコンテンツをインデックスする必要がある場合は、keywordフィールドを使用する方が良いでしょう。

以下はテキストフィールドのマッピングの例です：

Python

resp = client.indices.create(
   index="my-index-000001",
   mappings={
   "properties": {
   "full_name": {
   "type": "text"
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   mappings: {
   properties: {
   full_name: {
   type: 'text'
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
   properties: {
   full_name: {
   type: "text",
   },
   },
  },
});
console.log(response);

Console

PUT my-index-000001
{
  "mappings": {
   "properties": {
   "full_name": {
   "type":  "text"
   }
   }
  }
}

テキストとキーワードの両方としてフィールドを使用する

時には、同じフィールドのフルテキスト（text）とキーワード（keyword）の両方のバージョンを持つことが便利です：一つはフルテキスト検索用、もう一つは集約とソート用です。これはmulti-fieldsを使用して実現できます。

テキストフィールドのパラメータ

以下のパラメータはtextフィールドで受け入れられます：


`analyzer`	`text`フィールドに使用されるべきanalyzer、インデックス時と検索時の両方で（`search_analyzer`によってオーバーライドされない限り）。デフォルトはデフォルトインデックスアナライザー、または`standard` analyzerです。
`eager_global_ordinals`	グローバルオーディナルをリフレッシュ時に早期にロードする必要がありますか？`true`または`false`（デフォルト）を受け入れます。これを有効にすることは、（重要な）用語の集約に頻繁に使用されるフィールドでは良い考えです。
`fielddata`	フィールドはソート、集約、またはスクリプトに使用するためにメモリ内フィールドデータを使用できますか？`true`または`false`（デフォルト）を受け入れます。
`fielddata_frequency_filter`	`fielddata`が有効な場合にメモリにロードする値を決定するための専門的な設定。デフォルトではすべての値がロードされます。
`fields`	マルチフィールドは、異なる目的のために同じ文字列値を複数の方法でインデックスすることを可能にします。たとえば、検索用のフィールドと、ソートおよび集約用のマルチフィールド、または異なるアナライザーで分析された同じ文字列値です。
`index`	フィールドは検索可能であるべきですか？`true`（デフォルト）または`false`を受け入れます。
`index_options`	検索およびハイライト目的でインデックスに保存されるべき情報。デフォルトは`positions`です。
`index_prefixes`	有効にすると、2から5文字の間の用語プレフィックスが別のフィールドにインデックスされます。これにより、プレフィックス検索がより効率的に実行されますが、インデックスが大きくなります。
`index_phrases`	有効にすると、2語の組み合わせ（shingles）が別のフィールドにインデックスされます。これにより、正確なフレーズクエリ（スロップなし）がより効率的に実行されますが、インデックスが大きくなります。ストップワードが削除されない場合に最も効果的です。ストップワードを含むフレーズは補助フィールドを使用せず、標準のフレーズクエリに戻ります。`true`または`false`（デフォルト）を受け入れます。
`norms`	スコアリングクエリの際にフィールドの長さを考慮するべきか。`true`（デフォルト）または`false`を受け入れます。
`position_increment_gap`	文字列の配列の各要素の間に挿入されるべき偽の用語位置の数。デフォルトは`position_increment_gap`で、アナライザーで設定され、デフォルトは`100`です。`100`は、フレーズクエリが合理的に大きなスロップ（100未満）を持つ場合に、フィールド値をまたいで用語が一致するのを防ぐために選ばれました。
`store`	フィールド値は、`_source`フィールドから別に保存され、取得可能であるべきですか？`true`または`false`（デフォルト）を受け入れます。このパラメータは、`index.mode`が`time_series`に設定されているTSDBインデックス（`index.mode`が`time_series`に設定されているインデックス）に対して`true`に自動的に設定されます。`keyword`（491649a654160c3a.md#keyword-synthetic-source “Synthetic _source”）補助フィールドが合成`_source`をサポートしていない場合。
`search_analyzer`	`text`フィールドで検索時に使用されるべき`analyzer`. デフォルトは`analyzer`設定です。
`search_quote_analyzer`	フレーズが遭遇したときに検索時に使用されるべき`analyzer`. デフォルトは`search_analyzer`設定です。
`similarity`	使用するスコアリングアルゴリズムまたは類似性。デフォルトは`BM25`です。
`term_vector`	フィールドのために用語ベクターが保存されるべきか。デフォルトは`no`です。
`meta`	フィールドに関するメタデータ。

合成 _source

合成_sourceは、一般的にTSDBインデックス（index.modeがtime_seriesに設定されているインデックス）でのみ利用可能です。他のインデックスでは、合成_sourceは技術プレビュー中です。技術プレビューの機能は、将来のリリースで変更または削除される可能性があります。Elasticは問題を修正するために作業しますが、技術プレビューの機能は公式GA機能のサポートSLAの対象ではありません。


サブ-`````keyword`````フィールドを使用する場合、値は`````keyword`````フィールドの値がソートされるのと同じ方法でソートされます。デフォルトでは、重複を削除してソートされます。したがって：
#### Python
``````python
resp = client.indices.create(
   index="idx",
   mappings={
   "_source": {
   "mode": "synthetic"
   },
   "properties": {
   "text": {
   "type": "text",
   "fields": {
   "raw": {
   "type": "keyword"
   }
   }
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="idx",
   id="1",
   document={
   "text": [
   "the quick brown fox",
   "the quick brown fox",
   "jumped over the lazy dog"
   ]
   },
)
print(resp1)
`

Ruby

response = client.indices.create(
  index: 'idx',
  body: {
   mappings: {
   _source: {
   mode: 'synthetic'
   },
   properties: {
   text: {
   type: 'text',
   fields: {
   raw: {
   type: 'keyword'
   }
   }
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'idx',
  id: 1,
  body: {
   text: [
   'the quick brown fox',
   'the quick brown fox',
   'jumped over the lazy dog'
   ]
  }
)
puts response

Js

const response = await client.indices.create({
  index: "idx",
  mappings: {
   _source: {
   mode: "synthetic",
   },
   properties: {
   text: {
   type: "text",
   fields: {
   raw: {
   type: "keyword",
   },
   },
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
   text: [
   "the quick brown fox",
   "the quick brown fox",
   "jumped over the lazy dog",
   ],
  },
});
console.log(response1);

Console

PUT idx
{
  "mappings": {
   "_source": { "mode": "synthetic" },
   "properties": {
   "text": {
   "type": "text",
   "fields": {
   "raw": {
   "type": "keyword"
   }
   }
   }
   }
  }
}
PUT idx/_doc/1
{
  "text": [
   "the quick brown fox",
   "the quick brown fox",
   "jumped over the lazy dog"
  ]
}

次のようになります：

Console-Result

{
  "text": [
   "jumped over the lazy dog",
   "the quick brown fox"
  ]
}

テキストフィールドの順序を変更すると、phraseおよびspanクエリに影響を与える可能性があります。詳細については、position_increment_gapに関する議論を参照してください。フレーズクエリのslopパラメータがposition_increment_gapよりも低いことを確認することで、これを回避できます。これはデフォルトです。


#### Python
``````python
resp = client.indices.create(
   index="idx",
   mappings={
   "_source": {
   "mode": "synthetic"
   },
   "properties": {
   "text": {
   "type": "text",
   "store": True
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="idx",
   id="1",
   document={
   "text": [
   "the quick brown fox",
   "the quick brown fox",
   "jumped over the lazy dog"
   ]
   },
)
print(resp1)
`

Ruby

response = client.indices.create(
  index: 'idx',
  body: {
   mappings: {
   _source: {
   mode: 'synthetic'
   },
   properties: {
   text: {
   type: 'text',
   store: true
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'idx',
  id: 1,
  body: {
   text: [
   'the quick brown fox',
   'the quick brown fox',
   'jumped over the lazy dog'
   ]
  }
)
puts response

Js

const response = await client.indices.create({
  index: "idx",
  mappings: {
   _source: {
   mode: "synthetic",
   },
   properties: {
   text: {
   type: "text",
   store: true,
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
   text: [
   "the quick brown fox",
   "the quick brown fox",
   "jumped over the lazy dog",
   ],
  },
});
console.log(response1);

Console

PUT idx
{
  "mappings": {
   "_source": { "mode": "synthetic" },
   "properties": {
   "text": { "type": "text", "store": true }
   }
  }
}
PUT idx/_doc/1
{
  "text": [
   "the quick brown fox",
   "the quick brown fox",
   "jumped over the lazy dog"
  ]
}

次のようになります：

Console-Result

{
  "text": [
   "the quick brown fox",
   "the quick brown fox",
   "jumped over the lazy dog"
  ]
}

フィールドデータマッピングパラメータ


フィールドデータをメモリにロードすると、かなりのメモリを消費する可能性があります。  
フィールドデータは、集約、ソート、またはスクリプトでフルテキストフィールドから分析されたトークンにアクセスする唯一の方法です。たとえば、`````New York`````のようなフルテキストフィールドは`````new`````および`````york`````として分析されます。これらのトークンで集約するには、フィールドデータが必要です。
## フィールドデータを有効にする前に
通常、テキストフィールドでフィールドデータを有効にすることは意味がありません。フィールドデータは、計算が高価であるため、[フィールドデータキャッシュ](/read/elasticsearch-8-15/e814025c9e229ec0.md)とともにヒープに保存されます。フィールドデータの計算はレイテンシスパイクを引き起こす可能性があり、ヒープ使用量の増加はクラスターのパフォーマンス問題の原因となります。  
テキストフィールドでより多くのことを行いたいユーザーは、[`````text`````]フィールドをフルテキスト検索用に持ち、集約用に未分析の[`````keyword`````](/read/elasticsearch-8-15/491649a654160c3a.md)フィールドを持つ[multi-field mappings](/read/elasticsearch-8-15/e6000d5e836ad5dd.md)を使用することが多いです。次のように：
#### Python
``````python
resp = client.indices.create(
   index="my-index-000001",
   mappings={
   "properties": {
   "my_field": {
   "type": "text",
   "fields": {
   "keyword": {
   "type": "keyword"
   }
   }
   }
   }
   },
)
print(resp)
`

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   mappings: {
   properties: {
   my_field: {
   type: 'text',
   fields: {
   keyword: {
   type: 'keyword'
   }
   }
   }
   }
   }
  }
)
puts response

Go

res, err := es.Indices.Create(
    "my-index-000001",
    es.Indices.Create.WithBody(strings.NewReader(`{
      "mappings": {
      "properties": {
      "my_field": {
      "type": "text",
      "fields": {
      "keyword": {
      "type": "keyword"
      }
      }
      }
      }
      }
    }`)),
)
fmt.Println(res, err)

Js

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
   properties: {
   my_field: {
   type: "text",
   fields: {
   keyword: {
   type: "keyword",
   },
   },
   },
   },
  },
});
console.log(response);

Console

PUT my-index-000001
{
  "mappings": {
   "properties": {
   "my_field": {
   "type": "text",
   "fields": {
   "keyword": {
   "type": "keyword"
   }
   }
   }
   }
  }
}


	`my_field`フィールドを検索に使用します。
	`my_field.keyword`フィールドを集約、ソート、またはスクリプトで使用します。

テキストフィールドでのフィールドデータの有効化

既存のtextフィールドでフィールドデータを有効にするには、update mapping APIを次のように使用します：

Python

resp = client.indices.put_mapping(
   index="my-index-000001",
   properties={
   "my_field": {
   "type": "text",
   "fielddata": True
   }
   },
)
print(resp)

Ruby

response = client.indices.put_mapping(
  index: 'my-index-000001',
  body: {
   properties: {
   my_field: {
   type: 'text',
   fielddata: true
   }
   }
  }
)
puts response

Go

res, err := es.Indices.PutMapping(
    []string{"my-index-000001"},
    strings.NewReader(`{
      "properties": {
      "my_field": {
      "type": "text",
      "fielddata": true
      }
      }
    }`),
)
fmt.Println(res, err)

Js

const response = await client.indices.putMapping({
  index: "my-index-000001",
  properties: {
   my_field: {
   type: "text",
   fielddata: true,
   },
  },
});
console.log(response);

Console

PUT my-index-000001/_mapping
{
  "properties": {
   "my_field": {
   "type":     "text",
   "fielddata": true
   }
  }
}


	`my_field`に指定するマッピングは、そのフィールドの既存のマッピングと`fielddata`パラメータを含むべきです。

フィールドデータ頻度フィルターマッピングパラメータ

フィールドデータフィルタリングは、メモリにロードされる用語の数を減らし、したがってメモリ使用量を減らすために使用できます。用語は頻度によってフィルタリングできます：

頻度フィルターは、ドキュメント頻度がminとmaxの間にある用語のみをロードすることを許可します。これは、絶対数（数が1.0より大きい場合）またはパーセンテージ（例：0.01は1%で、1.0は100%）として表現できます。頻度はセグメントごとに計算されます。パーセンテージは、フィールドに値を持つドキュメントの数に基づいており、セグメント内のすべてのドキュメントに基づいているわけではありません。

小さなセグメントは、min_segment_sizeでセグメントが含むべき最小ドキュメント数を指定することによって完全に除外できます：

Python

resp = client.indices.create(
   index="my-index-000001",
   mappings={
   "properties": {
   "tag": {
   "type": "text",
   "fielddata": True,
   "fielddata_frequency_filter": {
   "min": 0.001,
   "max": 0.1,
   "min_segment_size": 500
   }
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   mappings: {
   properties: {
   tag: {
   type: 'text',
   fielddata: true,
   fielddata_frequency_filter: {
   min: 0.001,
   max: 0.1,
   min_segment_size: 500
   }
   }
   }
   }
  }
)
puts response

Go

res, err := es.Indices.Create(
    "my-index-000001",
    es.Indices.Create.WithBody(strings.NewReader(`{
      "mappings": {
      "properties": {
      "tag": {
      "type": "text",
      "fielddata": true,
      "fielddata_frequency_filter": {
      "min": 0.001,
      "max": 0.1,
      "min_segment_size": 500
      }
      }
      }
      }
    }`)),
)
fmt.Println(res, err)

Js

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
   properties: {
   tag: {
   type: "text",
   fielddata: true,
   fielddata_frequency_filter: {
   min: 0.001,
   max: 0.1,
   min_segment_size: 500,
   },
   },
   },
  },
});
console.log(response);

Console

PUT my-index-000001
{
  "mappings": {
   "properties": {
   "tag": {
   "type": "text",
   "fielddata": true,
   "fielddata_frequency_filter": {
   "min": 0.001,
   "max": 0.1,
   "min_segment_size": 500
   }
   }
   }
  }
}

マッチ専用テキストフィールドタイプ

textのバリアントで、スコアリングと位置クエリの効率をスペース効率と交換します。このフィールドは、textフィールドと同じ方法でデータを効果的に保存し、ドキュメントをインデックスするだけで（index_options: docs）ノルムを無効にします（norms: false）。用語クエリはtextフィールドと同じくらい速く、場合によってはそれよりも速く実行されますが、match_phraseクエリのように位置が必要なクエリは、フレーズが一致するかどうかを確認するために_sourceドキュメントを確認する必要があるため、遅くなります。すべてのクエリは、1.0に等しい定数スコアを返します。

分析は構成可能ではありません：テキストは常にデフォルトアナライザー（standardがデフォルト）で分析されます。

span queriesはこのフィールドではサポートされていません。代わりにinterval queriesを使用するか、絶対にspan queriesが必要な場合はtextフィールドタイプを使用してください。

それ以外は、match_only_textはtextと同じクエリをサポートします。そして、textのように、ソートをサポートせず、集約には限られたサポートしかありません。

Python

resp = client.indices.create(
   index="logs",
   mappings={
   "properties": {
   "@timestamp": {
   "type": "date"
   },
   "message": {
   "type": "match_only_text"
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'logs',
  body: {
   mappings: {
   properties: {
   "@timestamp": {
   type: 'date'
   },
   message: {
   type: 'match_only_text'
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "logs",
  mappings: {
   properties: {
   "@timestamp": {
   type: "date",
   },
   message: {
   type: "match_only_text",
   },
   },
  },
});
console.log(response);

Console

PUT logs
{
  "mappings": {
   "properties": {
   "@timestamp": {
   "type": "date"
   },
   "message": {
   "type": "match_only_text"
   }
   }
  }
}

マッチ専用テキストフィールドのパラメータ

以下のマッピングパラメータが受け入れられます：


`fields`	マルチフィールドは、異なる目的のために同じ文字列値を複数の方法でインデックスすることを可能にします。たとえば、検索用のフィールドと、ソートおよび集約用のマルチフィールド、または異なるアナライザーで分析された同じ文字列値です。
`meta`	フィールドに関するメタデータ。