フィールドデータ型 - キーワード（Keyword） - 《Elasticsearchガイドv8.15》日本語

キーワードタイプファミリー
キーワードフィールドタイプ
数値識別子のマッピング
- 基本キーワードフィールドのパラメータ
合成 _source
定数キーワードフィールドタイプ
定数キーワードフィールドのパラメータ
ワイルドカードフィールドタイプ
ワイルドカードフィールドのパラメータ
制限
合成 _source

キーワードタイプファミリー

キーワードファミリーには以下のフィールドタイプが含まれます：

keyword は、ID、メールアドレス、ホスト名、ステータスコード、郵便番号、またはタグなどの構造化されたコンテンツに使用されます。
constant_keyword は、常に同じ値を含むキーワードフィールド用です。
wildcard は、非構造化の機械生成コンテンツ用です。wildcardタイプは、大きな値や高いカーディナリティを持つフィールドに最適化されています。

キーワードフィールドは、ソート、集約、およびタームレベルのクエリでよく使用されます。例えば、termのようなものです。

フルテキスト検索にはキーワードフィールドを使用しないでください。代わりに、textフィールドタイプを使用してください。

キーワードフィールドタイプ

以下は基本的なkeywordフィールドのマッピングの例です：

Python

resp = client.indices.create(
   index="my-index-000001",
   mappings={
   "properties": {
   "tags": {
   "type": "keyword"
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   mappings: {
   properties: {
   tags: {
   type: 'keyword'
   }
   }
   }
  }
)
puts response

Go

res, err := es.Indices.Create(
    "my-index-000001",
    es.Indices.Create.WithBody(strings.NewReader(`{
      "mappings": {
      "properties": {
      "tags": {
      "type": "keyword"
      }
      }
      }
    }`)),
)
fmt.Println(res, err)

Js

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
   properties: {
   tags: {
   type: "keyword",
   },
   },
  },
});
console.log(response);

コンソール

PUT my-index-000001
{
  "mappings": {
   "properties": {
   "tags": {
   "type":  "keyword"
   }
   }
  }
}

数値識別子のマッピング

すべての数値データをnumericフィールドデータタイプとしてマッピングする必要はありません。Elasticsearchは、integerやlongのような数値フィールドをrangeクエリ用に最適化します。しかし、keywordフィールドは、termや他のタームレベルクエリに適しています。

ISBNや製品IDなどの識別子は、rangeクエリではほとんど使用されません。しかし、タームレベルのクエリを使用して頻繁に取得されます。

次の条件を満たす場合、数値識別子をkeywordとしてマッピングすることを検討してください：

rangeクエリを使用して識別子データを検索する予定がない場合。
高速な取得が重要です。termクエリは、keywordフィールドでの検索がterm数値フィールドでの検索よりも速いことがよくあります。

どちらを使用するか不明な場合は、multi-fieldを使用してデータをkeyword および 数値データタイプとしてマッピングできます。

基本キーワードフィールドのパラメータ

以下のパラメータはkeywordフィールドで受け入れられます：

doc_values
フィールドは、ソート、集約、またはスクリプトに使用できるように、ディスクにカラムストライド方式で保存されるべきですか？true（デフォルト）またはfalseを受け入れます。
eager_global_ordinals
グローバルオーディナルは、リフレッシュ時に早期にロードされるべきですか？trueまたはfalse（デフォルト）を受け入れます。これを有効にすることは、ターム集約に頻繁に使用されるフィールドでは良い考えです。
fields
マルチフィールドは、異なる目的のために同じ文字列値を複数の方法でインデックス化することを可能にします。例えば、検索用の1つのフィールドと、ソートおよび集約用のマルチフィールドなどです。
ignore_above
この値よりも長い文字列はインデックス化しません。デフォルトは2147483647で、すべての値が受け入れられます。ただし、デフォルトの動的マッピングルールは、keywordというサブフィールドを作成し、ignore_above: 256を設定することでこのデフォルトを上書きします。
index
フィールドは迅速に検索可能であるべきですか？true（デフォルト）およびfalseを受け入れます。keywordが有効なフィールドは、クエリ可能ですが、遅くなります。
index_options
スコア計算のためにインデックスに保存されるべき情報は何ですか。デフォルトはdocsですが、スコア計算時にターム頻度を考慮するためにfreqsに設定することもできます。
meta
フィールドに関するメタデータ。
norms
スコアリングクエリの際にフィールドの長さを考慮するべきか。trueまたはfalse（デフォルト）を受け入れます。
null_value
明示的なnull値の代わりに置き換えられる文字列値を受け入れます。デフォルトはnullで、フィールドは欠落していると見なされます。script値が使用されている場合、これは設定できません。
on_script_error
インデックス時にscriptパラメータによって定義されたスクリプトがエラーをスローした場合に何をするかを定義します。fail（デフォルト）を受け入れ、これによりドキュメント全体が拒否され、continueが、ドキュメントの_ignoredメタデータフィールドにフィールドを登録し、インデックスを続行します。このパラメータは、scriptフィールドも設定されている場合にのみ設定できます。
script
このパラメータが設定されている場合、フィールドはこのスクリプトによって生成された値をインデックス化し、ソースから直接値を読み取るのではなくなります。このフィールドに入力ドキュメントで値が設定されている場合、ドキュメントはエラーで拒否されます。スクリプトはそのランタイム相当物と同じ形式です。スクリプトによって出力された値は通常通り正規化され、ignore_aboveで設定された値よりも長い場合は無視されます。
store
フィールド値は、_sourceフィールドから別々に保存および取得されるべきですか。trueまたはfalse（デフォルト）を受け入れます。
similarity
使用するスコアリングアルゴリズムまたは類似性。デフォルトはBM25です。
normalizer
インデックス化前にキーワードを前処理する方法。デフォルトはnullで、キーワードはそのまま保持されます。
split_queries_on_whitespace
フルテキストクエリがこのフィールドのクエリを構築する際に、入力を空白で分割するべきか。trueまたはfalse（デフォルト）を受け入れます。
time_series_dimension
（オプション、ブール値）
フィールドを時系列次元としてマークします。デフォルトはfalseです。
index.mapping.dimension_fields.limitインデックス設定は、インデックス内の次元の数を制限します。
次元フィールドには以下の制約があります：
- doc_valuesおよびindexマッピングパラメータはtrueでなければなりません。
- フィールド値は配列またはマルチバリューであってはなりません。
- 次元値は、ドキュメントの時系列を識別するために使用されます。インデックス時に次元値が何らかの方法で変更されると、ドキュメントは意図した時系列とは異なるものとして保存されます。その結果、追加の制約があります：
  - フィールドはnormalizerを使用できません。

合成 _source

合成_sourceは、一般的にTSDBインデックス（index.modeがtime_seriesに設定されているインデックス）のみで利用可能です。他のインデックスでは、合成_sourceは技術プレビュー中です。技術プレビュー中の機能は、将来のリリースで変更または削除される可能性があります。Elasticは問題を修正するために作業しますが、技術プレビュー中の機能は公式GA機能のサポートSLAの対象ではありません。


デフォルトでは、合成ソースは`````keyword`````フィールドをソートし、重複を削除します。例えば：
#### Python
``````python
resp = client.indices.create(
   index="idx",
   mappings={
   "_source": {
   "mode": "synthetic"
   },
   "properties": {
   "kwd": {
   "type": "keyword"
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="idx",
   id="1",
   document={
   "kwd": [
   "foo",
   "foo",
   "bar",
   "baz"
   ]
   },
)
print(resp1)
`

Ruby

response = client.indices.create(
  index: 'idx',
  body: {
   mappings: {
   _source: {
   mode: 'synthetic'
   },
   properties: {
   kwd: {
   type: 'keyword'
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'idx',
  id: 1,
  body: {
   kwd: [
   'foo',
   'foo',
   'bar',
   'baz'
   ]
  }
)
puts response

Js

const response = await client.indices.create({
  index: "idx",
  mappings: {
   _source: {
   mode: "synthetic",
   },
   properties: {
   kwd: {
   type: "keyword",
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
   kwd: ["foo", "foo", "bar", "baz"],
  },
});
console.log(response1);

コンソール

PUT idx
{
  "mappings": {
   "_source": { "mode": "synthetic" },
   "properties": {
   "kwd": { "type": "keyword" }
   }
  }
}
PUT idx/_doc/1
{
  "kwd": ["foo", "foo", "bar", "baz"]
}

次のようになります：

コンソール-結果

{
  "kwd": ["bar", "baz", "foo"]
}

もしkeywordフィールドがstoreをtrueに設定している場合、順序と重複は保持されます。例えば：

Python

resp = client.indices.create(
   index="idx",
   mappings={
   "_source": {
   "mode": "synthetic"
   },
   "properties": {
   "kwd": {
   "type": "keyword",
   "store": True
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="idx",
   id="1",
   document={
   "kwd": [
   "foo",
   "foo",
   "bar",
   "baz"
   ]
   },
)
print(resp1)

Ruby

response = client.indices.create(
  index: 'idx',
  body: {
   mappings: {
   _source: {
   mode: 'synthetic'
   },
   properties: {
   kwd: {
   type: 'keyword',
   store: true
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'idx',
  id: 1,
  body: {
   kwd: [
   'foo',
   'foo',
   'bar',
   'baz'
   ]
  }
)
puts response

Js

const response = await client.indices.create({
  index: "idx",
  mappings: {
   _source: {
   mode: "synthetic",
   },
   properties: {
   kwd: {
   type: "keyword",
   store: true,
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
   kwd: ["foo", "foo", "bar", "baz"],
  },
});
console.log(response1);

コンソール

PUT idx
{
  "mappings": {
   "_source": { "mode": "synthetic" },
   "properties": {
   "kwd": { "type": "keyword", "store": true }
   }
  }
}
PUT idx/_doc/1
{
  "kwd": ["foo", "foo", "bar", "baz"]
}

次のようになります：

コンソール-結果

{
  "kwd": ["foo", "foo", "bar", "baz"]
}


#### Python
``````python
resp = client.indices.create(
   index="idx",
   mappings={
   "_source": {
   "mode": "synthetic"
   },
   "properties": {
   "kwd": {
   "type": "keyword",
   "ignore_above": 3
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="idx",
   id="1",
   document={
   "kwd": [
   "foo",
   "foo",
   "bang",
   "bar",
   "baz"
   ]
   },
)
print(resp1)
`

Ruby

response = client.indices.create(
  index: 'idx',
  body: {
   mappings: {
   _source: {
   mode: 'synthetic'
   },
   properties: {
   kwd: {
   type: 'keyword',
   ignore_above: 3
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'idx',
  id: 1,
  body: {
   kwd: [
   'foo',
   'foo',
   'bang',
   'bar',
   'baz'
   ]
  }
)
puts response

Js

const response = await client.indices.create({
  index: "idx",
  mappings: {
   _source: {
   mode: "synthetic",
   },
   properties: {
   kwd: {
   type: "keyword",
   ignore_above: 3,
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
   kwd: ["foo", "foo", "bang", "bar", "baz"],
  },
});
console.log(response1);

コンソール

PUT idx
{
  "mappings": {
   "_source": { "mode": "synthetic" },
   "properties": {
   "kwd": { "type": "keyword", "ignore_above": 3 }
   }
  }
}
PUT idx/_doc/1
{
  "kwd": ["foo", "foo", "bang", "bar", "baz"]
}

次のようになります：

コンソール-結果

{
  "kwd": ["bar", "baz", "foo", "bang"]
}

定数キーワードフィールドタイプ

定数キーワードは、インデックス内のすべてのドキュメントが同じ値を持つ場合のkeywordフィールドの特化型です。

Python

resp = client.indices.create(
   index="logs-debug",
   mappings={
   "properties": {
   "@timestamp": {
   "type": "date"
   },
   "message": {
   "type": "text"
   },
   "level": {
   "type": "constant_keyword",
   "value": "debug"
   }
   }
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'logs-debug',
  body: {
   mappings: {
   properties: {
   "@timestamp": {
   type: 'date'
   },
   message: {
   type: 'text'
   },
   level: {
   type: 'constant_keyword',
   value: 'debug'
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "logs-debug",
  mappings: {
   properties: {
   "@timestamp": {
   type: "date",
   },
   message: {
   type: "text",
   },
   level: {
   type: "constant_keyword",
   value: "debug",
   },
   },
  },
});
console.log(response);

コンソール

PUT logs-debug
{
  "mappings": {
   "properties": {
   "@timestamp": {
   "type": "date"
   },
   "message": {
   "type": "text"
   },
   "level": {
   "type": "constant_keyword",
   "value": "debug"
   }
   }
  }
}


フィールドに値がないドキュメントや、マッピングで設定された値と等しい値を持つドキュメントを送信することは許可されています。以下の2つのインデックスリクエストは同等です：
#### Python
``````python
resp = client.index(
   index="logs-debug",
   document={
   "date": "2019-12-12",
   "message": "Starting up Elasticsearch",
   "level": "debug"
   },
)
print(resp)
resp1 = client.index(
   index="logs-debug",
   document={
   "date": "2019-12-12",
   "message": "Starting up Elasticsearch"
   },
)
print(resp1)
`

Ruby

response = client.index(
  index: 'logs-debug',
  body: {
   date: '2019-12-12',
   message: 'Starting up Elasticsearch',
   level: 'debug'
  }
)
puts response
response = client.index(
  index: 'logs-debug',
  body: {
   date: '2019-12-12',
   message: 'Starting up Elasticsearch'
  }
)
puts response

Js

const response = await client.index({
  index: "logs-debug",
  document: {
   date: "2019-12-12",
   message: "Starting up Elasticsearch",
   level: "debug",
  },
});
console.log(response);
const response1 = await client.index({
  index: "logs-debug",
  document: {
   date: "2019-12-12",
   message: "Starting up Elasticsearch",
  },
});
console.log(response1);

コンソール

POST logs-debug/_doc
{
  "date": "2019-12-12",
  "message": "Starting up Elasticsearch",
  "level": "debug"
}
POST logs-debug/_doc
{
  "date": "2019-12-12",
  "message": "Starting up Elasticsearch"
}

ただし、マッピングで設定された値とは異なる値を提供することは許可されていません。

マッピングにvalueが提供されていない場合、フィールドは最初にインデックスされたドキュメントに含まれる値に基づいて自動的に構成されます。この動作は便利ですが、単一の有害なドキュメントが誤った値を持っている場合、他のすべてのドキュメントが拒否される可能性があることに注意してください。

値が提供される前（マッピングまたはドキュメントから）、フィールドに対するクエリはどのドキュメントにも一致しません。これには、existsクエリが含まれます。

フィールドのvalueは、一度設定されると変更できません。

定数キーワードフィールドのパラメータ

以下のマッピングパラメータが受け入れられます：


`meta`	フィールドに関するメタデータ。
`value`	インデックス内のすべてのドキュメントに関連付ける値。このパラメータが提供されない場合、最初にインデックスされるドキュメントに基づいて設定されます。

ワイルドカードフィールドタイプ


 [](#mapping-unstructured-content)   
**非構造化コンテンツのマッピング**  
非構造化コンテンツを含むフィールドを`````text`````またはキーワードファミリーフィールドにマッピングできます。最適なフィールドタイプは、コンテンツの性質とフィールドを検索する方法によって異なります。  
`````text`````フィールドタイプを使用する場合：  
-  コンテンツが人間に読みやすい場合（メール本文や製品説明など）。  
-  `````the
brown fox jumped`````のような個々の単語やフレーズを検索する予定がある場合、[フルテキストクエリ](/read/elasticsearch-8-15/7be69470b0c5db38.md)を使用します。Elasticsearchは、`````text`````フィールドを分析して、これらのクエリに対して最も関連性の高い結果を返します。
キーワードファミリーフィールドタイプを使用する場合：  
-  コンテンツが機械生成されたものである場合（ログメッセージやHTTPリクエスト情報など）。  
-  `````org.foo.bar`````のような正確なフル値や、[`````org.foo.*`````]のような部分的な文字列シーケンスを検索する予定がある場合、[タームレベルのクエリ](/read/elasticsearch-8-15/db690ef474a813e8.md)を使用します。
**キーワードファミリーフィールドタイプの選択**  
キーワードファミリーフィールドタイプを選択した場合、フィールドを`````keyword`````または`````wildcard`````フィールドとしてマッピングできます。これは、フィールドの値のカーディナリティとサイズによって異なります。`````wildcard`````タイプを使用する場合、[`````wildcard`````](/read/elasticsearch-8-15/883ff0514b763f15.md)または[`````regexp`````](/read/elasticsearch-8-15/18c0a8d633e2b8f9.md)クエリを使用してフィールドを定期的に検索する予定があり、次のいずれかの条件を満たす場合：  
-  フィールドに100万以上のユニークな値が含まれている。  
かつ  
リーディングワイルドカードを使用したパターンでフィールドを定期的に検索する予定がある（例：`````*foo`````または`````*baz`````）。  
-  フィールドに32KBを超える値が含まれている。  
かつ  
任意のワイルドカードパターンを使用してフィールドを定期的に検索する予定がある。
そうでなければ、`````keyword`````フィールドタイプを使用して、より高速な検索、より高速なインデックス化、より低いストレージコストを実現します。詳細な比較と意思決定フローチャートについては、[関連ブログ記事](https://www.elastic.co/blog/find-strings-within-strings-faster-with-the-new-elasticsearch-wildcard-field)を参照してください。  
**`````text`````フィールドからキーワードフィールドへの切り替え**  
以前に`````text`````フィールドを使用して非構造化の機械生成コンテンツをインデックス化していた場合、[`````keyword`````]または[`````wildcard`````]フィールドにマッピングを更新するために[再インデックス化](e0af58ddfb4e9b0b.md#update-mapping)を行うことができます。また、フィールドに対する単語ベースの[フルテキストクエリ](/read/elasticsearch-8-15/7be69470b0c5db38.md)を同等の[タームレベルのクエリ](/read/elasticsearch-8-15/db690ef474a813e8.md)に置き換えるために、アプリケーションやワークフローを更新することをお勧めします。  
内部的に、`````wildcard`````フィールドは、ngramを使用してフィールド全体の値をインデックス化し、完全な文字列を保存します。インデックスは、値の数を減らすための粗いフィルターとして使用され、その後、完全な値を取得して確認することによってチェックされます。このフィールドは、ログ行に対してgrepのようなクエリを実行するのに特に適しています。ストレージコストは通常、`````keyword`````フィールドよりも低いですが、完全なタームの正確な一致の検索速度は遅くなります。フィールド値が多くの接頭辞を共有している場合（同じウェブサイトのURLなど）、`````wildcard`````フィールドのストレージコストは、同等の`````keyword`````フィールドよりも高くなる可能性があります。  
ワイルドカードフィールドをインデックス化して検索する方法は次のとおりです。
#### Python
``````python
resp = client.indices.create(
   index="my-index-000001",
   mappings={
   "properties": {
   "my_wildcard": {
   "type": "wildcard"
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="my-index-000001",
   id="1",
   document={
   "my_wildcard": "This string can be quite lengthy"
   },
)
print(resp1)
resp2 = client.search(
   index="my-index-000001",
   query={
   "wildcard": {
   "my_wildcard": {
   "value": "*quite*lengthy"
   }
   }
   },
)
print(resp2)
`

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   mappings: {
   properties: {
   my_wildcard: {
   type: 'wildcard'
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
   my_wildcard: 'This string can be quite lengthy'
  }
)
puts response
response = client.search(
  index: 'my-index-000001',
  body: {
   query: {
   wildcard: {
   my_wildcard: {
   value: '*quite*lengthy'
   }
   }
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
   properties: {
   my_wildcard: {
   type: "wildcard",
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "my-index-000001",
  id: 1,
  document: {
   my_wildcard: "This string can be quite lengthy",
  },
});
console.log(response1);
const response2 = await client.search({
  index: "my-index-000001",
  query: {
   wildcard: {
   my_wildcard: {
   value: "*quite*lengthy",
   },
   },
  },
});
console.log(response2);

コンソール

PUT my-index-000001
{
  "mappings": {
   "properties": {
   "my_wildcard": {
   "type": "wildcard"
   }
   }
  }
}
PUT my-index-000001/_doc/1
{
  "my_wildcard" : "This string can be quite lengthy"
}
GET my-index-000001/_search
{
  "query": {
   "wildcard": {
   "my_wildcard": {
   "value": "*quite*lengthy"
   }
   }
  }
}

ワイルドカードフィールドのパラメータ

以下のパラメータはwildcardフィールドで受け入れられます：


`null_value`	明示的な`null`値の代わりに置き換えられる文字列値を受け入れます。デフォルトは`null`で、フィールドは欠落していると見なされます。
`ignore_above`	この値よりも長い文字列はインデックス化しません。デフォルトは`2147483647`で、すべての値が受け入れられます。

制限

wildcardフィールドはキーワードフィールドのようにトークン化されていないため、フレーズクエリなどの単語の位置に依存するクエリをサポートしていません。
wildcardクエリを実行する際、rewriteパラメータは無視されます。スコアは常に一定のスコアです。

合成 _source


合成ソースは常に`````wildcard`````フィールドをソートします。例えば：
#### Python
``````python
resp = client.indices.create(
   index="idx",
   mappings={
   "_source": {
   "mode": "synthetic"
   },
   "properties": {
   "card": {
   "type": "wildcard"
   }
   }
   },
)
print(resp)
resp1 = client.index(
   index="idx",
   id="1",
   document={
   "card": [
   "king",
   "ace",
   "ace",
   "jack"
   ]
   },
)
print(resp1)
`

Ruby

response = client.indices.create(
  index: 'idx',
  body: {
   mappings: {
   _source: {
   mode: 'synthetic'
   },
   properties: {
   card: {
   type: 'wildcard'
   }
   }
   }
  }
)
puts response
response = client.index(
  index: 'idx',
  id: 1,
  body: {
   card: [
   'king',
   'ace',
   'ace',
   'jack'
   ]
  }
)
puts response

Js

const response = await client.indices.create({
  index: "idx",
  mappings: {
   _source: {
   mode: "synthetic",
   },
   properties: {
   card: {
   type: "wildcard",
   },
   },
  },
});
console.log(response);
const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
   card: ["king", "ace", "ace", "jack"],
  },
});
console.log(response1);

コンソール

PUT idx
{
  "mappings": {
   "_source": { "mode": "synthetic" },
   "properties": {
   "card": { "type": "wildcard" }
   }
  }
}
PUT idx/_doc/1
{
  "card": ["king", "ace", "ace", "jack"]
}

次のようになります：

コンソール-結果

{
  "card": ["ace", "jack", "king"]
}