一般的なクラスターの問題を修正する - 赤または黄色のクラスター健康状態（Red or yellow cluster health status）

赤または黄色のクラスターの健康状態
クラスターの状態を診断する
赤または黄色のクラスターの状態を修正する

赤または黄色のクラスターの健康状態

赤または黄色のクラスターの健康状態は、1つ以上のシャードがノードに割り当てられていないことを示します。

赤の健康状態: クラスターにはいくつかの未割り当てのプライマリシャードがあり、これにより検索やインデックス作成などの操作が失敗する可能性があります。
黄色の健康状態: クラスターには未割り当てのプライマリシャードはありませんが、いくつかの未割り当てのレプリカシャードがあります。これによりデータ損失のリスクが高まり、クラスターのパフォーマンスが低下する可能性があります。

クラスターが赤または黄色の健康状態のとき、可能な限り検索やインデックス作成を続行しますが、クラスターが緑の健康状態に戻るまで、特定の管理およびクリーンアップ活動が遅れる場合があります。たとえば、いくつかのILMアクションは、操作対象のインデックスが緑の健康状態であることを要求します。

多くの場合、クラスターは自動的に緑の健康状態に回復します。クラスターが自動的に回復しない場合は、残りの問題を手動で対処する必要があります。そうしないと、管理およびクリーンアップ活動を進めることができません。

クラスターの状態を診断する

クラスターの状態を確認する

クラスター健康APIを使用します。

Python

resp = client.cluster.health(
   filter_path="status,*_shards",
)
print(resp)

Ruby

response = client.cluster.health(
  filter_path: 'status,*_shards'
)
puts response

Js

const response = await client.cluster.health({
  filter_path: "status,*_shards",
});
console.log(response);

コンソール

GET _cluster/health?filter_path=status,*_shards

健康なクラスターは緑のstatusとゼロのunassigned_shardsを持っています。黄色の状態は、レプリカのみが未割り当てであることを意味します。赤の状態は、1つ以上のプライマリシャードが未割り当てであることを意味します。

未割り当てのシャードを表示する

未割り当てのシャードを表示するには、cat shards APIを使用します。

Python

resp = client.cat.shards(
   v=True,
   h="index,shard,prirep,state,node,unassigned.reason",
   s="state",
)
print(resp)

Ruby

response = client.cat.shards(
  v: true,
  h: 'index,shard,prirep,state,node,unassigned.reason',
  s: 'state'
)
puts response

Js

const response = await client.cat.shards({
  v: "true",
  h: "index,shard,prirep,state,node,unassigned.reason",
  s: "state",
});
console.log(response);

コンソール

GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state

未割り当てのシャードはstateのUNASSIGNEDを持っています。prirepの値はプライマリシャードに対してp、レプリカに対してrです。

未割り当てのシャードが割り当てられない理由と、Elasticsearchがそれを割り当てるために取るべきアクションを理解するには、クラスター割り当て説明APIを使用します。

Python

resp = client.cluster.allocation_explain(
   filter_path="index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*",
   index="my-index",
   shard=0,
   primary=False,
)
print(resp)

Ruby

response = client.cluster.allocation_explain(
  filter_path: 'index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*',
  body: {
   index: 'my-index',
   shard: 0,
   primary: false
  }
)
puts response

Js

const response = await client.cluster.allocationExplain({
  filter_path:
   "index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*",
  index: "my-index",
  shard: 0,
  primary: false,
});
console.log(response);

コンソール

GET _cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*
{
  "index": "my-index",
  "shard": 0,
  "primary": false
}

赤または黄色のクラスターの状態を修正する

シャードが未割り当てになる理由はいくつかあります。以下のヒントは、最も一般的な原因とその解決策を示しています。

シャード割り当てを再有効化する

通常、再起動やその他のクラスターのメンテナンス中に割り当てを無効にします。その後、割り当てを再有効化するのを忘れた場合、Elasticsearchはシャードを割り当てることができません。割り当てを再有効化するには、cluster.routing.allocation.enableクラスター設定をリセットします。

Python

resp = client.cluster.put_settings(
   persistent={
   "cluster.routing.allocation.enable": None
   },
)
print(resp)

Ruby

response = client.cluster.put_settings(
  body: {
   persistent: {
   'cluster.routing.allocation.enable' => nil
   }
  }
)
puts response

Js

const response = await client.cluster.putSettings({
  persistent: {
   "cluster.routing.allocation.enable": null,
  },
});
console.log(response);

コンソール

PUT _cluster/settings
{
  "persistent" : {
   "cluster.routing.allocation.enable" : null
  }
}

失われたノードを回復する

シャードは、データノードがクラスターを離れるときに未割り当てになることがよくあります。これは、接続の問題からハードウェアの故障まで、いくつかの理由で発生する可能性があります。問題を解決してノードを回復すると、ノードはクラスターに再参加します。その後、Elasticsearchは未割り当てのシャードを自動的に割り当てます。

一時的な問題にリソースを無駄にしないために、Elasticsearchはデフォルトで1分間割り当てを遅延させます。ノードを回復し、遅延期間を待ちたくない場合は、引数なしでクラスター再ルートAPIを呼び出して割り当てプロセスを開始できます。このプロセスはバックグラウンドで非同期に実行されます。

Python

resp = client.cluster.reroute(
   metric="none",
)
print(resp)

Ruby

response = client.cluster.reroute(
  metric: 'none'
)
puts response

Js

const response = await client.cluster.reroute({
  metric: "none",
});
console.log(response);

コンソール

POST _cluster/reroute?metric=none

割り当て設定を修正する

誤って構成された割り当て設定は、未割り当てのプライマリシャードを引き起こす可能性があります。これらの設定には次のものが含まれます:

シャード割り当てインデックス設定
割り当てフィルタリングクラスター設定
割り当て認識クラスター設定

割り当て設定を確認するには、インデックス設定を取得およびクラスター設定を取得APIを使用します。

Python

resp = client.indices.get_settings(
   index="my-index",
   flat_settings=True,
   include_defaults=True,
)
print(resp)
resp1 = client.cluster.get_settings(
   flat_settings=True,
   include_defaults=True,
)
print(resp1)

Ruby

response = client.indices.get_settings(
  index: 'my-index',
  flat_settings: true,
  include_defaults: true
)
puts response
response = client.cluster.get_settings(
  flat_settings: true,
  include_defaults: true
)
puts response

Js

const response = await client.indices.getSettings({
  index: "my-index",
  flat_settings: "true",
  include_defaults: "true",
});
console.log(response);
const response1 = await client.cluster.getSettings({
  flat_settings: "true",
  include_defaults: "true",
});
console.log(response1);

コンソール

GET my-index/_settings?flat_settings=true&include_defaults=true
GET _cluster/settings?flat_settings=true&include_defaults=true

設定を変更するには、インデックス設定を更新およびクラスター設定を更新APIを使用できます。

レプリカを割り当てるか減らす

ハードウェアの故障から保護するために、Elasticsearchはレプリカをそのプライマリシャードと同じノードに割り当てません。他にレプリカをホストできるデータノードがない場合、レプリカは未割り当てのままになります。これを修正するには、次のことができます:

レプリカをホストするために同じティアにデータノードを追加します。
index.number_of_replicasインデックス設定を変更して、各プライマリシャードのレプリカ数を減らします。プライマリごとに少なくとも1つのレプリカを保持することをお勧めします。

Python

resp = client.indices.put_settings(
   settings={
   "index.number_of_replicas": 1
   },
)
print(resp)

Ruby

response = client.indices.put_settings(
  body: {
   'index.number_of_replicas' => 1
  }
)
puts response

Js

const response = await client.indices.putSettings({
  settings: {
   "index.number_of_replicas": 1,
  },
});
console.log(response);

コンソール

PUT _settings
{
  "index.number_of_replicas": 1
}

ディスクスペースを解放または増加させる

Elasticsearchは、受信シャードのためにデータノードに十分なディスクスペースがあることを保証するために、低ディスクウォーターマークを使用します。デフォルトでは、Elasticsearchはディスクスペースの85％以上を使用しているノードにシャードを割り当てません。

ノードの現在のディスクスペースを確認するには、cat allocation APIを使用します。

Python

resp = client.cat.allocation(
   v=True,
   h="node,shards,disk.*",
)
print(resp)

Ruby

response = client.cat.allocation(
  v: true,
  h: 'node,shards,disk.*'
)
puts response

Js

const response = await client.cat.allocation({
  v: "true",
  h: "node,shards,disk.*",
});
console.log(response);

コンソール

GET _cat/allocation?v=true&h=node,shards,disk.*

ノードのディスクスペースが不足している場合、いくつかのオプションがあります:

ディスクスペースを増やすためにノードをアップグレードします。
不要なインデックスを削除してスペースを解放します。ILMを使用している場合は、ライフサイクルポリシーを更新して検索可能なスナップショットを使用するか、削除フェーズを追加できます。データを検索する必要がなくなった場合は、スナップショットを使用してクラスター外に保存できます。
インデックスへの書き込みを行わなくなった場合は、強制マージAPIまたはILMの強制マージアクションを使用して、そのセグメントをより大きなものにマージします。

Python

resp = client.indices.forcemerge(
   index="my-index",
  )
  print(resp)

Ruby

response = client.indices.forcemerge(
   index: 'my-index'
  )
  puts response

Js

const response = await client.indices.forcemerge({
   index: "my-index",
  });
  console.log(response);

コンソール

POST my-index/_forcemerge

インデックスが読み取り専用の場合、インデックス縮小APIまたはILMの縮小アクションを使用して、そのプライマリシャードの数を減らします。

Python

resp = client.indices.shrink(
   index="my-index",
   target="my-shrunken-index",
  )
  print(resp)

Ruby

response = client.indices.shrink(
   index: 'my-index',
   target: 'my-shrunken-index'
  )
  puts response

Js

const response = await client.indices.shrink({
   index: "my-index",
   target: "my-shrunken-index",
  });
  console.log(response);

コンソール

POST my-index/_shrink/my-shrunken-index

ノードに大きなディスク容量がある場合、低ディスクウォーターマークを増やすか、明示的なバイト値に設定できます。

Python

resp = client.cluster.put_settings(
   persistent={
   "cluster.routing.allocation.disk.watermark.low": "30gb"
   },
  )
  print(resp)

Js

const response = await client.cluster.putSettings({
   persistent: {
   "cluster.routing.allocation.disk.watermark.low": "30gb",
   },
  });
  console.log(response);

コンソール

PUT _cluster/settings
  {
   "persistent": {
   "cluster.routing.allocation.disk.watermark.low": "30gb"
   }
  }

JVMメモリ圧力を軽減する

シャードの割り当てにはJVMヒープメモリが必要です。高いJVMメモリ圧力は、割り当てを停止し、シャードを未割り当てのままにするcircuit breakersを引き起こす可能性があります。高いJVMメモリ圧力を参照してください。

失われたプライマリシャードのデータを回復する

プライマリシャードを含むノードが失われた場合、Elasticsearchは通常、別のノードのレプリカを使用してそれを置き換えることができます。ノードを回復できず、レプリカが存在しないか、回復不可能な場合、割り当て説明はno_valid_shard_copyを報告し、次のいずれかを行う必要があります:

スナップショットから欠落したデータを復元する
元のデータソースから欠落したデータをインデックスする
インデックス削除を実行してインデックスレベルでデータ損失を受け入れる
クラスター再ルートを実行して、accept_data_loss: trueでallocate_stale_primaryまたはallocate_empty_primaryコマンドを実行してシャードレベルでデータ損失を受け入れる
ノードの回復がもはや不可能な場合のみ、このオプションを使用してください。このプロセスは空のプライマリシャードを割り当てます。ノードが後でクラスターに再参加すると、Elasticsearchはこの新しい空のシャードからデータでプライマリシャードを上書きし、データ損失が発生します。

Python

resp = client.cluster.reroute(
   metric="none",
   commands=[
   {
   "allocate_empty_primary": {
   "index": "my-index",
   "shard": 0,
   "node": "my-node",
   "accept_data_loss": "true"
   }
   }
   ],
  )
  print(resp)

Js

const response = await client.cluster.reroute({
   metric: "none",
   commands: [
   {
   allocate_empty_primary: {
   index: "my-index",
   shard: 0,
   node: "my-node",
   accept_data_loss: "true",
   },
   },
   ],
  });
  console.log(response);

コンソール

POST _cluster/reroute?metric=none
  {
   "commands": [
   {
   "allocate_empty_primary": {
   "index": "my-index",
   "shard": 0,
   "node": "my-node",
   "accept_data_loss": "true"
   }
   }
   ]
  }