一般的なクラスターの問題を修正する - タスクキューのバックログ（Task queue backlog）

タスクキューのバックログ
タスクキューのバックログを診断する
- Python
- Ruby
- Js
- Console
- Ruby
- Js
- Console
- Python
- Js
- Console
- Python
- Js
- Console
- Python
- Js
- Console
- Python
- Js
- Console
タスクキューのバックログを解決する

タスクキューのバックログ

バックログのあるタスクキューは、タスクの完了を妨げ、クラスターを不健康な状態にする可能性があります。リソースの制約、一度にトリガーされるタスクの数が多いこと、長時間実行されるタスクがすべて、バックログのあるタスクキューに寄与する可能性があります。

タスクキューのバックログを診断する

スレッドプールの状態を確認する

枯渇したスレッドプールは、拒否されたリクエストを引き起こす可能性があります。

スレッドプールの枯渇は、特定のdata tierに制限される場合があります。ホットスポッティングが発生している場合、あるノードは他のノードよりも早く枯渇し、パフォーマンスの問題やタスクのバックログの増加を引き起こす可能性があります。

cat thread pool APIを使用して、各スレッドプールのアクティブなスレッドの数、キューにあるタスクの数、拒否されたタスクの数、完了したタスクの数を確認できます。

Python

resp = client.cat.thread_pool(
   v=True,
   s="t,n",
   h="type,name,node_name,active,queue,rejected,completed",
)
print(resp)

Ruby

response = client.cat.thread_pool(
  v: true,
  s: 't,n',
  h: 'type,name,node_name,active,queue,rejected,completed'
)
puts response

Js

const response = await client.cat.threadPool({
  v: "true",
  s: "t,n",
  h: "type,name,node_name,active,queue,rejected,completed",
});
console.log(response);

Console

GET /_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed


**各ノードのホットスレッドを検査する**  
特定のスレッドプールキューがバックアップされている場合、[Nodes hot threads](/read/elasticsearch-8-15/7640099fe2267892.md) APIを定期的にポーリングして、スレッドが進行するのに十分なリソースを持っているかどうかを判断し、どれだけ早く進行しているかを測定できます。
#### Python
``````python
resp = client.nodes.hot_threads()
print(resp)
`

Ruby

response = client.nodes.hot_threads
puts response

Js

const response = await client.nodes.hotThreads();
console.log(response);

Console

GET /_nodes/hot_threads

長時間実行されるノードタスクを探す

長時間実行されるタスクもバックログを引き起こす可能性があります。タスク管理 APIを使用して、実行中のノードタスクに関する情報を取得できます。running_time_in_nanosを確認して、完了に過剰な時間がかかっているタスクを特定します。

Python

resp = client.tasks.list(
   pretty=True,
   human=True,
   detailed=True,
)
print(resp)

Js

const response = await client.tasks.list({
  pretty: "true",
  human: "true",
  detailed: "true",
});
console.log(response);

Console

GET /_tasks?pretty=true&human=true&detailed=true

特定のactionが疑われる場合、タスクをさらにフィルタリングできます。最も一般的な長時間実行されるタスクは、バルクインデックスまたは検索関連です。

バルクインデックスアクションをフィルタリング:

Python

resp = client.tasks.list(
   human=True,
   detailed=True,
   actions="indices:data/write/bulk",
  )
  print(resp)

Js

const response = await client.tasks.list({
   human: "true",
   detailed: "true",
   actions: "indices:data/write/bulk",
  });
  console.log(response);

Console

GET /_tasks?human&detailed&actions=indices:data/write/bulk

検索アクションをフィルタリング:

Python

resp = client.tasks.list(
   human=True,
   detailed=True,
   actions="indices:data/write/search",
  )
  print(resp)

Js

const response = await client.tasks.list({
   human: "true",
   detailed: "true",
   actions: "indices:data/write/search",
  });
  console.log(response);

Console

GET /_tasks?human&detailed&actions=indices:data/write/search

APIの応答には、descriptionやheaderなどの追加のタスク列が含まれている場合があります。これにより、タスクのパラメータ、ターゲット、およびリクエスターが提供されます。この情報を使用して、さらに診断を行うことができます。

長時間実行されるクラスタータスクを探す

タスクのバックログは、クラスター状態の同期の遅延としても現れる可能性があります。クラスター保留タスクAPIを使用して、実行中の保留中のクラスター状態同期タスクに関する情報を取得できます。

Python

resp = client.cluster.pending_tasks()
print(resp)

Js

const response = await client.cluster.pendingTasks();
console.log(response);

Console

GET /_cluster/pending_tasks

timeInQueueを確認して、完了に過剰な時間がかかっているタスクを特定します。

タスクキューのバックログを解決する

利用可能なリソースを増やす

タスクが遅く進行し、キューがバックアップされている場合、CPU使用率を削減するための手段を講じる必要があるかもしれません。

場合によっては、スレッドプールのサイズを増やすことが役立つかもしれません。たとえば、force_mergeスレッドプールはデフォルトで1つのスレッドです。サイズを2に増やすことで、強制マージリクエストのバックログを減らすのに役立つかもしれません。

スタックしたタスクをキャンセルする

アクティブなタスクのホットスレッドが進行しておらず、バックログがある場合は、タスクをキャンセルすることを検討してください。