推論API - Google Vertex AI推論サービス（Google Vertex AI inference service）

Google Vertex AI推論サービス
リクエスト
パスパラメータ
リクエストボディ
- テキスト
Google Vertex AIサービスの例

Google Vertex AI推論サービス

推論タスクを実行するための推論エンドポイントをgooglevertexaiサービスで作成します。

リクエスト

PUT /_inference/<task_type>/<inference_id>

パスパラメータ

<inference_id>
(必須、文字列) 推論エンドポイントの一意の識別子。
<task_type>
(必須、文字列) モデルが実行する推論タスクのタイプ。
利用可能なタスクタイプ:
- rerank
- text_embedding。

リクエストボディ

service
(必須、文字列) 指定されたタスクタイプに対してサポートされているサービスのタイプ。この場合、googlevertexai。
service_settings
(必須、オブジェクト) 推論モデルをインストールするために使用される設定。
これらの設定はgooglevertexaiサービスに特有です。
- service_account_json
- (必須、文字列) Google Vertex AI API用の有効なサービスアカウントをjson形式で。
- model_id
- (必須、文字列) 推論タスクに使用するモデルの名前。サポートされているモデルはText embeddings APIで確認できます。
- location
- (必須、文字列) 推論タスクに使用するロケーションの名前。サポートされているロケーションはGenerative AI on Vertex AI locationsで確認できます。
- project_id
- (必須、文字列) 推論タスクに使用するプロジェクトの名前。
- rate_limit
- (オプション、オブジェクト) デフォルトでは、googlevertexaiサービスは1分あたりのリクエスト数を30.000に設定します。これはGoogle Vertex AIから返されるレート制限エラーの数を最小限に抑えるのに役立ちます。これを変更するには、サービス設定内のこのオブジェクトのrequests_per_minute設定を設定します:

テキスト

"rate_limit": {
   "requests_per_minute": <<number_of_requests>>
}

Google Vertex AIのレート制限に関する詳細情報は、Google Vertex AI Quotas docsで確認できます。

task_settings
(オプション、オブジェクト) 推論タスクを構成するための設定。これらの設定は指定した<task_type>に特有です。
```
-   `````top_n
```
- (オプション、ブール値) 返されるべき上位nドキュメントの数を指定します。
  task_settingsはtext_embeddingタスクタイプのため
- auto_truncate
- (オプション、ブール値) APIが最大トークン長を超える入力を自動的に切り捨てるかどうかを指定します。

Google Vertex AIサービスの例

以下の例は、google_vertex_ai_embeddingsという推論エンドポイントを作成してtext_embeddingタスクタイプを実行する方法を示しています。

Python

resp = client.inference.put(
   task_type="text_embedding",
   inference_id="google_vertex_ai_embeddings",
   inference_config={
   "service": "googlevertexai",
   "service_settings": {
   "service_account_json": "<service_account_json>",
   "model_id": "<model_id>",
   "location": "<location>",
   "project_id": "<project_id>"
   }
   },
)
print(resp)

Js

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "google_vertex_ai_embeddings",
  inference_config: {
   service: "googlevertexai",
   service_settings: {
   service_account_json: "<service_account_json>",
   model_id: "<model_id>",
   location: "<location>",
   project_id: "<project_id>",
   },
  },
});
console.log(response);

コンソール

PUT _inference/text_embedding/google_vertex_ai_embeddings
{
   "service": "googlevertexai",
   "service_settings": {
   "service_account_json": "<service_account_json>",
   "model_id": "<model_id>",
   "location": "<location>",
   "project_id": "<project_id>"
   }
}

次の例は、google_vertex_ai_rerankという推論エンドポイントを作成してrerankタスクタイプを実行する方法を示しています。

Python

resp = client.inference.put(
   task_type="rerank",
   inference_id="google_vertex_ai_rerank",
   inference_config={
   "service": "googlevertexai",
   "service_settings": {
   "service_account_json": "<service_account_json>",
   "project_id": "<project_id>"
   }
   },
)
print(resp)

Js

const response = await client.inference.put({
  task_type: "rerank",
  inference_id: "google_vertex_ai_rerank",
  inference_config: {
   service: "googlevertexai",
   service_settings: {
   service_account_json: "<service_account_json>",
   project_id: "<project_id>",
   },
  },
});
console.log(response);

コンソール

PUT _inference/rerank/google_vertex_ai_rerank
{
   "service": "googlevertexai",
   "service_settings": {
   "service_account_json": "<service_account_json>",
   "project_id": "<project_id>"
   }
}