インデックスAPI - 分析（Analyze） - 《Elasticsearchガイドv8.15》日本語

Analyze API
Request
Prerequisites
Path parameters
Query parameters
Examples

Analyze API

分析をテキスト文字列に対して実行し、結果のトークンを返します。

Python

resp = client.indices.analyze(
   analyzer="standard",
   text="Quick Brown Foxes!",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   analyzer: 'standard',
   text: 'Quick Brown Foxes!'
  }
)
puts response

Js

const response = await client.indices.analyze({
  analyzer: "standard",
  text: "Quick Brown Foxes!",
});
console.log(response);

Console

GET /_analyze
{
  "analyzer" : "standard",
  "text" : "Quick Brown Foxes!"
}

Request

GET /_analyze

POST /_analyze

GET /<index>/_analyze

POST /<index>/_analyze

Prerequisites

Elasticsearchのセキュリティ機能が有効な場合、指定されたインデックスに対してmanage インデックス権限を持っている必要があります。

Path parameters

<index>
(オプション、文字列) アナライザーを導出するために使用されるインデックス。
指定された場合、analyzerまたは<field>パラメータがこの値を上書きします。
アナライザーまたはフィールドが指定されていない場合、analyze APIはインデックスのデフォルトアナライザーを使用します。
インデックスが指定されていない場合、またはインデックスにデフォルトアナライザーがない場合、analyze APIは標準アナライザーを使用します。

Query parameters

analyzer
(オプション、文字列) 提供されたtextに適用されるべきアナライザーの名前。このアナライザーは組み込みアナライザーであるか、インデックスで構成されたアナライザーである可能性があります。
このパラメータが指定されていない場合、analyze APIはフィールドのマッピングで定義されたアナライザーを使用します。
フィールドが指定されていない場合、analyze APIはインデックスのデフォルトアナライザーを使用します。
インデックスが指定されていない場合、またはインデックスにデフォルトアナライザーがない場合、analyze APIは標準アナライザーを使用します。
attributes
(オプション、文字列の配列) explainパラメータの出力をフィルタリングするために使用されるトークン属性の配列。
char_filter
(オプション、文字列の配列) トークナイザーの前に文字を前処理するために使用される文字フィルターの配列。文字フィルターのリストについては文字フィルターのリファレンスを参照してください。
explain
(オプション、ブール値) trueの場合、レスポンスにはトークン属性と追加の詳細が含まれます。デフォルトはfalseです。 [プレビュー] 追加の詳細情報の形式はLuceneで実験的とラベル付けされており、将来的に変更される可能性があります。
field
(オプション、文字列) アナライザーを導出するために使用されるフィールド。このパラメータを使用するには、インデックスを指定する必要があります。
指定された場合、analyzerパラメータがこの値を上書きします。
フィールドが指定されていない場合、analyze APIはインデックスのデフォルトアナライザーを使用します。
インデックスが指定されていない場合、またはインデックスにデフォルトアナライザーがない場合、analyze APIは標準アナライザーを使用します。
filter
(オプション、文字列の配列) トークナイザーの後に適用されるトークンフィルターの配列。トークンフィルターのリストについてはトークンフィルターのリファレンスを参照してください。
normalizer
(オプション、文字列) テキストを単一のトークンに変換するために使用されるノーマライザー。ノーマライザーのリストについてはノーマライザーを参照してください。
text
(必須、文字列または文字列の配列) 分析するテキスト。文字列の配列が提供される場合、それはマルチバリューフィールドとして分析されます。
tokenizer
(オプション、文字列) テキストをトークンに変換するために使用されるトークナイザー。トークナイザーのリストについてはトークナイザーのリファレンスを参照してください。

Examples

No index specified

インデックスを指定せずに、任意の組み込みアナライザーをテキスト文字列に適用できます。

Python

resp = client.indices.analyze(
   analyzer="standard",
   text="this is a test",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   analyzer: 'standard',
   text: 'this is a test'
  }
)
puts response

Js

const response = await client.indices.analyze({
  analyzer: "standard",
  text: "this is a test",
});
console.log(response);

Console

GET /_analyze
{
  "analyzer" : "standard",
  "text" : "this is a test"
}

Array of text strings


#### Python
``````python
resp = client.indices.analyze(
   analyzer="standard",
   text=[
   "this is a test",
   "the second text"
   ],
)
print(resp)
`

Ruby

response = client.indices.analyze(
  body: {
   analyzer: 'standard',
   text: [
   'this is a test',
   'the second text'
   ]
  }
)
puts response

Js

const response = await client.indices.analyze({
  analyzer: "standard",
  text: ["this is a test", "the second text"],
});
console.log(response);

Console

GET /_analyze
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}

Custom analyzer

analyze APIを使用して、トークナイザー、トークンフィルター、文字フィルターから構築されたカスタム一時アナライザーをテストできます。トークンフィルターはfilterパラメータを使用します:

Python

resp = client.indices.analyze(
   tokenizer="keyword",
   filter=[
   "lowercase"
   ],
   text="this is a test",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'keyword',
   filter: [
   'lowercase'
   ],
   text: 'this is a test'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "keyword",
  filter: ["lowercase"],
  text: "this is a test",
});
console.log(response);

Console

GET /_analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "text" : "this is a test"
}

Python

resp = client.indices.analyze(
   tokenizer="keyword",
   filter=[
   "lowercase"
   ],
   char_filter=[
   "html_strip"
   ],
   text="this is a test</b>",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'keyword',
   filter: [
   'lowercase'
   ],
   char_filter: [
   'html_strip'
   ],
   text: 'this is a test</b>'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "keyword",
  filter: ["lowercase"],
  char_filter: ["html_strip"],
  text: "this is a test</b>",
});
console.log(response);

Console

GET /_analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}

カスタムトークナイザー、トークンフィルター、文字フィルターは、リクエストボディに次のように指定できます:

Python

resp = client.indices.analyze(
   tokenizer="whitespace",
   filter=[
   "lowercase",
   {
   "type": "stop",
   "stopwords": [
   "a",
   "is",
   "this"
   ]
   }
   ],
   text="this is a test",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'whitespace',
   filter: [
   'lowercase',
   {
   type: 'stop',
   stopwords: [
   'a',
   'is',
   'this'
   ]
   }
   ],
   text: 'this is a test'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "whitespace",
  filter: [
   "lowercase",
   {
   type: "stop",
   stopwords: ["a", "is", "this"],
   },
  ],
  text: "this is a test",
});
console.log(response);

Console

GET /_analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

Specific index

特定のインデックスに対してanalyze APIを実行することもできます:

Python

resp = client.indices.analyze(
   index="analyze_sample",
   text="this is a test",
)
print(resp)

Ruby

response = client.indices.analyze(
  index: 'analyze_sample',
  body: {
   text: 'this is a test'
  }
)
puts response

Js

const response = await client.indices.analyze({
  index: "analyze_sample",
  text: "this is a test",
});
console.log(response);

Console

GET /analyze_sample/_analyze
{
  "text" : "this is a test"
}

上記は、analyze_sampleインデックスに関連付けられたデフォルトインデックスアナライザーを使用して、「これはテストです」というテキストの分析を実行します。analyzerを提供することもでき、異なるアナライザーを使用できます:

Python

resp = client.indices.analyze(
   index="analyze_sample",
   analyzer="whitespace",
   text="this is a test",
)
print(resp)

Ruby

response = client.indices.analyze(
  index: 'analyze_sample',
  body: {
   analyzer: 'whitespace',
   text: 'this is a test'
  }
)
puts response

Js

const response = await client.indices.analyze({
  index: "analyze_sample",
  analyzer: "whitespace",
  text: "this is a test",
});
console.log(response);

Console

GET /analyze_sample/_analyze
{
  "analyzer" : "whitespace",
  "text" : "this is a test"
}

Derive analyzer from a field mapping

アナライザーはフィールドマッピングに基づいて導出できます。例えば:

Python

resp = client.indices.analyze(
   index="analyze_sample",
   field="obj1.field1",
   text="this is a test",
)
print(resp)

Ruby

response = client.indices.analyze(
  index: 'analyze_sample',
  body: {
   field: 'obj1.field1',
   text: 'this is a test'
  }
)
puts response

Js

const response = await client.indices.analyze({
  index: "analyze_sample",
  field: "obj1.field1",
  text: "this is a test",
});
console.log(response);

Console

GET /analyze_sample/_analyze
{
  "field" : "obj1.field1",
  "text" : "this is a test"
}


### Normalizer
`````normalizer`````は、`````analyze_sample`````インデックスに関連付けられたノーマライザーを持つキーワードフィールドに提供できます。
#### Python
``````python
resp = client.indices.analyze(
   index="analyze_sample",
   normalizer="my_normalizer",
   text="BaR",
)
print(resp)
`

Ruby

response = client.indices.analyze(
  index: 'analyze_sample',
  body: {
   normalizer: 'my_normalizer',
   text: 'BaR'
  }
)
puts response

Js

const response = await client.indices.analyze({
  index: "analyze_sample",
  normalizer: "my_normalizer",
  text: "BaR",
});
console.log(response);

Console

GET /analyze_sample/_analyze
{
  "normalizer" : "my_normalizer",
  "text" : "BaR"
}

または、トークンフィルターと文字フィルターからカスタム一時ノーマライザーを構築することによって。

Python

resp = client.indices.analyze(
   filter=[
   "lowercase"
   ],
   text="BaR",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   filter: [
   'lowercase'
   ],
   text: 'BaR'
  }
)
puts response

Js

const response = await client.indices.analyze({
  filter: ["lowercase"],
  text: "BaR",
});
console.log(response);

Console

GET /_analyze
{
  "filter" : ["lowercase"],
  "text" : "BaR"
}

Explain analyze

より詳細な情報を取得したい場合は、explainをtrueに設定します（デフォルトはfalseです）。これにより、各トークンのすべてのトークン属性が出力されます。出力したいトークン属性をattributesオプションを設定することでフィルタリングできます。

追加の詳細情報の形式はLuceneで実験的とラベル付けされており、将来的に変更される可能性があります。

Python

resp = client.indices.analyze(
   tokenizer="standard",
   filter=[
   "snowball"
   ],
   text="detailed output",
   explain=True,
   attributes=[
   "keyword"
   ],
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'standard',
   filter: [
   'snowball'
   ],
   text: 'detailed output',
   explain: true,
   attributes: [
   'keyword'
   ]
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: ["snowball"],
  text: "detailed output",
  explain: true,
  attributes: ["keyword"],
});
console.log(response);

Console

GET /_analyze
{
  "tokenizer" : "standard",
  "filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"]
}


	“keyword”属性のみを出力するために”keyword”を設定します

リクエストは次の結果を返します:

Console-Result

{
  "detail" : {
   "custom_analyzer" : true,
   "charfilters" : [ ],
   "tokenizer" : {
   "name" : "standard",
   "tokens" : [ {
   "token" : "detailed",
   "start_offset" : 0,
   "end_offset" : 8,
   "type" : "<ALPHANUM>",
   "position" : 0
   }, {
   "token" : "output",
   "start_offset" : 9,
   "end_offset" : 15,
   "type" : "<ALPHANUM>",
   "position" : 1
   } ]
   },
   "tokenfilters" : [ {
   "name" : "snowball",
   "tokens" : [ {
   "token" : "detail",
   "start_offset" : 0,
   "end_offset" : 8,
   "type" : "<ALPHANUM>",
   "position" : 0,
   "keyword" : false
   }, {
   "token" : "output",
   "start_offset" : 9,
   "end_offset" : 15,
   "type" : "<ALPHANUM>",
   "position" : 1,
   "keyword" : false
   } ]
   } ]
  }
}


	リクエストで”attributes”を指定したため、”keyword”属性のみを出力します。

Setting a token limit

過剰なトークンを生成すると、ノードがメモリ不足になる可能性があります。次の設定により、生成できるトークンの数を制限できます:

index.analyze.max_token_count
_analyze APIを使用して生成できるトークンの最大数。デフォルト値は10000です。この制限を超えるトークンが生成されると、エラーが発生します。指定されたインデックスがない_analyzeエンドポイントは、常に10000値を制限として使用します。この設定により、特定のインデックスの制限を制御できます:

Python

resp = client.indices.create(
   index="analyze_sample",
   settings={
   "index.analyze.max_token_count": 20000
   },
)
print(resp)

Ruby

response = client.indices.create(
  index: 'analyze_sample',
  body: {
   settings: {
   'index.analyze.max_token_count' => 20_000
   }
  }
)
puts response

Js

const response = await client.indices.create({
  index: "analyze_sample",
  settings: {
   "index.analyze.max_token_count": 20000,
  },
});
console.log(response);

Console

PUT /analyze_sample
{
  "settings" : {
   "index.analyze.max_token_count" : 20000
  }
}

Python

resp = client.indices.analyze(
   index="analyze_sample",
   text="this is a test",
)
print(resp)

Ruby

response = client.indices.analyze(
  index: 'analyze_sample',
  body: {
   text: 'this is a test'
  }
)
puts response

Js

const response = await client.indices.analyze({
  index: "analyze_sample",
  text: "this is a test",
});
console.log(response);

Console

GET /analyze_sample/_analyze
{
  "text" : "this is a test"
}