テキスト分析の設定 - アナライザーのテスト（Test an analyzer）

アナライザーのテスト

アナライザーのテスト

analyze API は、アナライザーによって生成された用語を表示するための非常に貴重なツールです。リクエスト内で組み込みのアナライザーを指定できます:

Python

resp = client.indices.analyze(
   analyzer="whitespace",
   text="The quick brown fox.",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   analyzer: 'whitespace',
   text: 'The quick brown fox.'
  }
)
puts response

Js

const response = await client.indices.analyze({
  analyzer: "whitespace",
  text: "The quick brown fox.",
});
console.log(response);

コンソール

POST _analyze
{
  "analyzer": "whitespace",
  "text":     "The quick brown fox."
}

APIは次のレスポンスを返します:

コンソール-結果

{
  "tokens": [
   {
   "token": "The",
   "start_offset": 0,
   "end_offset": 3,
   "type": "word",
   "position": 0
   },
   {
   "token": "quick",
   "start_offset": 4,
   "end_offset": 9,
   "type": "word",
   "position": 1
   },
   {
   "token": "brown",
   "start_offset": 10,
   "end_offset": 15,
   "type": "word",
   "position": 2
   },
   {
   "token": "fox.",
   "start_offset": 16,
   "end_offset": 20,
   "type": "word",
   "position": 3
   }
  ]
}

次の組み合わせをテストすることもできます:

トークナイザー
ゼロ個以上のトークンフィルター
ゼロ個以上の文字フィルター

Python

resp = client.indices.analyze(
   tokenizer="standard",
   filter=[
   "lowercase",
   "asciifolding"
   ],
   text="Is this déja vu?",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'standard',
   filter: [
   'lowercase',
   'asciifolding'
   ],
   text: 'Is this déja vu?'
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: ["lowercase", "asciifolding"],
  text: "Is this déja vu?",
});
console.log(response);

コンソール

POST _analyze
{
  "tokenizer": "standard",
  "filter":  [ "lowercase", "asciifolding" ],
  "text":      "Is this déja vu?"
}

APIは次のレスポンスを返します:

コンソール-結果

{
  "tokens": [
   {
   "token": "is",
   "start_offset": 0,
   "end_offset": 2,
   "type": "<ALPHANUM>",
   "position": 0
   },
   {
   "token": "this",
   "start_offset": 3,
   "end_offset": 7,
   "type": "<ALPHANUM>",
   "position": 1
   },
   {
   "token": "deja",
   "start_offset": 8,
   "end_offset": 12,
   "type": "<ALPHANUM>",
   "position": 2
   },
   {
   "token": "vu",
   "start_offset": 13,
   "end_offset": 15,
   "type": "<ALPHANUM>",
   "position": 3
   }
  ]
}

位置と文字オフセット

[analyze API]の出力からわかるように、アナライザーは単語を用語に変換するだけでなく、各用語の順序または相対的な位置（フレーズクエリや単語近接クエリに使用）を記録し、元のテキスト内の各用語の開始および終了の文字オフセット（検索スニペットのハイライトに使用）を記録します。

代わりに、custom アナライザーを参照して、特定のインデックスで[analyze API]を実行することができます:

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "std_folded": {
   "type": "custom",
   "tokenizer": "standard",
   "filter": [
   "lowercase",
   "asciifolding"
   ]
   }
   }
   }
   },
   mappings={
   "properties": {
   "my_text": {
   "type": "text",
   "analyzer": "std_folded"
   }
   }
   },
)
print(resp)
resp1 = client.indices.analyze(
   index="my-index-000001",
   analyzer="std_folded",
   text="Is this déjà vu?",
)
print(resp1)
resp2 = client.indices.analyze(
   index="my-index-000001",
   field="my_text",
   text="Is this déjà vu?",
)
print(resp2)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   std_folded: {
   type: 'custom',
   tokenizer: 'standard',
   filter: [
   'lowercase',
   'asciifolding'
   ]
   }
   }
   }
   },
   mappings: {
   properties: {
   my_text: {
   type: 'text',
   analyzer: 'std_folded'
   }
   }
   }
  }
)
puts response
response = client.indices.analyze(
  index: 'my-index-000001',
  body: {
   analyzer: 'std_folded',
   text: 'Is this déjà vu?'
  }
)
puts response
response = client.indices.analyze(
  index: 'my-index-000001',
  body: {
   field: 'my_text',
   text: 'Is this déjà vu?'
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   std_folded: {
   type: "custom",
   tokenizer: "standard",
   filter: ["lowercase", "asciifolding"],
   },
   },
   },
  },
  mappings: {
   properties: {
   my_text: {
   type: "text",
   analyzer: "std_folded",
   },
   },
  },
});
console.log(response);
const response1 = await client.indices.analyze({
  index: "my-index-000001",
  analyzer: "std_folded",
  text: "Is this déjà vu?",
});
console.log(response1);
const response2 = await client.indices.analyze({
  index: "my-index-000001",
  field: "my_text",
  text: "Is this déjà vu?",
});
console.log(response2);

コンソール

PUT my-index-000001
{
  "settings": {
   "analysis": {
   "analyzer": {
   "std_folded": {
   "type": "custom",
   "tokenizer": "standard",
   "filter": [
   "lowercase",
   "asciifolding"
   ]
   }
   }
   }
  },
  "mappings": {
   "properties": {
   "my_text": {
   "type": "text",
   "analyzer": "std_folded"
   }
   }
  }
}
GET my-index-000001/_analyze
{
  "analyzer": "std_folded",
  "text":     "Is this déjà vu?"
}
GET my-index-000001/_analyze
{
  "field": "my_text",
  "text":  "Is this déjà vu?"
}

APIは次のレスポンスを返します:

コンソール-結果

{
  "tokens": [
   {
   "token": "is",
   "start_offset": 0,
   "end_offset": 2,
   "type": "<ALPHANUM>",
   "position": 0
   },
   {
   "token": "this",
   "start_offset": 3,
   "end_offset": 7,
   "type": "<ALPHANUM>",
   "position": 1
   },
   {
   "token": "deja",
   "start_offset": 8,
   "end_offset": 12,
   "type": "<ALPHANUM>",
   "position": 2
   },
   {
   "token": "vu",
   "start_offset": 13,
   "end_offset": 15,
   "type": "<ALPHANUM>",
   "position": 3
   }
  ]
}


	`std_folded`という`custom`アナライザーを定義します。
	フィールド`my_text`は`std_folded`アナライザーを使用します。
	このアナライザーを参照するには、`analyze` APIがインデックス名を指定する必要があります。
	名前でアナライザーを参照します。
	フィールド`my_text`によって使用されるアナライザーを参照します。