トークナイザーリファレンス - 文字（Letter） - 《Elasticsearchガイドv8.15》日本語

Letter tokenizer
Example output
Configuration

Letter tokenizer

letter トークナイザーは、文字でない文字に出会うたびにテキストを用語に分割します。これはほとんどのヨーロッパ言語に対しては合理的に機能しますが、単語がスペースで区切られていない一部のアジア言語に対してはひどい結果をもたらします。

Example output

Python

resp = client.indices.analyze(
   tokenizer="letter",
   text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.",
)
print(resp)

Ruby

response = client.indices.analyze(
  body: {
   tokenizer: 'letter',
   text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
  }
)
puts response

Js

const response = await client.indices.analyze({
  tokenizer: "letter",
  text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.",
});
console.log(response);

Console

POST _analyze
{
  "tokenizer": "letter",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

上記の文は次の用語を生成します：

Text

[ The, QUICK, Brown, Foxes, jumped, over, the, lazy, dog, s, bone ]

Configuration

letter トークナイザーは設定可能ではありません。