文字フィルタリファレンス - パターン置換（Pattern replace）

パターン置換文字フィルター
パターン置換文字フィルター
設定
例の設定

パターン置換文字フィルター

pattern_replace 文字フィルターは、指定された置換文字列で置き換えるべき文字を正規表現で一致させます。置換文字列は、正規表現のキャプチャグループを参照することができます。

パターン置換文字フィルター

設定

pattern_replace 文字フィルターは、以下のパラメータを受け入れます:


`pattern`	Java正規表現。必須。
`replacement`	キャプチャグループを参照できる置換文字列。これは、 `$1`..`$9` 構文を使用して、こちらで説明されています。

| flags | Java正規表現のフラグ。
フラグはパイプで区切る必要があります。例: "CASE_INSENSITIVE|COMMENTS"。

例の設定

この例では、pattern_replace 文字フィルターを設定して、数字内の埋め込まれたダッシュをアンダースコアに置き換えます。すなわち、123-456-789 → 123_456_789:

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "my_analyzer": {
   "tokenizer": "standard",
   "char_filter": [
   "my_char_filter"
   ]
   }
   },
   "char_filter": {
   "my_char_filter": {
   "type": "pattern_replace",
   "pattern": "(\\d+)-(?=\\d)",
   "replacement": "$1_"
   }
   }
   }
   },
)
print(resp)
resp1 = client.indices.analyze(
   index="my-index-000001",
   analyzer="my_analyzer",
   text="My credit card is 123-456-789",
)
print(resp1)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   tokenizer: 'standard',
   char_filter: [
   'my_char_filter'
   ]
   }
   },
   char_filter: {
   my_char_filter: {
   type: 'pattern_replace',
   pattern: '(\\d+)-(?=\\d)',
   replacement: '$1_'
   }
   }
   }
   }
  }
)
puts response
response = client.indices.analyze(
  index: 'my-index-000001',
  body: {
   analyzer: 'my_analyzer',
   text: 'My credit card is 123-456-789'
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   tokenizer: "standard",
   char_filter: ["my_char_filter"],
   },
   },
   char_filter: {
   my_char_filter: {
   type: "pattern_replace",
   pattern: "(\\d+)-(?=\\d)",
   replacement: "$1_",
   },
   },
   },
  },
});
console.log(response);
const response1 = await client.indices.analyze({
  index: "my-index-000001",
  analyzer: "my_analyzer",
  text: "My credit card is 123-456-789",
});
console.log(response1);

コンソール

PUT my-index-000001
{
  "settings": {
   "analysis": {
   "analyzer": {
   "my_analyzer": {
   "tokenizer": "standard",
   "char_filter": [
   "my_char_filter"
   ]
   }
   },
   "char_filter": {
   "my_char_filter": {
   "type": "pattern_replace",
   "pattern": "(\\d+)-(?=\\d)",
   "replacement": "$1_"
   }
   }
   }
  }
}
POST my-index-000001/_analyze
{
  "analyzer": "my_analyzer",
  "text": "My credit card is 123-456-789"
}

上記の例は、次の用語を生成します:

テキスト

[ My, credit, card, is, 123_456_789 ]

元のテキストの長さを変更する置換文字列を使用すると、検索目的には機能しますが、次の例に示すように、ハイライトが不正確になります。

この例では、小文字の文字の後に大文字の文字が続くときにスペースを挿入します（すなわち、fooBarBaz → foo Bar Baz）。これにより、camelCaseの単語を個別にクエリできます:

Python

resp = client.indices.create(
   index="my-index-000001",
   settings={
   "analysis": {
   "analyzer": {
   "my_analyzer": {
   "tokenizer": "standard",
   "char_filter": [
   "my_char_filter"
   ],
   "filter": [
   "lowercase"
   ]
   }
   },
   "char_filter": {
   "my_char_filter": {
   "type": "pattern_replace",
   "pattern": "(?<=\\p{Lower})(?=\\p{Upper})",
   "replacement": " "
   }
   }
   }
   },
   mappings={
   "properties": {
   "text": {
   "type": "text",
   "analyzer": "my_analyzer"
   }
   }
   },
)
print(resp)
resp1 = client.indices.analyze(
   index="my-index-000001",
   analyzer="my_analyzer",
   text="The fooBarBaz method",
)
print(resp1)

Ruby

response = client.indices.create(
  index: 'my-index-000001',
  body: {
   settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   tokenizer: 'standard',
   char_filter: [
   'my_char_filter'
   ],
   filter: [
   'lowercase'
   ]
   }
   },
   char_filter: {
   my_char_filter: {
   type: 'pattern_replace',
   pattern: '(?<=\\p{Lower})(?=\\p{Upper})',
   replacement: ' '
   }
   }
   }
   },
   mappings: {
   properties: {
   text: {
   type: 'text',
   analyzer: 'my_analyzer'
   }
   }
   }
  }
)
puts response
response = client.indices.analyze(
  index: 'my-index-000001',
  body: {
   analyzer: 'my_analyzer',
   text: 'The fooBarBaz method'
  }
)
puts response

Js

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
   analysis: {
   analyzer: {
   my_analyzer: {
   tokenizer: "standard",
   char_filter: ["my_char_filter"],
   filter: ["lowercase"],
   },
   },
   char_filter: {
   my_char_filter: {
   type: "pattern_replace",
   pattern: "(?<=\\p{Lower})(?=\\p{Upper})",
   replacement: " ",
   },
   },
   },
  },
  mappings: {
   properties: {
   text: {
   type: "text",
   analyzer: "my_analyzer",
   },
   },
  },
});
console.log(response);
const response1 = await client.indices.analyze({
  index: "my-index-000001",
  analyzer: "my_analyzer",
  text: "The fooBarBaz method",
});
console.log(response1);

コンソール

PUT my-index-000001
{
  "settings": {
   "analysis": {
   "analyzer": {
   "my_analyzer": {
   "tokenizer": "standard",
   "char_filter": [
   "my_char_filter"
   ],
   "filter": [
   "lowercase"
   ]
   }
   },
   "char_filter": {
   "my_char_filter": {
   "type": "pattern_replace",
   "pattern": "(?<=\\p{Lower})(?=\\p{Upper})",
   "replacement": " "
   }
   }
   }
  },
  "mappings": {
   "properties": {
   "text": {
   "type": "text",
   "analyzer": "my_analyzer"
   }
   }
  }
}
POST my-index-000001/_analyze
{
  "analyzer": "my_analyzer",
  "text": "The fooBarBaz method"
}

上記は次の用語を返します:

テキスト

[ the, foo, bar, baz, method ]


#### Python
``````python
resp = client.index(
   index="my-index-000001",
   id="1",
   refresh=True,
   document={
   "text": "The fooBarBaz method"
   },
)
print(resp)
resp1 = client.search(
   index="my-index-000001",
   query={
   "match": {
   "text": "bar"
   }
   },
   highlight={
   "fields": {
   "text": {}
   }
   },
)
print(resp1)
`

Ruby

response = client.index(
  index: 'my-index-000001',
  id: 1,
  refresh: true,
  body: {
   text: 'The fooBarBaz method'
  }
)
puts response
response = client.search(
  index: 'my-index-000001',
  body: {
   query: {
   match: {
   text: 'bar'
   }
   },
   highlight: {
   fields: {
   text: {}
   }
   }
  }
)
puts response

Js

const response = await client.index({
  index: "my-index-000001",
  id: 1,
  refresh: "true",
  document: {
   text: "The fooBarBaz method",
  },
});
console.log(response);
const response1 = await client.search({
  index: "my-index-000001",
  query: {
   match: {
   text: "bar",
   },
  },
  highlight: {
   fields: {
   text: {},
   },
  },
});
console.log(response1);

コンソール

PUT my-index-000001/_doc/1?refresh
{
  "text": "The fooBarBaz method"
}
GET my-index-000001/_search
{
  "query": {
   "match": {
   "text": "bar"
   }
  },
  "highlight": {
   "fields": {
   "text": {}
   }
  }
}

上記の出力は次のとおりです:

コンソール-結果

{
  "timed_out": false,
  "took": $body.took,
  "_shards": {
   "total": 1,
   "successful": 1,
   "skipped" : 0,
   "failed": 0
  },
  "hits": {
   "total" : {
   "value": 1,
   "relation": "eq"
   },
   "max_score": 0.2876821,
   "hits": [
   {
   "_index": "my-index-000001",
   "_id": "1",
   "_score": 0.2876821,
   "_source": {
   "text": "The fooBarBaz method"
   },
   "highlight": {
   "text": [
   "The foo<em>Ba</em>rBaz method"
   ]
   }
   }
   ]
  }
}


	不正確なハイライトに注意してください。