Skip to content

feat(nori): add metadata support to Korean tokenizer#14969

Open
twosom wants to merge 4 commits intoapache:mainfrom
twosom:add_nori_metadata
Open

feat(nori): add metadata support to Korean tokenizer#14969
twosom wants to merge 4 commits intoapache:mainfrom
twosom:add_nori_metadata

Conversation

@twosom
Copy link
Contributor

@twosom twosom commented Jul 20, 2025

Description

Summary

Adds metadata support to Nori Korean analyzer, allowing users to attach additional information to dictionary words.

Changes

  • Added MetadataAttribute interface and implementation
  • Extended user dictionary format to support word >> metadata syntax
  • Preserves metadata during compound word decomposition
  • Maintains backward compatibility with existing dictionaries

Example

Dictionary:

자바 >> computer language 엘라스틱서치 엘라스틱 서치 >> search engine 

Result:

  • 자바 → Term: "자바", Metadata: "computer language"
  • 엘라스틱서치 → All decomposed terms ("엘라스틱서치", "엘라스틱", "서치") carry "search engine" metadata

Fixes #14940

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 11.0.0 milestone Jul 20, 2025
@twosom twosom force-pushed the add_nori_metadata branch from 7be08b0 to 81ce2c8 Compare July 20, 2025 06:46
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Aug 4, 2025
@github-actions github-actions bot removed the Stale label Oct 1, 2025
@github-actions
Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant