7
$\begingroup$

Version 12 has basic NER support for some entities, but how does one recognize a custom entity?

For example, I want to parse text describing products, and parse out three entities: prices, size, and color, as custom entities. Using TextCases and TextContents works nicely for price and color:

TextCases["A red tshirt costs $5 and is medium", {"Color", "CurrencyAmount"}] 

enter image description here

But there is no way to parse entities that are not listed in guide/TextContentTypes, like sizes:

TextCases["large women's leather jacket", {"Size"}] 

enter image description here

And there is no way to add additional synonyms or spellings to an entity:

TextCases[# <> " feather", "Color"] & /@ {"violet", "lilac", "lavender", "royal", "purpled", "plum", "grape", "maroon", "magenta", "purplish"} 

enter image description here

I want to extend the built-in NER model with custom training data, i.e. substring labels:

newTrainingData = <|"Who is Nishanth?"-> {8, 16, "Name"}, "Who is Kamal Khumar?" -> {8, 20, "Name"}, "I like London and Berlin." -> {{8,14, "City"}, {19,25,"City"} |> 

If this is not supported in 12.2, perhaps there is a 3rd party paclet, resource function, or some NN repo entry to extend? or maybe it's coming in 12.3?

Related:

$\endgroup$
2
  • 4
    $\begingroup$ There was some discussion of upcoming improvements to TextCases in a recent "Live CEOing" discussion (Ep 438 youtube.com/watch?v=Beccqss0nGw around 1:16) and its entity recognition performance. Those in the system entity framework should be more reliably identified by TextCases but I don't think there are immediate plans to expose the machinery behind it to users for custom entity type creation. $\endgroup$ Commented Apr 9, 2021 at 9:44
  • $\begingroup$ This seems like it would be a good application for LLMExampleFunction $\endgroup$ Commented May 22, 2024 at 11:31

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.