Stanza bad performance on Named Entity Identification

Question

I'm using both SpaCy and Stanza to identify named entities in very short string (brand names and business names):

 # BUILDING THE MODELS #-----stanza sen = stanza.Pipeline ("en") smlp = stanza.MultilingualPipeline() #----spacy spl = spacy.load(en_core_web_lg-3.7.1) spt = spacy.load(en_core_web_trf-3.7.3) # TESTING THE MODELS name = 'The Port of Peri Peri' print(name) # spacy print('\n SPACY------------------------------------------------------------------') print('spacy spl') doc = spl(name) for token in doc: print(token.text, token.is_oov, token.shape_, token.tag_, token.pos_, token.dep_, token.ent_type_, token.ent_iob_) print('-----------------') print('spacy trf') doc = spt(name) for token in doc: print(token.text, token.is_oov, token.shape_, token.tag_, token.pos_, token.dep_, token.ent_type_, token.ent_iob_) #stanza print('\n STANZA----------------------------------------------------------------') print('stanza sen') doc = sen(name) for sent in doc.sentences: for token in sent.tokens: for word in token.words: print(word.text, word.xpos, word.upos, word.deprel, token.ner) print('-----------------') print('stanza smlp') doc = smlp(name) for sent in doc.sentences: for token in sent.tokens: for word in token.words: print(word.text, word.xpos, word.upos, word.deprel, token.ner) print('-----------------')

Output:

The Port of Peri Peri SPACY------------------------------------------------------------------ spacy spl The False Xxx DT DET det ORG B Port False Xxxx NNP PROPN ROOT ORG I of False xx IN ADP prep ORG I Peri False Xxxx NNP PROPN compound ORG I Peri False Xxxx NNP PROPN pobj ORG I ----------------- spacy trf The True Xxx DT DET det FAC B Port True Xxxx NNP PROPN ROOT FAC I of True xx IN ADP prep FAC I Peri True Xxxx NNP PROPN compound FAC I Peri True Xxxx NNP PROPN pobj FAC I STANZA---------------------------------------------------------------- stanza sen The DT DET det B-PERSON Port NNP PROPN root I-PERSON of IN ADP case I-PERSON Peri NNP PROPN nmod I-PERSON Peri NNP PROPN nmod E-PERSON ----------------- stanza smlp The DT DET det B-PERSON Port NNP PROPN root I-PERSON of IN ADP case I-PERSON Peri NNP PROPN nmod I-PERSON Peri NNP PROPN nmod E-PERSON -----------------

So, if you look at the ends of the output lines, while SpaCy identifies the place as ORGanization or FACility, Stanza identifies it as PERSON, which is rediculous: Person names do not usually start with "The" or have "of" in them. My question is, is there any improvements I can make to the Stanza models or is this as good as they get?

Stack Exchange Network

Stanza bad performance on Named Entity Identification

0

Hot Network Questions

Stanza bad performance on Named Entity Identification

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions