Name	Name	Last commit message	Last commit date
parent directory ..
splits	splits
README.md	README.md

Name

Last commit message

Last commit date

Splits for the Ben-Mordecai and Elhadad Hebrew NER Corpus (BMC)

In order to evaluate performance in accordance with the original Ben-Mordecai and Elhadad (2005) work, we provide three 75%-25% random splits.

Only the 7 entity categories viable for evaluation were kept (DATE, LOC, MONEY, ORG, PER, PERCENT, TIME) --- all MISC entities were filtered out.
Sequence label scheme was changed from IOB to BIOES
The dev sets are 10% taken out of the 75%

Citation

If you use use the BMC corpus, please cite the original paper as well as our paper which describes the splits:

Ben-Mordecai and Elhadad (2005):

@mastersthesis{naama,  title={Hebrew Named Entity Recognition},  author={Ben-Mordecai, Naama},  advisor={Elhadad, Michael},  year={2005},  url="https://www.cs.bgu.ac.il/~elhadad/nlpproj/naama/",  institution={Department of Computer Science, Ben-Gurion University},  school={Department of Computer Science, Ben-Gurion University}, }

Bareket and Tsarfaty (2020)

@misc{bareket2020neural,  title={Neural Modeling for Named Entities and Morphology (NEMO^2)},   author={Dan Bareket and Reut Tsarfaty},  year={2020},  eprint={2007.15620},  archivePrefix={arXiv},  primaryClass={cs.CL} }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Splits for the Ben-Mordecai and Elhadad Hebrew NER Corpus (BMC)

Citation

FilesExpand file tree

BMCNER

Directory actions

More options

Directory actions

More options

Latest commit

History

BMCNER

Folders and files

parent directory

README.md

Splits for the Ben-Mordecai and Elhadad Hebrew NER Corpus (BMC)

Citation