How the models have been obtained is described in our paper.
Setup virtual environment:
virtualenv --python=python3.6 venv Activate virtual environment:
source venv/bin/activate Upgrade pip:
pip install -U pip Install package together with its dependencies in development mode:
pip install -e ./ Download required models: https://qurator-data.de/sbb_ner/models.tar.gz
Extract model archive:
tar -xzf models.tar.gz Run webapp directly:
env FLASK_APP=qurator/sbb_ner/webapp/app.py env FLASK_ENV=development env USE_CUDA=True flask run --host=0.0.0.0 Set USE_CUDA=False, if you do not have a GPU available/installed.
For production purposes rather use
env USE_CUDA=True/False gunicorn --bind 0.0.0.0:5000 qurator.sbb_ner.webapp.wsgi:app If you want to use a different model configuration file:
env USE_CUDA=True/False env CONFIG=`realpath ./my-config.json` gunicorn --bind 0.0.0.0:5000 qurator.sbb_ner.webapp.wsgi:app docker build --build-arg http_proxy=$http_proxy -t qurator/webapp-ner-cpu -f Dockerfile.cpu . docker run -ti --rm=true --mount type=bind,source=data/konvens2019,target=/usr/src/qurator-sbb-ner/data/konvens2019 -p 5000:5000 qurator/webapp-ner-cpu Make sure that your GPU is correctly set up and that nvidia-docker has been installed.
docker build --build-arg http_proxy=$http_proxy -t qurator/webapp-ner-gpu -f Dockerfile . docker run -ti --rm=true --mount type=bind,source=data/konvens2019,target=/usr/src/qurator-sbb-ner/data/konvens2019 -p 5000:5000 qurator/webapp-ner-gpu NER web-interface is availabe at http://localhost:5000 .
Get available models:
curl http://localhost:5000/models Output:
[ { "default": true, "id": 1, "model_dir": "data/konvens2019/build-wd_0.03/bert-all-german-de-finetuned", "name": "DC-SBB + CONLL + GERMEVAL" }, { "default": false, "id": 2, "model_dir": "data/konvens2019/build-on-all-german-de-finetuned/bert-sbb-de-finetuned", "name": "DC-SBB + CONLL + GERMEVAL + SBB" }, { "default": false, "id": 3, "model_dir": "data/konvens2019/build-wd_0.03/bert-sbb-de-finetuned", "name": "DC-SBB + SBB" }, { "default": false, "id": 4, "model_dir": "data/konvens2019/build-wd_0.03/bert-all-german-baseline", "name": "CONLL + GERMEVAL" } ] Perform NER using model 1:
curl -d '{ "text": "Paris Hilton wohnt im Hilton Paris in Paris." }' -H "Content-Type: application/json" http://localhost:5000/ner/1 Output:
[ [ { "prediction": "B-PER", "word": "Paris" }, { "prediction": "I-PER", "word": "Hilton" }, { "prediction": "O", "word": "wohnt" }, { "prediction": "O", "word": "im" }, { "prediction": "B-ORG", "word": "Hilton" }, { "prediction": "I-ORG", "word": "Paris" }, { "prediction": "O", "word": "in" }, { "prediction": "B-LOC", "word": "Paris" }, { "prediction": "O", "word": "." } ] ] The JSON above is the expected input format of the SBB named entity disambiguation system.
Read CONLL 2003 ner ground truth files from directory and write the outcome of the data parsing to some pandas DataFrame that is stored as pickle.
compile_conll --help Read germ eval .tsv files from directory and write the outcome of the data parsing to some pandas DataFrame that is stored as pickle.
compile_germ_eval --help Read europeana historic ner ground truth .bio files from directory and write the outcome of the data parsing to some pandas DataFrame that is stored as pickle.
compile_europeana_historic --help Read wikiner files from directory and write the outcome of the data parsing to some pandas DataFrame that is stored as pickle.
compile_wikiner --help Perform BERT for NER supervised training and test/cross-validation.
bert-ner --help 