Implementation of Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition.
(This work was tested with PyTorch 2.0.1, CUDA 11.7, python 3.8, and Ubuntu 20.04.)
$ git clone https://github.com/Amiannn/Dancer.git $ python3 -m entity_correction \ --asr_transcription_path "./datas/aishell_test_set/asr_transcription/conformer/hyp" \ --asr_nbest_transcription_path "./datas/aishell_test_set/asr_transcription/conformer/nbest" \ --asr_manuscript_path "./datas/aishell_test_set/ref" \ --entity_path "./datas/entities/aishell/test_1_entities.txt" \ --entity_content_path "./datas/entities/aishell/descriptions" \ --entity_vectors_path "./datas/entities/aishell/descriptions/embeds.npy" \ --detection_model_type "bert_detector" \ --detection_model_path "./ckpts/ner/best_model" \ --retrieval_model_type "prsr_retriever" \ --retrieval_model_path "./ckpts/ranker/dpr_biencoder.39" \ --use_rejection "True"For example, we train CED model on Aishell dataset as follows:
$ python3 -m train_ced \ --train_path "./datas/ced/aishell_trainset_plus_conformer_nbest10_trainset_decode_result.json" \ --model_type "bert-base-chinese" \ --wandb "DANCER_CED_EXP" \ --epoch 10 \ --batch 256We train our EDA-MLM model on Aishell dataset (datas/eda_mlm/aishell) by using custom DPR project
$ python3 -m error_analysis.score \ --entity_path "./datas/entities/aishell/test/test_1_entities.txt" \ --ref_path "./datas/aishell_test_set/ref" \ --hyp_path "./datas/aishell_test_set/asr_transcription/conformer/hyp"- Download Google-Drive
- Download Google-Drive