Skip to content

Amiannn/Dancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DANCER

DANCER💃: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition

Implementation of Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition.

Getting Started

Dependency / Install

(This work was tested with PyTorch 2.0.1, CUDA 11.7, python 3.8, and Ubuntu 20.04.)

  • Install PyTorch
  • Install Faiss
  • $ pip install -r requirements

Scripts

$ git clone https://github.com/Amiannn/Dancer.git 

Prediction

$ python3 -m entity_correction \ --asr_transcription_path "./datas/aishell_test_set/asr_transcription/conformer/hyp" \ --asr_nbest_transcription_path "./datas/aishell_test_set/asr_transcription/conformer/nbest" \ --asr_manuscript_path "./datas/aishell_test_set/ref" \ --entity_path "./datas/entities/aishell/test_1_entities.txt" \ --entity_content_path "./datas/entities/aishell/descriptions" \ --entity_vectors_path "./datas/entities/aishell/descriptions/embeds.npy" \ --detection_model_type "bert_detector" \ --detection_model_path "./ckpts/ner/best_model" \ --retrieval_model_type "prsr_retriever" \ --retrieval_model_path "./ckpts/ranker/dpr_biencoder.39" \ --use_rejection "True"

Train CED Model

For example, we train CED model on Aishell dataset as follows:

$ python3 -m train_ced \ --train_path "./datas/ced/aishell_trainset_plus_conformer_nbest10_trainset_decode_result.json" \ --model_type "bert-base-chinese" \ --wandb "DANCER_CED_EXP" \ --epoch 10 \ --batch 256

Train Semantic Ranking Model

We train our EDA-MLM model on Aishell dataset (datas/eda_mlm/aishell) by using custom DPR project

Evaluation

$ python3 -m error_analysis.score \ --entity_path "./datas/entities/aishell/test/test_1_entities.txt" \ --ref_path "./datas/aishell_test_set/ref" \ --hyp_path "./datas/aishell_test_set/asr_transcription/conformer/hyp"

Datas

Checkpoints

About

Named Entity Correctior for ASR system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors