Tags: modelscope/dash-infer
Tags
update sampling, prefix cache, json mode impl (#55) - engine: stop and release model when engine release, and remove deprecated lock - sampling: generate_op heavily modified, remove dependency on global tensors - prefix cache: some bug fix, impove evict performance - json mode: update lmfe-cpp patch, add process_logits, sampling with top_k top_p - span-attention: move span_attn decoderReshape to init - lora: add docs, fix typo - ubuntu: add ubuntu dockerfile, fix install dir err - bugifx: fix multi-batch rep_penlty bug
PreviousNext