Skip to content

HF modeling code forcing FA2 #1

@Qubitium

Description

@Qubitium

In the current hf modeling code, flash_attention_2 is forced to on regardless even if eager is selected by user. It also does not check wether if user env has fa2 installed before force enabling it.

But I see if fa_enabled checks throughout the inference code so it appears fa2 is optional yet with the following code block, flash attention cannot be turned off.

Please clarify if the hf modeling code has hard dependecy on flash_attention_2 or is this a toggle bug which I think is more likely.

 if getattr(config, "_attn_implementation", None) is not None: if config._attn_implementation != "flash_attention_2": logger.warning_once( f"Ignoring the provided attention implementation {config._attn_implementation}") logger.warning_once("Using flash_attention_2 backend instead.") config._attn_implementation = "flash_attention_2" else: config._attn_implementation = "flash_attention_2" self._use_flash_attention_2 = config._attn_implementation == "flash_attention_2"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions