- Notifications
You must be signed in to change notification settings - Fork 63
Open
Description
In the current hf modeling code, flash_attention_2 is forced to on regardless even if eager is selected by user. It also does not check wether if user env has fa2 installed before force enabling it.
But I see if fa_enabled checks throughout the inference code so it appears fa2 is optional yet with the following code block, flash attention cannot be turned off.
Please clarify if the hf modeling code has hard dependecy on flash_attention_2 or is this a toggle bug which I think is more likely.
if getattr(config, "_attn_implementation", None) is not None: if config._attn_implementation != "flash_attention_2": logger.warning_once( f"Ignoring the provided attention implementation {config._attn_implementation}") logger.warning_once("Using flash_attention_2 backend instead.") config._attn_implementation = "flash_attention_2" else: config._attn_implementation = "flash_attention_2" self._use_flash_attention_2 = config._attn_implementation == "flash_attention_2"Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels