HF modeling code forcing FA2

In the current hf modeling code, flash_attention_2 is forced to on regardless even if eager is selected by user. It also does not check wether if user env has fa2 installed before force enabling it.

But I see if fa_enabled checks throughout the inference code so it appears fa2 is optional yet with the following code block, flash attention cannot be turned off.

Please clarify if the hf modeling code has hard dependecy on flash_attention_2 or is this a toggle bug which I think is more likely.

 if getattr(config, "_attn_implementation", None) is not None: if config._attn_implementation != "flash_attention_2": logger.warning_once( f"Ignoring the provided attention implementation {config._attn_implementation}") logger.warning_once("Using flash_attention_2 backend instead.") config._attn_implementation = "flash_attention_2" else: config._attn_implementation = "flash_attention_2" self._use_flash_attention_2 = config._attn_implementation == "flash_attention_2"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF modeling code forcing FA2 #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HF modeling code forcing FA2 #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions