Does Google DeepMind's Gemma 7B models specs have inconsistent dimensions?

Question

In Google DeepMind's Gemma technical paper (https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf), the 7B Gemma model specs are given as d_model = 3072, num_heads = 16 and head_size = 256 for the 7B model. They don't seem consistent (16 * 256 != 3072). Since the dimension is distributed across h heads, I think this should hold true -

#heads * #head_size = d_model

This is also explained in the original Transformers paper, "Attention Is All You Need".

This equation holds for the specs provided for Gemma 2B model in the same paper. Am I missing something with the 7B Gemma specs? Or does this paper have an error?

brewmaster321 · Accepted Answer · 2024-03-05 08:18:11Z

From the annotated transformer link - https://nlp.seas.harvard.edu/annotated-transformer/#full-model For the positionwiseFeedForward network: D_model = 512, and the inner layer has dimensionality dff = 2048.

It’s hard to be certain but perhaps the Gemma paper is using the dff for model size. I’ve always had the same understanding as you – n_heads * size_of_head = d_model, regardless of hidden dimension. This appears to agree with the hidden_size parameter on hugging_face for gemma_7b:

• hidden_size (int, optional, defaults to 3072) — Dimension of the hidden representations.

https://huggingface.co/docs/transformers/en/model_doc/gemma

Stack Exchange Network

Does Google DeepMind's Gemma 7B models specs have inconsistent dimensions?

1 Answer 1

Hot Network Questions

Does Google DeepMind's Gemma 7B models specs have inconsistent dimensions?

1 Answer 1

Related

Hot Network Questions