2
$\begingroup$

Lately, I've been working on generating alt text for images using the BLIP model. The model I use is "blip-image-captioning-base" from HuggingFace. However, to generate alt text in Turkish using this model, I'm integrating a BERT tokenizer specifically trained for the Turkish language. This tokenizer is "dbmdz/bert-base-turkish-cased" from Huggingface. I manually add the [BOS] and [EOS] tokens. I configure the model (e.g., model.config.decoder_start_token_id). I also resize the token embeddings using resize_token_embedding and then tie the weights using tie_weights. When I check the output after fine-tuning, the model generate Turkish caption successfully but never uses the [BOS] token and uses a completely different word at the beginning of the sentence. There are no ID conflicts in the dictionary. The model also recognizes the [BOS] token.

In the dataset, the words used at the beginning of sentences are almost non-existent, and those that do occur are never at the beginning of sentences. I haven't been able to solve this problem.

Have you done/are you doing this kind of integration? Do you have any suggestions?

To solve this problem, I manually added [BOS] and [EOS] to the reference captions and performed the training. I reset the weight of the [BOS] token and had it learn again, but I couldn't figure it out.

Expected result: [BOS] <alt_text> [EOS]

Actual result: If I don't use forced_bos_token_id: 'geçiyoruz' <alt_text> [EOS]

If I use forced_bos_token_id: 'geçiyoruz' [BOS] <slightly_trimmed_alt_text> [EOS]

$\endgroup$

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.