Skip to main content

Questions tagged [pretraining]

1 vote
0 answers
20 views

Following this tutorial, I trained YOLOv8 on a custom dataset to detect holes in the ground. I obtained acceptable results considering how small my training set is (about 50 pictures). When preparing ...
Sheldon's user avatar
  • 205
2 votes
1 answer
73 views

Let us assume the training of a BERT model. An initial pre-train is performed with a large data set A. Subsequently a finetuning is performed with a dataset B which is part of A, but now with labels ...
Álvaro Loza's user avatar
2 votes
1 answer
1k views

is window size and context length of language model one and the same thing? ******** following text is added as question with ONLY above text was not allowed ***** I am trying to understand how GPT ...
Vinay Sharma's user avatar
8 votes
3 answers
6k views

My goal is to use the general knowledge and language understanding of a pre-trained LLM and to continue training on a smaller domain specific corpus to improve the model's knowledge on the domain. ...
Arthuro's user avatar
  • 111
1 vote
1 answer
332 views

I've trained a transformer model based on the pytorch tutorial: https://pytorch.org/tutorials/beginner/transformer_tutorial.html, But I found I've difficulties to understant this model's input and ...
Clock ZHONG's user avatar
0 votes
1 answer
32 views

I am reading up about SeqGAN and I am trying to understand the pretraining step better. The authors claim they want to maximize the Maximum Likelihood Estimation on the dataset S by pretraining the ...
postnubilaphoebus's user avatar
1 vote
0 answers
16 views

Language-to-code transformation/generation require multiple skills - language and reasoning skills to digest the core problem from the natural language specification. And programming language ...
TomR's user avatar
  • 141
1 vote
0 answers
347 views

Fine tuning is a concept commonly used in deep learning. We may have a pre-trained model and then fine-tune it to our specific task. Does that apply to simple models, such as logistic regression? For ...
eduardokapp's user avatar

15 30 50 per page