Questions tagged [pretrained-models]
For questions related to pre-trained model. A pre-trained model is a model that was trained on a large benchmark dataset to solve a problem similar to the one that we want to solve. Accordingly, due to the computational cost of training such models, it is common practice to import and use models from published literature (e.g. VGG, Inception, MobileNet)
31 questions
2 votes
0 answers
69 views
Fine-tuning ResNet101 stuck at ~50% accuracy while MobileNetV2 reaches ~90% (same data, head, training setup)
I'm fine-tuning two different CNNs for an image classification task: The first CNN uses a ResNet101 backbone, and the second uses a MobileNetV2 backbone. Both are pre-trained on ImageNet. I use the ...
0 votes
1 answer
97 views
Reference request: data efficiency of LLM pre-training
I've seen it stated multiple times that LLMs have much worse data efficiency than humans (IE require more data to reach same or worse performance), EG this Tweet by Yann LeCun, or 19:30 in this talk ...
0 votes
3 answers
127 views
Is there a model for facial detection based on an infrared camera?
I need an AI model for facial detection based on an infrared camera Is there an existing model for this with per-trained weights? Does this model work well when the lighting conditions may change ...
1 vote
1 answer
112 views
Multi-task objective sometimes improve single-task performance, but is this true when fine tuning?
It is known that multitask objectives in neural networks sometimes have the effect of improving the performance of the neural network for each of the tasks individually (versus training the same ...
0 votes
1 answer
94 views
Is size of trained model on disk a good measure of model complexity?
I am writing a research paper on my own custom CNN model for image classification. I am comparing my model architecture with pre-trained architectures, like DenseNet121 and InceptionV3. I want to ...
2 votes
1 answer
535 views
Should I use pretrained model for image classification or not?
I have thousands of images similar to this. I can classify them using existing metadata to different folders according to gravel product type loaded on the truck. What would be optimal way to train a ...
0 votes
1 answer
64 views
Do different ngrams share embedding in Fasttext?
As per Section 3.2 in the original paper on Fasttext, the authors state: In order to bound the memory requirements of our model, we use a hashing function that maps n-grams to integers in 1 to K ...
0 votes
1 answer
513 views
Fine Tuning Transformer Model for Machine Translation
I am working on the Transformer example demonstrated on TensorFlow's website. https://www.tensorflow.org/text/tutorials/transformer In this example, Machine Translation model is trained to translate ...
3 votes
1 answer
886 views
What is the difference between prompt tuning and prefix tuning?
I read prompt tuning and prefix tuning are two effective mechanisms to leverage frozen language models to perform downstream tasks. What is the difference between the two and how they work really? ...
3 votes
1 answer
681 views
Using a pre-trained model to generate labels to data to then train a model on
I'm trying to set up a pipeline for my ML models to automatically re-train themselves whenever concept drift occurs to recalibrate to the new output distributions. However, I can't get ground-truth ...
1 vote
1 answer
823 views
How to Train a Decoder for Pre-trained BERT Transformer-Encoder?
Context: I am currently working on an encoder-decoder sequence to sequence model that uses a sequence of word embeddings as input and output, and then reduces the dimensionality of the word embeddings....
4 votes
1 answer
4k views
What is the difference between fine tuning and variants of few shot learning? [duplicate]
I am trying to understand the concept of fine-tuning and few-shot learning. I understand the need for fine-tuning. It is essentially tuning a pre-trained model to a specific downstream task. However, ...
0 votes
1 answer
2k views
Is it possible that the fine-tuned pre-trained model performs worse than the original pre-trained model?
I have downloaded a pre-trained EfficientDet D2 model (Tensorflow 2.0) and trained it on some data (about 20000 images with 20 classes). I set the number of steps to 25000 and batch size to 3 (...
2 votes
1 answer
3k views
Does BERT freeze the entire model body when it does fine-tuning?
Recently, I came across the BERT model. I did some research and tried some implementations. I wanted to tackle a NER task, so I chose the BertForSequenceClassifications provided by HuggingFace. ...
0 votes
2 answers
1k views
How to design a neural network with arbitrary input and output length?
I am trying to build a neural network that has an input of $n$ pairs of integer values (where $n$ is random) and a corresponding output of a binary array with length $n$. The input will be a set of ...