Skip to main content

Questions tagged [gpt]

For questions related to GPT (which stands for Generative Pre-Training), which is a combination of transformers (proposed in "Attention is All You Need") and unsupervised pre-training for solving language tasks, such as machine translation. GPT was proposed in "Improving Language Understanding by Generative Pre-Training" (2018) by Open AI. There's also GPT-2, which was proposed in "Language Models are Unsupervised Multitask Learners" (2019) by Open AI.

2 votes
2 answers
116 views

I’ve seen that GPT 5 has been released for public use recently. It seems that a key difference between the two models, GPT 4 and GPT 5, is that GPT 5 uses a real-time router, which I guess is ...
Mr. AI Cool's user avatar
1 vote
1 answer
379 views

OpenAI seems to be avoiding branding their reasoning models as "GPTs. See, for example, this page from their API docs, which has one column for "GPT models" and another for "...
kuzzooroo's user avatar
  • 121
0 votes
1 answer
97 views

I just discovered that LLMs themselves can be used to generate system prompts for LLMs. Does using them degrade the output quality compared to human-written system prompts?
user avatar
0 votes
1 answer
84 views

Let's say I have a company X. I have a lot of documents in it: financial, HR, strategies, procedures, customer information and many more. These documents are in different formats: PDF (textual, ...
bpiec's user avatar
  • 109
2 votes
1 answer
155 views

What is the difference between an encoder-decoder transformer and decoder-only transformer with regard to the loss calculation. Specifically, how does the loss signal differ? And how does this relate ...
Green 绿色's user avatar
0 votes
1 answer
377 views

This seems illogical according to the interpretation we use for QKV (q - query (what we are looking for), k - key(what do i have), v - value (what are my values), softmax(k(X) @ q(x).T) - in what ...
Тима 's user avatar
0 votes
0 answers
717 views

I wonder why does GPTs use decoder only architecture, instead of full Encoder Decoder architecture. In full encoder-decoder transformer architecture, we convert the input sequence to a contextual ...
Parag Londhe's user avatar
3 votes
0 answers
109 views

According to AI and Memory Wall, serving GPT models "involves repeated matrix-vector multiplications", but I don't understand why. Let's suppose I am the sole user of a LLM server, so we ...
nalzok's user avatar
  • 411
1 vote
0 answers
46 views

I'm currently deeply invested in the Transformer Circuits thread in parallel with 3blue1brown's videos (chapter 7 on the MLP layer was released a day or two ago) to gain a better theoretical ...
Jonas Zaugg's user avatar
0 votes
0 answers
118 views

I have an idea for a GPT. While ideally I can use something like ChatGPT or ClaudeGPT or something else, I want to my GPT to have a specific tone when providing responses. This part is very ...
confused's user avatar
  • 143
0 votes
0 answers
177 views

I was looking into the loss function in t5x here and see there is a z-loss added to the typical log loss definition. The only paper I could surface on this was https://arxiv.org/abs/1604.08859, but I ...
Jacob B's user avatar
  • 279
0 votes
1 answer
185 views

I'm trying to understand how transformer models, such as BERT or GPT, handle negation in sentiment analysis. Specifically, I'm curious about how these models manage to correctly interpret sentences ...
John Smith's user avatar
0 votes
1 answer
156 views

I have read the paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin et al. (2018) and "Improving Language Understanding by ...
XYJ's user avatar
  • 3
0 votes
1 answer
49 views

https://huggingface.co/spaces/optimum/llm-perf-leaderboard is great to compare inference times between LLMs but it misses close-sourced LLMs such as GPT 3.5/4 and Claude.
Franck Dernoncourt's user avatar
3 votes
2 answers
2k views

I'm learning about GenAI, such as GPT (Generative Pretrained Transformer), and I'm particularly interested in understanding the training techniques used for these models. Deep learning, generally, can ...
Exploring's user avatar
  • 381

15 30 50 per page
1
2 3 4 5
7