Questions tagged [gpt]

Question 1

I’ve seen that GPT 5 has been released for public use recently. It seems that a key difference between the two models, GPT 4 and GPT 5, is that GPT 5 uses a real-time router, which I guess is ...

Question 2

OpenAI seems to be avoiding branding their reasoning models as "GPTs. See, for example, this page from their API docs, which has one column for "GPT models" and another for "...

Question 3

I just discovered that LLMs themselves can be used to generate system prompts for LLMs. Does using them degrade the output quality compared to human-written system prompts?

Question 4

Let's say I have a company X. I have a lot of documents in it: financial, HR, strategies, procedures, customer information and many more. These documents are in different formats: PDF (textual, ...

Question 5

What is the difference between an encoder-decoder transformer and decoder-only transformer with regard to the loss calculation. Specifically, how does the loss signal differ? And how does this relate ...

Question 6

This seems illogical according to the interpretation we use for QKV (q - query (what we are looking for), k - key(what do i have), v - value (what are my values), softmax(k(X) @ q(x).T) - in what ...

Question 7

I wonder why does GPTs use decoder only architecture, instead of full Encoder Decoder architecture. In full encoder-decoder transformer architecture, we convert the input sequence to a contextual ...

Question 8

According to AI and Memory Wall, serving GPT models "involves repeated matrix-vector multiplications", but I don't understand why. Let's suppose I am the sole user of a LLM server, so we ...

Question 9

I'm currently deeply invested in the Transformer Circuits thread in parallel with 3blue1brown's videos (chapter 7 on the MLP layer was released a day or two ago) to gain a better theoretical ...

Question 10

I have an idea for a GPT. While ideally I can use something like ChatGPT or ClaudeGPT or something else, I want to my GPT to have a specific tone when providing responses. This part is very ...

Question 11

I was looking into the loss function in t5x here and see there is a z-loss added to the typical log loss definition. The only paper I could surface on this was https://arxiv.org/abs/1604.08859, but I ...

Question 12

I'm trying to understand how transformer models, such as BERT or GPT, handle negation in sentiment analysis. Specifically, I'm curious about how these models manage to correctly interpret sentences ...

Question 13

I have read the paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin et al. (2018) and "Improving Language Understanding by ...

Question 14

https://huggingface.co/spaces/optimum/llm-perf-leaderboard is great to compare inference times between LLMs but it misses close-sourced LLMs such as GPT 3.5/4 and Claude.

Question 15

I'm learning about GenAI, such as GPT (Generative Pretrained Transformer), and I'm particularly interested in understanding the training techniques used for these models. Deep learning, generally, can ...

Stack Exchange Network

Questions tagged [gpt]

How does the real-time router in GPT 5 differ from the mixture of experts approach GPT 4 uses?

Are the newer OpenAI models such as o1 not generative pretrained transformers?

Does using a LLM generated system prompt into anothe LLM degrade its output quality?

Platform for building own model pupulated by own data

Difference of encoder-decoder to decoder-only transformers w.r.t. loss

Why GPT2 uses biases in QKV attention?

Why GPT uses decoder only architecture, when they can use full encoder decoder architecture?

Why does serving GPT models typically involve GEMV instead of GEMM?

Are there freely available pre-trained "toy models" of transformers suitable for inspecting residual stream?

How much resources do I need to develop my own GPT?

How is this z-loss implementation in t5x related to this paper's loss X?

How do transformer models handle negation in sentiment analysis

How is the bidirectional context achieved in BERT?

Are there leaderboards/tables/stats that compare inference times between close-sourced LLMs such as GPT 3.5/4 and Claude?

What technique is used for training Large Language Models like GPT?

Hot Network Questions