Timeline for GPT on the platform: Data, actions, and outcomes

Current License: CC BY-SA 4.0

14 events

when toggle format	what		by	license	comment
Jun 9, 2023 at 16:22	comment	added	Joel Aelwyn		@NotTheDr01ds Fair enough, just wanted to make sure it wasn't being overlooked.
Jun 9, 2023 at 10:43	comment	added	NotTheDr01ds		@JoelAelwyn Absolutely, the chances of us finding GPT usage are even higher when we have more data to go on. The point here is simply that some of us do feel that we can identify at a very high rate just based on the unedited GPT output.
Jun 9, 2023 at 9:45	comment	added	Passer By		^That. I for one rely heavily on user history to find GPT plagiarists. It starts by smelling something fishy with the post, and ends in confirmation with user history.
Jun 9, 2023 at 1:22	comment	added	Joel Aelwyn		Unfortunately, most of these scenarios appear to overlook several of the input metrics that various mods have said they take into account (at least in anything but the most glaringly obvious cases) — things like prior posting history and style, just to pick one example (but by no means the only one).
Jun 8, 2023 at 18:20	comment	added	NotTheDr01ds		All good scenarios as well. Just wondering why nothing like this seems to have been done since SE claims this is an "evidence-backed" assumption.
Jun 8, 2023 at 16:16	comment	added	Wrzlprmft		I proposed a similar test (in moderator-only space), however using pre-GPT answers that score highly on Huggingface (or whatever) vs. GPT-generated answers. This makes the test considerably more difficult of course, but otherwise you have the problem that the vast majority of human answers do not look GPT-generated at all and thus the GPT answers stand out. This wouldn’t reflect a collection of answers flagged for being machine-generated in practical usage.
Jun 8, 2023 at 15:19	comment	added	Cris Luengo		A better test would have a small fraction of the 100 posts be ChstGPT-generated, and the mod should not know how many there actually are.
Jun 8, 2023 at 4:27	comment	added	Ryan M		A more realistic scenario would be to have groups of three-ish answers from the same user, some groups from real users and some from ChatGPT. Then ask to differentiate which are which.
Jun 8, 2023 at 0:14	comment	added	Proud anti-zionist		Even if we may misidentify some posts as AI-generated, the damage of allowing them is much larger than the damage of removing them. It's better to lower the punishment, and start with just a warning, and put the user on a watchlist for future offenses, which times out.
Jun 7, 2023 at 22:26	comment	added	NotTheDr01ds		@ErikA To be honest, I'm not a SME in most of the answers I'm identifying as coming from ChatGPT. I've just seen thousands of them personally, and I feel I'm fairly good at pattern recognition (obviously a biased statement, but I'm willing to put it to the test). And when I saw thousands, today I just added my 2,000th answer to my "ChatGPT Output" "Saves" list on SO. I have another 144 on a similar list on Ask Ubuntu. But I absolutely agree that for code answers, you must be a SME to identify the issues.
Jun 7, 2023 at 22:12	comment	added	Esther		I second the non-moderator test. I'm pretty sure there are many users who have seen enough GPT answers to have a fairly low error rate in detecting them. I would be quite curious to see how well I do.
Jun 7, 2023 at 22:02	comment	added	Erik A		If you want to add an additional level of realism, first, lay them in front of a subject-matter expert (to substitute the flagger) to identify possible use of ChatGPT. Then, give those results to a mod to do a second check. I'd imagine your approach would still lead to vastly more incorrect suspensions than what's actually happening, if I flag for the tags I follow I'm 100% confident it's not a human.
Jun 7, 2023 at 21:45	history	edited	NotTheDr01ds	CC BY-SA 4.0	added 609 characters in body
Jun 7, 2023 at 21:33	history	answered	NotTheDr01ds	CC BY-SA 4.0