30
$\begingroup$

A decent fraction of our newest power users are partly or entirely powered by copy-pasting LLM output. If you've seen a bit of this stuff, it's pretty easy to tell.

This is an interesting situation. For highly technical topics, these users can provide decent answers very quickly, which is perhaps an improvement over the status quo, where often there wouldn't be an answer provided at all. On the other hand, in cases where users are asking about something conceptually tricky that's not directly laid out in standard references, the LLM answer often dodges the question, saying a bunch of true, jargon-loaded things but not directly addressing the issue.

This whole system also seems a bit pointless, both for the answerers (isn't it boring to paste a question, wait, and paste an answer you don't understand?) and for the askers (why not ask the LLM directly, instead of waiting hours for somebody else to do it for you?).

Again, I'm not sure what to make of this, but it's another sign that times are changing.

$\endgroup$
7
  • 24
    $\begingroup$ The first thing to do is to flag undisclosed AI usage. $\endgroup$ Commented Oct 15 at 10:17
  • 2
    $\begingroup$ So what's your question? Or is this just a complaint? $\endgroup$ Commented Oct 15 at 14:34
  • 1
    $\begingroup$ Also: we have multiple pages available on the matter of GenAI/LLM: physics.meta.stackexchange.com/q/14281/25301, physics.meta.stackexchange.com/q/14438/25301, physics.meta.stackexchange.com/q/14801/25301 and even a Help Topic page $\endgroup$ Commented Oct 15 at 14:38
  • 8
    $\begingroup$ @KyleKanos My question is just to ask what other people think about it. I personally don't have a strong opinion against this. Current LLMs can already do a decent job answering many (though certainly not all) questions on this site. It's not clear to me what this site will look like 5 years from now. $\endgroup$ Commented Oct 15 at 17:53
  • 1
    $\begingroup$ Whether you have a strong opinion for or against it is wholly irrelevant as we have an existing site policy opposing its use. Had you looked at any of the prior discussions over the last 18-ish months (the most relevant linked previously), you'd probably have been aware of any of this. $\endgroup$ Commented Oct 15 at 18:04
  • 2
    $\begingroup$ @TobiasFünke The policy that rob’s answer links to doesn’t distinguish between disclosed and undisclosed usage as far as I can tell. $\endgroup$ Commented Oct 15 at 23:09
  • 1
    $\begingroup$ @Ghoster: Undisclosed AI usage is considered plagiarism and against network-wide policy. Beyond that, Physics SE has a site-specific policy (as agreed-upon by the community) against AI-generated content, even with disclosure. $\endgroup$ Commented Oct 27 at 15:56

6 Answers 6

41
$\begingroup$

rob's answer has the policy viewpoint.

But I think sharing opinions or rants can be productive in this case, since it strikes at the very heart of why LLM content is so subtly harmful. Words matter, not for their objective content, but because they're the way one human mind connects to another.

To me, the web was always meant to be a web of humans, a tapestry of minds all connecting to each other, free of physical boundaries.

And now, this open web as a place where humans interact as humans is dying. Traffic to sites like ours is falling and the gamification aspects which were meant to incentivize people to participate now incentivize people to post generated content to watch a number go up with no effort of their own.

Actually defending against generated content is difficult, time consuming and frankly depressing: Where once posts with well-constructed sentences meant that at least someone had cared enough about the question they're responding to to write them, now I find myself constantly wondering whether I'm wasting my time reading something no one could be bothered to write.

As much as SE has always intentionally de-emphasized the social aspects of the sites, the part where it's people with genuine questions asking and people with genuine knowledge answering always mattered to me. So much individual effort invested in all the wonderful answers on this site, each with the particular, peculiar voice of the person writing it, each offered without tangible reward.

And now we get answers that have no voice, that play at being knowledge shared but there is no mind behind them sharing it, no one caring whether they are right or wrong. Just pure bullshit in Frankfurt's sense, words whose only purpose is to exist as words, to provoke someone into clicking the upvote button, words that play at being communication but that communicate nothing.

In a comment, knzhou says:

My question is just to ask what other people think about it. I personally don't have a strong opinion against this.

I have a strong opinion. I'd rather read a dozen answers that are wrong in the interesting ways in which humans can be wrong than this slop where no one cared whether it was right. I want to talk to people, not algorithms.

I don't really know what to do about this.

There cannot be automated defenses against the output of LLMs (or at least I have not seen any plausible strategy for this), and every time I suspend someone for using generated text there's this little voice in the back of my head that goes "Maybe this person just talks like that?" or "Maybe I just don't know this topic well enough to judge". I really don't want to turn away genuine people whose only sin was writing a little weirdly. (Of course there are clear-cut cases, but they're effectively no more troublesome than traditional spam)

But I also don't want to stop trying. If the web dies, if it all turns into machines talking to machines, I want to be able to say I did my best to at least preserve this little part of it as long as it was possible.

$\endgroup$
2
  • 6
    $\begingroup$ This is beautifully written. Thank you for sharing it. $\endgroup$ Commented Oct 16 at 16:31
  • 4
    $\begingroup$ Nice words especially the last para. +1 from a user of another community advocating the same spirit articulated here. $\endgroup$ Commented Oct 18 at 11:02
20
$\begingroup$

The policy is

Generative artificial intelligence (a.k.a. GPT, LLM, generative AI, genAI) tools may not be used to generate content for Physics Stack Exchange.

If you see it, flag it.

I would like to strongly encourage flaggers to use the "standard flags," which put flagged posts into the review queues, before resorting to custom flags which are seen only by diamond moderators. The appropriate flag is probably "very low quality" (for "severe content problems"), which puts the post into a review queue which is accessible to users with more than 2000 reputation points. A comment which (politely) reminds the user that chatbot output is not allowed here might help to guide other users working through the VLQ queue.

The flag "rude or abusive" (for wasting everyone's time and effort), may also be appropriate. Posts which have only a "rude" flag don't seem to reach the non-moderator review queues, so this route isn't effective in crowdsourcing our chatbot detection. However, "rude" flags have other consequences. More about flags.

I really don't want for the diamond moderators to find ourselves in the position of suspending human users for using em-dashes — such superficial "AI giveaways" have particularly low reliability. The more we can crowdsource the detection, the better our outcome will be.

I had a long paragraph here about how much I dislike failing the Turing Test, but I can't quite make it a productive thing to say in public rather than just a rant.

$\endgroup$
17
  • 3
    $\begingroup$ If crowdsourcing is the way to go, we should be able to collaborate on who we think is violating this policy. (To me, it isn’t always obvious.) I’m going to go out on a limb and propose that Physics Meta posts be able to accuse a user of LLM usage; up and down votes can then confirm or refute the suspicion and be the community’s judgement. But I expect that you’re going to say this would be completely inappropriate. $\endgroup$ Commented Oct 15 at 23:18
  • 6
    $\begingroup$ @Ghoster I have participated in online communities where people engage in public callouts of other users, and I have participated in communities where public callouts are not tolerated. Since I joined, Physics has always been a non-callout community. I like it a LOT better that way; it's one of the reasons I'm still around. $\endgroup$ Commented Oct 15 at 23:46
  • 8
    $\begingroup$ Is it possible to add custom text to the "rude or abusive" flag option? Currently only frequent meta visitors would know it's an appropriate flag for AI. $\endgroup$ Commented Oct 16 at 3:50
  • 1
    $\begingroup$ Oh I am laughing at the em-dashes because I use a lot of en-dashes $\endgroup$ Commented Oct 16 at 15:43
  • 6
    $\begingroup$ @naturallyInconsistent The em-dash thing is especially infuriating. Yes, it's true that they're "more" prevalent in AI output than in the average human-created online content. But that's mostly a reflection of the fact that AI models were trained on plenty of text sourced from published, printed works, and em dashes ARE common in human-written books, magazines, etc. (You know — the things nobody actually reads anymore. So they assume humans have somehow lost the ability to use em dashes, when really it's just the 140-character culture that's em-dash impaired.) $\endgroup$ Commented Oct 17 at 20:28
  • 2
    $\begingroup$ @Ferd thanks for an insightful look into something that I had simply not paid attention to. I've just been alt-0150 and alt-0151 to produce the en-dashes and em-dashes on computer and my smartphone keyboard can do both kinds of dashes too. But of course most hoomans are lazy and would not go out of their way to type those if they don't actually have to. $\endgroup$ Commented Oct 17 at 20:49
  • 1
    $\begingroup$ Just for the record, I don't personally use em-dashes at all as an indicator for AI text. That was only relevant 1-2 years ago. AI generated physics now still has a very distinctive flavor, but it's different. $\endgroup$ Commented Oct 18 at 19:03
  • 1
    $\begingroup$ @naturallyInconsistent *nod* I use Linux with a Compose key (Menu), so for me it's just Menu-hyphen-hyphen-hyphen (or Menu-hyphen-hyphen-period for an en dash), and as a result I've been using them regularly, even online, for over 2 decades. But a twitter (I KNOW!!) search for "em dashes" will turn up almost nothing but people discussing them in the context of AI, now. (Right now it's an mix of "I, a real human writer, use em dashes!" and "Nobody uses em dashes except ChatGPT".) $\endgroup$ Commented Oct 20 at 23:55
  • 1
    $\begingroup$ This one makes me the saddest: "I was just informed by a client that I'm no longer allowed to use em dashes in the work I produce for them because of the ridiculous AI perception. 🫠 I'd like to vote myself off this island now, thank you." $\endgroup$ Commented Oct 20 at 23:58
  • $\begingroup$ @FeRD don't stick too much to that. I mean, it is clear that AI currently only uses em dashes, and so if we use em dashes along with en dashes, then we can assert our hoomanity. The whole part of typographical conventions are conventions; we get to choose to swap towards en dashes instead of em dashes if we so wish. $\endgroup$ Commented Oct 21 at 0:31
  • $\begingroup$ If you see it, flag it. I flagged two dozen answers by one AI-using user. Fifteen of these flags were marked as helpful but nine were declined. I’m now banned from flagging for a while. With a different user, I flagged a post where his linked “paper” said “Developed via GPT-5 reasoning”. Even that flag was declined and I got scolded for raising it. I’m done with flagging. $\endgroup$ Commented Oct 25 at 17:21
  • 1
    $\begingroup$ @Ghoster When you raised the first dozen flags targeting a specific user, I went through that user's fifty-plus answers and deleted slightly more than half of them. There was a clear evolution in that user's contributions from acceptable contributions to obvious chatbot copypasta. As soon as your first dozen flags were processed, you flagged a dozen of the answers that I had just decided were okay. I left those to be re-evaluated by other moderators. $\endgroup$ Commented Oct 25 at 18:55
  • 2
    $\begingroup$ Your recollection that no suspension was issued is incorrect. We have a policy of not publicly discussing suspensions and the like in public, so I won't elaborate further — I've already said more than I would in other cases. For "please evaluate every post by this user," custom flags probably are better than standard flags. The flag system, like the voting system, works better for material that you come across organically than it does if you go to someone's profile and flag everything. $\endgroup$ Commented Oct 25 at 20:38
  • 1
    $\begingroup$ @Ghoster Please see the current version of the answer (v4) for an argument in favor of "very low quality" rather than "rude/abusive" flags for crowdsourcing detection. $\endgroup$ Commented Oct 25 at 20:41
  • 6
    $\begingroup$ @rob I really think that we need another flag for this. Sometimes the answers are neither very low quality nor rude/abusive. They are just suspiciously AI-like. Such posts could easily be dismissed from the queue as not being low quality etc., since they aren't $\endgroup$ Commented Oct 28 at 0:56
7
$\begingroup$

Here are my thoughts (not quite a rant).

It sucks when a robot comes for your job (or online hobby in this case). It sucks even more when you realize that the robot is probably better at some aspect of your job than you are.

I've noticed in the past few months that the robots (AI/LLM chatbots) have gotten a lot better at answering well posed classical mechanics problems. I'm not entirely surprised by this since (in some sense) the theoretical analysis "just" involves finite manipulations of discrete symbols, all of which can be represented intangibly. The robot has been programmed not only to predict words in a text sequence, but also to organize lots and lots of prior written knowledge in a useful way. I assume that many of the classical mechanics textbooks are part of its training data.

Many of the old tricks that the robot used to fall for seem to be getting fixed. However, the robot is still pretty bad at reasoning about real events regarding tangible objects in physical space, presumably because this knowledge in humans/animals was developed over millions of years of real-world experience, and it seems so obvious that no one has bothered to try to write down (and therefore such knowledge has not been digested and organized by the robot), or maybe it is not possible/easy to write down.

Returning to classical mechanics problems, I've been challenging the robot with hard problems from old textbooks. And lately I've been surprised by how well it does. I was really surprised that this robot helped me find typos in the "official" solutions to problems in ancient textbooks (e.g., Whittaker).

I think the robot is still not great at answering bad questions. (You know those bad questions we see all the time with false premises, faulty logic, typos, and no punctuation.) But, I'm also bad at answering those questions, probably for different reasons...

So, is the robot coming for the job of a theoretical physicist? Probably not, but it seems to be quite good at the formal symbol manipulation part of that job.

The experimental physicist's job seem much more secure against the robot threat, since it necessarily deals in the tangible (all those wires and pipes wrapped in aluminum foil down in the basement of the physics building).


None of the above thoughts (not a rant!) really addresses OP's question/comment. Just a some thoughts from one human to another, transmitted across the void.

$\endgroup$
1
  • 5
    $\begingroup$ To supplement this: try cutting-and-pasting most numerically driven questions from Halliday-Resnick into ChatGPT and it will produce a correct step-by-step solution. $\endgroup$ Commented Oct 16 at 22:33
6
$\begingroup$

One way to inject just a little bit of friction in the system is to have a poster (question or answer) click a button answering a question like

Was AI used in formulating this question or this answer?

Journals have now started asking versions of this question before you submit either a paper or a referee report. Of course this would never prevent determined and malicious users from answering “no” even if they actually did use AI or LLM but it would be a constant reminder of the site policy (to which one could link).

As to your last part: some people get quite annoyed because they loose 2 reputation points from a random downvotes, so the flip side is they presumably get a high from an upvote, even if the answer or the question was AI-generated. In a world where some obsess over how may “Likes” they have, it’s not hard to conceive that climbing the reputation ladder by any means (especially as you can boast how smart you are to your fiends and friends) is something some users aspire to.

$\endgroup$
4
  • 2
    $\begingroup$ "you can boast how smart you are to your fiends" - I thinking maybe that last word isn't a typo :-) $\endgroup$ Commented Oct 16 at 12:56
  • 1
    $\begingroup$ @JohnRennie unintended pun which I partially corrected… :D $\endgroup$ Commented Oct 16 at 13:15
  • $\begingroup$ Note that the answer box now includes such a reminder. $\endgroup$ Commented Oct 16 at 20:03
  • $\begingroup$ This is how the site works. You create a lot of upvoted content, and you will be the moderator. It gives power for contributing effort. I think we can be happy until they only post AI content. Time will come as they will vote by AI assistants. $\endgroup$ Commented Oct 21 at 11:50
5
$\begingroup$

This whole system also seems a bit pointless, both for the answerers (isn't it boring to paste a question, wait, and paste an answer you don't understand?) and for the askers (why not ask the LLM directly, instead of waiting hours for somebody else to do it for you?).

Cynically, this system might work for spammers, they get to build a reputation (either as asker or answerer) as a seemingly "normal" user with insightful contributions, and for StackExchange Inc to prop up the activity in an increasingly abandoned property.

Naively, it could be because the askers don't know if their question is something an LLM can answer correctly, or they didn't trust LLMs at all (and of course, it's ironic that then they get LLM answers).

LLM answers potential redeeming value is that they can be fixed/improved by others. Personally, I don't buy this, but the site owner or LLM companies might use that argument to push further LLM integration.

$\endgroup$
5
$\begingroup$

I imagine the draw to answering posts by copy/pasting AI generated responses is similar to that of video games and pinball machines. Push a few buttons and your score goes up. It is easy. I imagine it makes some posters feel like they are smart.


Ironically, AI might be a solution to @AcuriousMind's problem of detecting AI generated responses. One way to make an LLM's responses more human-like is a Generative Adversarial Network. This has two parts. A generator is trained to produce the most human-like responses possible. A discriminator is trained to detect which responses are genuinely human and which are AI generated. The output of the discriminator shows shortcomings of the generator and is used to train it better. And repeat. As the generator gets better, so does the discriminator.


Personally, I don't see AI as replacing physicists any time soon. When I started my career, calculations were done with slide rules. Tools are much better now. They have helped physicists to tremendous advances in physics. Even though one physicist now can do the work of many, physicists are not scarce. I expect AI will be another such tool.

That said, I expect it will have an impact on this site. In 5 years, AI will be answering many questions well enough that they won't be asked here. If they are, we will treat it as if the poster had not checked Google before posting.


As an aside, the StackExchange sites are a truly great resource for training AI models. Millions of questions grouped by site. Tags that index topics. Answers labelled with scores that reflect their quality. Each answer attributed to a user who can also be used as a quality measure. Images tagged with descriptions. You can be very sure that companies have paid to use this data for training.

The plus for us is that when LLM's get good at answering physics questions, it will be because of our answers. They will benefit humanity. They will also benefit the owners of the LLMs. This is little different from now. As it is, we are only paid for our answers with scores and recognition. The minus side is we will lose that recognition.

$\endgroup$
2
  • $\begingroup$ Last paragraph - just like all the other artists who are having their work stolen. I guess. Note that the rules about attribution etc. in the license are absolutely clear, so if AI is using our material (and I'm sure it is) they are breaking the license. "You should be aware that all Public Content you contribute is available for public copy and redistribution, and all such Public Content must have appropriate attribution.." $\endgroup$ Commented Nov 4 at 8:01
  • $\begingroup$ It turns out we won't lose all recognition after all. I Googled a programming question yesterday. The top response is always an AI overview. At the bottom was a link to the stackoverflow post from which the overview was drawn. $\endgroup$ Commented Nov 6 at 15:12

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.