47

I completed my assessment of the AI “Answer Bot” experiment and I don't see it as a good fit for Web Apps SE and don't support moving this experiment forward.

  • I found the answers to be overly long, repetitive, low-quality and poorly aligned with site standards.
  • I found it took substantial effort to review, verify and edit AI answers for subjects I was competent in and even more for those I couldn't readily verify.
  • I think that the answer bot would reduce the interest of experts to contribute quality answers to current questions, and over time this would also erode the value of historical content.
  • I don't think too many users are capable and/or willing to sign on for "Extreme Makeover: WebApps Edition" reviewing and properly editing these answers.

Numerous sites and in-browser tools already provide similar blathering AI responses, while a question on our site is an opportunity to get something different.

Our community does a great job providing answers and while AI’s performance will continue to improve, it isn't currently competing with our human-generated solutions.

I lacked in-depth exposure to AI before my most recent employment. That, combined with an inherent comfort with new approaches and ideas, was reflected in my responses to the AI survey and my optimism for this initiative.

I now have untold hours working with AI and these Answer bot tools continue to underwhelm and disappoint me. Fundamentally they are not optimized for competency but to provide immediate answers based on limited details; minimal meat buried under a mountain of words ("Steak 'n Word Salad").

Answer bot is the antithesis of what this site is about. It provides longwinded answers that are almost never good, sometimes really bad, but mostly just really really (really) long, repetitive, and mediocre. Answers that will easily flood the site and make it unlikely that "good content rises" and "incorrect content falls."

We have a "question" problem. AI has an "answer" problem. IMO not a good match.
 


Thanks Rubén & Berthold

In the spirit of "it's not you, it's me it's your AI friend" I really want to give a big shout out to fellow Web Apps mod Rubén and CM Berthold. A lot of human effort went into this experiment and continues to. I was treated extremely well by staff, never felt pressured, and when I provided SE similar, albeit less saucy, feedback they thanked me for my involvement, and we were not asked to volunteer the site for the next round of the experiment. I had committed to providing feedback to the community following SE's exposure of the experiment, and in spite of this no effort was made to coach me or get insight into what I would say.

5
  • "I found it took substantial effort to review, verify and edit AI answers for subjects I was competent in and even more for those I couldn't readily verify." To be fair, one would probably compare this effort to the substantial effort it would take to write equal good answers from scratch. Do you think that would save time? Or rather not. Commented Feb 7 at 15:38
  • "minimal meat buried under a mountain of words" One way to improve them , if one wanted that, would be to first reduce the mountain of words and cut everything that isn't related away, then examine what is left and see if it holds up to anything significant. Commented Feb 7 at 15:39
  • 2
    @NoDataDumpNoContribution What is left is minimal meat. I've danced this dance numerous times with ChatBots, the only thing worse than the Steak 'n Word Salad is actually seeing the protein all by itself (in those cases where the answer contained some). Commented Feb 8 at 1:09
  • 1
    I understand why they went for unanswered questions in this experiment, as to not compete with humans and generate additional value, but maybe there is a reason they are not answered and maybe that makes AI output on them even less valuable because the AI also fails. Apart from other problems like missing attention, maybe we should simply wait and repeat the experiment in a couple of years in order to see if there is more meat then. The output seems to be substantially below human standards. Commented Feb 8 at 8:22
  • 1
    @NoDataDumpNoContribution These tools are not good problem solvers. They will happily admit to their shortcomings, and they are serious. flaws. In their defense the output is not below "all" humans standards, just the ones I'm interested in getting help from and collaborating with at work. Quote from ChatGPt: "deficiencies in my ability to deliver meaningful insight and solutions are evident. These failings are not incidental but fundamental, highlighting significant shortcomings that undermine my utility as a reliable tool for the complex or nuanced tasks" Commented Feb 8 at 21:19

2 Answers 2

24

Blindspots, I'd like to thank you and Rubén for the time you dedicated to this, and for the thoughtful and candid assessment here. This validates many of the assumptions that the company and the community had, and your open-minded and objective approach is incredibly valuable.

I'm going to go out on a limb and say that I think it's very likely that this experiment will conclude and not make it into the product long term (at least without a massive overhaul), so your findings and learnings help affirm where value can and cannot be found with GenAI integrations.

6
  • 5
    I appreciate that especially in light of the feedback. I must say that your team is high functioning and it is unfortunately merely AI that is letting them down. Commented Feb 6 at 22:25
  • 3
    That's very kind of you to say. I'll pass it along to the folks you worked with. :) Commented Feb 6 at 22:40
  • @Philippe is this talking about all sites or just web apps Commented Feb 9 at 15:00
  • 5
    @Starship As I said, I'm going out on a limb a bit (so don't hold me to this, because it's not an answer signed in blood, but represents my - informed - best guess), but I think it's fairly safe to say that this experiment will not deploy to any further sites (at least without a massive overhaul). Commented Feb 10 at 19:17
  • @Philippe Thank you. What would such a massive overhaul consist of? Response to the community feedback to the announcement? A better AI model? Something else? Also, do you believe this experiment will eventually be removed from the sites it is currently on? Commented Feb 10 at 19:30
  • 5
    if it isn't an answer signed in blood, then what good is it to us? Seriously, though 👏 Commented Feb 12 at 4:21
4

Thanks for sharing your experience! Would it be possible to share Answer Bot's answers? I'd be curious to see how they looked. Note that verbosity is often an issue with AI answers, but can be mitigated to some extent by amending the prompt.

3
  • 4
    "Would it be possible to share Answer Bot's answers?" You should direct that to the project team; I'm not involved at this point. Commented Feb 6 at 22:23
  • 6
    The feedback from the mods here, and on the other test sites, helped our team refine the prompt before the current expansion of visibility. Verbosity is indeed an issue. Right now we're being very careful about the private answer content being exposed to any kind of public crawling or consumption, so we'd prefer that nothing be circulated. Thank you for understanding. Commented Feb 7 at 0:35
  • 1
    @Berthold would you mind sharing at least a few Q&A pairs in the form of a screenshot? Because this way I could produce insanely good Claude 3.7 answers that would basically address 95% of factual concerns raised in this post. Commented Mar 21 at 6:16

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.