Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”

“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”

    • rudyharrelson@lemmy.radio
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 months ago

      People always say this on stories about “obvious” findings, but it’s important to have verifiable studies to cite in arguments for policy, law, etc. It’s kinda sad that it’s needed, but formal investigations are a big step up from just saying, “I’m pretty sure this technology is bullshit.”

      I don’t need a formal study to tell me that drinking 12 cans of soda a day is bad for my health. But a study that’s been replicated by multiple independent groups makes it way easier to argue to a committee.

      • irate944@piefed.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 months ago

        Yeah you’re right, I was just making a joke.

        But it does create some silly situations like you said

      • BillyClark@piefed.social
        link
        fedilink
        English
        arrow-up
        0
        ·
        3 months ago

        it’s important to have verifiable studies to cite in arguments for policy, law, etc.

        It’s also important to have for its own merit. Sometimes, people have strong intuitions about “obvious” things, and they’re completely wrong. Without science studying things, it’s “obvious” that the sun goes around the Earth, for example.

        I don’t need a formal study to tell me that drinking 12 cans of soda a day is bad for my health.

        Without those studies, you cannot know whether it’s bad for your health. You can assume it’s bad for your health. You can believe it’s bad for your health. But you cannot know. These aren’t bad assumptions or harmful beliefs, by the way. But the thing is, you simply cannot know without testing.

        • Slashme@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          3 months ago

          Or how bad something is. “I don’t need a scientific study to tell me that looking at my phone before bed will make me sleep badly”, but the studies actually show that the effect is statistically robust but small.

          In the same way, studies like this can make the distinction between different levels of advice and warning.

    • scarabic@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      It’s actually interesting. They found the LLMs gave the correct diagnosis high-90-something percent of the time if they had access to the notes doctors wrote about their symptoms. But when thrust into the room, cold, with patients, the LLMs couldn’t gather that symptom info themselves.

      • Hacksaw@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        3 months ago

        LLM gives correct answer when doctor writes it down first… Wowoweewow very nice!

          • Hacksaw@lemmy.ca
            link
            fedilink
            English
            arrow-up
            0
            ·
            3 months ago

            If you seriously think the doctor’s notes about the patient’s symptoms don’t include the doctor’s diagnostic instincts then I can’t help you.

            The symptom questions ARE the diagnostic work. Your doctor doesn’t ask you every possible question. You show up and you say “my stomach hurts”. The Doctor asks questions to rule things out until there is only one likely diagnosis then they stop and prescribe you a solution if available. They don’t just ask a random set of questions. If you give the AI the notes JUST BEFORE the diagnosis and treatment it’s completely trivial to diagnose because the diagnostic work is already complete.

            God you AI people literally don’t even understand what skill, craft, trade, and art are and you think you can emulate them with a text predictor.

  • softwarist@programming.dev
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    As neither a chatbot nor a doctor, I have to assume that subarachnoid hemorrhage has something to do with bleeding a lot of spiders.

  • BeigeAgenda@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    Anyone who have knowledge about a specific subject says the same: LLM’S are constantly incorrect and hallucinate.

    Everyone else thinks it looks right.

    • tyler@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      That’s not what the study showed though. The LLMs were right over 98% of the time…when given the full situation by a “doctor”. It was normal people who didn’t know what was important that were trying to self diagnose that were the problem.

      Hence why studies are incredibly important. Even with the text of the study right in front of you, you assumed something that the study did not come to the same conclusion of.

    • zewm@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      It is insane to me how anyone can trust LLMs when their information is incorrect 90% of the time.

  • rumba@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    Chatbots make terrible everything.

    But an LLM properly trained on sufficient patient data metrics and outcomes in the hands of a decent doctor can cut through bias, catch things that might fall through the cracks and pack thousands of doctors worth of updated CME into a thing that can look at a case and go, you know, you might want to check for X. The right model can be fucking clutch at pointing out nearly invisible abnormalities on an xray.

    You can’t ask an LLM trained on general bullshit to help you diagnose anything. You’ll end up with 32,000 Reddit posts worth of incompetence.

  • cub Gucci@lemmy.today
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    3 months ago

    “but have they tried Opus 4.6/ChatGPT 5.3? No? Then disregard the research, we’re on the exponential curve, nothing is relevant”

    Sorry, I’ve opened reddit this week