“Your vision is good?” asked the doctor. “Mm-hm,” replied the patient. “And your dentures fit fine?” “Yep,” the patient said. “No problems with them?” the doctor followed up. “Mm,” the patient said, indicating everything was OK.
The back-and-forth would have made perfect sense to the two people talking in the clinic. But to the automatic speech recognition tool tasked with transcribing it and turning it into visit notes, the “mm-hms” and mumbles became a garbled mess. “Your vision is good?” was caught clearly, but the patient’s reply was documented, nonsensically, as “is it,” making the machine’s version of the encounter all but unintelligible.
While medical providers are trying to decrease physician burnout by turning to tools sold by Microsoft and others to transcribe patient-provider conversations and write visit notes, a recent study found that speech-to-text engines meant to transcribe medical conversations do not accurately record clinically relevant “non-lexical conversational sounds,” or NLCS. The difference between “uh-huh” and “uh-uh” is subtle — but very important — in a clinical context, especially when taking a medical history. However, artificial intelligence tools are still not very good at telling them apart.
This article is exclusive to STAT+ subscribers
Unlock this article — and get additional analysis of the technologies disrupting health care — by subscribing to STAT+.
Already have an account? Log in
Already have an account? Log in
To submit a correction request, please visit our Contact Us page.
STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect