Clinical AI Faceoff: OpenAI's ChatGPT for Clinicians vs OpenEvidence vs DoxGPT
This is a free preview of a paid episode. To hear more, visit techysurgeon.substack.com [https://techysurgeon.substack.com?utm_medium=podcast&utm_campaign=CTA_7]
Thank you to everyone who tuned into my live video! Join me for my next live video in the app.
I went live at 6:45 the other morning to open three tabs, ChatGPT for Clinicians, Doximity GPT, and OpenEvidence, and ask them the same questions. A few dozen clinicians and subscribers joined at that hour on a Sunday, which I did not expect, and I’m grateful for.
The headline finding isn’t who won. It’s that it seems soon you won’t be able to tell the three tools apart from the navigation bar. Each one now has an ambient scribe (or form of one). Each one tracks CME. Each one has a “skills” or “dot flows” tab that, today, mostly amounts to baked prompts dressed up as workflows. OpenEvidence has a feature literally called the dialer — Doximity has had a dialer for a decade. The product surface is converging fast.
A quick disclaimer before we go further: opinions here are mine alone. I have no financial relationship with any of these companies. I selected these three because they appear to be getting the most traction in the marketplace — not because they’re the only ones worth your time. Up-to-Date Expert AI, Glass Health, Abridge’s embedded answering, and others all deserve their own look.
What each tool is best at right now
ChatGPT for Clinicians is the new entrant. Verification is rigorous — NPI, photo of a driver’s license, a ClearID face match — which I read as a deliberate credibility signal. Underneath, the experience is polished but the clinical answers were the weakest of the three on the queries I ran. There is a skills surface that hints at where this is going, but most of the entries today function as prompts rather than true agentic workflows. I did not see a Business Associate Agreement presented during signup, and I have not yet found a satisfying answer on PHI handling.
Doximity GPT quietly has the best one-off clinical answers right now. Not by a wide margin, the others are good, but on a hip arthroplasty question and a DVT prophylaxis report, Doximity surfaced the PREVENT CLOT trial [https://www.nejm.org/doi/full/10.1056/NEJMoa2205395] and the CRISTAL trial [https://jamanetwork.com/journals/jama/fullarticle/2802988] at the top of the response, where a domain expert would put them. For a clinician, citation prioritization is trust. Doximity also brings a distribution moat the others can’t replicate quickly — the dialer, fax, telehealth, the news network, and Peer Check (where physician experts grade the answers) — and a redesigned interface that’s the cleanest of the three.
OpenEvidence has the lowest friction and the fastest latency. They are clearly throwing serious compute at the answer surface. The differentiator most clinicians never find is Deep Consult. Turn it on, answer two or three follow-up questions, and you get a research-grade brief with embedded figures from JAMA and NEJM, made possible by the licensing partnerships OpenEvidence has signed with NEJM Group [https://www.nejm.org/] and other major publishers. When I asked Deep Consult to brief me on secondary fracture prevention for a quality improvement committee, the output was something I could have walked into a department meeting with that morning.
Distribution beats product when the products converge
All three are free. All three answer questions credibly (ChatGPT least so). All three are racing to bolt on the same surrounding capabilities. Doximity wins on installed clinician base. OpenEvidence wins on speed, trajectory, raw capability and on Deep Consult. ChatGPT for Clinicians wins, today, on almost nothing — but the verification gate suggests they intend to be taken seriously, and they hold a foundational model and patient facing asset the others don’t.
The chat interface is no longer the moat. The moat is whoever first connects grounded clinical evidence to native multimodal output, real workflow extensibility, and physician-earned trust without forcing the clinician to play copy-paste between four tabs to get there.
This is the worst these tools will ever be. That should change how we evaluate them: I’m less interested in the question “does it work today?” , and more gravitating to “how do we shape what it becomes?” Date several. Marry none. Use the tool that fits the question in front of you. Send the teams behind them living, breathing feedback.
On the Horizon
None of these tools is built for patients. The updated guidelines on incidental hepatic steatosis answer ChatGPT gave me this morning was reasonable, and I am a bone surgeon. I read it the way a layperson would, and I would not stake decisions on it without help. The literacy gap between clinical outputs and patient comprehension is not a UX problem. It is a safety problem. Tearing down the gate before we have built tools that respect that gap is how we get harm.
The administrative arms race — prior authorization letters, denial appeals, faster note-writing — is a symptom, not a cure. We went deep and fast on the workflows where the money lives, which are the workflows our payer infrastructure forces clinicians to spend their evenings on. That work is valuable, and it is not patient-facing. The places where AI could actually move outcomes — secondary fracture prevention, fall prevention, post-op care navigation, osteoporosis treatment rates that sit around 20% after a fragility fracture [https://www.bonehealthandosteoporosis.org/] when the evidence base for treatment is overwhelming — are still under-resourced.
Trust and co-design with clinicians is the unlock. OpenAI has not earned it yet for clinical use. Doximity and OpenEvidence have, in different ways, by being physician-forward from day one. That posture is not optional going forward — it is the moat.
The path from clinical intelligence in your pocket to democratized, evidence-based care that actually moves quality and outcomes runs straight through clinicians willing to show up and iterate.
That is the dream. We are not there. We can get there.
Christian Péan, MD, MHS, is an orthopedic trauma surgeon in Durham, North Carolina. He is core faculty at the Duke-Margolis Institute for Health Policy and CEO and co-founder of RevelAi Health [https://revelaihealth.com/], an AI care management platform for value-based care. Opinions are his own.
🔒 For paid subscribers — the full demo and the operator’s notes
The complete screen-share from the morning’s livestream. Side-by-side queries across all three tools, the Deep Consult walkthrough, the prior authorization generation, the acetabular fracture surgical-plan comparison, the live multimodal handoff into a branded HTML committee deck, the connector detour inside Claude, and the clinical trials map I discovered on stage.