This study compared ChatGPT responses to patient questions with responses from physicians on a medical subreddit.

https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2804309

A panel of physicians evaluated the responses while blinded. ChatGPT was found to produce higher quality and more empathetic responses.

Some key caveats though.

The panel didn't evaluate for accuracy of ChatGPT's responses. Given ChatGPT's tendency to completely fabricate information and "hallucinate" this makes me wary of drawing too many conclusions from this study especially in regards to accuracy of medical information.

The responses and questions were taken randomly from the Reddit forum. The only filter was to include responses from verified physicians as some non-physicians can reply. And they only included the first response from the physician, not any follow-up responses. So there might be a lot of variation in quality of the responses and they may not be necessarily complete.

ChatGPT's responses were significantly longer than the physicians' responses. It may have been a byproduct of the source of the responses being Reddit. I imagine in the real world a physician would have a more personable response to their actual patient and a more thorough response. If you look at the sample responses, ChatGPT's verbosity allowed for more empathizing with the patient's situation, while the physicians' responses tended to be more to the point in their answers to the problem.

However all that said, I think this is a good study showcasing how ChatGPT can be a tool for a physician to augment their communication with patients. This was in fact the impetus for the study- physicians spend a lot of time responding to patient messages in virtual patient portals. I think this shows that ChatGPT can help them craft more empathetic responses and increase their productivity. I would like to see it though tested against actual clinical encounters rather than a Reddit forum where two strangers are communicating.

In the sensitivity analyses you can see that verbosity may be doing a lot of work here as physician responses that were longer scored higher than average. ChatGPT was trained to be more verbose, and it may have the byproduct of seeming more empathetic as a result.