Comparative Study of Five Chatbots Including GPT and Gemini
Analysis of 43 Knee Replacement Surgery Questions

A new study has found that medical information provided by artificial intelligence (AI) chatbots can be used as a tool to assist with patient education and clinical consultations.

Siyeong Song, Professor of Orthopedic Surgery at Hallym University Dongtan Sacred Heart Hospital. Hallym University Dongtan Sacred Heart Hospital

Siyeong Song, Professor of Orthopedic Surgery at Hallym University Dongtan Sacred Heart Hospital. Hallym University Dongtan Sacred Heart Hospital

View original image

The research team led by Professor Si-young Song from the Department of Orthopedics at Hallym University Dongtan Sacred Heart Hospital announced the results of a comparative analysis of the answer accuracy of five AI chatbots that provide information on knee replacement surgery.


The research paper, titled "Comparative Analysis of Answers to Knee Replacement Surgery Questions by GPT-3.5, GPT-4, GPT-4 Omni, Gemini Advanced, and Gemini 1.5," was published in the January issue of the SCIE journal "Journal of Orthopedic Sports Medicine," which covers the fields of orthopedics and sports medicine.


The research team selected 43 frequently asked questions from patients before and after surgery, based on Google search trends and consultation with orthopedic specialists. The questions were divided into six categories: an overview and process of surgery; surgical indications and outcomes; side effects and complications; pain and recovery process; postoperative activities; and alternatives to surgery and modified surgical techniques.


Each question was presented to five large language model (LLM)-based chatbots—GPT-3.5, GPT-4, GPT-4 Omni, Gemini Advanced, and Gemini 1.5—in the same way. Two orthopedic specialists in knee replacement surgery evaluated the accuracy and relevance of the chatbot answers. The evaluation used a 5-point Likert scale, and the reviewers were blinded to the type of chatbot providing the answer.


According to the analysis, GPT-3.5, GPT-4, GPT-4 Omni, and Gemini 1.5 all demonstrated high accuracy with average scores above 4.8 for all questions, and the relevance of their answers was rated at 100%. In contrast, Gemini Advanced received a lower average accuracy score of 4.07 and a relevance score of 83.7%, both of which were lower than the other chatbots.


Notably, there were significant performance differences between chatbots for questions related to surgical indications and outcomes, as well as alternatives and modified surgical techniques for knee replacement. While GPT-3.5, GPT-4, GPT-4 Omni, and Gemini 1.5 received scores close to 5 in these areas, Gemini Advanced was rated significantly lower from a statistical standpoint.


The research team explained that some chatbots tended to respond to certain questions by advising users to "consult a specialist." While this may serve as a safety mechanism to prevent the provision of incorrect information, it could also limit the specificity of information when these chatbots are used as patient education tools.



Professor Song stated, "We have confirmed that the latest AI chatbots can provide medical information on knee replacement surgery with considerable accuracy," adding, "However, AI responses should be used only to support patient education, and surgical decisions must always be made in direct consultation with medical professionals."


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing