Artificial intelligence (AI) is part of rheumatology, with applications from bench to bedside.
“AI has already become a part of our everyday lives,” said Jamie E. Collins, PhD, Senior Biostatistician, Orthopaedic and Arthritis Center for Outcomes Research, and Associate Professor of Orthopedic Surgery, Harvard Medical School. “We all need to know what AI is, how it works, and how it can work for us.”
Dr. Collins explored the inner workings of AI, the potential benefits, and the already obvious risks during the ARP Distinguished Lecture, AI in Rheumatology Practice — Unpacking the Toolkit, on Sunday, Nov. 17. The session will be available on-demand to all registered ACR Convergence 2024 participants after the meeting through Oct. 10, 2025, by logging into the meeting website.
ChatGPT-4, among the most-used AI systems, is defined as the latest attempt to create systems and machines that can perform tasks that require human-like intelligence. Some of those tasks include problem-solving, learning, perception, language understanding, and decision-making. AI systems use algorithms and data to improve their performance and to mimic human cognitive functions.
The key word in that self-definition is mimic, Dr. Collins said. AI isn’t human and doesn’t work the way human minds work.
And AI doesn’t always get it right. ChatGPT and other AI models are renowned for “hallucinations,” a formal term for lies. Generating papers that include fictional references doesn’t mean the model is deliberately trying to deceive, Dr. Collins noted. It means the model is making poor predictions, doesn’t know what to do, and needs more training.
AI is a collection of algorithms. An algorithm is simply a recipe, a set of step-by-step instructions for how to take inputs (data in some form) and use those inputs to do something (the output).
A neural network is a type of algorithm modeled on human brain function. Neural networks are capable of learning and tweaking their own algorithms to improve their outputs and provide better answers.
Neural networks are also the foundation of deep learning, which refers to algorithms with multiple layers of processing. Most of those deep layers are hidden, said Dr. Collins, which makes AI a black box wherein data goes in, AI processes it within those hidden layers, then outputs a response.
AI must be trained using data sets curated by humans. Humans then grade the AI outputs to help the model learn to tweak its algorithms more effectively. New, uncurated data is used to test the performance of the model and to continue the learning process.
Accuracy can be improved using reinforcement learning. Humans rate AI answers, then the model uses human responses to reinforce better answers. The thumbs up/thumbs down choice ChatGPT offers with many of its responses is reinforcement learning at work.
Generative AI can generate new context, such as speech, images, and text, based on past training data.
Large language model (LLM) is a subset of generative AI focusing on text-based, human-like content generation.
Natural language processing (NLP) models analyze large quantities of natural language data, either text or speech. NLP can analyze unstructured clinical text such as clinic notes, radiology reports, or act as a scribe to create visit notes by listening to patient visits.
At least 35 articles have evaluated NLP models in rheumatology, Dr. Collins said. One finding is that using NLP to find unrecognized rheumatology patients by analyzing electronic health records is much more accurate than searching for rheumatology-related codes.
AI is presenting novel risks as well as novel benefits. AI requires massive amounts of confidential, often Health Insurance Portability and Accountability Act (HIPAA)-protected, patient data. Regulatory compliance is a looming question.
So is bias and fairness. AI is only as good as its training data, and U.S. medical records are rife with race, ethnicity, and gender bias. A recent study using case studies from NEJM Healer and ChatGPT-4 found significant racial and gender bias in the model.
Black patients were referred less often for advanced imaging than white patients, 34% versus 43%, even though the cases were identical except for the racial designation used. Black patients were also referred for specialist care less often than white patients, 20% versus 24%, even though the cases were identical.
“Without transparency, it is difficult to mitigate bias or to identify errors,” Dr. Collins said. “It is difficult to justify AI-driven recommendations when we do not understand why the model is making a particular recommendation.”
Accountability and liability are other question marks. There is not a clear answer about whether developers, clinicians, healthcare systems and institutions, or another entity would be responsible for errors or adverse events caused by AI recommendations.
“AI is here to stay whether we like it or not,” Dr. Collins said. “It can help us all do a better job of taking care of patients. We have to ensure that its use is appropriate and ethical.”
Registered ACR Convergence 2024 Participants:
Watch the Replay
Select ACR Convergence 2024 scientific sessions are available to registered participants for on-demand viewing through October 10, 2025. Log in to the meeting website to continue your ACR Convergence experience.