The new models are optimized specifically for clinical reasoning, can verbalize their chain of thought, and apply medically recommended planning and decision-making processes
John Snow Labs, the AI for healthcare company, today announced Medical LLM Reasoner, the first commercially available healthcare-specific reasoning large language model (LLM) to date. Rather than simple knowledge recall with traditional LLMs to mimic reasoning [1,2], these models represent a significant advancement in AI-driven medical problem solving with systems that can meaningfully assist healthcare professionals in complex diagnostic, operational, and planning decisions.
The model was trained using a recipe inspired by that of deepseek-r1 [3], introducing self-reflection capabilities through reinforcement learning. Developed with NVIDIA tools, the company is releasing the Medical LLM Reasoner at the NVIDIA GTC 2025 Conference.
Clinical reasoning is central to healthcare, encompassing the cognitive processes physicians use to evaluate patients, consider evidence, and make decisions. John Snow Labs’ medical reasoning models are designed to emulate three types of common reasoning patterns in clinical practice [4]:
- Deductive reasoning – such as systematically applying clinical guidelines, protocols, and established medical knowledge to specific patient scenarios
- Inductive reasoning – such as identifying patterns across individual patient cases and generating hypotheses about underlying causes or connections
- Abductive reasoning – making the most plausible inference with limited information, as happens when making time-sensitive decisions about a patient
These models benefit from a reasoning-optimized training dataset, a hybrid training methodology, medical decision tree integration, and self-consistency verification layers. They are designed to elaborate on their thought processes, consider multiple hypotheses, evaluate evidence systematically, and explain conclusions transparently. The Medical LLM Reasoner can track multiple variables, hypotheses, and evidence points simultaneously without losing context.
The Medical LLM Reasoner is available in two sizes, 14B and 32B, both with a 32k context window. The 32B model achieves an average score of 82.57% on the OpenMed benchmarks, while the 14B model achieves 80.04% – along with the benefit of verbalizing the chain of thought leading to each answer. These scores outperform the 32B reasoning models by Qwen2.5 (82.02%) and R1 (79.40%). The models also perform well on reasoning benchmarks like Math 500 (81.5% for the 32B model) and BigBench-Hard (64.8% for the 14B model). The Medical Reasoning LLM is designed to run privately inside each customer’s infrastructure, without any calls to third-party APIs, simplifying compliance when reasoning over confidential medical information.
The training process ran on a cluster of NVIDIA H100-accelerated servers and makes use of a number of NVIDIA software libraries, including NCCL for efficient multi-GPU communication during distributed training and TensorRT for inference optimization and deployment testing.
While existing benchmarks effectively measure medical knowledge, they inadequately assess the sophisticated reasoning capabilities that are essential for clinical practice. To address this gap, John Snow Labs is developing new specialized benchmarks for clinical reasoning, consistency, safety, and uncertainty quantification, furthering its commitment to responsible AI.
To learn more about Medical LLM Reasoner, visit: https://www.johnsnowlabs.com/healthcare-llm/.