John Snow Labs, the AI for healthcare company, today announced the release of Generative AI Lab 7.0. The update enables domain experts, such as doctors or lawyers, to evaluate and improve custom-built large language models (LLMs) with precision and transparency. New capabilities include no-code features to streamline the process of auditing and tuning AI models.
While the Generative AI Lab already exists as a tool for testing, tuning, and deploying state-of-the-art (SOTA) language models, this upgrade enhances the quality of evaluation workflows. With the ability to compare LLM outputs side-by-side, annotate specific text spans, apply structured scoring, and export results, domain experts can quickly and easily train or fine-tune LLMs downstream.
Key features of the release include:
- Customizable project templates for LLM output evaluation with support for HTML content, including hyperlinks to references. Two modes are supported: individual and side-by-side response evaluation. Inter-Annotator Agreement (IAA) charts are also available for those projects.
- Support for Hierarchical Condition Category (HCC) coding enables users to streamline clinical risk adjustment workflows by automatically linking International Classification of Diseases (ICD) codes to HCC categories, prioritizing high-value tasks, and validating codes more efficiently.
- Comprehensive and configurable framework for evaluating AI models across dimensions like accuracy, bias, robustness, fairness, and performance—enabling users to identify model weaknesses, improve reliability, and streamline iterative model enhancement through automated test management, visual insights, and data augmentation capabilities.
Domain experts are often best positioned to develop AI-driven solutions tailored to their specific business needs. However, limited technical skills and resources can pose significant barriers to the adoption of AI solutions. The Generative AI Lab addresses this challenge by providing a user-friendly, no-code platform that empowers teams to build reliable models, identify potential failures, evaluate output quality, and responsibly integrate AI into essential workflows.
“Evaluating custom-built AI models and LLMs for specific use cases is complex and goes beyond relying solely on public benchmarks. Determining their efficacy, safety, and value requires targeted, context-aware testing to ensure models perform reliably in real-world applications,” said David Talby, CEO, John Snow Labs. “With the new structured evaluations and detailed feedback included in the Generative AI Lab, domain experts can improve model quality, reduce errors, and accelerate safe, scalable AI deployments without the support of a data scientist.”
Click here to learn more about Generative AI Lab 7.0 or register for our upcoming training session to see the new side-by-side response evaluation feature in action.