Software & SaaS

Microsoft Launches ASSERT for AI Behavior Testing

Microsoft has introduced ASSERT, an open-source framework designed to simplify AI behavior testing. The tool translates natural language descriptions into scored tests for AI systems.

By Christopher Clark

Christopher Clark covers software & saas for Techawave.

June 3, 20262 min read0 views

Microsoft Launches ASSERT for AI Behavior Testing

Microsoft has unveiled ASSERT, a new open-source framework aimed at simplifying the process of testing artificial intelligence systems for specific behaviors and compliance. The tool, announced Tuesday, translates high-level, natural-language descriptions of desired AI actions into comprehensive, scored tests that developers can use to evaluate their AI models. This innovation seeks to address the growing need for application-specific AI evaluations, ensuring AI systems function as intended within their designated products and services.

ASSERT, which stands for Adaptive Spec-driven Scoring for Evaluation and Regression Testing, allows developers to define goals, policies, and intended behaviors in plain English. The framework then automatically generates problem scenarios and test cases based on these inputs. It runs these tests against the target AI system, scores the outcomes, and can record the internal processes, including intermediate actions and tool calls, to help identify the root cause of failures. This detailed logging is crucial for developers looking to understand and rectify issues in complex AI applications.

The framework also accommodates customization by allowing developers to provide system context, specific tools the AI can utilize, and operational constraints. For instance, a developer could instruct an AI designed for document research that it should not send emails outside the company, restrict confidential information access to C-level executives, and always provide concise summaries based on prior context. ASSERT would then generate tests to continuously verify adherence to these custom rules.

Contextualizing AI Development and Trust

Broader, more generalized AI evaluations often fall short when AI models are deeply integrated into specific products with unique operational contexts, policies, and toolsets. ASSERT aims to fill this gap by providing a method for rigorous, application-tailored testing. This is particularly important as AI models become more sophisticated and are deployed in critical business functions.

“One of the things we’ve learned is that evaluations are absolutely critical to making good decisions,” said Sarah Bird, chief product officer of Responsible AI at Microsoft. “Because if you don’t understand the behavior of the AI system, it’s really hard to know if it’s meeting your organization’s bar … What we found is that if you really want to have a trustworthy system, you should evaluate many more dimensions that are application-specific.” Bird emphasized that ASSERT can be utilized throughout the AI lifecycle, from the initial building stages and post-deployment monitoring to ongoing continuous evaluation.

The development of ASSERT reflects a broader trend in the AI industry towards greater accountability and reliability. As AI applications become more pervasive, the demand for robust testing and validation mechanisms that can guarantee safety, fairness, and adherence to organizational guidelines is paramount. Microsoft's open-source approach with ASSERT suggests a commitment to fostering collaboration and transparency in the responsible development of AI technologies.

This new tool is expected to empower developers to build more reliable and trustworthy AI systems by providing them with a more accessible and effective way to test application-specific behaviors. The ability to translate high-level intentions into concrete tests addresses a significant challenge in current AI development, moving beyond generic performance metrics to ensure AI aligns with precise business needs and ethical standards.

SourceTechCrunch

Topicsmicrosoft assert ai behavior testing responsible ai natural language