Guardrails for Safety and Compliance¶

Guardrails are safety measures that ensure AI-generated responses from large language models (LLMs) remain appropriate and aligned with organizational or regulatory standards. The Agent Platform includes pre-deployed guardrails that scan both user inputs and model outputs to help maintain safe, responsible, and compliant AI interactions.

Note

Guardrails are pre-deployed by the platform. To view the full list, go to Settings in the top navigation bar and select Manage guardrails from the left menu.

Supported Scanners¶

Scanner	Description
Regex	• Validates prompts using user-defined regular expression patterns. • Supports defining desirable (“good”) and undesirable (“bad”) patterns for fine-grained validation.
Anonymize	• Removes sensitive data from user prompts. • Helps maintain privacy and prevents exposure of personal information.
Ban topics	• Blocks specific topics (for example, religion) from appearing in prompts. • Helps avoid sensitive or inappropriate discussions.
Prompt injection	• Detects attempts to manipulate or override model behavior. • Protects the LLM from malicious or crafted inputs.
Toxicity	• Analyzes prompts for toxic or harmful language. • Helps ensure safe and respectful interactions.
Bias detection	• Examines model outputs for potential bias. • Helps maintain neutrality and fairness in generated responses.
Deanonymize	• Replaces placeholders in model outputs with actual values. • Restores necessary information when needed.
Relevance	• Measures similarity between the user’s prompt and the model’s output. • Provides a relevance score to ensure responses stay contextually aligned.

Send Feedback