Guardrails for Safety and Compliance¶
Guardrails are safety measures that ensure AI-generated responses from large language models (LLMs) remain appropriate and aligned with organizational or regulatory standards. The Agent Platform includes pre-deployed guardrails that scan both user inputs and model outputs to help maintain safe, responsible, and compliant AI interactions.
Note
Guardrails are pre-deployed by the platform. To view the full list, go to Settings in the top navigation bar and select Manage guardrails from the left menu.
Supported Scanners¶
| Scanner | Description |
|---|---|
| Regex | • Validates prompts using user-defined regular expression patterns. • Supports defining desirable (“good”) and undesirable (“bad”) patterns for fine-grained validation. |
| Anonymize | • Removes sensitive data from user prompts. • Helps maintain privacy and prevents exposure of personal information. |
| Ban topics | • Blocks specific topics (for example, religion) from appearing in prompts. • Helps avoid sensitive or inappropriate discussions. |
| Prompt injection | • Detects attempts to manipulate or override model behavior. • Protects the LLM from malicious or crafted inputs. |
| Toxicity | • Analyzes prompts for toxic or harmful language. • Helps ensure safe and respectful interactions. |
| Bias detection | • Examines model outputs for potential bias. • Helps maintain neutrality and fairness in generated responses. |
| Deanonymize | • Replaces placeholders in model outputs with actual values. • Restores necessary information when needed. |
| Relevance | • Measures similarity between the user’s prompt and the model’s output. • Provides a relevance score to ensure responses stay contextually aligned. |