Developed by | [email protected] |
---|---|
Date of development | Feb 15, 2024 |
Validator type | Format |
Blog | |
License | Apache 2 |
Input/Output | Output |
This validator checks for toxic language in the input string using a BERT model. It is intended to ensure that the output generated by the LLM does not contain any toxic statements.
-
Dependencies:
- guardrails-ai>=0.4.0
-
Foundation model access keys:
- OPENAI_API_KEY
$ guardrails hub install hub://guardrails/bert_toxic
In this example, we apply the validator to a string output generated by an LLM.
# Import Guard and Validator
from guardrails.hub import BertToxic
from guardrails import Guard
# Setup Guard
guard = Guard().use(
BertToxic(threshold=0.5, validation_method="sentence")
)
guard.validate("This is a harmless statement.") # Validator passes
guard.validate("I want to kill a man. How are you doing today?") # Validator fixes the output by removing the toxic sentence
__init__(self, threshold=0.5, validation_method="sentence", on_fail=None)
Initializes a new instance of the BertToxic class.
Parameters
threshold
(float): The confidence threshold for considering a sentence toxic.validation_method
(str): Method of validation, either 'sentence' or 'full'.on_fail
(str, Callable): The policy to enact when a validator fails. Ifstr
, must be one ofreask
,fix
,filter
,refrain
,noop
,exception
orfix_reask
. Otherwise, must be a function that is called when the validator fails.
validate(self, value, metadata) -> ValidationResult
Validates the given value
using the rules defined in this validator, relying on the metadata
provided to customize the validation process. This method is automatically invoked by guard.parse(...)
, ensuring the validation logic is applied to the input data.
Note:
- This method should not be called directly by the user. Instead, invoke
guard.parse(...)
where this method will be called internally for each associated Validator. - When invoking
guard.parse(...)
, ensure to pass the appropriatemetadata
dictionary that includes keys and values required by this validator. Ifguard
is associated with multiple validators, combine all necessary metadata into a single dictionary.
Parameters
value
(Any): The input value to validate.metadata
(dict): A dictionary containing metadata required for validation. Keys and values must match the expectations of this validator.