Home
Tag
AI safety
2 entries tagged AI safety · 2 terms.
Dictionary
Deployment Simulation
A pre-release method of running a candidate AI model against stripped real-user conversation logs to predict the behaviors it will exhibit in production before any user sees it.
Safety Classifier
An in-model mechanism that detects when a query falls into a high-risk category and reroutes it to a safer model or refuses it outright.