AgentDoG 1.5 is Rewriting the Rules of Agentic Safety

May 31, 2026

A new lightweight, scalable alignment framework promises to secure powerful autonomous AI agents without compromising their utility or draining computational resources.

Next-Generation Safety for Evolving Threats: AgentDoG 1.5 introduces an updated safety taxonomy and trajectory-level benchmarks to counteract emergent risks in powerful open-world agents like OpenClaw and Codex.

Unprecedented Efficiency: Using a taxonomy-guided data engine and influence-function purification, the framework trains models (ranging from 0.8B to 8B parameters) on just ~1,000 highly informative samples, matching the performance of leading closed-source models like GPT-5.4 while reducing deployment overhead by two orders of magnitude.

Versatile Deployment and Future Growth: AgentDoG 1.5 acts as a highly effective filter for Supervised Fine-Tuning (SFT), a reward signal for Reinforcement Learning (RL), and a real-time runtime guardrail, though future iterations will need to embrace multimodal inputs and stricter tool-time checks to ensure comprehensive security.

The era of the isolated, conversational AI chatbot is rapidly giving way to a new paradigm: the open-world AI agent. Modern models, such as OpenClaw, boast incredible cross-environment execution capabilities. They can navigate complex systems, write and execute code, and string together tools to accomplish intricate tasks. However, this newfound autonomy introduces a broad new spectrum of safety risks. As advanced frontier AI models drastically lower the barrier to complex cyberattacks, current agent alignment frameworks are proving dangerously inadequate for real-world deployment.

Enter AgentDoG 1.5. Developed to bridge this critical security gap, AgentDoG 1.5 is a lightweight, scalable alignment framework specifically designed for AI agent safety and security. By rethinking how we diagnose and train for safety, this openly released framework is setting a new standard for responsible AI deployment.

Redefining the Safety Taxonomy

To effectively guard against a threat, you first have to understand it. The creators of AgentDoG 1.5 recognized that existing safety classifications were insufficient for the unique execution scenarios of Codex and OpenClaw. Consequently, they comprehensively updated the agent safety taxonomy to accommodate these emergent risks.

Alongside this updated taxonomy, the framework extends ATBench into a trajectory-level benchmark family. Rather than just looking at isolated prompts and responses, this new benchmark evaluates the entire trajectory of an agent’s actions, covering general tool-use scenarios as well as complex execution environments.

Maximum Impact, Minimal Data

Perhaps the most impressive achievement of AgentDoG 1.5 is its ruthless efficiency. Traditional AI training often relies on brute force, feeding models millions of data points at immense computational cost. AgentDoG 1.5 takes a surgical approach.

The framework utilizes an efficient training pipeline that combines a taxonomy-guided data engine with influence-function-based purification. This sophisticated method sifts through vast amounts of data to select only about 1,000 highly informative samples. Using this incredibly lean dataset, the team successfully trained several AgentDoG 1.5 variants—ranging from lightweight 0.8B and 2B models to more robust 4B and 8B parameter models.

Despite the tiny training footprint, these models achieve safety performance comparable to leading closed-source giants like GPT-5.4. Furthermore, by constructing a highly efficient agentic safety SFT and RL training environment, the framework reduces deployment overhead in Docker-level environments by two orders of magnitude (a 100x reduction), making it vastly more accessible for developers.

A Triple Threat in Agentic Safety

AgentDoG 1.5 is not just a theoretical model; it is designed for versatile, hands-on application across the AI development lifecycle:

SFT Data Filtering: It filters high-quality safety data for agentic Supervised Fine-Tuning (SFT). Extensive experiments show that AgentDoG 1.5-filtered SFT drastically improves an agent’s safety and robustness while preserving its crucial function-calling abilities.
Reinforcement Learning (RL): It provides reliable reward signals for agentic safety RL, directly improving the delicate trade-off between an agent’s safety constraints and its functional utility.
Real-Time Guardrails: When deployed as a training-free online guardrail for OpenClaw-style agents, AgentDoG 1.5 actively monitors trajectories and successfully reduces unsafe final deliveries, proving its worth as an effective runtime moderator.

Multimodality and Proactive Defense

While AgentDoG 1.5 represents a massive leap forward and achieves state-of-the-art performance in diverse, interactive scenarios, the landscape of AI safety is always moving.

Currently, the framework operates primarily on text-based trajectories. Yet, real-world agents are increasingly interacting with multimodal environments—navigating Graphical User Interfaces (GUIs), analyzing complex documents, and processing audio and video. Extending this trajectory-level safety diagnosis to multimodal agent traces is a critical next step.

While the current guardrail framework serves as a highly practical intervention point, it cannot retroactively reverse harms caused by earlier external side effects during an agent’s task execution. To build a truly impenetrable safety architecture, future systems will need to combine trajectory-level monitoring with selective tool-time checks, strict permission-aware execution policies, and mandatory human approval for high-risk actions.

As we continue to delegate more power to autonomous systems, frameworks like AgentDoG 1.5 provide the essential leash—ensuring that our AI agents remain not only highly capable, but fundamentally secure.

Source

Project Page