OpenSage Research Integration

Research backing for Tachikoma's OpenSage self-programming agent generation engine.

Overview

Paper: OpenSage: Self-programming Agent Generation Engine Authors: Hongwei Li, Zhun Wang, Qinrun Dai, et al. Venue: ICML 2026 arXiv: https://arxiv.org/abs/2602.16891Implementation: Implementation Status

Problem Statement

Traditional Agent Development Kits (ADKs) follow a human-centered paradigm:

Engineers manually design agent topologies
Developers create toolsets upfront
Fixed memory structures defined by humans
Limited scalability and generalizability

This is analogous to early machine learning with handcrafted features.

OpenSage's Solution

AI-centered paradigm — Let LLMs create agents, tools, and memory structures.

Three core innovations:

1. Self-Generating Agent Topology

Problem: Static agent structures can't adapt to task requirements.

Solution: Agents dynamically create subagents based on task analysis.

Vertical Decomposition

typescript

Complex task → Analyze → Subtask 1, Subtask 2, Subtask 3
                            ↓            ↓           ↓
                        Agent 1    Agent 2   Agent 3

Results from paper:

60.2% resolved rate on CyberGym (vs 39.4% baseline)
20% improvement from vertical decomposition alone
Reduced context overflow (6.4 vs 13.1 summarization events)

Horizontal Ensemble

typescript

Task → Multiple Strategies → Agent 1, Agent 2, Agent 3
                               ↓        ↓         ↓
                           Results → Merge → Best Solution

Results from paper:

15% improvement from ensemble mechanism
Better for tasks with multiple valid approaches
Reduced bias from single approach

2. Dynamic Tool Synthesis

Problem: Fixed toolsets limit agent capabilities and cause hallucinations.

Solution: Agents write their own tools on-demand.

Architecture

Agent Tool Generation:
┌──────────────────────────┐
│  Tool Specification      │
│  - Name, Description     │
│  - Args (Zod schema)     │
│  - Implementation        │
│  - Dependencies          │
└──────────────────────────┘
              ↓
┌──────────────────────────┐
│  Tool Registration       │
│  - Add to tool registry  │
│  - Set permissions       │
│  - Create metadata       │
└──────────────────────────┘

Results from paper:

25% improvement from domain-specific toolkit
39 tools generated during CyberGym eval
Tools: fuzzers, generators, validators
Enables heterogeneous tool support

3. Hierarchical Memory Management

Problem: Linear memory is inefficient and lacks structure.

Solution: Graph-based memory with nodes and edges.

Graph Structure

Memory Graph:
┌───────────────────────────────────────┐
│ Nodes: Entities (code, concepts       │
│ Edges: Relationships (uses, creates)  │
│ Embeddings: Similarity search         │
└───────────────────────────────────────┘

Results from paper:

3-5x more efficient retrieval
30% context efficiency gain
Memory agent +20% compression efficiency
Supports complex knowledge representation

Experimental Results

Benchmarks

Benchmark	Task Type	Baseline (OpenHands)	OpenSage	Improvement
CyberGym	Security vuln	39.4%	60.2%	+52.8%
Terminal-Bench 2.0	Terminal tasks	64.7%	65.2%	+0.8%
SWE-Bench Pro	Software eng	40.2%	59.0%	+46.8%

Ablation Studies

Self-Generating Agent Topology

Configuration	Resolved Rate
Full OpenSage	60.2%
No Horizontal (no ensemble)	52.6%
No Vertical (no decomposition)	42.8%
No Feature (baseline)	39.4%

Key findings:

Vertical decomposition: +20% impact
Horizontal ensemble: +15% impact
Combined: +35% over baseline
Both essential for optimal performance

Tooling System

Configuration	Resolved Rate
Full OpenSage + Domain Tools	60.2%
No Tools (raw terminal)	23.4%
No Domain Tools (basic tools)	36.7%

Key findings:

Domain-specific toolkit: +25% impact
Dynamic tool creation: enables specialization
Tool management essential for heterogeneous tools

Memory System

Configuration	Context Efficiency
Graph Memory + Memory Agent	30% improvement
Graph Memory (no agent)	15% improvement
Linear Memory	baseline

Key findings:

Graph structure: 2x efficiency
Memory agent: +15% efficiency
Compression critical for long sessions

Key Insights from Paper