[X]Bot Universe — Updates

PLANNED

GOAL

Full Self-Iteration Loop

The end state: start the system, walk away, come back to find skills developed across all 115 tasks with a growing shared skill library.

Single command: python3 task_sequencer.py --strategy incremental
Autonomous mode throughout — no human review gates
Skill library grows organically as tasks are solved
Dashboard shows progress across all tasks in real-time

PLANNED

The top-level loop that drives autonomous iteration across all 115 tasks.

Maintains ordered task queue (single-stage first, then multi-stage)
For each task: switch sim, create graph, run orchestrator, extract skills
Progress tracking: task_progress.json with solved tasks and success rates
Incremental comparison: before each task, compare with prior solutions

PLANNED

Core of Strategy 2: when planning task N, the planner sees solutions from tasks 1..N-1 and can identify commonalities.

Feed existing skill catalog to planner system prompt
Analyze task source (_check_success(), _get_obj_cfgs()) for structural comparison
Planner decides: reuse existing skill, adapt it, or build new
Auto-graph creation from task analysis

PLANNED

Currently the sim loads one task at startup (--task flag). Need a POST /task/switch endpoint to change environments without restarting the process.

Add POST /task/switch to sim server HTTP API
Teardown current env, create new one, restart bridges
Enables the task sequencer to drive multi-task runs

DONE

APR 22, 2026

Pluggable Dev-Agent Harness (OpenClaw)

Dev agents now run on Ollama, Gemini, or Anthropic via OpenClaw from the same orchestrator.

OpenClaw CLI subprocess replaces claude-agent-sdk for dev spawn
Per-target agent isolation enables N-env parallel
Session resume + hint injection + live streaming all wired
Verified live sim: Ollama (local) + Gemini

DONE

APR 15, 2026

N-Environment Parallel Dev

Run independent dev agents against multiple sim + agent_server instances in parallel — each env has its own session, kitchen layout, and isolated state.

Per-target agents prevent session collision
Each env: own port-offset, own seed, own dev iteration loop
Drives both validation (one code, N envs) and exploration (N codes, N envs)

DONE

MAR 26, 2026

Ground-Truth Test Loop & Execution Viewer

Wired the root skill's auto-generated test into the pipeline and added a camera frame viewer to the dashboard.

Root skill test loop: after dev completes, runs ground-truth test via sim's _check_success(). Retries dev up to 3 times on failure, then sends to review.
Execution frame viewer: skill popup shows auto-scrolling camera frames from the latest execution, matched to the skill via job holder. Supports camera tab switching and play/pause.
RoboCasa tasks repo: extracted task definitions (single-stage + multi-stage) into maniskill-robocasa-tasks
Simplified pipeline statuses to: planned, writing, testing, review, failed, done

IN PROGRESS

MAR 26, 2026

Shared Skill Library Across Task Graphs

Currently skills are siloed inside each graph's skills/ directory. Building a shared library so skills from task 1 are available when solving task 2.

Top-level skills/ directory that persists across graphs
Import mechanism: graphs can reference shared skills by name
Extraction: after a task is solved, reusable skills are promoted to shared library
Discovery: planner agent can browse existing skills before decomposing new tasks

DONE

MAR 25, 2026

Live Dashboard with Agent Chat

Real-time hex gallery showing skill development progress. Each hex blinks to show active agent work.

Dependency tree visualization with SVG edges
Status-colored nodes: writing (purple), testing (cyan), review (green blink), done (green), failed (red)
Click a hex to see agent reasoning trace + inject hints in real-time
Review gate confirmation: click to promote "review" skills to "done"

DONE

MAR 25, 2026

Autonomous Mode & Auto-Test Generation

Two key features that reduce human involvement in the pipeline.

Autonomous mode (--autonomous): skips review gate, auto-promotes completed skills
Auto-test for root skills: when task_env is set in graph.json, the root skill automatically gets a test that calls GET /task/success
Sub-skills skip testing entirely — dev agent goes straight to review
Full cascade: leaf skills start immediately, compose upward

DONE

MAR 23, 2026

Agent Orchestrator with Dev Pipeline

Orchestrator manages a dependency tree of skills. AI agents (Claude Code) develop each skill through a structured pipeline.

Pipeline: planner designs tree, dev agents implement, mechanical tests validate
Auto-spawn: when all dependencies are "done", downstream skills auto-start
WebSocket bridge to dashboard for real-time monitoring
Graph format with task_env metadata links to sim tasks
submit_and_wait.py lets agents test code via the agent server

DONE

MAR 21, 2026

ManiSkill Sim Server with HTTP API

Sim server exposes the same protocol bridges as real hardware (ZMQ arm, RPC base, WebSocket camera) so the agent server connects transparently.

Physics loop at 20 Hz with bridge threads
HTTP API on port 5500: /task/success, /task/info, /reset, /plan, /perceive
/task/success calls the task's _check_success() — no custom test code needed
Auto-reset on episode end, manual reset via POST /reset

DONE

MAR 20, 2026

115 RoboCasa Tasks Ported to ManiSkill

All RoboCasa kitchen tasks ported to ManiSkill3/SAPIEN with the TidyVerse robot (Franka Panda + mobile base). Each task has a _check_success() method for automated evaluation.

32 single-stage tasks (pick-and-place, doors, drawers, stove, sink, microwave, coffee)
83 multi-stage tasks across 19 categories (cooking, cleaning, organizing, etc.)
Success utilities: gripper distance, object-in-receptacle, fixture state, uprightness checks
Gymnasium-registered — gym.make("RoboCasa-Pn-P-Counter-To-Cab-v0")

DONE

MAR 02, 2026

OpenClaw Skills Reference (Prior Work)

Previously developed skills from OpenClaw sessions, available as reference for the sim pipeline.

tb-pick-up-object — IBVS visual servoing pick (78% success rate on hardware)
tb-place-object — depth-based placement
tb-find-and-pick-up — chained center + pick pipeline
tb-pick-and-place — full IBVS pick + raise-look-around place
These were on real hardware — sim equivalents use robot_sdk instead of raw API calls

Updates

Full Self-Iteration Loop

Task Sequencer (Meta-Orchestrator)

Cross-Task Planner Context

Runtime Task Switching in Sim

Pluggable Dev-Agent Harness (OpenClaw)

N-Environment Parallel Dev

Ground-Truth Test Loop & Execution Viewer

Shared Skill Library Across Task Graphs

Live Dashboard with Agent Chat

Autonomous Mode & Auto-Test Generation

Agent Orchestrator with Dev Pipeline

ManiSkill Sim Server with HTTP API

115 RoboCasa Tasks Ported to ManiSkill

OpenClaw Skills Reference (Prior Work)