Building autonomous skill development across 115 RoboCasa kitchen tasks. Strategy: Incremental Comparison — solve tasks one by one, reusing and growing a shared skill library.
The end state: start the system, walk away, come back to find skills developed across all 115 tasks with a growing shared skill library.
python3 task_sequencer.py --strategy incrementalThe top-level loop that drives autonomous iteration across all 115 tasks.
task_progress.json with solved tasks and success ratesCore of Strategy 2: when planning task N, the planner sees solutions from tasks 1..N-1 and can identify commonalities.
_check_success(), _get_obj_cfgs()) for structural comparisonCurrently the sim loads one task at startup (--task flag). Need a POST /task/switch endpoint to change environments without restarting the process.
POST /task/switch to sim server HTTP APIDev agents now run on Ollama, Gemini, or Anthropic via OpenClaw from the same orchestrator.
claude-agent-sdk for dev spawnRun independent dev agents against multiple sim + agent_server instances in parallel — each env has its own session, kitchen layout, and isolated state.
Wired the root skill's auto-generated test into the pipeline and added a camera frame viewer to the dashboard.
_check_success(). Retries dev up to 3 times on failure, then sends to review.Currently skills are siloed inside each graph's skills/ directory. Building a shared library so skills from task 1 are available when solving task 2.
skills/ directory that persists across graphsReal-time hex gallery showing skill development progress. Each hex blinks to show active agent work.
Two key features that reduce human involvement in the pipeline.
--autonomous): skips review gate, auto-promotes completed skillstask_env is set in graph.json, the root skill automatically gets a test that calls GET /task/successOrchestrator manages a dependency tree of skills. AI agents (Claude Code) develop each skill through a structured pipeline.
task_env metadata links to sim taskssubmit_and_wait.py lets agents test code via the agent serverSim server exposes the same protocol bridges as real hardware (ZMQ arm, RPC base, WebSocket camera) so the agent server connects transparently.
/task/success, /task/info, /reset, /plan, /perceive/task/success calls the task's _check_success() — no custom test code neededPOST /resetAll RoboCasa kitchen tasks ported to ManiSkill3/SAPIEN with the TidyVerse robot (Franka Panda + mobile base). Each task has a _check_success() method for automated evaluation.
gym.make("RoboCasa-Pn-P-Counter-To-Cab-v0")Previously developed skills from OpenClaw sessions, available as reference for the sim pipeline.
tb-pick-up-object — IBVS visual servoing pick (78% success rate on hardware)tb-place-object — depth-based placementtb-find-and-pick-up — chained center + pick pipelinetb-pick-and-place — full IBVS pick + raise-look-around placerobot_sdk instead of raw API callsPOST /task/switch)
task_env
/task/success)
_check_success()