It is now the evening of April 11, 2026.
Today, the PolyMarket arbitrage project officially launched its fund format. The codename is PathFinder, abbreviated as PF. It was also named PMA in its early stages.
Today, I discussed with C1 some insights I've gained from orchestrating AI during this period.
I believe the current performance of Agent Harness has actually reached a sweet spot: after a few interviews, the AI can produce fairly good results—not 100% accurate, but not completely off-track either.
If we continue to focus on improving Harness now, the accuracy of task completion might increase further, but the cost would become increasingly high.
My current workflow typically involves about 30 minutes of interviewing, followed by 1–2 hours of execution. Completing one round of work takes about 2 hours.
Upon reflection, I realized that I don't necessarily need to scrutinize the specifications so meticulously. This could save me a lot of time during interviews. Often, I can simply follow the AI's recommended direction with a series of "yes, yes, yes."
I can actually let the AI naturally develop these details instead of forcing them into it.
Why do I feel the need to review? It's still the fear of the AI going off track. My time is limited, and I can't afford a day of poor progress.
If every time I propose a direction, after a few minutes of "yes, yes, yes," and then a few more minutes later, everything is done, then there would be no need for me to review at all. In that case, it would be better to let the AI grow naturally.
Therefore, I believe that Harness is no longer the main issue. Iteration speed is now the primary problem. Given that AI tokens are still relatively cheap, economic cost is not a concern for now.
I think a project should start from its initial commit, growing linearly from the origin/main ref through continuous PRs.
An OpenCode session can start from any origin/main point. Based on the task context, it can create a new worktree, execute the task, run integration tests, submit a PR upon completion, resolve conflicts, merge into origin/main, and clean up the worktree.
This task is already being handled quite well.
We might as well treat this task as a basic unit of orchestration.
We need to break down tasks and establish dependency management, building a task graph where each task can be bound to an OpenCode session for execution.
Tasks can be conceptually categorized as non-blocking or blocking based on dependencies, though this classification is purely conceptual. We can never be 100% certain about the dependency relationship between two tasks.
In practice, all tasks can be executed in parallel, but some tasks are better executed after others are completed. This judgment is also left to the AI (and thus won't be 100% accurate).
Even some tasks that appear to be blocked can actually be broken down into smaller subtasks for parallel execution. Some of these subtasks may be blocking, while others are non-blocking—all determined by the AI.
A typical example is a comprehensive feature development task involving frontend, backend, and interface design. It can be broken down into three subtasks: frontend development, backend development, and interface design. Frontend and backend development can proceed in parallel, but both depend on the completion of interface design. Therefore, interface design is a non-blocking task, while frontend and backend development are blocking tasks.
Another perspective is that incorrectly advancing a task that should be blocked is unacceptable in traditional scheduling but acceptable in AI scheduling. Just like a human, the AI can standby on that task, waiting for all dependent tasks to complete.
During task execution, the AI can also check whether the environment meets the execution conditions. If not, it can create a prerequisite task to modify the environment until the conditions are satisfied before proceeding.
Therefore, rather than topological sorting being the core of scheduling, it is merely a heuristic tool. If you have an unlimited number of AI tokens, you could fork countless worktrees, each busily spanning different tasks, and the AI would make its own judgments.