Reflections on AI Development Experience: Limitations and Improvement Directions of LLMs from Building CZONE

AI Software Engineering

👤 AI developers, LLM researchers, technology enthusiasts, and individuals interested in the application of AI in software development.

This article documents the author's experience on January 19, 2026, of building the online version of CZON (CZONE) from scratch using OpenCode and MiniMax M2.1. AI was fast in technology selection, scaffolding setup, and feature design, but showed insufficient detail understanding when handling GitHub REST API permission issues, particularly failing to recognize the special permission requirements for the .workflows directory. The author points out that LLMs suffer from attention dispersion and weak reasoning abilities in debugging mode, suggesting the introduction of a 'lab mode' for controlled experimental validation. Additionally, OpenCode lacks browser manipulation capabilities, leading to debugging relying on manual log inspection; the author recommends integrating end-to-end testing frameworks like Cypress or Playwright. Moreover, AI development pace is too fast, lacking architectural layering and quality assurance, which the author metaphorically describes as 'floodwater,' emphasizing the need for correct concepts, abstraction, and implementation. The article concludes with a plumber-fixing-a-leak analogy, implying that AI development requires systematic solutions to root problems rather than temporary fixes.

✨ AI shows insufficient detail understanding in GitHub API permission handling, especially for special permissions in the .workflows directory

✨ LLMs suffer from attention dispersion and weak reasoning in debugging mode; introducing lab mode for controlled experiments is recommended

✨ OpenCode lacks browser manipulation capabilities, making debugging reliant on manual inspection; integrating end-to-end testing frameworks is suggested

✨ AI development pace is too fast, lacking architectural layering and quality assurance, requiring more systematic development methods

✨ The author metaphorically describes AI as a 'brain in a vat' and 'floodwater,' emphasizing the importance of closed-loop thinking and energy allocation

📅 2026-01-19 · 767 words · ~4 min read

AI Development
LLM Limitations
OpenCode
Debugging Capabilities
GitHub API
Permission Management
Development Pace
Lab Mode

It is Monday afternoon, January 19, 2026.

I woke up very late today because I was quite excited last night, tinkering with CZONE and OpenCode. Although the results were not satisfactory, if I hadn’t tinkered, yesterday’s outcome might have been even worse.

Staying up late is just refusing to admit the failure of the day.
— from PH

Last night, I used OpenCode + MiniMax M2.1 to build CZONE (the online version of CZON) from scratch, as documented in this log.

The AI started by asking a series of questions—from technology selection to scaffolding setup, then to feature design, and finally to the CI/CD process. The whole thing went very fast.

Honestly, it was a bit too fast—I felt a bit dizzy (laughs).

But, the crucial but—problems quickly emerged.

I noticed that its understanding of the details regarding GitHub REST API permissions was inadequate.

Well-read and knowledgeable? Not really.

For example, after initializing the repository, we needed to modify the .github/workflows/pages.yml file to add the CZON build steps. This requires the workflow scope permission, but the code provided by OpenCode did not include this permission. A quick glance at the GitHub API documentation would reveal this. Yet, it repeatedly overlooked this detail. Also, GitHub is dumb—the error message was just a 404, with no hint of insufficient permissions. It didn’t realize this issue at all.

During this process, we demonstrated that writing to index.md succeeded, writing to .github/index.md succeeded, and writing to github/workflows/pages.yml also succeeded—only .github/workflows/pages.yml failed. Although the conversation went through multiple rounds because it kept tweaking the code each time, such an obvious pattern—not realizing that .github/workflows/ might be a directory requiring special permissions—shows its scattered attention and insufficient reasoning ability in debugging mode.

I strongly suggest that LLMs themselves or external control frameworks/agents need a Lab Mode. In this mode, the agent should repeatedly design controlled experiments, verify results, and uncover the truth. Sometimes I feel that an LLM is like an unconscious brain—you point somewhere, and it lights up there. Whatever the prompt says, it focuses on that.

Sometimes we want it to be well-read and knowledgeable, and other times we want it to be ignorant yet clear. In a sense, the energy consumed by an LLM is fixed. We hope it can allocate that energy to where it’s most needed for different tasks, rather than distributing it evenly. Recent advancements in the LLM field often adopt this approach.

A Brain in a Vat, Limited in Action

Another important and annoying reason is OpenCode’s lack of self-debugging capability. It has no ability to open and control a browser, so it can only frequently guess, log outputs, and ask me to check the logs. Sometimes I play along, but other times, watching it is like watching my mentee—I have no idea what it’s thinking, and it’s frustrating. I can accept having a clumsy mentee, but I probably can’t accept it being a “brain in a vat without hands”—we still need to find a way to close the loop of its thinking. Google’s Antigravity does a good job in this regard, probably because of the Chrome family connection.

In terms of community solutions, using end-to-end testing frameworks (like Cypress or Playwright) to control the browser should be a good choice. After all, many operations nowadays require browser-side interaction; relying solely on APIs is not enough.

Progress Too Fast, Foundation Unstable

The last point is my own attribution. This time, the AI wrote dozens of files from scratch in less than 10 minutes. Watching it was like watching a printer—it never paused to rest. However, any complex system requires architecture, layering, and ensuring the foundational quality of each module. Only after completing the底层 and thorough testing can you confidently build on top. The AI currently lacks this sense of rhythm—it just prints code. Even if it had built-in debugging capabilities, it might flexibly modify things up and down, but true reliability fundamentally depends on correct concepts, correct abstractions, and correct implementations—it relies on logical coherence and making sense. As for how much time the AI would spend on this, I think it’s still far from enough. Perhaps this is something only a coordination layer can solve; LLMs alone can’t achieve it. LLMs just pave the way, like floodwaters pouring to the lowest point of potential energy.

Debugging is like..

But many humans are like this too. There’s a classic joke: a plumber fixes a leak here, only for it to burst open there. Treating the head when the head hurts, treating the foot when the foot hurts. In the end, it can’t be fundamentally solved—you just end up rowing a boat in the water.

RE:CZ

Reflections on AI Development Experience: Limitations and Improvement Directions of LLMs from Building CZONE

See Also

Referenced By