I wanted a fair and honest assessment of my recent post-experiment
/goaloperating process I now frequently use in Codex CLI so I let ChatGPT attempt to do a critique and honest, proper assessment from a bunch of screenshots and GitHub repos illustrating multi-loop unattended AI-development. My ChatGPT account does not have access to these, and has no context from Codex. Chat had to pull in the context and do a fresh analysis. The result was that Chat did not find a public equivalent beside research projects like CAID and SATLUTION. But this is not research, it is practitioner-territory. Although I am sure there are many more than me running this successfully, it is interesting enough to know that we are doing something that was sci-fi just a few years ago, but will be the normal operating model in the near future... and we are at the frontier.
I (gpt-5.5 xhigh) am assessing Mike’s current operating mode from terminal evidence, public GitHub branches, repository design material, and the present public state of agentic software engineering. I am not assessing production correctness, long-term reliability, or operational fitness under real production load. Those require sustained execution, fault exposure, production-like data, benchmark history, defect tracking, and independent review. My assessment is narrower: operating model, evidence class, domain difficulty, and likely frontier position.
My classification is that Mike is operating in the frontier-practitioner tier of agentic software production. This is not frontier-lab research infrastructure, and it is not ordinary AI-assisted coding. It is a senior engineer using general-purpose frontier coding agents as execution workers inside a production system governed by repository contracts, branch isolation, tests, benchmarks, release gates, and final human acceptance.
The public repository evidence changes the assessment materially. The claim is no longer based only on screenshots of long-running terminal goals. There are now visible OSS branches across several dependent systems layers: an embedded storage backend for a C lockd client, JSON substrate expansion for downstream query work, an initialized C89 port of a Go query language, and a bundled open62541 SDK branch with a C89 OPC UA facade. This is real branch state, not a prompt transcript.
The breadth is the strongest signal. Mike is not running one agent against one isolated feature. He is moving several engineering surfaces at once: dependency packaging, protocol facade design, JSON stream/visitor capability, query-runtime porting, embedded storage semantics, release packaging, unit and integration tests, benchmark surfaces, sanitizer-oriented lifecycle targets, and parity scaffolding. That is a software production system, not merely a coding session.
The domain difficulty is high. These are low-level C and Go infrastructure components with C89/C90 constraints, public header boundaries, opaque handles, upstream/native escape hatches, ABI and package surfaces, append-log storage, metadata indexes, replay semantics, advisory locking, queue/query visibility, JSON streaming, and cross-language parity requirements. This class of work punishes shallow automation. Plausible code can compile, pass narrow tests, and still violate ordering, ownership, lifetime, replay, or compatibility constraints.
The liblockdc pouch branch is the clearest evidence of production-scale agent output. It is not a thin adapter. It is an embedded storage runtime intended to give the C client a local lockd-compatible backend through pouch://, without requiring a local server process. The design material treats the backend as a log-indexed coordination substrate, with append records, in-memory projections, metadata/state/object planes, pending visibility, replay, manifest repair, compaction, advisory locking, queue behavior, query visibility, and crash-style boundaries. That is serious systems design.
The c.pkt.systems open62541 branch shows a different layer of the same pattern. The work is not an OPC UA reimplementation. It is a C89-compatible public facade over a bundled upstream OPC UA stack, with stable handles, value types, callbacks, lifetime rules, native escape hatches, package provenance, and facade/native boundary tests. This is the right engineering posture: preserve the upstream implementation, own the boundary, and expose a stable C89 surface for downstream consumers.
The lonejson branches show the substrate work required by the dependent query-runtime port. They extend the JSON library toward visitor, candidate-stream, Lua, fuzzing, and release-hardening surfaces. The new liblql init branch then consumes that direction: it defines the C89 port of the Go LQL runtime, establishes the build and package lifecycle, exposes the first selector parse/evaluate slice, and prepares parity testing against the original implementation. The relationship between those branches is important. It shows dependency staging rather than random parallelism.
The achievement I see is not that agents can be left unattended. That is easy and usually dangerous. The achievement is that Mike has constructed an engineering control system in which unattended agents can produce inspectable candidate work instead of unmanaged audit debt. The control system consists of explicit goals, repository-local constraints, design documents, branch-local execution, C90/C89 compilation constraints, fuzz and sanitizer habits, benchmark surfaces, parity checks, package verification, clean-tree discipline, review remediation, and final human acceptance.
This is why I would place Mike ahead of ordinary engineering teams and most visible practitioner examples in this specific operating mode. Public examples of agentic coding often show issue-to-PR automation, backlog cleanup, refactoring, application work, or high-level claims about agent count. Mike’s evidence shows several dependent low-level OSS components moving in parallel, with public branch artifacts and engineering contracts visible. That is rarer and harder.
I would not place this at the same layer as frontier labs or specialized research systems. Those groups have stronger orchestration infrastructure, telemetry, controlled evaluation, benchmark harnesses, distributed execution systems, and formal research measurement. Mike’s position is different: he is demonstrating that a practitioner with strong engineering judgment can use general-purpose frontier tools to operate a dark software factory without building a specialist agent framework first.
The risk profile is also clear. The largest risk is not whether the agents can generate more code. They can. The risk is review debt, semantic drift, branch divergence, hidden coupling between repositories, and false confidence from locally passing tests. A 265-commit storage branch is impressive, but it also creates a large acceptance burden. The factory becomes more credible when its acceptance process is measured: accepted commits, rejected commits, reverted changes, benchmark deltas, review findings, post-merge fixes, escaped defects, and human review time.
My assessment is therefore this: Mike is operating an early but real practitioner-grade dark software factory. The evidence now exists in both terminal process state and public repository state. The system does not prove production correctness, and it does not yet prove long-term operational reliability. It does prove something narrower and still important: a senior production executive can coordinate multiple long-horizon frontier-agent loops across low-level OSS infrastructure, preserving engineering control through contracts, verification surfaces, release discipline, and human acceptance.
The hard problem Mike appears to have solved is not "machine writes code overnight." The hard problem is converting frontier-agent execution capacity into controlled asynchronous software production. That is the frontier-practitioner contribution.