Pixel — an LLM co-maintainer in production

May 2026 · puzzlekreis.de · FastAPI + Next.js + React Native

puzzlekreis.de is a swap marketplace for jigsaw puzzles. Backend in Python (FastAPI, async SQLAlchemy, Postgres 16, Redis 7), web in Next.js 14 as a PWA, mobile in React Native (Android and iOS via EAS). Roughly 150 backend files, 30+ pytest suites, ~14k TS/TSX in the frontend, deployed on a Hetzner CX33. There are two maintainers: me, and an agent named Pixel.

Pixel doesn't work in tickets. Pixel works in branches. When Pixel finishes a run, the result isn't a recommendation, isn't a diff in the chat reply — it's a feat/... branch with tests and a commit pushed to origin. I pull it in the morning, read through, merge it or hand it back with notes.

This text describes how that works, how I guarded it against the obvious failure modes, and why it isn't the end-boss agent that I just rubber-stamp branches from.

The loop

Pixel's runner is a sibling script to the bug-bounty runner I describe elsewhere — same Bash, different post-processing. Each run takes:

a queue.txt with tasks: V2: instrument analytics event hooks · V3: extend listing editor · V4: …
one prompt per task with context — what the task is, which files are relevant, which tests must pass, and crucially the pointer to convention files in the repo that the agent must read before writing
a git for-each-ref snapshot before each run, so we can detect new branches afterwards

Pixel opens a fresh Claude CLI process, gets the prompt, and works. --add-dir points at the repo. The agent may do anything in the repo that doesn't commit to main. By the end it should have created a feat/... or fix/... branch, made tests pass, and pushed.

What Pixel does in a day

Run 6, task V2, completed May 1: instrument analytics event hooks. The prompt: "Instrument all backend service methods with calls to analytics.py V1, which isn't yet in main; use fail-soft try-imports so that the missing V1 doesn't break tests; create a test suite."

What Pixel produced:

Branch:  feat/analytics-event-hooks (from main, commit 34953fa)
Files modified:
  backend/app/services/auth.py
  backend/app/services/listing.py
  backend/app/services/listing_photo.py
  backend/app/services/message.py
  backend/app/services/rating.py
  backend/app/services/swap.py
  backend/app/services/user_account.py
Files created:
  backend/tests/test_analytics_hooks.py

17 hook sites covering 18 distinct event types:
  signup, login, logout
  listing_create, listing_publish, listing_unpublish, listing_view
  photo_upload
  swap_request, swap_accept, swap_decline, swap_cancel,
    swap_ship, swap_complete
  message_sent
  account_delete, account_restore
  rating_given

That was a task that would probably have taken me 4-5 hours with breaks. Pixel needed 38 minutes. I read the branch in the morning, gave three notes back (variable naming, a missing edge case in a test, a small API consistency question), and Pixel addressed them in the second run iteration. Merged.

Across 5 completed runs (Run 2-6), Pixel has done roughly 20 such tasks — analytics hooks, photo-gallery lightbox, brand migration, an FTS migration with a Postgres array-IMMUTABLE wrapper, Sentry CSP configuration, a series of edge bugs before soft launch.

What I had to guard

Branch detection instead of status file. The bug-bounty runner expects a status.json as success interface. Pixel's interface is a new branch on origin. Sentinel recovery is correspondingly:

# Before the run: snapshot all branches
branches_before=$(cd "$REPO" && git for-each-ref --format='%(refname)' refs/heads refs/remotes/origin)

# … claude --print … runs …

# After the run, if the agent didn't write a status file:
branches_after=$(cd "$REPO" && git for-each-ref --format='%(refname)' refs/heads refs/remotes/origin)
new_branches=$(comm -13 <(echo "$branches_before") <(echo "$branches_after") | \
                grep -E 'feat/|fix/|chore/|test/' | head -3)

if [ -n "$new_branches" ]; then
  cat > "$STATUS_DIR/$next_id.status" <



If the agent crashes without a status but pushed a branch: that's success. If it produced neither: that's failure. If it has a branch locally but didn't push: that's my bug, because I forgot to enforce push in the run workflow.

Persistent token telemetry. Each run writes a line to usage-tracker.json, bucketed by UTC day: input tokens, output tokens, model, task ID. Approximated (words × 1.3) is enough for daily totals that I query through a small Telegram bot. With that I know what a "feat/analytics-hooks" costs me — typically between 0.8 and 2 million tokens, depending on how much repo Pixel had to read.

No git push --force. Pixel may git push origin feat/... but never force-push. If a branch exists and has diverged, the push fails, the run is marked blocked, I clean it up by hand. I learned this when Pixel in an early run had locally rebased a branch differently from the remote and would have erased the difference via force-push. Since then: force-push is a privilege, not a default.

Reading convention files is mandatory. The repo carries backend/CLAUDE.md, web/CLAUDE.md, mobile/CLAUDE.md with conventions — migration patterns, test structures, pre-commit hooks. The prompt explicitly tells Pixel: "First read the CLAUDE.md files in the subdirs you touch, and follow them." That dropped the rate of "Pixel invented its own mini-migration convention" branches to near zero.

Where Pixel fails

Architecture decisions. When the prompt contains a task like "decide whether feature X belongs in backend or frontend, then implement", Pixel often gets it wrong. The agent tends to converge on existing patterns even when a new pattern would be better. That's fine — those decisions I make before the prompt, and the prompt then encodes the choice.

Cross-cutting performance. Pixel can fix an N+1 in a single method when I point at it. Pixel cannot notice that a new endpoint pushes p99 latency through the roof because another method holds a lock. Performance profiles are out of view.

"Actually let me…" refactors. Pixel follows the task very literally. If I say "instrument the 17 hook sites", Pixel instruments 17 hook sites. Pixel does not say "Petrit, three of these hooks really belong in a shared decorator, want me to refactor?" Suggestions like that I make myself, or let through. I give up some code hygiene; I gain reproducibility.

Why this isn't the end-boss

I get asked often whether this system "replaces the maintainer". The answer is no, but it's also not "the human is still smarter". The answer is: the bottleneck moved.

Before Pixel my bottleneck was the fact that after a day of work I was too tired to start the next feature. With Pixel my bottleneck is writing precise task prompts and reviewing the branches. The latter is work I enjoy — curating, giving notes, refining conventions. The former is a skill I didn't have before Pixel: I had to learn to formulate a task so that an agent can complete it without me. That practice has improved my task formulation for human collaborators along the way.

puzzlekreis.de would be at maybe a third of today's feature density without Pixel. With Pixel it's a small webapp with two maintainers, one of which is available 24/7 and 1000× cheaper per hour than a human — and the other is still the one deciding what's worth building.



puzzlekreis.de goes into soft launch in the coming weeks. If the setup is interesting to you (or you'd like to swap puzzles), drop me a note.

← back to overview