Pixel — an LLM co-maintainer in production
May 2026 · puzzlekreis.de · FastAPI + Next.js + React Native
puzzlekreis.de is a swap marketplace for jigsaw puzzles. Backend in Python (FastAPI, async SQLAlchemy, Postgres 16, Redis 7), web in Next.js 14 as a PWA, mobile in React Native (Android and iOS via EAS). Roughly 150 backend files, 30+ pytest suites, ~14k TS/TSX in the frontend, deployed on a Hetzner CX33. There are two maintainers: me, and an agent named Pixel.
Pixel works in branches rather than tickets. When a run finishes, there's a feat/... branch with tests sitting on origin — not a diff in a chat reply that I'd have to carry over by hand. I pull it in the morning, read through, merge it or hand it back with notes. Rubber-stamping branches is not part of the setup.
The loop
Pixel's runner is a sibling script to the bug-bounty runner I describe elsewhere — same Bash, different post-processing. Each run takes:
- a
queue.txtwith tasks: V2: instrument analytics event hooks · V3: extend listing editor · V4: … - one prompt per task with context — what the task is, which files are relevant, which tests must pass, and crucially the pointer to convention files in the repo that the agent must read before writing
- a
git for-each-refsnapshot before each run, so we can detect new branches afterwards
Pixel opens a fresh Claude CLI process, gets the prompt, and works. --add-dir points at the repo. The agent may do anything in the repo that doesn't commit to main. By the end it should have created a feat/... or fix/... branch, made tests pass, and pushed.
What Pixel does in a day
Run 6, task V2, completed May 1: instrument analytics event hooks. The prompt: "Instrument all backend service methods with calls to analytics.py V1, which isn't yet in main; use fail-soft try-imports so that the missing V1 doesn't break tests; create a test suite."
What Pixel produced:
Branch: feat/analytics-event-hooks (from main, commit 34953fa)
Files modified:
backend/app/services/auth.py
backend/app/services/listing.py
backend/app/services/listing_photo.py
backend/app/services/message.py
backend/app/services/rating.py
backend/app/services/swap.py
backend/app/services/user_account.py
Files created:
backend/tests/test_analytics_hooks.py
17 hook sites covering 18 distinct event types:
signup, login, logout
listing_create, listing_publish, listing_unpublish, listing_view
photo_upload
swap_request, swap_accept, swap_decline, swap_cancel,
swap_ship, swap_complete
message_sent
account_delete, account_restore
rating_given
That was a task that would probably have taken me 4-5 hours with breaks. Pixel needed 38 minutes. I read the branch in the morning, gave three notes back (variable naming, a missing edge case in a test, a small API consistency question), and Pixel addressed them in the second run iteration. Merged.
Across 5 completed runs (Run 2-6), Pixel has done roughly 20 such tasks — analytics hooks, photo-gallery lightbox, brand migration, an FTS migration with a Postgres array-IMMUTABLE wrapper, Sentry CSP configuration, a series of edge bugs before soft launch.
What I had to guard
Branch detection instead of status file. The bug-bounty runner expects a status.json as success interface. Pixel's interface is a new branch on origin. Sentinel recovery is correspondingly:
# Before the run: snapshot all branches
branches_before=$(cd "$REPO" && git for-each-ref --format='%(refname)' refs/heads refs/remotes/origin)
# … claude --print … runs …
# After the run, if the agent didn't write a status file:
branches_after=$(cd "$REPO" && git for-each-ref --format='%(refname)' refs/heads refs/remotes/origin)
new_branches=$(comm -13 <(echo "$branches_before") <(echo "$branches_after") | \
grep -E 'feat/|fix/|chore/|test/' | head -3)
if [ -n "$new_branches" ]; then
cat > "$STATUS_DIR/$next_id.status" <
If the agent crashes without a status but pushed a branch: that's success. If it produced neither: that's failure. If it has a branch locally but didn't push: that's my bug, because I forgot to enforce push in the run workflow.
Persistent token telemetry. Each run writes a line to usage-tracker.json, bucketed by UTC day: input tokens, output tokens, model, task ID. Approximated (words × 1.3) is enough for daily totals that I query through a small Telegram bot. With that I know what a "feat/analytics-hooks" costs me — typically between 0.8 and 2 million tokens, depending on how much repo Pixel had to read.
No git push --force. Pixel may git push origin feat/... but never force-push. If a branch exists and has diverged, the push fails, the run is marked blocked, I clean it up by hand. I learned this when Pixel in an early run had locally rebased a branch differently from the remote and would have erased the difference via force-push. Since then: force-push is a privilege, not a default.
Reading convention files is mandatory. The repo carries backend/CLAUDE.md, web/CLAUDE.md, mobile/CLAUDE.md with conventions — migration patterns, test structures, pre-commit hooks. The prompt explicitly tells Pixel: "First read the CLAUDE.md files in the subdirs you touch, and follow them." That dropped the rate of "Pixel invented its own mini-migration convention" branches to near zero.
Where Pixel fails
Architecture decisions. When the prompt contains a task like "decide whether feature X belongs in backend or frontend, then implement", Pixel often gets it wrong. The agent tends to converge on existing patterns even when a new pattern would be better. So I make those decisions before the prompt; the prompt then encodes the choice.
Cross-cutting performance. Pixel can fix an N+1 in a single method when I point at it. But that a new endpoint pushes p99 latency through the roof because an entirely different method holds a lock — Pixel won't get there. Performance profiles are out of view.
"Actually let me…" refactors. Pixel follows the task very literally. If I say "instrument the 17 hook sites", Pixel instruments 17 hook sites. The objection "three of these hooks really belong in a shared decorator, want me to refactor?" doesn't come up on its own. Suggestions like that I make myself. It costs a bit of code hygiene now and then; in return I know pretty precisely what a run is going to produce.
Does this replace the maintainer?
No. What got replaced is my bottleneck.
Before Pixel, the bottleneck was that after a day of work I was too tired to start the next feature. Now it's writing precise task prompts and reviewing the branches. Reviewing suits me — curating, giving notes, refining conventions. The prompt writing I had to learn first: formulating a task so someone can complete it without follow-up questions is a skill of its own, and it's been helping me at work ever since, whenever I hand tasks to people.
puzzlekreis.de would be at maybe a third of today's feature density without Pixel. It remains a small webapp with two maintainers — one of them is available around the clock, the other decides what gets built at all.
puzzlekreis.de is live now. If the setup is interesting to you (or you'd like to swap puzzles), drop me a note.