{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Irvine Afri Dwicahya",
  "home_page_url": "https://irvineafri.com",
  "feed_url": "https://irvineafri.com/feed.json",
  "description": "Backend engineer. Payments, lending, and the boring parts of distributed systems.",
  "items": [
    {
      "id": "https://irvineafri.com/blog/porting-to-react-native-overnight-with-an-agent-loop",
      "url": "https://irvineafri.com/blog/porting-to-react-native-overnight-with-an-agent-loop",
      "title": "Porting a Next.js app to React Native overnight, with an agent loop",
      "content_html": "\u003cblockquote\u003e\n\u003cp\u003eHow I ran a single-threaded fleet of Claude Code agents against a 115-task queue and got 50 tasks (5 of 12 milestones) shipped to \u003ccode\u003emain\u003c/code\u003e while I slept, without ever letting an agent touch git.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThis is a working journal, not a polished pitch. The system is still running as I write. The numbers below are accurate as of the most recent milestone push.\u003c/p\u003e\n\u003ch2 id=\"the-setup\"\u003eThe setup\u003c/h2\u003e\n\u003cp\u003eI run KelasJenius, an interactive learning platform for Indonesian SMP/SMA students. The web app (Next.js 14 + Fastify + Postgres) is mature: auth, subscriptions, quizzes, duels over WebSocket, AI tutor, parent portal, the works.\u003c/p\u003e\n\u003cp\u003eThe mobile app needs to ship to the App Store and Google Play with 1-for-1 parity to the web student experience, plus native Apple IAP and Google Play Billing. That's a ~15-week solo project at full-time pace. I do not have 15 weeks. I have nights and weekends.\u003c/p\u003e\n\u003cp\u003eSo I built a system that lets a swarm of one agent at a time, working in sequence, walk a dependency-ordered task queue while I sleep.\u003c/p\u003e\n\u003cp\u003eIn the last ~36 hours of wall-clock time the loop has shipped:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eM0 (backend prep), 7 tasks: bearer-token auth, WS auth via query param, \u003ccode\u003e/api/version/check\u003c/code\u003e, device registration, IAP migration scaffolding, CORS for native, refresh-in-body\u003c/li\u003e\n\u003cli\u003eM1 (mobile foundation), 11 tasks: Expo SDK 54 + RN 0.81.5 scaffold, monorepo wiring, NativeWind, providers, MMKV+TanStack, \u003ccode\u003eapiFetch\u003c/code\u003e with sliding renewal, smoke test\u003c/li\u003e\n\u003cli\u003eM2 (design system \u003ccode\u003epackages/ui-mobile\u003c/code\u003e), 9 tasks: tokens, primitives (KjButton/Pressable/Card/Screen/Text/Input/Skeleton), 506 SVG icons ported by codemod, theme provider, toast, motion (KjXpPopup/KjStreakFlame), data badges, dev gallery\u003c/li\u003e\n\u003cli\u003eM3 (auth shell), 8 tasks: login, register, forgot/reset/verify-email, \u003ccode\u003euseCurrentUser\u003c/code\u003e, profile tab + settings sheet, force-upgrade gate\u003c/li\u003e\n\u003cli\u003eM4 (core content), 9 tasks: KaTeX-via-WebView, lesson reader, dashboard, subjects tree, paywall placeholder, offline cache, offline banner\u003c/li\u003e\n\u003cli\u003eM5 (quiz), in progress, 4 of 10 done: state machine + 32 tests, session screen UI, confirm-before-submit, reveal animations next\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThat's 50 of 115 tasks, with 1,565+ tests green at the most recent milestone gate.\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/posts/porting-to-react-native-overnight-with-an-agent-loop/proof2.png\" alt=\"TASKS.md status counts at the M4 milestone push, timestamped\"\u003e\u003c/p\u003e\n\u003cp\u003eThe shipping isn't the interesting part. The interesting part is how small the set of design choices that made it boring enough to ship overnight without me at the keyboard.\u003c/p\u003e\n\u003ch2 id=\"the-shape-of-the-problem\"\u003eThe shape of the problem\u003c/h2\u003e\n\u003cp\u003eLong-horizon agent work fails in three predictable places.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDrift across tasks.\u003c/strong\u003e Agent #4 builds a thing on top of Agent #2's misunderstanding of the spec. The error compounds.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eUntracked state.\u003c/strong\u003e \u0026quot;Which tasks are done? Which are blocked? What did the last agent change?\u0026quot; If the answers live in chat scrollback you've already lost.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGit becomes the contention point.\u003c/strong\u003e Twelve agents force-pushing over each other, or one agent amending a commit a downstream agent already pulled. The repo's history is the single most valuable artifact, and touching it carelessly destroys the run.\u003c/p\u003e\n\u003cp\u003eThe system I'll describe addresses each of those head-on. The architecture is boring on purpose.\u003c/p\u003e\n\u003ch2 id=\"the-three-artifacts-that-run-everything\"\u003eThe three artifacts that run everything\u003c/h2\u003e\n\u003cp\u003eEverything the loop needs lives in three places under \u003ccode\u003eapps/mobile/plans/\u003c/code\u003e:\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003eapps/mobile/plans/\n├── TASKS.md                          ← the queue (115 rows, 12 milestones)\n├── logs/agents.log                   ← append-only audit log\n├── 00-README.md                      ← the master plan\n├── 01-architecture-decisions.md      ← non-negotiables, locked\n├── 02-phase-0-backend-prep.md        ← preconditions + concrete tasks\n├── 03-phase-1-foundation.md\n├── 04-phase-2-design-system.md\n├── 05-phase-3-auth-shell.md\n├── 06-phase-4-core-content.md\n├── 07-phase-5-quiz-daily.md\n├── 08-phase-6-duel-realtime.md\n├── 09-phase-7-social-leaderboard.md\n└── 10-phase-8-advanced.md\n\u003c/code\u003e\u003c/pre\u003e\n\u003ch3 id=\"tasksmd-the-queue\"\u003eTASKS.md, the queue\u003c/h3\u003e\n\u003cp\u003eThe queue is a flat markdown table per phase. Every row is one task with: ID, title, link to a spec section in the phase doc, deps, status, commit SHA, last-updated timestamp.\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003e| ID    | Title                       | Spec link                              | Deps        | Status      | Commit  | Updated              |\n| T4.6  | Lesson reader screen        | 06-phase-4-core-content.md#task-46     | T4.2, T4.5  | done        | 3fd8fa6 | 2026-05-12T12:30:00Z |\n| T5.1a | Quiz state machine module   | 07-phase-5-quiz-daily.md#task-51       | T4.6        | done        |         | 2026-05-12T15:27:00Z |\n| T5.1b | Quiz session screen UI      | 07-phase-5-quiz-daily.md#task-51       | T5.1a, T2.7 | done        |         | 2026-05-12T16:45:00Z |\n| T5.1c | Confirm-before-submit flow  | 07-phase-5-quiz-daily.md#task-51       | T5.1b       | done        |         | 2026-05-12T17:30:00Z |\n| T5.1d | Reveal animations + haptics | 07-phase-5-quiz-daily.md#task-51       | T5.1c       | in_progress |         | 2026-05-12T18:00:00Z |\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eFour statuses: \u003ccode\u003etodo\u003c/code\u003e, \u003ccode\u003ein_progress\u003c/code\u003e, \u003ccode\u003edone\u003c/code\u003e, \u003ccode\u003eblocked\u003c/code\u003e (external).\u003c/p\u003e\n\u003cp\u003eStatus counts live at the top of the file and must equal working-tree truth, not git history. The orchestrator and any watchdog reconcile against the working-tree file:\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003e- todo: 51\n- in_progress: 1\n- done: 53\n- blocked (external): 10\n- Total: 115\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eThis is the only shared state between the orchestrator and any agent. There is no database, no task service, no Jira sync. The file is the queue.\u003c/p\u003e\n\u003ch3 id=\"agentslog-the-audit-trail\"\u003eagents.log, the audit trail\u003c/h3\u003e\n\u003cp\u003eEvery agent return appends one structured block:\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003e## 2026-05-12T08:25:00Z — T5.1a done (milestone-pending)\n- agent: a8ae91c3e6ac30d62\n- duration: 12m 11s\n- files: apps/mobile/lib/quiz/quizMachine.ts (new — pure reducer + useQuizMachine hook),\n         apps/mobile/lib/quiz/__tests__/quizMachine.test.ts (new, 32 tests),\n         apps/mobile/plans/TASKS.md\n- tests: 32 new tests added; workspace total now 57\n- summary: Quiz state machine shipped. Pattern: useReducer + pure reducer + custom hook.\n  ... [paragraph of substance: what changed, what was decided, why] ...\n  Critical API correction: spec mentioned `GET /sessions/:id/next-question` but that\n  endpoint does NOT exist in the live API. I verified against `apps/api/src/routes/sessions.ts`.\n  The actual web flow loads all questions upfront via\n  `GET /subjects/:s/topics/:t/subtopics/:st/questions`. The machine loads the full\n  question array at startSession and advances client-side.\n  Bug fixed during review: `questionsAnswered` was off-by-one; corrected to length.\n- notes: M5 status 1/7 done. Handoff to T5.1b: consume useQuizMachine, call startSession\n  on mount, observe state, drive selection via selectAnswer(optionId), submission via\n  submitAnswer()...\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003e\u003ccode\u003emilestone-pending\u003c/code\u003e is a placeholder. When the milestone pushes, the orchestrator rewrites these to the real short SHA in a follow-up commit.\u003c/p\u003e\n\u003cp\u003eThe \u003ccode\u003enotes:\u003c/code\u003e field at the bottom is the most important part. Every agent ends its entry with a handoff to the next agent: the precondition the next task can rely on, the path the previous agent actually used (not what the spec said), and any decision the next agent does not need to re-litigate.\u003c/p\u003e\n\u003cp\u003eThis is how drift gets contained. The next agent reads the last 1–3 log entries before claiming, so it inherits a precise mental model of what is true in the working tree right now instead of reasoning from the spec alone.\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/posts/porting-to-react-native-overnight-with-an-agent-loop/proof3.png\" alt=\"A tail of the real agents.log, timestamps showing the overnight run\"\u003e\u003c/p\u003e\n\u003ch3 id=\"the-phase-docs\"\u003eThe phase docs\u003c/h3\u003e\n\u003cp\u003eEach phase doc is self-contained. It states:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003ePreconditions (\u0026quot;M3 must be green; bearer auth must exist\u0026quot;)\u003c/li\u003e\n\u003cli\u003eConcrete file paths (\u0026quot;create \u003ccode\u003eapp/(dashboard)/subjects/[slug]/[topicSlug]/[subtopicSlug]/index.tsx\u003c/code\u003e\u0026quot;)\u003c/li\u003e\n\u003cli\u003eCode sketches, just enough to anchor the structure, never enough to copy-paste\u003c/li\u003e\n\u003cli\u003eAcceptance checklist (\u0026quot;done when: WebView renders KaTeX block math correctly, light/dark theme switches without remount lag, no console errors\u0026quot;)\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe locked-in \u003ccode\u003e01-architecture-decisions.md\u003c/code\u003e is the bedrock: Expo SDK 54, New Architecture on, NativeWind v4, TanStack Query + MMKV, expo-secure-store for JWT, KaTeX via WebView. An agent that proposes Zustand or AsyncStorage for tokens gets reverted.\u003c/p\u003e\n\u003ch2 id=\"the-execution-model-and-how-it-evolved\"\u003eThe execution model (and how it evolved)\u003c/h2\u003e\n\u003cp\u003eI tried three workflows in the first 24 hours. The third one stuck.\u003c/p\u003e\n\u003ch3 id=\"attempt-1-branch--pr--merge-bot-per-task\"\u003eAttempt 1: branch + PR + merge-bot per task\u003c/h3\u003e\n\u003cp\u003eEach task spawns a worktree, the agent works, opens a PR, a merge-bot watches CI and merges.\u003c/p\u003e\n\u003cp\u003eWhy it failed: per-task PRs created 100+ tiny PRs and a merge queue. Cross-task drift surfaced in PR review, which the agent then had to relitigate. Cognitive cost per task was too high to be worth the audit trail.\u003c/p\u003e\n\u003ch3 id=\"attempt-2-main-only-direct-push-per-task\"\u003eAttempt 2: main-only direct push per task\u003c/h3\u003e\n\u003cp\u003eEach agent works directly on \u003ccode\u003emain\u003c/code\u003e, runs \u003ccode\u003epnpm verify\u003c/code\u003e, commits and pushes if green.\u003c/p\u003e\n\u003cp\u003eWhy it failed: two problems. Rollback granularity was per-task, which is fine if a single agent broke something but useless if a \u003cem\u003esequence\u003c/em\u003e of agents had compounded a subtle error. And the git log was an unreadable wall of 50+ commits per evening, with the actual feature unit (e.g. \u0026quot;M4 core content\u0026quot;) spread across nine commits and three days of intermediate state.\u003c/p\u003e\n\u003cp\u003eThere was also an integrity issue: agents occasionally forgot to flip TASKS.md to \u003ccode\u003edone\u003c/code\u003e, and the orchestrator's bookkeeping had to chase the agent's git history instead of the agent's reported state.\u003c/p\u003e\n\u003ch3 id=\"attempt-3-main-only-milestone-batched-no-git-agents-current\"\u003eAttempt 3: main-only, milestone-batched, no-git agents (current)\u003c/h3\u003e\n\u003cp\u003eThis is the model that's been running. The rules:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eAgents do not touch git. At all. They edit code, run \u003ccode\u003epnpm verify\u003c/code\u003e, edit \u003ccode\u003eTASKS.md\u003c/code\u003e in the working tree, and return.\u003c/li\u003e\n\u003cli\u003eThe only git command an agent may run is the initial \u003ccode\u003egit switch main \u0026amp;\u0026amp; git pull --rebase origin main\u003c/code\u003e at start.\u003c/li\u003e\n\u003cli\u003eWhen all tasks in a milestone group reach \u003ccode\u003estatus=done\u003c/code\u003e in the working tree, the orchestrator captures one squash-style commit per milestone and pushes.\u003c/li\u003e\n\u003cli\u003eA separate \u003ccode\u003echore(plans): record M\u0026lt;n\u0026gt; SHA\u003c/code\u003e follow-up commit backfills the SHA into the \u003ccode\u003eCommit\u003c/code\u003e column of each row in that milestone.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe orchestrator loop is six lines:\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003eloop:\n  assert in_progress == 0                # working-tree TASKS.md, not origin/main\n  next = lowest-numbered todo with all Deps == done\n  if next is None:\n    if working-tree milestone is complete: push milestone (see below)\n    elif any blocked exist:                surface blocker, halt loop\n    else:                                  all done, exit loop\n  spawn 1 agent on `next`                 # agent does NOT touch git beyond initial pull\n  wait for agent to return with status=done in working-tree TASKS.md\n  append entry to apps/mobile/plans/logs/agents.log (working tree, no commit)\n  if this task is the last in its milestone group: push milestone\n  goto loop\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eThe milestone push itself:\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003e1. git status                            # confirm working tree contains only milestone code + TASKS + log\n2. git add -A\n3. pnpm verify                           # one more time — catches integration drift between tasks\n4. git commit -m \u0026quot;\u0026lt;prefix\u0026gt;: \u0026lt;name\u0026gt; — Tx.y..Tx.z [M\u0026lt;n\u0026gt;]\u0026quot; with co-author trailer\n5. capture short SHA\n6. backfill SHA into TASKS.md Commit columns; replace milestone-pending in agents.log\n   chore(plans): record M\u0026lt;n\u0026gt; SHA (\u0026lt;sha\u0026gt;)\n7. git push origin main\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eThis trades rollback granularity (you can only revert a whole milestone) for shippable units (every commit on \u003ccode\u003emain\u003c/code\u003e is a complete, tested, parity-checked feature group). Per-task \u003ccode\u003epnpm verify\u003c/code\u003e is still the per-task quality gate; the per-milestone re-verify catches anything that snuck between tasks.\u003c/p\u003e\n\u003cp\u003eThe per-milestone re-verify has caught a real bug exactly once so far: a type drift between T2.3c and T2.4a where a primitive's prop signature shifted while icons were being ported. The orchestrator fixed it inline as part of the milestone commit (no separate task) and noted the drift in the log.\u003c/p\u003e\n\u003cp\u003e\u003cvideo controls muted playsinline preload=\"metadata\" src=\"/posts/porting-to-react-native-overnight-with-an-agent-loop/video-ai-agent.mp4\" aria-label=\"A timelapse of the agent loop running overnight\"\u003e\u003c/video\u003e\u003c/p\u003e\n\u003ch2 id=\"the-pnpm-verify-gate-the-only-quality-contract\"\u003eThe \u003ccode\u003epnpm verify\u003c/code\u003e gate, the only quality contract\u003c/h2\u003e\n\u003cp\u003eThere is no PR review. There are no human checkpoints during the loop. \u003ccode\u003epnpm verify\u003c/code\u003e is the single quality contract. It runs:\u003c/p\u003e\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth\u003eGate\u003c/th\u003e\n\u003cth\u003eMechanism\u003c/th\u003e\n\u003cth\u003eScope\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd\u003eEmoji ban\u003c/td\u003e\n\u003ctd\u003egrep over Unicode ranges, allowlisted paths\u003c/td\u003e\n\u003ctd\u003eevery UI workspace (incl. \u003ccode\u003eapps/mobile\u003c/code\u003e)\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003etype-check\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003epnpm -r --if-present run type-check\u003c/code\u003e (\u003ccode\u003etsc --noEmit\u003c/code\u003e)\u003c/td\u003e\n\u003ctd\u003eevery workspace\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003elint\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003epnpm -r --if-present run lint\u003c/code\u003e (eslint / next lint / expo lint)\u003c/td\u003e\n\u003ctd\u003eevery workspace\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eunit tests\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003epnpm --filter \u0026lt;ws\u0026gt; test\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e14 workspaces\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003ebuild\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003epnpm -r --if-present run build\u003c/code\u003e (skipped pre-commit, runs on verify / pre-push)\u003c/td\u003e\n\u003ctd\u003eevery workspace\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eIt runs automatically on \u003ccode\u003egit commit\u003c/code\u003e via \u003ccode\u003ecore.hooksPath=.githooks\u003c/code\u003e. Emergency override is \u003ccode\u003eVERIFY_SKIP=1\u003c/code\u003e. The rule: only for genuine fires, fix the root cause next.\u003c/p\u003e\n\u003cp\u003eTwo design choices that pay rent.\u003c/p\u003e\n\u003cp\u003eA per-workspace \u003ccode\u003etest:unit\u003c/code\u003e / \u003ccode\u003etest:integration\u003c/code\u003e split. \u003ccode\u003eapps/api\u003c/code\u003e and \u003ccode\u003epackages/db\u003c/code\u003e own DB-backed integration suites that need a live Postgres on port 14002. The unit slice runs everywhere (fresh clone, sandbox, CI), and the per-task agent loop only runs \u003ccode\u003etest:unit\u003c/code\u003e. The orchestrator runs \u003ccode\u003etest:integration\u003c/code\u003e once at milestone boundaries on a machine with the test DB up. This split is the difference between a 6-second per-task gate and a 90-second one.\u003c/p\u003e\n\u003cp\u003eA self-asserting matrix. \u003ccode\u003epackages/types/src/__tests__/verify-coverage.test.ts\u003c/code\u003e declares which workspaces are expected to participate in which gates. If a new workspace is added without being wired into \u003ccode\u003escripts/verify.sh\u003c/code\u003e, that test fails. The gate audits itself.\u003c/p\u003e\n\u003ch2 id=\"lessons-the-hard-won-kind\"\u003eLessons (the hard-won kind)\u003c/h2\u003e\n\u003ch3 id=\"1-specs-are-guidance-code-is-truth\"\u003e1. Specs are guidance. Code is truth.\u003c/h3\u003e\n\u003cp\u003eThe single most common failure mode across 53 completed tasks was spec drift. The phase docs were written upfront, the code evolved, the agents trusted the spec. Examples from the log:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eT5.1a: spec said \u003ccode\u003eGET /sessions/:id/next-question\u003c/code\u003e. That endpoint does not exist in the live API. The agent verified against \u003ccode\u003eapps/api/src/routes/sessions.ts\u003c/code\u003e, found that the actual web flow loads questions upfront via a different endpoint, and built the state machine around the real shape. The handoff note to T5.1b documented the correction so the next agent didn't re-discover it.\u003c/li\u003e\n\u003cli\u003eT0.1: spec said \u003ccode\u003eapps/api/src/middleware/auth.ts\u003c/code\u003e but the live code path was \u003ccode\u003eapps/api/src/plugins/auth.ts\u003c/code\u003e. Agent updated the real file. Handoff noted the canonical path so T0.7 didn't trip on the same thing.\u003c/li\u003e\n\u003cli\u003eT4.3: spec pseudocode used \u003ccode\u003eKjScreen onRefresh/refreshing\u003c/code\u003e props. The real component takes \u003ccode\u003erefreshControl={\u0026lt;RefreshControl/\u0026gt;}\u003c/code\u003e. Multiple primitive prop drifts caught in one task.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe rule I baked into every agent's spawn prompt:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eIf the spec disagrees with the live code, the live code wins. Update the spec section's path/shape if you're sure, and document the correction in your handoff note.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe cost is one extra \u003ccode\u003egrep\u003c/code\u003e per task. The benefit is that every subsequent agent inherits a corrected model.\u003c/p\u003e\n\u003ch3 id=\"2-force-the-handoff-dont-trust-the-agent-to-volunteer-it\"\u003e2. Force the handoff. Don't trust the agent to volunteer it.\u003c/h3\u003e\n\u003cp\u003eHalf the value of the \u003ccode\u003eagents.log\u003c/code\u003e entries is the bottom \u003ccode\u003enotes:\u003c/code\u003e field. The first dozen agents barely filled it in. So the spawn prompt became explicit:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eYour final report MUST include a \u003ccode\u003eHandoff\u003c/code\u003e paragraph for the next dependent task: the precondition it can rely on, the path you actually used (not what the spec said), and any decision it does not need to re-litigate.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAfter this change, every entry has a usable handoff. The pattern is so reliable I caught one bug just by reading the previous entry's handoff against the current task's spec. They disagreed, the previous agent had been right, and the spec was stale.\u003c/p\u003e\n\u003ch3 id=\"3-agents-bail-mid-investigation-make-them-flip-the-row-before-they-exit\"\u003e3. Agents bail mid-investigation. Make them flip the row before they exit.\u003c/h3\u003e\n\u003cp\u003eThis was the most expensive failure mode. An agent finishes the code, runs \u003ccode\u003epnpm verify\u003c/code\u003e, sees green, then, instead of flipping the \u003ccode\u003eTASKS.md\u003c/code\u003e row to \u003ccode\u003edone\u003c/code\u003e, drops out of the loop with \u0026quot;Let me check the actual component interfaces\u0026quot; as their final line. The work is done. The bookkeeping is not.\u003c/p\u003e\n\u003cp\u003eWhen the next agent claims the next row, it sees \u003ccode\u003ein_progress\u003c/code\u003e from the previous row and refuses to spawn (the precondition is \u003ccode\u003ein_progress == 0\u003c/code\u003e). The orchestrator has to absorb the bookkeeping by hand.\u003c/p\u003e\n\u003cp\u003eThe fix in the spawn prompt:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eBefore reporting, you MUST: (1) run \u003ccode\u003epnpm verify\u003c/code\u003e to completion, (2) flip your row in TASKS.md to \u003ccode\u003edone\u003c/code\u003e, (3) decrement \u003ccode\u003ein_progress\u003c/code\u003e and increment \u003ccode\u003edone\u003c/code\u003e in the status counts. Report only after these three things are visible in the working tree.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003ePlus an explicit confirmation line in the report:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026quot;TASKS.md flipped to \u003ccode\u003edone\u003c/code\u003e, counts updated.\u0026quot;\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAfter this change the bail rate dropped to roughly zero. Two agents that did bail in M4 (T4.4, T4.7) were caught by the orchestrator at the spawn precondition check and the row was finalized in seconds, with the agent's actual work intact in the working tree.\u003c/p\u003e\n\u003ch3 id=\"4-codex-review-as-a-cheap-second-opinion\"\u003e4. Codex review as a cheap second opinion\u003c/h3\u003e\n\u003cp\u003eAfter any non-trivial implementation, I run:\u003c/p\u003e\n\u003cpre tabindex=\"0\" style=\"color:#e5e5e5;background-color:#000;\"\u003e\u003ccode\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003ecodex \u003cspan style=\"color:#fff;font-weight:bold\"\u003eexec\u003c/span\u003e --sandbox read-only \u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#34;Review for bugs and logic errors\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003cp\u003eIt's a different model with a fresh context window reading the diff cold. It catches things the implementing agent missed because the implementing agent was deep inside its own assumptions.\u003c/p\u003e\n\u003cp\u003eThe KjLessonWebView task (T4.2) is a clean example. The implementing agent shipped it. Codex flagged two real issues: (1) \u003ccode\u003eonHeightChange\u003c/code\u003e presence was incorrectly switching the WebView to content-height layout mode, and (2) \u003ccode\u003eDOM_READY_JS\u003c/code\u003e was running twice (once inside \u003ccode\u003ebuildKatexDoc\u003c/code\u003e's \u003ccode\u003eDOMContentLoaded\u003c/code\u003e handler, again via \u003ccode\u003einjectedJavaScript\u003c/code\u003e). Both got fixed in the same commit before the milestone push.\u003c/p\u003e\n\u003cp\u003eI treat Codex as a peer reviewer with zero relationship to the agent that wrote the code. The cost is one tool call per task. The catch rate is meaningful.\u003c/p\u003e\n\u003ch3 id=\"5-codemod-what-you-can\"\u003e5. Codemod what you can\u003c/h3\u003e\n\u003cp\u003e\u003ccode\u003epackages/ui/src/icons/icon-renderers.tsx\u003c/code\u003e has 519 named SVG icons used across the web app. The naive approach (hand-port each to \u003ccode\u003ereact-native-svg\u003c/code\u003e) was budgeted at three days.\u003c/p\u003e\n\u003cp\u003eInstead, T2.4a hand-ported the first 30 to establish the pattern: default export function, \u003ccode\u003ereact-native-svg\u003c/code\u003e elements, \u003ccode\u003eSvgComponentProps\u003c/code\u003e props. Then T2.4b ran a codemod at \u003ccode\u003epackages/ui-mobile/scripts/port-icons.mjs\u003c/code\u003e over the remaining 489. 476 ported cleanly. 22 needed hand-port, because they use \u003ccode\u003e\u0026lt;text\u0026gt;\u003c/code\u003e SVG elements or \u003ccode\u003e.map()\u003c/code\u003e in their renderers, and the skip list lives at \u003ccode\u003epackages/ui-mobile/src/icons/skipped.ts\u003c/code\u003e so the parity test can prove every web icon is either ported or explicitly skipped.\u003c/p\u003e\n\u003cp\u003eT2.4c ran a parity gate test: walk every icon in the web registry, assert it exists in the mobile registry or in the skip list. If a new web icon ships, the mobile gate fails until the icon is either ported or skipped. That gate runs as part of \u003ccode\u003epnpm verify\u003c/code\u003e.\u003c/p\u003e\n\u003cp\u003eThe whole sub-phase shipped in under three hours of wall clock, including the codemod write itself. Three days saved.\u003c/p\u003e\n\u003ch3 id=\"6-three-file-env-var-rule\"\u003e6. Three-file env-var rule\u003c/h3\u003e\n\u003cp\u003eWhenever any service reads \u003ccode\u003eprocess.env.X\u003c/code\u003e, the rule is:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eAdd the var with a safe default to \u003ccode\u003e.env.example\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003eAdd a \u003ccode\u003eVAR=${VAR}\u003c/code\u003e placeholder to \u003ccode\u003e.env.dokploy\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003eSet the real value in the Dokploy production env config\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eMiss any of the three and the next deploy silently breaks. I've shipped two regressions to this rule before automating it. Both took longer to debug than the rule takes to follow. The \u003ccode\u003edeploy-supervisor\u003c/code\u003e skill now scans \u003ccode\u003eprocess.env.X\u003c/code\u003e references against \u003ccode\u003e.env.dokploy\u003c/code\u003e at push time and refuses to deploy if any var is missing.\u003c/p\u003e\n\u003cp\u003eSame principle applied to the mobile build: every new env var consumed by \u003ccode\u003eapps/mobile\u003c/code\u003e (currently \u003ccode\u003eEXPO_PUBLIC_API_BASE_URL\u003c/code\u003e, \u003ccode\u003eAPP_VARIANT\u003c/code\u003e) goes through all three files. If a future agent tries to read a new var without registering it, \u003ccode\u003edeploy-supervisor\u003c/code\u003e blocks the push.\u003c/p\u003e\n\u003ch3 id=\"7-plan-up-front-execute-without-thinking\"\u003e7. Plan up front. Execute without thinking.\u003c/h3\u003e\n\u003cp\u003eThe 19 plan documents (\u003ccode\u003e00-README.md\u003c/code\u003e through \u003ccode\u003e10-phase-8-advanced.md\u003c/code\u003e plus parity matrix and conventions) total roughly 130 KB of markdown. They were written before any code was. They include:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eLocked architecture decisions (no agent may re-litigate)\u003c/li\u003e\n\u003cli\u003eConcrete file paths per task\u003c/li\u003e\n\u003cli\u003eCode sketches just detailed enough to anchor structure\u003c/li\u003e\n\u003cli\u003e\u0026quot;Done when\u0026quot; checklists\u003c/li\u003e\n\u003cli\u003eA glossary\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eWriting this upfront felt slow. It's the highest-leverage decision I've made on this project. Every minute spent writing a clear \u0026quot;Done when\u0026quot; line in T4.6 saved an hour of agent thrashing during execution. Agents that hit ambiguity stall and start asking the orchestrator questions, which means I get paged in the middle of the night.\u003c/p\u003e\n\u003cp\u003eThe phase docs are written for \u0026quot;an autonomous coding agent (or human engineer) picking up cold.\u0026quot; That framing forces self-containment.\u003c/p\u003e\n\u003ch2 id=\"what-the-math-actually-looks-like\"\u003eWhat the math actually looks like\u003c/h2\u003e\n\u003cp\u003eWall clock over the recent two-day window:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eM0 (backend prep): 7 tasks, ~1.5 hours\u003c/li\u003e\n\u003cli\u003eM1 (foundation): 11 tasks, ~3.5 hours including dependency churn\u003c/li\u003e\n\u003cli\u003eM2 (design system): 9 tasks, ~6 hours (the codemod sub-phase compressed what was budgeted as 3 days)\u003c/li\u003e\n\u003cli\u003eM3 (auth shell): 8 tasks, ~3 hours\u003c/li\u003e\n\u003cli\u003eM4 (core content): 9 tasks, ~4 hours including the KaTeX prototype and offline cache\u003c/li\u003e\n\u003cli\u003eM5 (quiz): 4 of 10 tasks shipped so far, ~1 hour\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eTotal: ~19 hours of agent wall-clock for what the original plan estimated as ~7 weeks of solo founder calendar time. Not all of that was overnight, but most of M2–M4 ran while I was asleep. The orchestrator sent push notifications on milestone completions and on blocker surfacing; I woke up to a working \u003ccode\u003e(dashboard)/subjects/[slug]/[topicSlug]/[subtopicSlug]\u003c/code\u003e lesson reader I had not touched.\u003c/p\u003e\n\u003cp\u003eThings the system has not had to deal with yet:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eNative module integration (Apple IAP, Google Play Billing, Phase 9)\u003c/li\u003e\n\u003cli\u003eReal device testing (currently sim-only; release pipeline is Phase 11)\u003c/li\u003e\n\u003cli\u003eA merge conflict (single-threaded execution + \u003ccode\u003egit pull --rebase\u003c/code\u003e at agent start prevents this entirely)\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eI expect Phase 9 (IAP) to be the model's first real stress test, because eight of those tasks are \u003ccode\u003eblocked\u003c/code\u003e on external Apple/Google account state that no agent can resolve.\u003c/p\u003e\n\u003ch2 id=\"what-id-tell-someone-setting-this-up-tomorrow\"\u003eWhat I'd tell someone setting this up tomorrow\u003c/h2\u003e\n\u003col\u003e\n\u003cli\u003eWrite the plan docs first. All of them. Before any code. The plan docs are the spec the agents read. If they're vague, the agents will fight the same battle three times across three tasks.\u003c/li\u003e\n\u003cli\u003eThe queue is one markdown file. Not a database, not a task service. Drift between the file and the system breaks everything. Make the file the system.\u003c/li\u003e\n\u003cli\u003eAgents must not touch git. Let them code. Let them test. Let them flip the tracker. Push from one place, one time per milestone group. Audit log is append-only.\u003c/li\u003e\n\u003cli\u003eThe pre-commit hook is your QA team. \u003ccode\u003epnpm verify\u003c/code\u003e runs every gate every time. If it can't catch a class of bug, harden it once. Don't review by hand.\u003c/li\u003e\n\u003cli\u003eForce the handoff in the spawn prompt. The next agent's success depends on the previous agent's last paragraph. Make that paragraph contractual.\u003c/li\u003e\n\u003cli\u003eA second model reviews everything. Codex (or any agent with a fresh context window and read-only access) catches assumption-blindness from the implementing agent. It's the cheapest review you'll ever do.\u003c/li\u003e\n\u003cli\u003eSpecs are guidance. Code is truth. Bake this into the spawn prompt verbatim. Agents that trust the spec over the code will compound errors.\u003c/li\u003e\n\u003cli\u003ePlan for the bail. Agents will exit mid-task. Make the orchestrator's precondition (\u003ccode\u003ein_progress == 0\u003c/code\u003e) self-healing: if a row is stuck \u003ccode\u003ein_progress\u003c/code\u003e, finalize it from the working-tree state and move on. Do not block the loop on a bail.\u003c/li\u003e\n\u003cli\u003eMilestone-batch the commits. Per-task commits are unreadable. Per-milestone commits are shippable units. The trade-off (coarser rollback granularity) is worth it for clean history and a clear push contract.\u003c/li\u003e\n\u003cli\u003ePush notifications on milestone completion and on blockers. Otherwise you wake up to a system that paused at 3 a.m. waiting for a question you could have answered in 30 seconds.\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"the-bits-i-havent-solved-yet\"\u003eThe bits I haven't solved yet\u003c/h2\u003e\n\u003cp\u003eHonest list:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003ePhase 9 (IAP) has 8 externally-blocked tasks. Apple Developer enrollment, Small Business Program, App Store Connect product setup, Google Play Console, User Choice Billing application. The loop walks around them via the dependency graph, but the eventual unblocking is a sequence of two-day-each turnaround items that no automation can compress.\u003c/li\u003e\n\u003cli\u003eReal device testing. The smoke test passes on iOS Simulator and Android Emulator. Real-device QA on a TestFlight build is currently a manual gate scheduled for Phase 11.\u003c/li\u003e\n\u003cli\u003eSpec drift detection. Agents flag drift in their handoff notes, but the spec doc itself is never auto-updated. After M5 closes I plan a sweep agent that ingests every \u003ccode\u003eagents.log\u003c/code\u003e \u003ccode\u003eSpec drift:\u003c/code\u003e note and proposes corrections to the phase docs.\u003c/li\u003e\n\u003cli\u003eLong-form lessons learned never propagate back to the spawn prompt. The seven lessons in the previous section live in this blog post and in my head. They should live in a \u003ccode\u003eCONTRIBUTING-FOR-AGENTS.md\u003c/code\u003e that every spawn loads. That refactor is on the list.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2 id=\"closing\"\u003eClosing\u003c/h2\u003e\n\u003cp\u003eNone of this is novel. Every individual ingredient (append-only audit logs, single-threaded queues, pre-commit verification gates, milestone-batched commits, codemods for boring transforms, second-model review) is engineering practice from before LLMs existed.\u003c/p\u003e\n\u003cp\u003eWhat changed is that the things you used to need a team for now run on a laptop with an agent that you brief like a junior engineer. There's no clever prompt to copy. The work is writing a plan boring enough to execute mechanically, building a pre-commit gate strict enough to be the only reviewer, and refusing to let an agent touch the git history.\u003c/p\u003e\n\u003cp\u003eThe hard part of solo engineering used to be doing the work. Now the hard part is deciding what work to do, and writing it down clearly enough that the agent doesn't have to ask.\u003c/p\u003e\n\u003chr\u003e\n\u003cp\u003e\u003cem\u003eSources: \u003ccode\u003eapps/mobile/plans/TASKS.md\u003c/code\u003e, \u003ccode\u003eapps/mobile/plans/logs/agents.log\u003c/code\u003e, \u003ccode\u003eapps/mobile/plans/00-README.md\u003c/code\u003e, \u003ccode\u003eapps/mobile/plans/01-architecture-decisions.md\u003c/code\u003e, \u003ccode\u003eapps/mobile/CLAUDE.md\u003c/code\u003e, \u003ccode\u003escripts/verify.sh\u003c/code\u003e. All numbers and quotes are from the actual files; nothing has been edited for narrative effect.\u003c/em\u003e\u003c/p\u003e\n",
      "summary": "How a single-threaded fleet of Claude Code agents walked a 115-task queue and shipped 50 tasks (5 of 12 milestones) to main while I slept, without ever touching git.",
      "date_published": "2026-05-12T00:00:00Z",
      "tags": [
        "ai",
        "agents",
        "mobile",
        "react-native",
        "kelasjenius"
      ]
    },
    {
      "id": "https://irvineafri.com/blog/autonomous-agents-in-an-indonesian-company",
      "url": "https://irvineafri.com/blog/autonomous-agents-in-an-indonesian-company",
      "title": "Autonomous agents inside an Indonesian company",
      "content_html": "\u003cblockquote\u003e\n\u003cp\u003eNumbers are real but rounded. Rupiah figures use IDR 16,000/USD as\nthe lazy exchange anchor I keep in my head. Calibrated against a\n2026 Q1 production run on GCP \u003ccode\u003easia-southeast2\u003c/code\u003e, hitting OpenAI\nvia Azure Singapore, Anthropic in \u003ccode\u003eus-west\u003c/code\u003e, and a self-hosted\nLlama 3.3 70B for the cheap stuff.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eMost \u0026quot;agent\u0026quot; articles pretend the loop is solved. Call the LLM,\nparse the tool call, run it, feed the result back. Done. That's the\ndemo loop. The production loop is a different animal, and once you\nship one of these for an Indonesian company with rupiah on the line\nand an OJK auditor on speed-dial, the differences stop being\nacademic.\u003c/p\u003e\n\u003cp\u003eI've been running autonomous agents inside that kind of company for\nabout a year. This is the writeup I wish somebody had handed me on\nday one. The audience is engineers who already know what an MCP\nserver is, what a tool-call schema looks like, and roughly what an\n\u003ccode\u003eo1\u003c/code\u003e-style reasoning trace costs per token. I'm skipping the\nmarketing layer.\u003c/p\u003e\n\u003ch2 id=\"what-agent-means-here\"\u003eWhat \u0026quot;agent\u0026quot; means here\u003c/h2\u003e\n\u003cp\u003eA long-running process that takes a goal, plans, calls tools,\nwatches the world, retries, escalates when it gets stuck, and\nproduces a durable artifact. Not a chatbot. Not a single LLM call\nin a retry loop. Something with state that survives a process\nrestart, and a coordinator that decides when the work is done.\u003c/p\u003e\n\u003cp\u003eThe agent we run most often does collections triage. Given a\ndelinquent borrower, it pulls the loan history, checks the WhatsApp\nengagement, drafts a tailored outreach, fires the first contact,\nwatches the response, and either escalates to a human collector or\nschedules a follow-up. End to end: 40 to 90 seconds wall-clock,\n20 to 50 LLM calls, 6 to 12 tool calls. Runs about 12,000 times a\nday at peak.\u003c/p\u003e\n\u003cp\u003eThat's the shape. Now the parts.\u003c/p\u003e\n\u003ch2 id=\"1-orchestration\"\u003e1. Orchestration\u003c/h2\u003e\n\u003cp\u003eFirst decision: graph framework or hand-rolled. We tried both.\nLangGraph, BAML, Inngest are all wonderful for the walkthrough\ndemo. They become a tax the moment your control flow stops being\na DAG. And real agent control flow is \u003cem\u003enot\u003c/em\u003e a DAG. It has loops,\ndynamic branches based on tool output, and at-least-once retries\nthat need state-machine guarantees the framework's abstractions\nweren't built to express. We spent more time fighting the\nframework than we saved.\u003c/p\u003e\n\u003cp\u003eSo we wrote our own. State machine over Postgres + RabbitMQ. The\nshape:\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003e[pending]\n   │\n   ▼\n[running]  ◄────┐\n   │            │  resumed after\n   ▼            │  tool callback\n[awaiting_tool]─┘\n   │\n   ▼\n[completed | failed | escalated]\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eEvery transition writes a row to \u003ccode\u003eagent_runs.events\u003c/code\u003e (append-only)\nand updates \u003ccode\u003eagent_runs.state\u003c/code\u003e atomically, in the same transaction.\nThat single decision is load-bearing. Every model call, every tool\ncall, every external observation lands in the database as an event.\nIf a worker dies mid-run, and they do, often, because Indonesian\ndata centres lose power in ways AWS post-mortems don't capture,\nanother worker reads the log and resumes from the last consistent\nstate.\u003c/p\u003e\n\u003cp\u003eThe pseudocode that earns its keep:\u003c/p\u003e\n\u003cpre tabindex=\"0\" style=\"color:#e5e5e5;background-color:#000;\"\u003e\u003ccode\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#fff;font-weight:bold\"\u003edef\u003c/span\u003e step(run_id):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#fff;font-weight:bold\"\u003ewith\u003c/span\u003e txn():\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        run = lock_run(run_id)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#fff;font-weight:bold\"\u003eif\u003c/span\u003e run.state == \u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#39;awaiting_tool\u0026#39;\u003c/span\u003e:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            \u003cspan style=\"color:#fff;font-weight:bold\"\u003ereturn\u003c/span\u003e  \u003cspan style=\"color:#007f7f\"\u003e# someone else\u0026#39;s problem\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        events = load_events(run_id)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        next_action = plan(run, events)  \u003cspan style=\"color:#007f7f\"\u003e# an LLM call\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#fff;font-weight:bold\"\u003eif\u003c/span\u003e next_action.kind == \u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#39;tool\u0026#39;\u003c/span\u003e:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            event = emit(\u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#39;tool_call.requested\u0026#39;\u003c/span\u003e, next_action)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            run.state = \u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#39;awaiting_tool\u0026#39;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            run.save()\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            enqueue_tool(event)         \u003cspan style=\"color:#007f7f\"\u003e# RabbitMQ delayed-message exchange\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        \u003cspan style=\"color:#fff;font-weight:bold\"\u003eelif\u003c/span\u003e next_action.kind == \u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#39;finish\u0026#39;\u003c/span\u003e:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            run.state = \u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#39;completed\u0026#39;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            run.save()\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e            emit(\u003cspan style=\"color:#0ff;font-weight:bold\"\u003e\u0026#39;run.completed\u0026#39;\u003c/span\u003e, next_action.result)\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003cp\u003eThe trick is that \u003ccode\u003eawaiting_tool\u003c/code\u003e is a real, stable state with its\nown timeout. Tools are \u003cem\u003ejobs\u003c/em\u003e, not function calls. Calling a tool\nmeans publishing a message. A callback later delivers the result.\nThat's what makes a 90-second agent run with three external HTTP\nhops survivable when one of those hops takes 12 seconds because\nsome upstream API is having a moment.\u003c/p\u003e\n\u003ch2 id=\"2-memory\"\u003e2. Memory\u003c/h2\u003e\n\u003cp\u003eThere are three kinds, and they have nothing to do with each\nother. Pretending they're one thing (a \u0026quot;memory layer,\u0026quot; a vector\nstore) is the most common mistake I see.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRun-local memory\u003c/strong\u003e is the scratchpad inside one agent run.\nEverything the agent has seen so far, including its own intermediate\nreasoning. We store it as the event log on \u003ccode\u003eagent_runs\u003c/code\u003e. Replaying\nthe events deterministically reconstructs the prompt for the next\nstep. Token budget: 32k before we summarise.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEpisodic memory\u003c/strong\u003e is what this agent remembers about \u003cem\u003ethis\nborrower\u003c/em\u003e across past runs. We tried vector stores: \u003ccode\u003epgvector\u003c/code\u003e,\nWeaviate, Qdrant. Burned three months chasing retrieval relevance.\nWhat actually shipped was a structured episodic table:\u003c/p\u003e\n\u003cpre tabindex=\"0\" style=\"color:#e5e5e5;background-color:#000;\"\u003e\u003ccode\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#fff;font-weight:bold\"\u003eCREATE\u003c/span\u003e \u003cspan style=\"color:#fff;font-weight:bold\"\u003eTABLE\u003c/span\u003e borrower_episodes (\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  borrower_id   \u003cspan style=\"color:#fff;font-weight:bold\"\u003ebigint\u003c/span\u003e,\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  episode_at    timestamptz,\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  channel       \u003cspan style=\"color:#fff;font-weight:bold\"\u003etext\u003c/span\u003e,        \u003cspan style=\"color:#007f7f\"\u003e-- \u0026#39;wa\u0026#39;, \u0026#39;voice\u0026#39;, \u0026#39;sms\u0026#39;\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#007f7f\"\u003e\u003c/span\u003e  outcome       \u003cspan style=\"color:#fff;font-weight:bold\"\u003etext\u003c/span\u003e,        \u003cspan style=\"color:#007f7f\"\u003e-- \u0026#39;paid\u0026#39;, \u0026#39;pkpu\u0026#39;, \u0026#39;no_answer\u0026#39;, ...\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#007f7f\"\u003e\u003c/span\u003e  notes         \u003cspan style=\"color:#fff;font-weight:bold\"\u003etext\u003c/span\u003e,\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e  vector        vector(\u003cspan style=\"color:#ff0;font-weight:bold\"\u003e768\u003c/span\u003e)  \u003cspan style=\"color:#007f7f\"\u003e-- mE5, multilingual\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#007f7f\"\u003e\u003c/span\u003e);\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003cp\u003eRetrieval is \u003ccode\u003eWHERE borrower_id = $1 ORDER BY episode_at DESC LIMIT 20\u003c/code\u003e.\nThe vector column is reserved for the rare \u0026quot;find episodes\nsemantically like this one\u0026quot; query that shows up maybe once a week.\nThe vector index is the cherry on top, not the cake. People keep\nflipping that around.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eProcedural memory\u003c/strong\u003e is the prompt. We version every system prompt\nwith \u003ccode\u003egit\u003c/code\u003e, hash it, and stamp the hash on every run. When somebody\n\u0026quot;fixes\u0026quot; a regression by editing the prompt, we can replay the\noffending run against both versions and see which one it was born\nunder. Sounds boring. Will save you a sprint the first time a\nquality drop bisects to a four-word edit.\u003c/p\u003e\n\u003ch2 id=\"3-tools\"\u003e3. Tools\u003c/h2\u003e\n\u003cp\u003eThe mistake is one big tool with a hundred arguments. The shape\nthat survives is many small tools, each with a tight, validated\ninput schema, each idempotent.\u003c/p\u003e\n\u003cp\u003eEvery tool gets:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eA Zod-style schema for inputs.\u003c/li\u003e\n\u003cli\u003eA canonical idempotency key derived from inputs + run id.\u003c/li\u003e\n\u003cli\u003eA timeout. \u003ccode\u003ep99\u003c/code\u003e of normal latency × 3, capped at 30 seconds\nfor the synchronous request, longer for the async job.\u003c/li\u003e\n\u003cli\u003eA circuit breaker per downstream system.\u003c/li\u003e\n\u003cli\u003eAn audit row in \u003ccode\u003eagent_tool_calls\u003c/code\u003e with the full request and\nresponse payloads, encrypted at rest.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe audit table isn't optional. Indonesian fintechs have auditors,\nand when an auditor asks \u0026quot;what did this agent do on borrower xyz?\u0026quot;,\nthe answer needs to be one query. I've watched a peer team scramble\nfor two days reconstructing this from logs after the fact. Don't be\nthat team.\u003c/p\u003e\n\u003cp\u003eA failure that quietly costs you: the LLM hallucinates a tool name\nthat doesn't exist, or hallucinates an argument with the\nslightly-wrong type. The framework most tutorials show you swallows\nthis and feeds a string error back to the model, hoping it\nself-corrects. In production you want the orchestrator to detect\n\u0026quot;hallucinated tool / schema\u0026quot; as a \u003cem\u003ecategory\u003c/em\u003e of failure, count it,\nalert when it spikes, and fall back to a smaller, stricter model\nfor the next attempt. We've watched \u003ccode\u003egpt-5\u003c/code\u003e regress on a Wednesday\nafternoon because of a quiet upstream model update. That's where\nthis metric earns its keep.\u003c/p\u003e\n\u003ch2 id=\"4-permissions\"\u003e4. Permissions\u003c/h2\u003e\n\u003cp\u003eThe dangerous question: what is the agent allowed to do?\u003c/p\u003e\n\u003cp\u003eThe lazy answer is \u0026quot;whatever its tools let it do.\u0026quot; That answer\nships exactly once. After that, compliance puts a hold on every\nagent project for six months. I've seen it happen.\u003c/p\u003e\n\u003cp\u003eWhat works:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eTools declare a \u003cem\u003ecapability\u003c/em\u003e (\u003ccode\u003epayment.disburse\u003c/code\u003e,\n\u003ccode\u003eborrower.send_wa\u003c/code\u003e, \u003ccode\u003eborrower.read_pii\u003c/code\u003e).\u003c/li\u003e\n\u003cli\u003eEach agent run is bound to an \u003cem\u003eactor\u003c/em\u003e, not a service account.\nFor autonomous runs, the actor is a synthetic identity tied to\nthe workflow definition (\u003ccode\u003eagent:collections-tier-1\u003c/code\u003e).\u003c/li\u003e\n\u003cli\u003eThe orchestrator enforces capability scoping \u003cem\u003ebefore\u003c/em\u003e the tool\nis dispatched, against a per-actor policy table.\u003c/li\u003e\n\u003cli\u003eCapabilities have soft and hard caps. \u003ccode\u003epayment.disburse\u003c/code\u003e for\n\u003ccode\u003eagent:collections-tier-1\u003c/code\u003e has a hard cap of IDR 0 (it cannot\nmove money) and a soft cap of zero in any policy revision.\nEscalating beyond it requires a human approver in the event\nlog, full stop.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe enforcement point matters. Don't put it in the tool. Put it in\nthe dispatcher. Tools assume their inputs are already authorised.\nThat's one audit boundary. Putting the check in N tools means N\naudit boundaries, written by N engineers, each of whom forgot\nsomething different. I learned this the slow way.\u003c/p\u003e\n\u003ch2 id=\"5-reliability\"\u003e5. Reliability\u003c/h2\u003e\n\u003cp\u003eLLM endpoints are not reliable infrastructure. Treat them like\nflaky third-party APIs, because that is what they are.\u003c/p\u003e\n\u003cp\u003eProduction reliability budget for a single agent run, last quarter:\u003c/p\u003e\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth\u003eSource\u003c/th\u003e\n\u003cth\u003eFailure rate (Q1 2026)\u003c/th\u003e\n\u003cth\u003eMitigation\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd\u003eOpenAI 5xx\u003c/td\u003e\n\u003ctd\u003e0.4%\u003c/td\u003e\n\u003ctd\u003eretry × 2 with jitter\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eAnthropic 5xx\u003c/td\u003e\n\u003ctd\u003e0.6%\u003c/td\u003e\n\u003ctd\u003eretry × 2 with jitter\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eOpenAI rate-limit\u003c/td\u003e\n\u003ctd\u003e1.1%\u003c/td\u003e\n\u003ctd\u003emodel-level priority queue\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eTool timeout\u003c/td\u003e\n\u003ctd\u003e0.9%\u003c/td\u003e\n\u003ctd\u003eper-tool circuit breaker\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eHallucinated schema\u003c/td\u003e\n\u003ctd\u003e0.3%\u003c/td\u003e\n\u003ctd\u003estrict-mode reattempt\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eIndo network\u003c/td\u003e\n\u003ctd\u003e0.2%\u003c/td\u003e\n\u003ctd\u003econnection pool warming + retry\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eCompose those naively and you get a 3.5% per-call failure rate.\nAcross a 30-LLM-call run, the unmitigated joint failure probability\nis around 65%. Mitigations bring it under 2%. The gap between \u0026quot;demo\nworks\u0026quot; and \u0026quot;demo works on Friday afternoon when GPT is degraded\u0026quot; is\nexactly this list.\u003c/p\u003e\n\u003cp\u003eTwo patterns I keep coming back to:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIdempotent at the agent level, not just the tool level.\u003c/strong\u003e If a\nworker crashes mid-step and another resumes, the resumer should\nproduce the same effects, not duplicate ones. The event log is what\nenforces this. The resumer reads \u0026quot;tool X was already requested with\nidempotency key K\u0026quot; and skips re-emitting. The resume is silent.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA \u003ccode\u003eresume\u003c/code\u003e is not a \u003ccode\u003eretry\u003c/code\u003e.\u003c/strong\u003e Resume picks up after the last\ndurable state. Retry replays the last step. Both are needed, in\ndifferent scenarios. Conflating them is how you send a borrower\nthe same WhatsApp twice.\u003c/p\u003e\n\u003ch2 id=\"6-observability\"\u003e6. Observability\u003c/h2\u003e\n\u003cp\u003eTracing an agent is harder than tracing a microservice. A single\nrun has dozens of LLM calls, dozens of tool calls, branching\nreasoning, prompt-version changes, and a result that may not be\n\u0026quot;success\u0026quot; or \u0026quot;failure\u0026quot; but \u0026quot;escalated to human.\u0026quot;\u003c/p\u003e\n\u003cp\u003eWhat worked for us: OpenTelemetry for transport, Langfuse for the\nagent-aware UI, and a custom trace structure where every event in\nthe agent's event log emits its own span.\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003erun.collections_triage  74,231 ms\n├─ plan.step.0           1,482 ms   gpt-5  · 2.4k/0.3k tok\n├─ tool.borrower.read      210 ms\n├─ plan.step.1           1,623 ms   gpt-5  · 4.1k/0.5k tok\n├─ tool.wa.history       1,820 ms\n├─ plan.step.2             842 ms   haiku  · 1.2k/0.1k tok\n├─ tool.outreach.draft   3,118 ms\n├─ tool.outreach.send   12,344 ms   ← retry × 2\n├─ plan.step.3           1,099 ms   gpt-5\n└─ run.completed\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eThat view puts model timing, tool timing, per-step token cost,\nand retries onto one screen. When a teammate Slacks me \u0026quot;this run\nwas slow,\u0026quot; I can answer in under 30 seconds.\u003c/p\u003e\n\u003cp\u003eThe metric that earned its keep: \u003cstrong\u003eescalation rate per\nsub-workflow\u003c/strong\u003e. Not per agent. Not per model. Per \u003cem\u003enamed step in\nthe workflow\u003c/em\u003e. When a particular step starts escalating more\noften, it almost always points to a model regression, a prompt\nedit, or a downstream tool returning a new error shape. None of\nthose show up on a top-level success metric.\u003c/p\u003e\n\u003ch2 id=\"7-scaling\"\u003e7. Scaling\u003c/h2\u003e\n\u003cp\u003eThe bottleneck is rarely compute. It's almost always one of: rate\nlimits at the model provider, latency at a downstream tool, or\nworker concurrency tuned wrong.\u003c/p\u003e\n\u003cp\u003eCost shape for our collections triage agent at 12,000 runs/day:\u003c/p\u003e\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth\u003eComponent\u003c/th\u003e\n\u003cth\u003ePer run\u003c/th\u003e\n\u003cth\u003eDaily\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd\u003eGPT-5 plan steps\u003c/td\u003e\n\u003ctd\u003e$0.014\u003c/td\u003e\n\u003ctd\u003e$168\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eHaiku 4.5 sub-steps\u003c/td\u003e\n\u003ctd\u003e$0.002\u003c/td\u003e\n\u003ctd\u003e$24\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eSelf-hosted Llama 3.3\u003c/td\u003e\n\u003ctd\u003e$0.0008\u003c/td\u003e\n\u003ctd\u003e$9.60\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003ePostgres / RMQ infra\u003c/td\u003e\n\u003ctd\u003e(amortised)\u003c/td\u003e\n\u003ctd\u003e$42\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eObservability stack\u003c/td\u003e\n\u003ctd\u003e(amortised)\u003c/td\u003e\n\u003ctd\u003e$18\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cstrong\u003eTotal\u003c/strong\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cstrong\u003e$0.018\u003c/strong\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cstrong\u003e$216 / day\u003c/strong\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eIn rupiah that's about IDR 3.5M/day, or IDR ~290 per run. The\nhuman collector who would otherwise make the first call costs\nroughly IDR 14,000 per touch, all-in. Unit economics work, but\nonly because we keep the planner cheap (Haiku on the easy steps,\nGPT-5 only when the plan branches into something nontrivial) and\nwe cap out-of-budget runs at the orchestrator level. Without that\ncap, the first model spike caught us at 4× the budget for ten\nhours straight.\u003c/p\u003e\n\u003cp\u003eThe scaling lever that mattered most was \u003cem\u003emoving inference to\n\u003ccode\u003easia-southeast\u003c/code\u003e\u003c/em\u003e. Cross-region calls to OpenAI's US endpoints\nwere adding 180-220 ms median per call. On a 30-call run that's\nabout 6 seconds of pure latency tax. Once we routed bulk traffic\nthrough Azure OpenAI in Singapore and kept Anthropic in \u003ccode\u003eus-west\u003c/code\u003e\nonly for the long-context steps, p99 dropped from ~118 seconds to\n~71. That is the difference between a borrower picking up the\nphone and not.\u003c/p\u003e\n\u003ch2 id=\"8-failure-recovery\"\u003e8. Failure recovery\u003c/h2\u003e\n\u003cp\u003eEvery agent run is a finite state machine; failures land in named\nrecovery states; each recovery state has a manual override.\u003c/p\u003e\n\u003cp\u003eThe states that matter beyond \u003ccode\u003efailed\u003c/code\u003e:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003estuck\u003c/code\u003e: three consecutive plan steps failed to produce a\nrecognisable next action. Push to a queue read by a human\ntriager. Replay-friendly.\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003eescalated\u003c/code\u003e: agent returned \u0026quot;hand off.\u0026quot; A human picks up the\nfull event log inside our internal ops UI and continues from\nthe last state with a \u003ccode\u003ehuman_resume\u003c/code\u003e event.\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003equarantined\u003c/code\u003e: schema-validation failures that look adversarial\n(e.g., the agent kept emitting tool-calls with \u003ccode\u003eborrower_id\u003c/code\u003e set\nto the \u003cem\u003ecoordinator's\u003c/em\u003e user ID). These don't replay. They alert\non PagerDuty.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eA specific lesson, paid in production: \u003cstrong\u003edon't auto-retry\n\u003ccode\u003eescalated\u003c/code\u003e.\u003c/strong\u003e If a human said this needs eyes, an automatic\nresume two hours later because of a queue redelivery will surprise\nthat human in the worst possible way. Resume only on explicit\nhuman action. Ask me how I learned this. Actually, don't.\u003c/p\u003e\n\u003ch2 id=\"9-agent-coordination\"\u003e9. Agent coordination\u003c/h2\u003e\n\u003cp\u003eMulti-agent setups are oversold and undersold at the same time.\nMost \u0026quot;multi-agent\u0026quot; systems are one orchestrator plus a few\nnarrow-skill agents. We have three:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eA \u003cstrong\u003eplanner\u003c/strong\u003e that owns the run and chooses sub-tasks.\u003c/li\u003e\n\u003cli\u003eA \u003cstrong\u003eresearcher\u003c/strong\u003e that does retrieval and summarisation against\nepisodic memory and the loan/transaction history.\u003c/li\u003e\n\u003cli\u003eA \u003cstrong\u003edrafter\u003c/strong\u003e that writes outbound messages in Bahasa Indonesia,\nfine-tuned for collections tone (firm, lawful, never threatening).\nThe fine-tune mattered. The off-the-shelf model wrote outputs\nthat read as condescending in Bahasa formal.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eCoordination is just the planner calling the others as tools. They\nhave their own tool-call surfaces, their own audit trails, their\nown per-task token budgets. They don't share memory directly. They\nshare it through the orchestrator's event log.\u003c/p\u003e\n\u003cp\u003eThe \u0026quot;agents that talk to other agents in an ad-hoc swarm\u0026quot; pattern\nsounds clever and produces remarkable demos. In production it's a\ndebugging nightmare. Replays are non-deterministic, blame is\ndiffuse, unit tests are basically impossible. We don't run it.\nMaybe in 2027 the tooling catches up.\u003c/p\u003e\n\u003ch2 id=\"10-long-running-execution\"\u003e10. Long-running execution\u003c/h2\u003e\n\u003cp\u003eSome workflows take days. Our loan-restructuring agent runs as a\nsaga — waits for the borrower to respond, escalates internally,\nschedules a callback for next Monday, and so on. The agent run can\nbe alive for two weeks of wall-clock time across maybe 90 seconds\nof compute.\u003c/p\u003e\n\u003cp\u003eThis works because the orchestrator is the durable state, not the\nprocess. Workers are stateless; they grab a run, advance it one\nstep, release it. A \u003ccode\u003ecron\u003c/code\u003e-style scheduler nudges runs whose\n\u003ccode\u003enext_check_at\u003c/code\u003e is in the past. The runs themselves don't sit in\nmemory waiting; they sit in Postgres.\u003c/p\u003e\n\u003cp\u003eThe thing that kept biting us: \u003cstrong\u003ewall-clock timeouts inside\nprompts.\u003c/strong\u003e \u0026quot;If you haven't received a response in 24 hours,\nescalate\u0026quot; worked great until daylight savings. Jakarta doesn't\nobserve DST, but our customers' phones sync from carriers that\nsometimes report wrong, and the agent's notion of \u0026quot;24 hours\u0026quot; was\ninferred from the prompt, not the clock. We pulled every time\ncalculation out of the model and into the orchestrator. The agent\nonly sees \u003ccode\u003etime_since_last_contact: 26h13m\u003c/code\u003e as a structured input,\nnever raw timestamps. Day got easier.\u003c/p\u003e\n\u003ch2 id=\"what-you-actually-buy\"\u003eWhat you actually buy\u003c/h2\u003e\n\u003cp\u003eWhen the system works, the agent isn't smarter than a junior\ncollector. It's \u003cem\u003emore consistent\u003c/em\u003e. Available at 02:00. Doesn't\nforget the borrower's last interaction. Doesn't let an inflammatory\nmessage slip through. Triages 12,000 cases a day without burnout.\nThat's the value. The model is a small part of it.\u003c/p\u003e\n\u003cp\u003eThe infrastructure (the durable orchestrator, the event log, the\npermission enforcement, the observability) is what makes it real.\nYou can swap GPT-5 for Claude tomorrow and the system keeps\nrunning. You can't swap the orchestrator without rewriting the\ncompany.\u003c/p\u003e\n\u003cp\u003eIf you're building one of these for an Indonesian company, three\nthings land harder than the tutorials suggest:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eData residency.\u003c/strong\u003e Pin inference to \u003ccode\u003easia-southeast\u003c/code\u003e. The\nlatency wins are real and the OJK conversation gets shorter.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eBahasa drafting tone.\u003c/strong\u003e Off-the-shelf produces outputs that\nread as condescending in Bahasa formal. You will fine-tune.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eWhatsApp.\u003c/strong\u003e Every workflow ends at WhatsApp. Build the WA\ntool first, and treat its quirks (Cloud API rate limits,\ntemplate approvals, the 24-hour service window) as first-class\ninfra constraints. They are.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe rest is engineering.\u003c/p\u003e\n",
      "summary": "A year of running autonomous agents in production at an Indonesian fintech. What actually breaks (orchestration, memory, permissions, reliability, observability, cost), and the writeup I wish someone had handed me on day one.",
      "date_published": "2026-05-09T00:00:00Z",
      "tags": [
        "ai",
        "agents",
        "infrastructure",
        "indonesia",
        "backend"
      ]
    },
    {
      "id": "https://irvineafri.com/blog/cutting-api-latency-with-a-data-transfer-layer",
      "url": "https://irvineafri.com/blog/cutting-api-latency-with-a-data-transfer-layer",
      "title": "How I cut a lending app's API latency by ~30%",
      "content_html": "\u003cp\u003eMost \u0026quot;I made the API faster\u0026quot; posts read like magic-trick demos.\nClever caching layer in act two, latency graph drops in act three,\napplause. The Kredit Pintar transfer-layer work didn't feel like\nthat. It felt like a slow, deliberate audit that paid off because\nnobody had done one in a while.\u003c/p\u003e\n\u003cp\u003eThis is what actually happened.\u003c/p\u003e\n\u003ch2 id=\"where-we-started\"\u003eWhere we started\u003c/h2\u003e\n\u003cp\u003eKredit Pintar is a lending app with more than five million monthly\nactive users. The backend is mostly Java on Spring Boot, MySQL\nunderneath, a busy mesh of services on Kubernetes with Argo CD\nshipping changes. The data-transfer layer (the code that takes a\nrequest, talks to whatever systems we depend on, and shapes a\nresponse back to the caller) had grown organically. That's the\npolite way of saying every owner who'd touched it had added the\nfield they needed and left.\u003c/p\u003e\n\u003cp\u003eThe symptom showed up on the graphs. P50 and P95 on a handful of\nhot endpoints had been creeping up. Nothing dramatic, nothing\npager-worthy, just enough that on-call kept flagging it in weekly\nreviews.\u003c/p\u003e\n\u003ch2 id=\"two-weeks-of-reading\"\u003eTwo weeks of reading\u003c/h2\u003e\n\u003cp\u003eThe first two weeks I didn't write any new code. I read code.\nThen I read traces. Then I read more code. Looking back, I wish\nI'd spent two days up front on better profiling tooling. By the\ntime I had the picture clear, I'd already half-formed the wrong\nhypothesis twice.\u003c/p\u003e\n\u003cp\u003eTwo patterns surfaced once I'd done enough of that:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eRedundant serialisation.\u003c/strong\u003e The same payload was being\nserialised, sent across a hop, deserialised, then re-serialised\none or two hops downstream. Fields nobody ever read travelled\nthe whole way for free.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eChatty round trips.\u003c/strong\u003e A surprising number of \u0026quot;one logical\nrequest\u0026quot; flows were actually three sequential calls under the\nhood. Each cheap on its own. The latencies stacked.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eA token-bucket rate limiter is the kind of thing every fintech\nbackend grows somewhere on the hot path. The shape below is the\nsame one that lives behind \u003ccode\u003e/api/lab/latency\u003c/code\u003e on this site —\n\u003ccode\u003e/labs/latency\u003c/code\u003e runs it live against three handler variants:\u003c/p\u003e\n\u003cdiv data-runnable-go=\"cGFja2FnZSBtYWluCgppbXBvcnQgKAoJImZtdCIKCSJ0aW1lIgopCgp0eXBlIGJ1Y2tldCBzdHJ1Y3QgewoJdG9rZW5zICAgZmxvYXQ2NAoJY2FwICAgICAgZmxvYXQ2NAoJcmF0ZSAgICAgZmxvYXQ2NCAvLyB0b2tlbnMgcGVyIHNlY29uZAoJbGFzdFRpY2sgdGltZS5UaW1lCn0KCmZ1bmMgKGIgKmJ1Y2tldCkgdGFrZShub3cgdGltZS5UaW1lKSBib29sIHsKCWIudG9rZW5zICs9IG5vdy5TdWIoYi5sYXN0VGljaykuU2Vjb25kcygpICogYi5yYXRlCglpZiBiLnRva2VucyA+IGIuY2FwIHsKCQliLnRva2VucyA9IGIuY2FwCgl9CgliLmxhc3RUaWNrID0gbm93CglpZiBiLnRva2VucyA8IDEgewoJCXJldHVybiBmYWxzZQoJfQoJYi50b2tlbnMtLQoJcmV0dXJuIHRydWUKfQoKZnVuYyBtYWluKCkgewoJYiA6PSAmYnVja2V0e2NhcDogNSwgcmF0ZTogMiwgdG9rZW5zOiA1LCBsYXN0VGljazogdGltZS5Ob3coKX0KCWZvciBpIDo9IDA7IGkgPCAxMDsgaSsrIHsKCQlpZiBiLnRha2UodGltZS5Ob3coKSkgewoJCQlmbXQuUHJpbnRmKCJyZXEgJTJkOiBva1xuIiwgaSkKCQl9IGVsc2UgewoJCQlmbXQuUHJpbnRmKCJyZXEgJTJkOiByYXRlLWxpbWl0ZWRcbiIsIGkpCgkJfQoJCXRpbWUuU2xlZXAoMTUwICogdGltZS5NaWxsaXNlY29uZCkKCX0KfQo=\"\u003e\u003c/div\u003e\n\u003ch2 id=\"what-i-actually-changed\"\u003eWhat I actually changed\u003c/h2\u003e\n\u003cp\u003eThere was no single magical change. The win was the cumulative\neffect of small ones:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eA clearer contract between the API surface and the systems\nbehind it. One round trip per logical operation where it used\nto be two or three.\u003c/li\u003e\n\u003cli\u003eTighter request shapes. Fields nobody downstream consumed\nstopped travelling the wire.\u003c/li\u003e\n\u003cli\u003eBackwards-compatible adapters at the seams, so the rewrite\ncould ship in chunks and reach production traffic gradually\ninstead of one terrifying cutover.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe unglamorous list is the win. The graph dropped because of the\nlist, not because of any one item on it.\u003c/p\u003e\n\u003ch2 id=\"keeping-myself-honest\"\u003eKeeping myself honest\u003c/h2\u003e\n\u003cp\u003eTwo things kept me honest, and both saved me at least once.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTraffic mirror in staging.\u003c/strong\u003e I replayed real production\nrequests against the new and old paths side by side and diffed\nthe responses. The first time I caught a regression I was sure\nwasn't there (a one-character bug in a default-value fallback),\nthat diff was the only reason I caught it before customers did.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSlow rollout.\u003c/strong\u003e Small percentage of real traffic at first, with\nthe old path still hot enough to fall back to. Boring. Effective.\nThe day the new path emitted a malformed response under one\nspecific timezone offset, rollback was a single config flip.\u003c/p\u003e\n\u003ch2 id=\"the-result\"\u003eThe result\u003c/h2\u003e\n\u003cp\u003eAverage API latency on the rewritten paths dropped by roughly\n\u003cstrong\u003e30%\u003c/strong\u003e. P95 followed it down. The team shipped seven major\nfeatures in the same window without slipping the rewrite or each\nother.\u003c/p\u003e\n\u003ch2 id=\"what-id-do-differently\"\u003eWhat I'd do differently\u003c/h2\u003e\n\u003cp\u003eSpend more of the early days on profiling tooling. The instinct\non a project like this is to start writing the new layer right\naway. The higher-leverage move is to make it cheap to know where\ntime is actually being spent, and \u003cem\u003ethen\u003c/em\u003e start writing.\u003c/p\u003e\n\u003cp\u003eThe other lesson, which I keep relearning: the boring, careful\naudit is almost always faster than the clever rewrite. Most\nperformance wins at scale aren't hidden. They're sitting in the\ncode, waiting for somebody to read it slowly. The hard part\nisn't the change. The hard part is taking two weeks to read\nfirst.\u003c/p\u003e\n",
      "summary": "How I cut average API latency by ~30% on the Kredit Pintar lending backend. No clever caching trick, just a slow audit of a transfer layer nobody had looked at in a while.",
      "date_published": "2026-01-16T00:00:00Z",
      "tags": [
        "backend",
        "performance",
        "java",
        "spring-boot"
      ]
    },
    {
      "id": "https://irvineafri.com/blog/indonesian-payment-rails-cheatsheet",
      "url": "https://irvineafri.com/blog/indonesian-payment-rails-cheatsheet",
      "title": "A backend engineer's cheatsheet for Indonesian payment rails",
      "content_html": "\u003cblockquote\u003e\n\u003cp\u003eWorking notes from three years of wiring Indonesian payment rails into\nbank and lending backends. The companion lab is at\n\u003ca href=\"/labs/rails\"\u003e/labs/rails\u003c/a\u003e — same data, sortable.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThere's a moment, the first time you wire up Indonesian payments,\nwhen you realise the question \u0026quot;how do I take payment?\u0026quot; has a dozen\ndifferent answers. Each has its own latency story, idempotency\ncontract, and refund path. The overseas tutorials don't help. They\nexplain Stripe and Adyen, and \u003cem\u003eneither of those is the rail\u003c/em\u003e. The\nrail is BI-FAST or QRIS or GPN, sitting underneath an acquirer or\na wallet that may also be the rail.\u003c/p\u003e\n\u003cp\u003eThis is the map I wish somebody had handed me on day one.\u003c/p\u003e\n\u003ch2 id=\"the-five-families-that-matter\"\u003eThe five families that matter\u003c/h2\u003e\n\u003cp\u003eYou can group every domestic rail into five families:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003e\u003cstrong\u003eInstant inter-bank\u003c/strong\u003e: BI-FAST. Real-time, 24/7, capped at IDR 250M.\u003csup id=\"fnref:1\"\u003e\u003ca href=\"#fn:1\" class=\"footnote-ref\" role=\"doc-noteref\"\u003e1\u003c/a\u003e\u003c/sup\u003e\nThe default rail for retail transfers since 2021.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eQR\u003c/strong\u003e: QRIS. One QR code reads in every wallet and every bank app.\nInteroperability is the whole point. Speed is incidental.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eDomestic switching\u003c/strong\u003e: GPN. Routes domestic debit-card\ntransactions through Indonesian switches. Cheaper than international\nschemes, slower to dispute.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eClearing and high-value\u003c/strong\u003e: SKN (batch clearing) and BI-RTGS (high\nvalue, real-time gross). Different shapes, different occasions.\nPayroll goes on SKN, treasury goes on RTGS.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eClosed-loop wallets\u003c/strong\u003e: OVO, GoPay, DANA, ShopeePay, LinkAja. Each\nis its own network, plus a QRIS interface, plus an in-app SDK.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eCards (Visa/Mastercard) sit slightly outside this taxonomy. Still\nubiquitous for cross-border and high-AOV, still the only rail with a\nreal chargeback story, still the most expensive.\u003c/p\u003e\n\u003ch2 id=\"latency-but-honest\"\u003eLatency, but honest\u003c/h2\u003e\n\u003cp\u003eThe lab page sorts by latency, and that's misleading without context.\n\u0026quot;Latency\u0026quot; here is end-to-end, from \u0026quot;I called the API\u0026quot; to \u0026quot;the\ncounterparty sees the money\u0026quot;. Within that bar:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eWallet APIs (OVO, GoPay, DANA) are fast, typically 2 to 3 seconds,\nbecause both legs sit inside the wallet's perimeter.\u003c/li\u003e\n\u003cli\u003eBI-FAST is also fast, typical 5 seconds, but p99 climbs into the\ntens of seconds when the receiving bank drags its feet.\u003c/li\u003e\n\u003cli\u003eQRIS \u003cem\u003eacks\u003c/em\u003e in 3 seconds, but merchant settlement is T+1.\u003c/li\u003e\n\u003cli\u003eSKN is \u003cem\u003ebatch\u003c/em\u003e. Four windows per business day. The \u0026quot;latency\u0026quot; is\neffectively the wait until the next window.\u003c/li\u003e\n\u003cli\u003eRTGS is real-time, but business hours only.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eTwo practical implications:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eIf your customer is staring at a screen, you want a wallet, QRIS,\nor BI-FAST. SKN is for things they don't watch land.\u003c/li\u003e\n\u003cli\u003eIf your reconciliation runs daily, the difference between 3 seconds\nand 30 seconds is invisible. Pick on cost and on idempotency\nsemantics, not on raw speed.\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"idempotency-stories--read-these-carefully\"\u003eIdempotency stories — read these carefully\u003c/h2\u003e\n\u003cp\u003eEvery rail says \u0026quot;we're idempotent,\u0026quot; and \u003cem\u003eevery rail means a different\nthing by it\u003c/em\u003e.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eBI-FAST\u003c/strong\u003e: unique transaction ID per request. Reuse returns the\nprior result, including the prior error. The sender bank is the\nsource of truth.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eQRIS\u003c/strong\u003e: one QR string is one transaction. Double-scan is blocked\nat the PJP layer. Your job is to not reuse the QR.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eGPN cards\u003c/strong\u003e: the ARN (Acquirer Reference Number) is the\nfingerprint. If your retry doesn't carry the same merchant\ntransaction reference, the issuer treats it as a brand new\nauthorization.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eOVO / GoPay / DANA\u003c/strong\u003e: partner-supplied idempotency key on a custom\nheader. The wallet's API stores the key and replays the prior\nresponse on retry. The retention window varies. Assume 24 hours and\nverify in the API docs.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSKN\u003c/strong\u003e: batch + reference. Reverse clearing is your only out.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eVA\u003c/strong\u003e: the VA \u003cem\u003enumber\u003c/em\u003e is the idempotency token. Once a VA is paid,\npaying it again either bounces or creates a duplicate at the\nacquirer's discretion. Not a contract you want to lean on.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe rule I've internalised: \u003cstrong\u003ecarry an idempotency key on every\nexternal call, whether the rail demands one or not.\u003c/strong\u003e Even when the\nrail enforces uniqueness for you, your code reaches the rail through\nwrappers and middleware, and the wrappers will retry. If the wrapper\nretries silently and the rail accepts the retry as new, your ledger\nis wrong. Fix that on your side.\u003c/p\u003e\n\u003ch2 id=\"refund-paths--the-unsexy-column\"\u003eRefund paths — the unsexy column\u003c/h2\u003e\n\u003cp\u003eThis is the one that bites in production. The lab has the row-by-row\ndetail; the headline is:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eWallets\u003c/strong\u003e have proper refund APIs. Use them.\u003csup id=\"fnref:2\"\u003e\u003ca href=\"#fn:2\" class=\"footnote-ref\" role=\"doc-noteref\"\u003e2\u003c/a\u003e\u003c/sup\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eVA, BI-FAST, SKN\u003c/strong\u003e have no scheme refund. You fire a \u003cem\u003enew\u003c/em\u003e\ncounter-transfer, and your accounting reflects it as such.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCards\u003c/strong\u003e have the strongest dispute story (90-day chargebacks) and\nthe weakest refund-to-customer-satisfaction ratio.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eQRIS\u003c/strong\u003e sits awkwardly in between. In-session reversal works.\nLater reversals go through the PJP, which means support tickets.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eIf you're building a customer-facing product, refund-path quality is\nthe single biggest reason to prefer wallets over VAs, even when the\nMDR looks worse on paper.\u003c/p\u003e\n\u003ch2 id=\"the-simulator-companion\"\u003eThe simulator companion\u003c/h2\u003e\n\u003cp\u003eThe \u003ca href=\"/labs/payments\"\u003epayment-flow simulator\u003c/a\u003e is the live-coding\ncompanion to all of this. It encodes the same patterns: idempotent\ndebits, double-delivered webhooks, timeout-with-retry,\npartial-failure reconciliation. It doesn't pick a specific rail.\nPair the two: the cheatsheet for \u0026quot;what shape is this rail?\u0026quot;, the\nsimulator for \u0026quot;what does it do under failure?\u0026quot;.\u003c/p\u003e\n\u003ch2 id=\"what-this-isnt\"\u003eWhat this isn't\u003c/h2\u003e\n\u003cp\u003eNot a regulatory primer. Cite Bank Indonesia and OJK directly for\nthat. Not a contract. The numbers are public-range estimates and\nwill be wrong for the largest merchants. Not exhaustive. The\ne-money schemes (LinkAja, ShopeePay) and the corporate rails\n(CMS, host-to-host) sit alongside this list and don't fit on one\nscreen.\u003c/p\u003e\n\u003cp\u003eWhat it is: the page I wish I could have shown my past self the\nweek before I started writing the M-Syariah Payment API. If you're\nthat engineer right now, this is for you.\u003c/p\u003e\n\u003cdiv class=\"footnotes\" role=\"doc-endnotes\"\u003e\n\u003chr\u003e\n\u003col\u003e\n\u003cli id=\"fn:1\"\u003e\n\u003cp\u003eThe IDR 250M cap is the scheme-level ceiling per BI's PADG\n23/25/PADG/2021. Sending banks can apply tighter caps; check your\nissuer.\u0026#160;\u003ca href=\"#fnref:1\" class=\"footnote-backref\" role=\"doc-backlink\"\u003e\u0026#x21a9;\u0026#xfe0e;\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\u003cli id=\"fn:2\"\u003e\n\u003cp\u003eTest the refund path on day one of integration, not\nweek three. Most outages I've seen on payment integrations were\nrefund-shaped, not authorization-shaped.\u0026#160;\u003ca href=\"#fnref:2\" class=\"footnote-backref\" role=\"doc-backlink\"\u003e\u0026#x21a9;\u0026#xfe0e;\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ol\u003e\n\u003c/div\u003e\n",
      "summary": "The map I wish someone had handed me on day one. BI-FAST, QRIS, GPN, SKN, RTGS, OVO, GoPay, DANA, virtual accounts. What each rail is for, what its latency actually is, and which idempotency story to trust.",
      "date_published": "2025-12-24T00:00:00Z",
      "tags": [
        "payments",
        "fintech",
        "indonesia",
        "backend"
      ]
    },
    {
      "id": "https://irvineafri.com/blog/integrating-ovo-gopay-dana-into-syariah-banking",
      "url": "https://irvineafri.com/blog/integrating-ovo-gopay-dana-into-syariah-banking",
      "title": "Integrating OVO, GoPay, and DANA into a Sharia core banking system",
      "content_html": "\u003cp\u003eIf you live in Indonesia, you probably moved money through OVO,\nGoPay, or DANA this morning without thinking about it. That\n\u0026quot;without thinking about it\u0026quot; is the whole game in payments. Inside\nthe bank that connects to those wallets, it's also the part that\neats the most engineering time.\u003c/p\u003e\n\u003cp\u003eThis is what I picked up designing the Payment API at Bank Mega\nSyariah, the one that wired our core banking platform into all\nthree wallets.\u003c/p\u003e\n\u003ch2 id=\"why-this-is-hard-before-you-write-any-code\"\u003eWhy this is hard before you write any code\u003c/h2\u003e\n\u003cp\u003eThe first time someone says \u0026quot;let's integrate three e-wallets,\u0026quot; it\nsounds like roughly three times the work of integrating one. It\nisn't. Each wallet has:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eIts own dialect for requests and responses.\u003c/li\u003e\n\u003cli\u003eIts own webhook model — when it fires, how it retries, what it\nguarantees about delivery.\u003c/li\u003e\n\u003cli\u003eIts own reconciliation cadence and statement format.\u003c/li\u003e\n\u003cli\u003eIts own definition of success and failure, and a wider gap than\nyou'd expect between \u0026quot;we accepted your message\u0026quot; and \u0026quot;the money\nmoved\u0026quot;.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eMultiply that by the bank side. The core banking system is the\nsource of truth. Ledger postings have to be exact. A lost message\nmeans a real person is missing real money. What looked like one\nproject becomes three half-projects plus the glue that joins them.\u003c/p\u003e\n\u003cp\u003eThe glue is the project. Most of my time was the glue.\u003c/p\u003e\n\u003ch2 id=\"the-shape-i-ended-up-with\"\u003eThe shape I ended up with\u003c/h2\u003e\n\u003cp\u003eOne unified Payment API in front of the core, with thin adapters\nper biller behind it. The internal contract is one shape; each\nwallet's dialect lives in its adapter and doesn't leak inward.\nThat sentence is the whole architecture. Everything else was\ndetails.\u003c/p\u003e\n\u003cp\u003eThe pieces I'd call out, in order of how badly each one bites if\nyou skimp on it:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1. Strong idempotency keys on every external call.\u003c/strong\u003e A network\nblip should never end with the user double-charged. Getting this\nright at the start is cheap. Getting it wrong is a regulator\nasking why, three months in, two specific accounts are out by\nIDR 47,500.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2. Webhooks: separate \u0026quot;did the message arrive\u0026quot; from \u0026quot;is the\nledger consistent\u0026quot;.\u003c/strong\u003e It's tempting to do both in one handler.\nDon't. You'll lose either reliability or correctness, and you'll\nfind out which one at 3am.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3. A daily reconciliation job that proves the ledger.\u003c/strong\u003e The\nunglamorous, schedule-driven thing that catches the cases your\nlive code missed. Treat it as a first-class part of the product,\nnot a clean-up phase you'll add later when there's time. There's\nnever time.\u003c/p\u003e\n\u003ch2 id=\"what-surprised-me\"\u003eWhat surprised me\u003c/h2\u003e\n\u003cp\u003eHow much of the work is naming things. \u0026quot;Pending\u0026quot; in OVO's world is\nnot \u0026quot;pending\u0026quot; in your ledger's world. \u0026quot;Failed\u0026quot; might be retryable\nor it might be terminal. Different wallet, different answer. The\ndiscipline of writing the \u003cem\u003einternal\u003c/em\u003e contract, the names and\nstates the rest of the bank's code sees, mattered more than any\none integration.\u003c/p\u003e\n\u003cp\u003eOnce we had a clean internal vocabulary, adding a fourth wallet\nwould have taken a week, not a quarter. We never did add a fourth,\nbut the hypothetical was the proof that the design worked.\u003c/p\u003e\n\u003ch2 id=\"the-thing-nobody-tells-you-about-payment-integrations\"\u003eThe thing nobody tells you about payment integrations\u003c/h2\u003e\n\u003cp\u003eThe UI is the easy part. The first time the M-Syariah app showed\na green tick that said \u0026quot;transfer successful,\u0026quot; it was thrilling.\nThe real work was making that tick \u003cem\u003enot lie\u003c/em\u003e. Under packet loss.\nUnder timeouts. When the wallet is briefly down on a Saturday\nafternoon. When their webhook arrives twice, fifteen minutes\napart. When their webhook never arrives at all, and your\nreconciliation job has to figure it out the next morning.\u003c/p\u003e\n\u003cp\u003eIf the green tick is honest, you've done the hard work. If it's\noptimistic, you're a support ticket waiting to happen. There's no\nthird option.\u003c/p\u003e\n\u003ch2 id=\"lessons\"\u003eLessons\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eTreat reconciliation as a product feature, not an operational\nafterthought. Design it on day one. It's the only thing that\ncatches what live code missed.\u003c/li\u003e\n\u003cli\u003eThe internal contract is the most important part of any\nmulti-provider integration. The adapters are mechanical; the\ncontract is the design.\u003c/li\u003e\n\u003cli\u003e\u0026quot;Idempotent\u0026quot; is a property of \u003cem\u003ethe system\u003c/em\u003e, not just \u003cem\u003ethe call\u003c/em\u003e.\nIt only holds when storage, retries, and consumers all\ncooperate. Any one of them silently retrying breaks the property.\u003c/li\u003e\n\u003cli\u003eTest the refund path on day one of integration, not week three.\nMost of the production outages I saw on payment work were\nrefund-shaped, not authorization-shaped.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eIf I were doing the same work today on a greenfield stack, the\nshape would still be this one. Different language, different\ncloud, maybe an event-sourced ledger instead of the postings\nmodel. But the unified-API-with-thin-adapters spine, strong\nidempotency, reconciliation as a feature, those would be on the\nwall on day one.\u003c/p\u003e\n",
      "summary": "Notes from wiring three Indonesian e-wallets into the Bank Mega Syariah core. Idempotency, reconciliation, why the UI is the easy part, and the things I'd still do the same way.",
      "date_published": "2025-03-16T00:00:00Z",
      "tags": [
        "backend",
        "payments",
        "fintech",
        "indonesia"
      ]
    },
    {
      "id": "https://irvineafri.com/blog/the-train-that-taught-me-distributed-systems",
      "url": "https://irvineafri.com/blog/the-train-that-taught-me-distributed-systems",
      "title": "The train that taught me distributed systems",
      "content_html": "\u003cp\u003eWhen someone asks me about distributed systems, the example I keep\nreaching for is a model train. People look at me funny, fair enough.\nBut the project I keep coming back to in interviews, in my head, and\nevery time I draw a state machine on a whiteboard, is a miniature\nrailway I helped build at UGM in 2022.\u003c/p\u003e\n\u003cp\u003eSo here's what that train taught me, in software-people words.\u003c/p\u003e\n\u003ch2 id=\"what-it-actually-was\"\u003eWhat it actually was\u003c/h2\u003e\n\u003cp\u003eA model train you could drive over the web. We sat in a small lab\nin the Faculty of Engineering. There were rails on a desk and a\nRaspberry Pi acting as the brain. You opened a web app, picked a\ntrain, set a speed, switched a light on or off. Down at the rails,\na protocol called Digital Command Control encoded those\ninstructions onto the same pair of wires that carried the power.\u003c/p\u003e\n\u003cp\u003eThe Raspberry Pi was the whole stack. Backend in Go, frontend in\nFlask + Python, hardware loop running off the same board. My\nBachelor's thesis later pushed the work further with a Python\nprototype of the DCC controller proper, hitting millisecond\nprecision on the wire. There's a tiny simulation of the same idea\non this site at \u003ca href=\"/labs/train\"\u003e/labs/train\u003c/a\u003e. It's a toy. The\noriginal was a slightly bigger toy.\u003c/p\u003e\n\u003ch2 id=\"the-lessons-that-travelled\"\u003eThe lessons that travelled\u003c/h2\u003e\n\u003cp\u003eI didn't know the term \u0026quot;distributed systems\u0026quot; yet. Looking back,\nthe lab project was a small, complete one:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLatency budgets are real.\u003c/strong\u003e DCC cares about timing in\nmilliseconds. If your encoder slips, the decoder on the train gets\nconfused, and the locomotive sits there blinking. That's a debugger\nyou can hear. Years later when an SRE complained that p99 was up by\n20ms, I knew exactly what he meant. I'd stood next to a train that\nwent silent because of less.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eState machines beat ad-hoc logic.\u003c/strong\u003e A train can be moving,\nstopped, accelerating, switching tracks, or in an error state.\nThe moment I drew that as a graph and made the transitions\nexplicit, the bugs almost stopped. On every backend project\nsince, I draw the state machine first. It's the cheapest\ndebugging investment I know.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe frontend is also distributed.\u003c/strong\u003e A web app that talks to a\nPi running a hardware loop is not the same as a web app that talks\nto a database. The browser doesn't know that. Figuring out what to\nshow the user when the train is \u003cem\u003eprobably\u003c/em\u003e fine but you haven't\nheard back yet is a small version of every distributed-systems\nproblem you'll hit later. \u0026quot;Probably fine\u0026quot; turned out to be the\nwhole job.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHardware tells the truth.\u003c/strong\u003e Software lies all the time. A\nmisbehaving distributed system can hide behind retries and logs.\nA locomotive that sits there blinking will not politely sit there\nblinking. There's something honest about building systems where\nthe failure mode is visible. I miss it sometimes, working in\nfintech, where most failures stay invisible until reconciliation\nday.\u003c/p\u003e\n\u003ch2 id=\"why-i-keep-talking-about-it\"\u003eWhy I keep talking about it\u003c/h2\u003e\n\u003cp\u003eYears later I work in fintech. I think about ledgers, idempotency,\ntimeouts, reconciliation. None of that is very far from a model\ntrain and a state machine on a Pi.\u003c/p\u003e\n\u003cp\u003eThe path from \u0026quot;model train on a desk\u0026quot; to \u0026quot;five-million-MAU lending\napp\u0026quot; is shorter than it sounds, if you stay curious about the\nseams.\u003c/p\u003e\n",
      "summary": "My favourite project is still a model train I helped wire up at UGM in 2022. Here's what a Raspberry Pi and a pair of rails taught me before I knew the word \"microservice\".",
      "date_published": "2023-01-15T00:00:00Z",
      "tags": [
        "career",
        "embedded",
        "distributed-systems",
        "university"
      ]
    }
  ]
}