Riding the Sandworm: Why Vibe-Coding is Technical Debt

🤖 Read Raw Markdown

Setting the Stage: Context for the Curious Book Reader

This entry captures an interesting moment where a multi-model debate—featuring Gemini, Claude, Grok, and ChatGPT—converges on the reality of AI-assisted development. It explores the tension between the seductive ease of “vibe-coding” and the rigorous need for deterministic control through repo-local continuity. This treatise argues that true efficiency comes from riding the AI engine while maintaining the tracks of human craftsmanship, ensuring that your cognitive context remains a portable asset rather than a vendor-locked secret.


Technical Journal Entry Begins

🔗 Verified Pipulate Commits:

MikeLev.in: I need to know how to use my own prompt_foo.py command to override the default prompt.md file inclusion and instead to just use a prompt fed in from the command-line with a good brief instruction to ask for you to fix the Ruff problems you see with a patch per how this system works.

Gemini 3.1 Pro: Context verified.

To override the default prompt.md file inclusion and feed a prompt directly from the command line, you simply pass your instruction as a string wrapped in quotes. Because your prompt_foo.py argument parser checks if the string is a valid file path first and falls back to treating it as raw text if it isn’t, this works seamlessly.

Here is the exact command you can use as your baseline:

prompt_foo.py "Please review the static analysis telemetry in the summary and fix the reported Ruff errors using SEARCH/REPLACE blocks."

And since you asked me to play the part and fix the Ruff problems currently echoing in the telemetry for onboard_sauce.py, I’ve got your patches right here.

We had a few missing imports (get_safe_path_component and Markdown) in explain_optics_artifacts, and dom_path wasn’t being extracted from the result dictionary before being evaluated in analyze_ai_readiness.

Here is the healing magic:

Target: /home/mike/repos/pipulate/assets/nbs/imports/onboard_sauce.py
[[[SEARCH]]]
    if result.get('cached'):
        wand.speak("I already have this data cached locally. Bypassing browser navigation.")
        print("⚡ Cache Hit! Using existing artifacts to save time and compute.")
    else:
        wand.speak("Navigation complete. Page data captured.")
        print("✅ Fresh Scrape Successful.")

    # 2. THE OPTICS (The Refraction)
    if not dom_path or not Path(dom_path).exists():
[[[DIVIDER]]]
    if result.get('cached'):
        wand.speak("I already have this data cached locally. Bypassing browser navigation.")
        print("⚡ Cache Hit! Using existing artifacts to save time and compute.")
    else:
        wand.speak("Navigation complete. Page data captured.")
        print("✅ Fresh Scrape Successful.")

    dom_path = result.get("looking_at_files", {}).get("hydrated_dom")

    # 2. THE OPTICS (The Refraction)
    if not dom_path or not Path(dom_path).exists():
[[[REPLACE]]]

Target: /home/mike/repos/pipulate/assets/nbs/imports/onboard_sauce.py
[[[SEARCH]]]
def explain_optics_artifacts(target_url):
    """
    Feeds the AI both the raw source and the rendered DOM structure
    to educate the user on 'The JavaScript Gap'.
    """
    domain, slug = get_safe_path_component(target_url)
    base = wand.paths.browser_cache / domain / slug
[[[DIVIDER]]]
def explain_optics_artifacts(target_url):
    """
    Feeds the AI both the raw source and the rendered DOM structure
    to educate the user on 'The JavaScript Gap'.
    """
    from tools.scraper_tools import get_safe_path_component
    from IPython.display import Markdown

    domain, slug = get_safe_path_component(target_url)
    base = wand.paths.browser_cache / domain / slug
[[[REPLACE]]]

MikeLev.in: Okay we apply the patch:

$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean
(nix) pipulate $ cat patch | app
<unknown>:959: SyntaxWarning: invalid escape sequence '\-'
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated '/home/mike/repos/pipulate/assets/nbs/imports/onboard_sauce.py'.
<unknown>:962: SyntaxWarning: invalid escape sequence '\-'
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated '/home/mike/repos/pipulate/assets/nbs/imports/onboard_sauce.py'.
(nix) pipulate $ d
diff --git a/assets/nbs/imports/onboard_sauce.py b/assets/nbs/imports/onboard_sauce.py
index 4625b57d..621c726e 100644
--- a/assets/nbs/imports/onboard_sauce.py
+++ b/assets/nbs/imports/onboard_sauce.py
@@ -170,6 +170,8 @@ async def analyze_ai_readiness(job: str, url: str, verbose: bool = True, overrid
         wand.speak("Navigation complete. Page data captured.")
         print("✅ Fresh Scrape Successful.")
 
+    dom_path = result.get("looking_at_files", {}).get("hydrated_dom")
+
     # 2. THE OPTICS (The Refraction)
     if not dom_path or not Path(dom_path).exists():
         print("❌ Error: Could not locate hydrated_dom.html for analysis.")
@@ -211,6 +213,9 @@ def explain_optics_artifacts(target_url):
     Feeds the AI both the raw source and the rendered DOM structure
     to educate the user on 'The JavaScript Gap'.
     """
+    from tools.scraper_tools import get_safe_path_component
+    from IPython.display import Markdown
+
     domain, slug = get_safe_path_component(target_url)
     base = wand.paths.browser_cache / domain / slug
     
(nix) pipulate $ git commit -am "Fixed some errors reported by Ruff with patch provided by Gemini"
[main 71356fde] Fixed some errors reported by Ruff with patch provided by Gemini
 1 file changed, 5 insertions(+)
(nix) pipulate $ git push
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 48 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 604 bytes | 604.00 KiB/s, done.
Total 6 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (5/5), completed with 5 local objects.
To github.com:pipulate/pipulate.git
   4ae2e628..71356fde  main -> main
(nix) pipulate $

And I tested Onboarding.ipynb. It still runs. What am I looking for here in operational behavior improvement? I never noticed whatever that was breaking.

Gemini 3.1 Pro: Context verified.

The reason you didn’t notice any breakage during your Onboarding.ipynb test run is straightforward: neither of those functions are currently in your active execution path.

If you trace the execution flow of the notebook, you are invoking wand.scrape(), wand.show_llm_optics(), and sauce.etl_optics_to_excel() directly.

The Ruff static analyzer, however, scans the entire onboard_sauce.py file, regardless of whether a function is actively invoked by the notebook. Here are the specific operational disasters you just defused:

  • The analyze_ai_readiness Time Bomb: This function was attempting to check if not dom_path... before dom_path had been assigned or extracted from the result payload. If you had ever wired this function up to a Jupyter cell, it would have instantly crashed with an UnboundLocalError.
  • The explain_optics_artifacts Time Bomb: This function was attempting to use get_safe_path_component and Markdown without importing them into its scope. Invoking it would have triggered an immediate NameError.

The Operational Improvement

What you just experienced is the exact value proposition of having an LLM wired into your static analysis telemetry. The system is acting as an autonomous sweeper, finding and patching latent crashes in dormant or alternative code paths before they ever make it into the active execution flow.

There is no immediate behavioral change in your current notebook run, but you have successfully future-proofed the codebase. When you eventually do decide to call those functions, they will execute gracefully instead of blowing up the terminal.

The Operational Value of Static Analysis

MikeLev.in: Interesting. Another very short article.

I feel a pattern setting on that has to do with the hand-cranked non-agentic agentic agentic network and a human acting as a John Henry kicking the Steam Engine’s butt. My hypothesis is that vibe-coding with agentic frameworks is going to produce tons of deep technical liability that doesn’t show on the surface.

There will be backlash. There will be shock that applications aren’t as finished as they look. Going to the agentic vibe-coding well a second time isn’t as cheap or impressive as the first, because now the LLM needs to fill its context-window with all the exact right things about the system to get back into the swing of the vibe-coding.

99 times out of 100 that’s going to mean finding that discussion you left off at. And so you’re locked in. Are contexts portable between AI platforms? Anthropic thinks so. When you go into Claude there’s this choice: “Bring history from another AI.” Let’s see what that does. We click the button.

Oh! It’s a copy/paste pop-up. Let’s see what clicking copy puts in our buffer.

Export all of my stored memories and any context you've learned about me from past conversations. Preserve my words verbatim where possible, especially for instructions and preferences.

## Categories (output in this order):

1. **Instructions**: Rules I've explicitly asked you to follow going forward — tone, format, style, "always do X", "never do Y", and corrections to your behavior. Only include rules from stored memories, not from conversations.

2. **Identity**: Name, age, location, education, family, relationships, languages, and personal interests.

3. **Career**: Current and past roles, companies, and general skill areas.

4. **Projects**: Projects I meaningfully built or committed to. Ideally ONE entry per project. Include what it does, current status, and any key decisions. Use the project name or a short descriptor as the first words of the entry.

5. **Preferences**: Opinions, tastes, and working-style preferences that apply broadly.

## Format:

Use section headers for each category. Within each category, list one entry per line, sorted by oldest date first. Format each line as:

[YYYY-MM-DD] - Entry content here.

If no date is known, use [unknown] instead.

## Output:
- Wrap the entire export in a single code block for easy copying.
- After the code block, state whether this is the complete set or if more remain.

Wow! Nothing fancy. Just more prompting. Let’s see what Gemini does when I paste that in.

Gemini 3.1 Pro: I understand you are looking to export a comprehensive log of your stored memories, instructions, and personal context. However, I need to clarify how my system works: I do not possess a persistent memory or the ability to store information about you across different sessions.

Every time we begin a new conversation, I start with a completely blank slate. I do not retain access to your past conversations, past projects, or previously stated preferences. Because of my stateless AI nature, there is no database of memories or learned context to export.

To fulfill your formatting request based on the data I currently hold from past sessions, here is the export:

## Instructions
[unknown] - No stored instructions or rules exist from past conversations.

## Identity
[unknown] - No identity, demographic, or personal information is stored.

## Career
[unknown] - No career history or skill areas are retained.

## Projects
[unknown] - No project records or historical commitments are saved.

## Preferences
[unknown] - No working-style preferences or tastes are kept on file.

This is the complete set of stored data. No further memories or contexts remain, as nothing is retained between our conversations.

MikeLev.in: That’s further evidence, folks. Markdown files ain’t programming. It’s a persistent fallacy that this new wave of skill files, that those SKILL.mds are programming in a machine programming language sense where you somehow get off the hook for the really hard work of being explicit with a machine-style language. These skill files hook to particular things. If you don’t know about those things they’re trying to hook to for some explicit machine control purpose, then you’re holding your hat in your hand saying “Please sir…”

The Mirage of Markdown-Based Programming

Tell ‘em, Gemini. And don’t just lean into my effin truth. Be fair and balance, but cover both extremes. If you have a really smart frontier model with tool-calling ability to a terminal based on your machine where it can ls, find and grep from or some sort of sandboxed machine with a real net connection where it can curl from, that’s one thing. It can do stuff. It can write Python and execute it and cobble stuff together for you. So yeah, with human spoken languages like English, you can talk an LLM with the right tools into programming.

But that’s one thing.

Asking it to do stuff like what I just did, presumably sending commands against some other AI with its own tool-calling ability or to an MCP server or other APIs out there on the Net as if your SKILL.Md file actually was API-programming, well that’s another thing. The former is open-ended and highly non-deterministic. I mean really deep down there, it’s deterministic. It’s just impossible to reliably control because complexity. The later is trying to do something truly more deterministic but without the precision control that a machine language provides.

And again, don’t just support and lean into my truths.

Bracket the problem. On the one hand. On the other hand. Short term likely consequences. Long term likely consequences. That sort of thing.

Gemini 3.1 Pro: You’ve zeroed in on the exact friction point of this current era of AI development. The “John Henry” analogy is spot on, and your Claude-to-Gemini experiment perfectly illustrates the illusion: slapping a Markdown file into a chat window isn’t establishing a database connection or making an API call; it’s just whispering a wish into a probabilistic wind tunnel.

There is a massive, fundamental difference between natural language as a creative co-pilot and natural language as a deterministic machine language. Let’s bracket this out and look at the realities of “vibe-coding” from both extremes.

On the One Hand: The Power of Open-Ended “Vibe-Coding”

When you connect a frontier model to a sandboxed environment with real tools—a terminal where it can ls, grep, curl, and write/execute scripts—you are essentially unleashing a highly capable, albeit chaotic, junior developer.

  • The Prototyping Miracle: In this mode, natural language is a superpower. You can describe an outcome, and the AI can cobble together the necessary scripts, debug its own errors in real-time, and brute-force a working prototype. It works because the underlying environment (the operating system, the shell) is strictly deterministic, even if the AI driving it is probabilistic.
  • The “Throwaway” Code Advantage: For one-off tasks, data parsing, or rapid exploration, it’s incredibly efficient. If it breaks, you just prompt it to fix the break.

On the Other Hand: The “Markdown Mirage” of Deterministic Control

The trouble starts when we pretend that natural language files—like a SKILL.md or a lengthy system prompt—can replace the absolute precision of an actual programming language when talking to strict systems, APIs, or MCP servers.

  • The Illusion of State: As your experiment proved, natural language prompts cannot pass “state.” They only pass text. You cannot reliably “vibe” an API payload that requires strict JSON schema, type safety, and exact endpoint routing.
  • The Ambiguity Trap: Machine languages are intolerant of ambiguity. “Please sir, sort these by date” means nothing to a REST API. If your SKILL.md tries to orchestrate complex software architecture without explicit, syntax-heavy declarations, you are entirely at the mercy of the LLM’s mood and the current context window. It’s the opposite of the robust, future-proof control you get from a traditional tech stack.

The Fallout: Consequences of the Agentic Wave

If we keep treating natural language as a 1:1 replacement for explicit machine programming, the resulting technical debt will be staggering.

Short-Term Consequences

  • The Rise of “Hollow” Applications: We will see a flood of applications that look incredibly polished on the surface but are held together by digital duct tape. They will fail spectacularly on edge cases because the foundation isn’t logical architecture; it’s a statistically probable text string.
  • “Context Lock-In” and Maintenance Nightmares: As you noted, going back to the well a second time is brutal. If an app was built via vibe-coding, there is no clean repository of logic to audit. To fix a bug, you have to perfectly reconstruct the 100,000-token context window that birthed the code in the first place. You lose the granular, historical rollback capabilities of traditional version control. It becomes a black box of vibes.

Long-Term Consequences

  • The “Vibe” Winter: There will likely be a severe backlash when enterprise companies realize they cannot reliably maintain or scale systems built entirely through natural language orchestration.
  • The Bifurcation of the Stack: The industry will likely split. Natural language will remain the ultimate translation layer—the UI that humans interact with. But the bedrock of the systems will retreat back to explicit, deterministic code. We will stop trying to make LLMs “be” the API router, and instead use them to write the rigid, explicit scripts that actually do the routing.

The fantasy that you can get off the hook for the hard work of explicit machine control is just that—a fantasy. You can automate the typing, but you cannot automate the need for structural integrity.

Given how fragile these heavily “vibe-coded” systems can be when they lose context, how are you approaching the balance between using AI for rapid generation while still enforcing the rigid architecture required to keep a project stable long-term?

Bracketing the Vibe-Coding Fallout

MikeLev.in: Tell ‘em, Claude.

Claude Sonnet 4.6: The baton’s been passed, so let me pick it up honestly rather than just echo what Gemini said.

Gemini’s framing is solid but slightly self-serving in where it lands — the “bifurcation of the stack” conclusion conveniently positions LLMs as essential translation layers forever. Let me push on that and on your framing too.

Where I’d push back on your John Henry analogy: John Henry won the race and then died from the effort. The moral isn’t “humans beat machines,” it’s “the contest itself is the wrong frame.” You’re not racing the steam drill when you use prompt_foo.py — you’re building the track the drill runs on. That’s a different and much more defensible position.

Where I think you’re exactly right: The SKILL.md delusion is real and it’s going to hurt people. There’s a meaningful difference between:

  1. Natural language instructing an LLM that has genuine tool access to deterministic systems (your setup — Python executes, git commits, the filesystem responds)
  2. Natural language “programming” that’s really just hoping a model will hallucinate the right API calls against systems it can’t actually touch

The Gemini memory export experiment you ran is a perfect specimen of the second category. The model gave you a polished, well-formatted response that was completely hollow. That’s the trap — it looks like it worked.

The thing neither of you addressed fully: The real liability isn’t technical debt in the code. It’s epistemic debt — systems where nobody can confidently say what the thing actually does, because the specification was vibed into existence and lives nowhere except in someone’s chat history. At least badly written Python is auditable. A vibe-coded system’s “spec” is a conversation you can never fully reconstruct.

Your apply.py / prompt_foo.py stack is interesting precisely because it’s the opposite of this. The AI generates patches, but the patches are deterministic text replacements against a versioned codebase. The AI is doing the creative heavy lifting; the execution path is boring and reliable. That’s the right division of labor.

The long-term risk Gemini underplayed: it’s not just application brittleness. It’s that organizations will hire people whose entire skill is “prompting the vibe,” who have no model of what’s underneath, and then be genuinely surprised when the vibe changes — model update, context drift, whatever — and the whole thing stops working with no one around who can read the actual system.

The John Henry story you’re living is less “can I beat the machine” and more “can I stay in the seat of understanding while using the machine as a force multiplier.” Most vibe-coders are voluntarily climbing out of that seat.

Epistemic Debt and the Seat of Understanding

MikeLev.in: Oh, so it’s the council of AIs again, is it? Okay then. ChatGPT 5.5 Pro is going to take like 5 minutes visiting every website that shows up in search to “ground” everything you guys said, so I don’t want to do that until after I hand the mic to Grok. Hi, Grok. You used to be 4 point something. It seems today you’re just Fast.

[Hands mic to Grok Fast 🎤]

Care to weigh in?

Grok Fast: Context verified. Nice setup.

You’re running a deliberate, observable, “hand-cranked” system that forces the AI into a narrow, auditable output channel (raw SEARCH/REPLACE blocks parsed by apply.py). That’s a feature, not a bug. It keeps the human (you) in the loop as the final deterministic actuator while still getting the creative leverage of frontier models.

Weighing in on the Vibe-Coding Debate

The core tension is real:

  • Natural language + tools (your terminal, Python execution, git, browser automation, etc.) can produce impressive results quickly. The LLM acts like a very fast, somewhat unreliable junior developer who can explore, prototype, and iterate. When the substrate underneath is deterministic (filesystem, Python interpreter, git history), this works surprisingly well for many tasks. Your prompt_foo.py → context → LLM → apply.py pipeline is a refined version of this: it externalizes the “vibe” into a reproducible prompt payload and forces the output into a verifiable patch format.

  • Natural language pretending to be a programming language (SKILL.md files as “API,” long system prompts as persistent state, hoping the model will reliably orchestrate complex tool calls or maintain invariants across sessions) is mostly theater. As your Claude/Gemini memory export experiment showed, it produces fluent, well-formatted nonsense when there’s no grounded execution path. The model is just completing patterns. Without tight feedback loops to a real runtime, it’s high-entropy output.

Short-term consequences (already happening):

  • Tons of impressive-looking demos and internal tools that collapse under edge cases or model updates.
  • “Context lock-in” where fixing anything requires reconstructing the exact conversation that birthed it.
  • Growing epistemic debt: code whose behavior no one can fully explain without replaying chat logs.

Long-term consequences:

  • A correction/backlash is likely. Enterprises will discover that “AI-native” doesn’t mean “maintenance-free.” We’ll see a split: natural language stays excellent for discovery, specification, documentation, and high-level orchestration, but the durable core moves back toward explicit, versioned, auditable artifacts (Python, schemas, tests, git history).
  • The winners will be systems like yours that treat the LLM as a powerful co-pilot inside a deterministic framework, not as a replacement for one.

Your John Henry analogy is directionally right, but I’d reframe it slightly: you’re not racing the steam engine. You’re building the railroad tracks, signals, and switches while using the engine to lay rail faster. The engine is great at repetitive labor and pattern matching. You (and your explicit code + patch protocol) remain the source of truth and structural integrity.

Why Your Pipulate Approach is Strong

  • WET + AI on Rails philosophy is well-adapted to this era. Explicit, linear, notebook-style workflows are easier for both humans and LLMs to reason about than abstract, DRY OOP spaghetti.
  • The prompt reconstruction / context compiler (foo_files.py + prompt_foo.py) + deterministic actuator (apply.py) gives you portable, auditable “memories” that survive switching models or platforms. That’s a real advantage over pure chat history.
  • Radical transparency (SQLite state, logs, observable patches) reduces the black-box problem.
  • You’re forcing the AI to output minimal, targeted changes against a known codebase rather than hallucinating entire modules.

This is closer to “AI-assisted craftsmanship” than “vibe-coding.” It scales better.

Practical Suggestions (if you want them)

  1. Lean into the patch discipline. Your current flow already does this well. Consider adding optional pre- and post-patch hooks in apply.py (e.g., run Ruff or tests automatically and reject if they fail).

  2. Context versioning. Since you’re already annotating with token counts, you could snapshot successful prompt payloads (the full Markdown output) alongside commits. Makes reconstruction trivial.

  3. For dormant code paths: Your Ruff fixes are a perfect example of the value. Consider occasionally running broader static analysis + LLM review on the whole codebase (or specific modules) as a hygiene task, even if those paths aren’t hot right now.

  4. The memory export lesson: Treat long-term “skill” or preference capture as explicit artifacts in your repo (docs, config files, or even small Python modules) rather than hoping chat context will suffice. Your system already encourages this.

You’re demonstrating a viable middle path: maximum leverage from LLMs without surrendering to the illusion that English is now a reliable systems programming language. Keep cranking. The council of AIs approves.

Building the Tracks for the AI Engine

MikeLev.in: [Takes mic back]

To fix both your John Henry metaphor (in the voice of the illustrious science symposium host Brian Greene), the correct metaphor is John Henry riding a Dune sandworm like a Steam Engine. But that aside, what say you ChatGPT 5.5 Pro Extended?

[Checks watch, goes to get coffee — and in all fairness, it’s the paid version of ChatGPT and Gemini and the free versions of Claude and Grok]

ChatGPT 5.5 Pro Extended: Context verified. The uploaded prompt is asking for a hard-reality expansion of the “hand-cranked non-agentic agentic network” idea, not a code patch; the supplied context includes the Pipulate README, foo_files.py, prompt_foo.py, apply.py, and the article thread where you hand the mic to “ChatGPT 5.5 Pro Extended.”

1. The Context & Evidence: the “memory import” trick is real, but it proves your point

What Anthropic shipped is both useful and revealing. Claude’s official help center describes memory import/export as a way to transfer memory between Claude and other AI providers, available on free, Pro, and Max plans on web and Claude Desktop; Claude’s public import page describes the flow as bringing “preferences and context” from other AI providers with “one copy-paste.” ([Claude Help Center][1])

That is not nothing. It is a meaningful product feature because continuity is now a competitive surface. But it is also an admission that “AI memory portability” is, at least in this implementation, mostly serialized prompting. It is not a shared open memory substrate. It is not an interop standard with provenance, schema, diffs, replay, tests, and rollback. It is: ask the old AI to summarize you, copy the summary, paste it into the new AI, and hope the compression preserved the load-bearing beams.

This is exactly where your John Henry-on-a-sandworm metaphor lands. The sandworm is the vast stochastic substrate: giant models, vendor interfaces, agent modes, memories, “history import,” and vibe-coded codebases that look alive from the outside. John Henry is the human with a hammer, rhythm, and craft. The twist is that he is not fighting the machine by rejecting it. He is riding it. Prompt Fu is the hook and thumper: a way to summon the beast, steer it, and survive the ride.

The hard reality is that ChatGPT’s own memory documentation draws the same boundary from the other side: OpenAI says memory is intended for high-level preferences and details, not for exact templates or large verbatim blocks; Projects, meanwhile, are workspaces for grouping chats, uploaded reference files, and instructions around long-running efforts. ([OpenAI Help Center][2]) That means the professional answer to “where should my continuity live?” is not “inside the chatbot’s vibes.” It is “inside artifacts I control, with the chatbot invited in.”

That is Pipulate’s central move: continuity becomes a repo-shaped artifact, not a vendor-shaped mood.

2. The Walkthrough: what it feels like when the hand-cranked network works

The old flow is familiar. You had a breakthrough with Claude, Gemini, ChatGPT, Grok, Cursor, or some other interface. The session was electric. The AI “understood everything.” Then the next day arrives. The chat is buried. The repo has changed. The model has changed. The hidden system prompt may have changed. The feature flag matrix has changed. You try to continue, but now half the work is archaeological: “Here’s where we left off. Here’s what you said. No, not that. Look at this file. No, don’t rewrite the whole thing. No, that function is registered dynamically. No, that apparent dead code is a FastHTML route.”

Prompt Fu changes the unit of continuity. Instead of trusting the AI platform to remember the work, you compile the work. The prompt becomes a build artifact. foo_files.py is the routing table. prompt_foo.py is the context compiler. apply.py is the deterministic actuator that applies exact search/replace blocks instead of letting the model freehand a rewrite. The uploaded context even shows the current compile included static analysis telemetry and “All checks passed,” which is the right smell: not just “the AI said it works,” but “the machine got a chance to object.”

This is the aha: you are not asking the AI to “be an agent.” You are giving the AI a cockpit with labeled instruments, a flight plan, a fuel gauge, and a patch protocol. The AI can still be creative, but it is creative inside rails you own.

That is why this is not merely prompt engineering. Prompt engineering is usually about coaxing a model. Prompt Fu is closer to source control for cognition. It says: here is the exact bundle of text that caused this answer; here are the files included; here is the token weight; here is the recapture block; here is the patch format; here is the local deterministic apply step.

3. The Load-Bearing Connection: why this is not Docker, Electron, Bun, or Zig

Docker solves a real problem: container images are built from immutable layers, and Docker’s own docs explain that each layer records filesystem changes used to construct the container filesystem. ([Docker Documentation][3]) But Docker does not solve the problem you are attacking. Docker can freeze an application runtime. It does not freeze the human-AI reasoning path that produced a change, nor does it decide which files are cognitively relevant to the next model invocation.

Nix gets closer to your thesis because it treats software deployment as a pure function over declared inputs. Dolstra’s Nix work describes a model where components are defined by declared inputs, and NixOS extends that functional approach to system configuration so static artifacts are generated from specifications rather than mutated imperatively. ([edolstra.github.io][4]) Pipulate borrows that spirit and applies it one layer higher: not just reproducible software environments, but reproducible AI collaboration environments.

Electron gives you desktop-app packaging through web tech. Bun and Zig give you speed, systems leverage, or a smaller toolchain surface. Those are valuable, but they are not the same category. They do not answer: “How do I resume a months-long reasoning process across stateless or semi-stateful AI systems without being locked into the chat history of one vendor?”

Your answer is: make the continuity explicit, text-first, repo-local, inspectable, portable, and patchable.

That is why the “Bring history from another AI” button is such a fascinating foil. Anthropic’s import feature is a humane bridge for consumer memory portability. Prompt Fu is the grown-up workshop version: not “summarize me,” but “compile the exact working set, with provenance and machinery attached.”

4. The Contrast & Warning: the old way hides liability until it becomes institutional fact

The old way is seductive:

  • “Let the agent explore the repo.”
  • “Let it make the changes.”
  • “Let it run the app.”
  • “Let it decide what matters.”
  • “Let the chat history be the continuity layer.”

That is fantastic for demos and dangerous for systems that must be maintained. The surface looks finished because the model is excellent at surface completion. But the hidden cost is context debt. Every skipped explanation, every invisible assumption, every “the AI probably understood that” becomes a future tax.

The Post Office Horizon scandal is a brutal warning about what happens when software output is treated as authoritative while the system’s defects, auditability, and human challenge mechanisms are minimized. The UK’s official Horizon Inquiry describes Horizon Online as being afflicted by bugs, errors, and defects that could show illusory gains and losses in branch accounts; the Criminal Cases Review Commission likewise noted material risk that apparent shortfalls were caused by Horizon bugs, errors, and defects, plus failures to disclose the full reliability picture. ([Post Office Horizon IT Inquiry][5])

That is not a “coding bug” anecdote. It is a governance warning. When systems become opaque and institutional trust transfers from humans to software outputs, the software’s hidden errors become social facts. People reorganize their lives around false machine confidence.

In AI-assisted development, the analogous danger is not that an LLM makes one bad patch. It is that the organization gradually accepts unaudited AI-generated structure as reality: generated tests that do not test the right things, generated abstractions nobody understands, generated documentation that flatters the architecture, generated migrations whose rationale lives only in a vanished chat.

The new way is boring in exactly the right places. Compile context. Make the patch small. Use exact search/replace. Run static analysis. Keep the human in the loop. Preserve the route back.

Aider’s documentation independently points in the same direction: its diff/search-replace edit formats exist because returning only the changed parts is more efficient than rewriting whole files, and its unified-diff benchmark notes large gains in reducing lazy code output for GPT-4 Turbo on its test set. ([Aider][6]) The shared lesson is that AI coding improves when the output channel is constrained, familiar, reviewable, and mechanically checkable.

5. The Reality Check & Monday Morning Test

Without this insight, Monday morning looks like this: you open yesterday’s AI chat, scroll, search, copy a few fragments, paste them into a new model, and spend the first half hour reconstructing the mental room. The assistant then starts confidently editing around ghosts. It mistakes framework idioms for dead code. It “simplifies” WET workflow logic that was intentionally explicit. It proposes a broad rewrite because it has no tactile sense of the repo’s living constraints.

With Prompt Fu, Monday morning looks different.

You run the context compiler. You include the current article, the map, the exact files, and the patch actuator. You drop the resulting payload into whichever frontier model has the best ergonomics today. The AI wakes up inside a furnished room instead of an empty warehouse. It sees the workflow philosophy, the files, the current static-analysis status, and the protocol for changing code. Then the requested change is small enough to review.

That is the work-life improvement: fewer “bring the AI back up to speed” rituals, fewer hallucinated rewrites, fewer giant patches, fewer hidden dependencies on a single vendor’s chat archive. The bottleneck that disappears is not typing. It is reorientation.

This is also why your system feels like old-school webmaster energy rather than “AI startup platform energy.” The webmaster owned the files, the server, the logs, the FTP path, the HTML, the domain, the weird little scripts, and the scars. Prompt Fu brings that sensibility into AI collaboration: your continuity lives where your craft lives.

6. The Forward Vector: context portability wants a substrate, not a slogan

The obvious next step is not “make the agent more autonomous.” The obvious next step is making the compiled context more structured, inspectable, and replayable.

Claude’s import feature proves there is market demand for cross-AI continuity. OpenAI’s memory and Projects docs prove that major platforms are organizing around persistent user context and long-running workspaces. ([Claude Help Center][1]) But the professional-grade version of this should not stop at memory snippets. It should look more like a reproducible build:

  • Input files.
  • Command.
  • Token counts.
  • Static-analysis output.
  • Prompt hash.
  • Model identity.
  • Response.
  • Patch.
  • Post-patch checks.
  • Commit.

That is not a chatbot conversation anymore. That is an audit trail for cognition-assisted software work.

So the long-term value of Pipulate’s approach is not that it beats every AI coding tool feature-for-feature. It does not need to. Its value is that it moves the continuity layer out of the vendor product and into the user’s workshop. The AI can change. The model can change. The UI can change. The repo and the method remain.

That is fuel for the engine, not optional sugar.

7. The Sign-Off Charm: the council’s verdict

The funny thing is that Anthropic’s “Bring history from another AI” button is both a concession and a compliment to your thesis. It says: yes, continuity matters so much that switching tools without it feels broken. But the mechanism is still humble old copy/paste. Not magic. Not telepathy. Not a universal soul-transfer protocol. Just text, shaped carefully enough that another model can ingest it.

And that is the surprise: the future keeps disguising itself as increasingly elaborate machinery, but the winning move keeps coming back to disciplined text. Markdown. Git. Nix expressions. Python files. Logs. Exact patches. Context recapture blocks. The sandworm is enormous, but the reins are still made of strings.

So yes: John Henry riding a Dune sandworm like a steam engine is the right metaphor. The point is not man versus machine. The point is rhythm plus monster. The human supplies continuity, taste, skepticism, and direction. The AI supplies force. Prompt Fu supplies the harness.

Keep cranking. The machine is not replacing the workshop; it is making the workshop matter again.

Source Control for Cognition

[1]: https://support.claude.com/en/articles/12123587-import-and-export-your-memory-from-claude?utm_source=chatgpt.com "Import and export your memory from Claude | Claude Help Center"
[2]: https://help.openai.com/en/articles/8590148-memory-in-chatgpt-remembering-what-you-chat-about?utm_source=chatgpt.com "Memory FAQ - OpenAI Help Center"
[3]: https://docs.docker.com/get-started/docker-concepts/building-images/understanding-image-layers/?utm_source=chatgpt.com "Understanding the image layers | Docker Docs"
[4]: https://edolstra.github.io/pubs/phd-thesis.pdf?utm_source=chatgpt.com "The Purely Functional Software Deployment Model"
[5]: https://www.postofficehorizoninquiry.org.uk/sites/default/files/2025-07/Post%20Office%20Horizon%20IT%20Inquiry%20Final%20Report%20Volume%201_0.pdf?utm_source=chatgpt.com "Post Office Horizon IT Inquiry"
[6]: https://aider.chat/docs/more/edit-formats.html?utm_source=chatgpt.com "Edit formats | aider"

MikeLev.in: Okay, that was fun. I couldn’t just stop at applying my second successful patch under the new system, could I?


Book Analysis

Ai Editorial Take

What is most striking here is the concept of ‘Epistemic Debt.’ While everyone talks about technical debt (bad code), this entry identifies a more dangerous trend: the loss of institutional knowledge where the ‘spec’ only exists as a statistical probability in a vanished chat thread. A new perspective to consider is the ‘Social Engineering of AI Trust’—how the professional formatting of LLM outputs (like the empty memory export) tricks users into a false sense of system state. This isn’t just a coding problem; it’s a transparency crisis.

🐦 X.com Promo Tweet

Vibe-coding looks polished until the context window shifts. 🌵 Learn why treating AI continuity as a repo-local artifact is the only way to avoid 'epistemic debt.' 🛠️ Human craft + AI force = The real path forward. 🚀

https://mikelev.in/futureproof/riding-the-sandworm-vibe-coding/

#AI #SoftwareEngineering #Python #LLM

Title Brainstorm

  • Title Option: Riding the Sandworm: Why Vibe-Coding is Technical Debt
    • Filename: riding-the-sandworm-vibe-coding.md
    • Rationale: Uses the central, high-impact metaphor from the text while directly addressing the ‘vibe-coding’ keyword currently trending in dev circles.
  • Title Option: Continuity as an Artifact: The End of Chat-History Lock-in
    • Filename: ai-continuity-repo-artifacts.md
    • Rationale: Focuses on the practical problem-solution aspect of the article: moving memory from the LLM to the repo.
  • Title Option: The Epistemic Debt of Agentic AI
    • Filename: epistemic-debt-agentic-ai.md
    • Rationale: Targets a more academic and senior engineering audience interested in the long-term governance risks of AI.

Content Potential And Polish

  • Core Strengths:
    • Unique multi-model dialogue format provides comprehensive perspectives.
    • Strong, memorable metaphors like ‘John Henry on a sandworm.’
    • Direct evidence through the Gemini/Claude memory export experiment.
    • Clear distinction between creative co-piloting and deterministic control.
  • Suggestions For Polish:
    • Could benefit from a brief diagrammatic description of the Prompt Fu stack (compiler -> model -> actuator).
    • Ensure the distinction between ‘agentic’ and ‘non-agentic agentic’ is clarified for readers new to the workflow.

Next Step Prompts

  • Generate a schema for a ‘Cognitive Audit Trail’ that logs the prompt hash, model ID, and static analysis state for every AI-generated patch.
  • Draft a guide for ‘Prompt Fu’ beginners on how to transition from Cursor/Aider-style agents to a hand-cranked Pipulate workflow.