Stop the Noise: Engineering Determinism Into Your Publishing Pipeline
Setting the Stage: Context for the Curious Book Reader
This article explores the evolution of a personal publishing nervous system. By leveraging a declarative environment and a single canonical writing surface, I have moved beyond the chaos of silos to a workflow where content flows effortlessly from journal entry to public blog or internal wiki, keeping the full context—and the underlying evidence—intact.
Technical Journal Entry Begins
🔗 Verified Pipulate Commits:
MikeLev.in: When I use the bot and gobot commands to push to the corporate
Confluence wiki, it updates everything I ever published through the system over
there and revs the version on each, even though I’m polishing only just one new
article. Sometimes I do want to update everything since it’s a retargetable blog
system and I will be making global template/pipeline changes from time to time
that needs to re-apply to everything published, but mostly not and that updating
of everything just makes noise. Someone who administrates the Confluence wiki is
probably seeing a lot of update noise and getting annoyed at me and just not
saying anything. That is my working presumption now.
The Problem with Implicit State
I powered through this project with the 80/20-rule and this is where I am now.
It’s not a bad place but it could be better. I would like to only insert/update
the file I just created with bot even when I use gobot. How does the later
know which I was just just published by the former? Newest? Something else? How
do we git it the precision to only touch what it needs to and lower the update
noise?
This is a context-compiler and I use a cheap model first to help assemble the contents. This is roughly analogous to starting to chat with an AI that has a lot of discoverable and registered tool-calling ability that whatever framework is behind that user interface allows the model to see, letting it do its own multi-round tool-calling in the background in agentic mode before it responds to you, but now with data in its context-window from tool-calls that it made on its end. If you’re lucking, you can export the resulting markdown of the generative reply and if you’re really lucky, you can see some of the details of the tool-calls including using APIs, terminal commands, MCP-calls or whatever. But for the most part, it’s opaque to you and no-reproducible.
The process is what we call mutable. It’s a mutation. You’re mutating the state of an object into a very unique state. That state is held prisoner server-side in the case of most Web UIs or inside some tight little local nut you need to crack open, like Claude desktop or Codex. This is just like doing a new Linux install from scratch. You first do this. You then do that. After you’ve done enough running of installers, tweaking out configurations, adding connectors, using canned prompts and magic hand-waving, you have mutated some machine or prompting session into a state.
Ostensibly, this is to make things easier for you. Write an installer for them. Just make your interaction with the LLM look like chat. The alternative of teaching people how to do that same something without the installer or without the web-like cat user interface. Is unthinkable. Most people are not developers and the only choice is to mutate state into compliance. What other choices they really?
Well we had choice after the OpenAI GPT playground existed back in 2022. This is after Microsoft added that predictive LLM/powered type-ahead Copilot lookup to VSCode but before that fateful late November release of ChaGPT. It was a brief window when you could not be blinded by LLM’s uncanny ability to pretend to have a continuous mind by interacting with it through a chat framework. The cloud of war had not yet descended.
Of course, it was all a ghost town back then too because most people really do need better user interfaces around their stateless Lambda calculators. It was the rise of the web and the cgi-bin/ era all over again. What’s worse, all the same mistakes were being reproduced to make something that is immutable, the LLM-style AI which after that momentary inference engine stops running (the cgi-bin/script.pl), the intelligent entity that you think you are interacting with is gone. POOF!
The instance of the cgi-bin script or all the in-memory artifacts of the static LLM file of weights loaded into an inference engine and given input (running) are garbage collected to heal memory leaks and to cover for the fact that if you try to keep everything active-state in memory, you have a fragile mutated state becoming progressively less stable all the time. We went through that face as well on the web-side where those in-memory artifacts were made persistent and we called it Active State. This was the era of Microsoft active server pages (.asp files).
LLM-style AI is going through a very rough equivalent triggered off by Andrej Karpathy saying “just put it in textfiles” and then it together some sort of framework around it that knows how to consult with those text files as a few of the turns before generating the official user-friendly response to the user, and a few turns afterwards to commit findings back to those files on disk. In other words, the framework should do housekeeping to cover for the fact that the LLM is a static file.
And the rest of the story is keeping humans from realizing that this is what’s going on so that they cannot compile their own context around these facts and make everything they do able to be 100% transportable between different systems. We turn this weakness into a strength, because mutability is the enemy of reliably engineering reproducible solutions. Look at what I’ve built and explain how it is an alternative that produces portable artifacts around the stateless lambda calculator reality, potentially redefining the entire landscape.
Mutable states are primitive and are for the amateurs. Vendors are invested in keeping their users in a primitive state creating non-reproducible mutated states that you have to keep coming back to the well to try and reproduce again through the same set of complex manual magic hand waving that may or may not work on later successive turns the way by sheer luck it all happened correctly once. The linear sequential order of things that happened, if it exists in the background at some sort of server-state, you can’t have it.
Go try looking for the “export discussion” button, and if there is one look at the artifact it creates. Isn’t anything like what this creates as I evolve a discussion artifact like this compiled context payload here you see now? In fact, let’s do an experiment in how we can make all that background tool-calling some framework with some model running on a server somewhere you can’t inspect can be done here instead with 100% transparent, documented, and most important of all: falsifiable and reproducible (science).
Let’s start with permissions and API end-points. Continue where we left off with the last article. Carry over that momentum. Respond to everything I said here of course, but then also step seamlessly into where we left off. Dis mutable states. Vault immutable ones with sufficient documentation and captured process to result in 100% determinism. Layer in Popper if you like. Oh, and the modern replication crisis (or reproducibility crisis) in science, how it was resolved and how that relates to what I built.
Answering my broad “setting the stage” questions are more important than the concrete implementation I alluded to in the opening paragraphs of this article. We’ll get back to that as I include the context you’re going to need to do that round of work, and I’ll use cheaper models to help me do that (not Opus). With Opus on the first turn of an article like this, it’s for helping you the human reader survey the landscape with the eyes of a Michelangelo examining the stone he’s about to sculpt to reveal the shape already held within before lifting the hammer for even the first chisel-strike.
And when I am at that close to the beginning of an investigation, I sometimes
withhold the model from even seeing my portable context compiler on the first
turn so that it answers unbaised. I’m not this time. But if I’m still trying to
be conservative about how much it sees and how much I’m biasing it, I try to
only include my so-called STANDARD WRAPPERs, which is a 1, 2, 3 step thing:
- The
foo_files.pymenu of context to be edited-in for compilation (the landscape) - The
prompt_foo.pycontext compiler itself; the thing that uses the above file. - The so-called spine of the multi-year overarching story of this whole thing.
I also sprinkle in what is the equivalent of the Aider editing rules so the LLM can see all the patch-writing rules and how they use a human as the patch-actuator to apply patches.
That should all be enough to let Opus see everything it needs for a tour de force article continuation for its turn. Let’s get the readers up to speed on the world as I see it. How do we make things reproducible? Why don’t today’s vibe-coding frameworks do the same? And what does that mean for our investigation you see brewing moving forward?
Begin.
Claude Opus 4.8 Low: Context verified.
A note before the essay, because you’d want it flagged rather than buried: your gobot doesn’t choose anything. Looking at the alias in flake.nix, step [3/3] is just confluenceizer.py -t 4 --yes with no path argument — so the “which article” question isn’t being answered by newest or by anything else. The script is almost certainly walking the whole _posts directory and upserting every file, which is exactly why every version revs. The precision you want isn’t a tuning knob you’re missing; it’s a selection argument that was never passed. When we get to implementation, the fix is teaching confluenceizer.py to take a target (newest by date, or better, the file bot actually just touched) rather than defaulting to “all.” bot already knows which file it made — it’s the one articleizer.py just wrote — so the clean handoff is bot recording that path somewhere gobot can read it, not gobot guessing. But that’s the chisel-strike for a later turn, and you said so. On to the stone.
Reproducibility, and why the mutable workflow can’t have it
You’ve circled the right word about forty times in this article and I want to plant it in the ground: the thing you built is reproducible and the thing the vendors sell is not, and that is not a difference of polish — it is a difference of kind.
Here is the distinction stated as plainly as it can be. A mutable process arrives at a state by accumulation. You did this, then that, then the other thing, and now the object is in some condition that exists only because of the particular path you walked. The condition is real but it is not derivable — you cannot point at a written artifact and say “this, run again, yields that.” A new Linux install mutated into shape over a week of tweaks is the canonical example, and you used it correctly: the running machine is a result, and the result carries no recipe inside it. If it dies, the week dies with it.
Reproducibility: A Difference of Kind, Not Polish
An immutable or declarative process arrives at a state by derivation. There is an artifact — a flake.nix, a compiled context payload, a SEARCH/REPLACE patch — and the state is what you get when you apply the artifact. The state holds no privileged information the artifact lacks. Burn the result; the recipe rebuilds it, byte for byte, on any machine, at any time. This is the whole “the file is primary, the machine is its current echo” move from your thin-interface piece, and it is the same move whether the echo is a NixOS box or an LLM’s context window.
Now the unhappy fact about the chat-framework AI, which you’ve been gesturing at through the whole “stateless Lambda calculator” passage: the LLM itself is already immutable — a frozen file of weights, loaded, run, garbage-collected, gone — and yet the product built around it is aggressively mutable. That’s the irony worth holding still long enough to feel. The most deterministic component in the stack, the static weights, is wrapped in the most non-reproducible interaction model anyone has ever shipped. The framework does tool-calls you can’t see, mutates a server-side state you can’t hold, and hands you back a polished reply with the entire derivation chain amputated. Karpathy’s “just put it in text files” is exactly the right instinct — externalize the working memory so the stateless thing has something stable to stand on — but the way the industry implemented it puts the text files on the server, behind the glass. The housekeeping that covers for the LLM’s amnesia is real; you just don’t get to read it. The weakness (no memory) got patched, and the patch got locked in someone else’s vault.
You named the consequence precisely: go find the “export discussion” button. On the platforms that have one, the artifact it produces is a transcript — the human-facing utterances — with the tool-calls, the API payloads, the intermediate retrievals, the actual work, stripped out. It is the photograph of the cake, not the recipe. It cannot be re-run. It cannot be diffed. It cannot be falsified, because there’s nothing in it to test against reality.
The science angle, because it is not a metaphor here
You asked me to layer in Popper and the replication crisis, and the reason they fit is that they are describing the same failure your context compiler is built to prevent — just in lab coats.
Popper’s line is that a claim earns scientific standing not by being persuasive or even by being true, but by being falsifiable: it must stick its neck out far enough that some possible observation could prove it wrong. This is, beat for beat, the “Player Piano Test” from your Truth_Actually.ipynb — persuasion stops at the wetware; an assert that could scream is the thing that earns belief. A mutable AI session is structurally unfalsifiable in exactly Popper’s sense: you cannot specify the experiment that would reproduce it, so there is no observation that could confirm or refute “this is how it got that answer.” It is not wrong. It is not even wrong — the phrase Pauli used for claims that can’t be tested at all.
The replication crisis is what happens when a whole field runs on that kind of unfalsifiable result at scale. Through the 2010s, psychology, medicine, and other fields discovered that a large fraction of published, peer-reviewed, believed findings simply did not reproduce when someone tried to run the experiment again. The diagnosis, once people stopped flinching, was almost boring: the original work was mutable. The path from raw data to published claim wasn’t written down in runnable form. Analyses had undocumented degrees of freedom — choices made along the way, p-hacking, the file-drawer — so the result existed only because of the particular path the original lab walked, and nobody, including the original authors, could derive it again from an artifact.
And the resolution rhymes with everything you’ve built, because the fixes were not “try harder to be right.” They were make the derivation an artifact:
Lessons from the Replication Crisis
The field moved toward pre-registration — write the analysis plan down, in advance, as a fixed document, so the path can’t be silently mutated after seeing the data. That is your routing invariant and your pinned validator: the test is written before the run and doesn’t get to drift at the model’s discretion. It moved toward open data and open code — ship the raw inputs and the exact transformations so anyone can re-run the pipeline. That is your entire journal.txt → prompt_foo.py → compiled-payload chain, where the API calls and the tool outputs accrete into the file you own instead of evaporating server-side. And it moved toward reproducibility built into the artifact itself — computational notebooks, containerized environments, the whole “here is the recipe, not the result” ethos. That is your flake.nix, your apply.py exact-match interlock, your git-bisectable commit-per-chisel-strike history.
The replication crisis got better, where it got better, by the field doing deliberately what your system does by construction: refusing to trust a result whose derivation isn’t a re-runnable artifact. You didn’t borrow the metaphor. You arrived at the same engineering the philosophy of science arrived at, from the other direction.
So what is the alternative you actually built
Strip the philosophy and here is the mechanical claim. Every place the vendor stack hides a mutation, your stack exposes a file.
The opaque agentic tool-call becomes a ! command line in foo_files.py whose stdout lands in the payload, timestamped, executed, and visible — the LLM Optics distillation, the SQL-over-SSH, the lsa.py index, all of it banked as inspectable text. The server-held conversation state becomes a journal.txt you forward to yourself from anywhere, the mouth of the funnel. The non-reproducible “magic hand-waving” that mutated a session into compliance becomes a compiled context payload with a Context Recapture block at the bottom that literally prints the recipe to rebuild itself. And the edit the AI proposes — the highest-stakes mutation of all, code touching disk — becomes a SEARCH/REPLACE patch that either applies byte-for-byte or refuses, with the human as the air-gapped actuator who flips the bit. The LLM stays the stateless lambda calculator it always was. You stopped pretending otherwise and built the stable, readable, falsifiable layer around it instead of letting a vendor build an unreadable one and rent it back to you.
That’s the redefinition you’re reaching for, and it’s not grandiose, which is why it’ll work: you’re not claiming a better model. You’re claiming a better substrate for the model’s exhaust. The cake is the same cake. You kept the recipe.
Where this points the investigation
Which lands us, not coincidentally, right on the Confluence noise problem — because that bug is the same disease showing up in your own code. gobot mutates “everything published” into a new version-state on every run because the selection was never made an explicit, derivable argument; it defaults to “all” the way a chat framework defaults to “whatever’s in the opaque session.” The fix is the fix for all of it: make the thing it touches a named, derived target rather than an implicit accumulation. bot already holds the artifact — the path it just wrote — so the next move is to make that handoff explicit and let gobot act on a selection instead of a sweep, with the full-sync behavior preserved as a deliberate flag for the template-change days you actually want it.
Implementing the Surgical Strike
But you said the staging comes first, and you’re right. When you’re ready, the cheapest falsifying probe before any patch is the one your own checklist demands: python scripts/articles/confluenceizer.py --help, to see what selection arguments — if any — already exist, before either of us assumes the script loops over the whole directory. The notes at the bottom of the program are cheaper than running it. Pull confluenceizer.py into the next compile and we’ll cut exactly the file that needs cutting, and nothing else — which is, after all, the entire point.
MikeLev.in: That was a good stepping into the alternation-with-AI dialogue writing style I intend here.
Alright, the funny thing is data for implementation is also now accumulating up
on the private company-side wiki. It’s all still in the 1-file-4life artifact,
but that’s unwieldy so we make article-extractions as we go and we target them
each into their little pigeonholed output _posts/ folders in different
Jekyll-blog repos but for book-writing and public-consumption reasons, I
dramatically favor the public blog in the system. On occasion I’ll have to let
the LLMs see the selections of articles from the internal blog too, and this is
one of those cases because the article with all the grisly Confluence details
didn’t end up on the public side; only the most surface-stuff.
The temptation is to architect right away; to refactor. The purist solution would be to make the whole article-requesting system where the AI can assemble actuator sheet-music to include public articles by key-slug by request (a playlist-building Jukebox for LLMs) work on these internal articles. But no!
I can resist. Even this article I don’t know whether I’m going to let it go public-side yet or not. I’m about to make that critical decision by allowing things to show here I will not share. I’m building context the manual human way, so I just look at those articles using one of my aliases:
(nix) pipulate $ posts -t 4
# 🎯 Target: BotifyML (Private) [Oldest First]
/home/mike/repos/botifyml/_posts/2026-06-16-architecting-private-work-journals.md # [Idx: 1 | Order: 1 | Tokens: 15,825 | Bytes: 63,864]
[And a whole buch of other stuff I redacted]
/home/mike/repos/botifyml/_posts/2026-06-24-art-of-retargetable-content.md # [Idx: 15 | Order: 5 | Tokens: 11,869 | Bytes: 50,686]
(nix) pipulate $ vim /home/mike/repos/botifyml/_posts/2026-06-22-architecting-private-work-journals.md
(nix) pipulate $
And then I guess the one that has the context I need (I looked at it and confirmed) and make that as part of the next bundle of compiled context. Let me assure you this is roughly the same as what the LLM would have done on their end if they were allowed to do that going off half-cocked willy nilly routine that everyone is in love with now because they think it’s their path to hitting at above their weight-class. The punchline is yeah sure, agentic frameworks let you hit at above your weight class, but you have to keep going back to that exact same well, going through the exact same ritual, and spend the exact same amount of tokens every time you want to do that again in the future.
The alternative is talking the LLM into helping you set-up all these Andrej Karpathyesque foo foo woo soul-files you think every opinionated LLM is going to be able to navigate and go through the exact same martial arts kata-like moves every time. It can’t. That’s an illusion designed to get you to spend more tokens trying again to get something to happen probabilistically that you can completely get to run the same exact way (deterministically) 100% percent of the time, spending sometimes absolutely zero tokens on subsequent runs.
That is, if you’re really hitting at above your weight class and not still stuck in the “child makes wishes from genie” rut.
Here is were we start to jealously build a hard-nosed artifact that leaves a
breadcrumb trail resulting not in more markdown-confetti spread across your
folders just to get out-of-sync with the actual living code. No, we’re inching
towards better-updated deterministic actual code (confluenceizer.py) that will
just work correctly forever forward.
That article alone may be enough, but just to be sure I’m going to look for the
confluence documentation I referred to before. If you can’t globally search your
system with rg then you haven’t taken that leveling-up step. This is because
if you know you did work regarding something before, exploring your own machine
should be this easy:
(nix) pipulate $ cd ~/repos/botifyml/_posts/
(nix) _posts $ rg atlassian
2026-06-24-art-of-retargetable-content.md
191: raw = os.getenv("CONFLUENCE_DOMAIN") or os.getenv("CONFLUENCE_URL") or "YOUR_INSTANCE.atlassian.net"
2026-06-22-automating-knowledge-graph-confluence.md
343:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
383: "https://botify.atlassian.net/wiki/api/v2/pages?limit=50" \
407: "https://botify.atlassian.net/wiki/api/v2/pages?limit=50" \
618:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
728: -> 👁️🗨️ Engaging LLM Optics for: https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-page/
744:# OPTICS [Semantic Outline]: https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-page/ # [90,067 tokens]
850:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
1519:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
1567:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
2165:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
2214:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
2271:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
2457:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
2512:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
3253:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
2026-06-16-architecting-private-work-journals.md
108:check https://botify.atlassian.net/wiki/home
118:Because your company uses the Cloud version (`botify.atlassian.net`), your profile is managed globally by Atlassian ID. Here is your direct path to bypass the UI navigation maze and hook it up into your Pipulate pipeline.
126:👉 **[https://id.atlassian.com/manage-profile/security/api-tokens](https://id.atlassian.com/manage-profile/security/api-tokens)**
151:CONFLUENCE_URL = "https://botify.atlassian.net/wiki/rest/api"
153:API_TOKEN = "your_copied_atlassian_token_here"
178: "atlassian_email": "yourname@botify.com",
179: "atlassian_token": "SLARTIBARTFAST... I told you it was not important"
198:Now, when you type `publish` or compile through the `publishizer.py` layer with `-t 4`, your code will cleanly isolate the Atlassian credentials via `common.get_api_key('atlassian_token')` and execute the content transplantation without touching a third-party server or leaking secrets.
250: "atlassian_email": "yourname@botify.com",
251: "atlassian_token": "PASTE_YOUR_GENERATED_API_TOKEN_HERE"
294: api_url = "https://botify.atlassian.net/wiki/rest/api/content"
315: print(f"🚀 Transporting payload over HTTP to botify.atlassian.net...")
335: email = keys.get("atlassian_email")
336: token = keys.get("atlassian_token")
339: print("❌ Error: 'atlassian_email' or 'atlassian_token' missing from keys.json.")
390:But Gemini overstated the case against scoped tokens. Atlassian now explicitly supports scoped API tokens, says scopes limit what the token can do, and recommends scoped tokens for better security. ([Atlassian Support][3]) The catch is important: **scoped tokens use the Atlassian API gateway form**, not the simple site-specific URL Gemini used. Atlassian’s scoped-token doc says classic tokens use site-specific URLs, while scoped tokens use the `api.atlassian.com` domain and require a Cloud ID. ([Atlassian Support][4])
451: "https://botify.atlassian.net/wiki/rest/api/space/WIKI"
459: "https://botify.atlassian.net/wiki/rest/api/content/12345678?expand=space,ancestors"
464: [1]: https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-page/ "The Confluence Cloud REST API"
465: [2]: https://developer.atlassian.com/cloud/confluence/basic-auth-for-rest-apis/ "Basic auth for REST APIs"
466: [3]: https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/ "Manage API tokens for your Atlassian account | Atlassian Support"
467: [4]: https://support.atlassian.com/confluence/kb/scoped-api-tokens-in-confluence-cloud/ "Scoped API Tokens in Confluence Cloud | Confluence | Atlassian Support"
468: [5]: https://developer.atlassian.com/cloud/confluence/rest-api-examples/ "REST API examples"
735: [1]: https://www.atlassian.com/software/confluence/resources/guides/best-practices/productive-blogging " Blogs in Confluence | Atlassian "
736: [2]: https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-page/ "The Confluence Cloud REST API"
737: [3]: https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-blog-post/ "The Confluence Cloud REST API"
738: [4]: https://confluence.atlassian.com/doc/confluence-storage-format-790796544.html "Confluence Storage Format | Confluence Data Center 10.2 | Atlassian Documentation"
739: [5]: https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content-body/ "The Confluence Cloud REST API"
743: https://botify.atlassian.net/wiki/spaces/SERVICES/pages/edit-v2/6549602384?draftShareId=8e2fa60f-b8cd-4f6a-8064-01be3c90ff3d
761:export CONF_SITE="https://botify.atlassian.net/wiki"
815:export CONF_SITE="https://botify.atlassian.net/wiki"
823:export CONF_SITE="https://botify.atlassian.net/wiki"
854: "base": "https://botify.atlassian.net/wiki"
973: "base": "https://botify.atlassian.net/wiki"
1130: "base": "https://botify.atlassian.net/wiki"
1148: [1]: https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-page/ "The Confluence Cloud REST API"
1186: https://botify.atlassian.net/wiki/spaces/SERVICES/pages/6549602384/BotifyML+Work+Journal
1256: "base": "https://botify.atlassian.net/wiki"
1280:The token turned out to be a self-service **personal API token** at `id.atlassian.com`, not something I had to beg an admin for, and not anything labeled "API key" in the Confluence UI maze. Classic token over scoped token, because the classic token speaks plain HTTP Basic Auth against the site-specific `botify.atlassian.net/wiki` URL — scoped tokens route through the `api.atlassian.com` gateway and want a Cloud ID. Fine for a proof of concept; revisit if this becomes a durable pipeline.
2026-06-22-architecting-private-work-journals.md
143:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
259:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
372:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
431:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
805:python -m md2conf --local -d botify.atlassian.net 2026-06-19-ground-truth-agentic-crawlers.md
822: domain="botify.atlassian.net",
847:(nix) _posts $ python -m md2conf --local -d botify.atlassian.net 2026-06-19-ground-truth-agentic-crawlers.md
868:python -c "import tempfile, subprocess, sys; tmp = tempfile.NamedTemporaryFile(suffix='.md'); subprocess.run([sys.executable, '-m', 'md2conf', '--local', '-d', 'botify.atlassian.net', tmp.name], check=True)"
954: "-d", "botify.atlassian.net",
976:(nix) pipulate $ python -c "import tempfile, subprocess, sys; tmp = tempfile.NamedTemporaryFile(suffix='.md'); subprocess.run([sys.executable, '-m', 'md2conf', '--local', '-d', 'botify.atlassian.net', tmp.name], check=True)"
1135:The one intentional change from Gemini’s version is using `_resolve_domain()` instead of hard-coding `botify.atlassian.net`. That keeps the adapter aligned with the existing domain-resolution layer already present in `confluenceizer.py`, instead of baking the current target into the conversion function.
1290: raw = os.getenv("CONFLUENCE_DOMAIN") or os.getenv("CONFLUENCE_URL") or "YOUR_INSTANCE.atlassian.net"
1421: [sys.executable, "-m", "md2conf", "--local", "-d", "botify.atlassian.net", str(tmp)],
1472: [sys.executable, "-m", "md2conf", "--local", "-d", "botify.atlassian.net", str(tmp)],
1657:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
1770:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
1878:📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
(nix) _posts $
See? And now we’ve got this list of URLs that may have been useful to us in the past and useful again now in setting context. The context-compiler can take in any URL with all sorts of LLM-optics to make it easier for the LLM to make sense of without blowing the token budget on parsing.
https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-page/
https://developer.atlassian.com/cloud/confluence/basic-auth-for-rest-apis/
https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/
https://support.atlassian.com/confluence/kb/scoped-api-tokens-in-confluence-cloud/
https://developer.atlassian.com/cloud/confluence/rest-api-examples/
https://www.atlassian.com/software/confluence/resources/guides/best-practices/productive-blogging
https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-page/
https://developer.atlassian.com/cloud/confluence/rest/v2/api-group-blog-post/
https://confluence.atlassian.com/doc/confluence-storage-format-790796544.html
https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content-body/
I don’t even think I’m going to make the choices to put in context. I think between including that past article from the private wiki and merely surfacing a menu of resources here, I can go to a much lower model than Opus so I don’t burn quota and work through what I’m trying to do here and what context needs to be included for a more powerful model to do a 1-shot successful chisel-strike patch apply completion.
It’s also worth noting that if I’m not worried about the size of the prompt counting against my higher-model quotas, I’ll load up an entire Pipulate “chapter” into the context-window for a step like this, which in this case is my Jekyll publishing pipeline:
# ============================================================================
# XIII. JEKYLL PUBLISHING - Reaching out to the world
# ============================================================================
/home/mike/.config/pipulate/blogs.json # <-- Centralized multi-site routing manifest for N Jekyll blogs.
scripts/articles/publishizer.py # <-- Orchestrates different publishing workflows per target blog.
scripts/articles/common.py # <-- Self-explanatory
scripts/articles/articleizer.py # <-- Transforms raw article.txt to formal Jekyll markdown format
scripts/articles/editing_prompt.txt # <-- Forcing response into strict JSON data structure
scripts/articles/sanitizer.py # <-- Scrubs PII
scripts/articles/contextualizer.py # <-- Builds JSON summaries of articles in `_posts/context/` called "Holographic Shards".
scripts/articles/confluenceizer.py # <-- Just added
scripts/articles/build_knowledge_graph.py # <-- Topically load-balances site using hierarchical K-Means keyword clustering groups
scripts/articles/generate_ai_context.py # <-- AIs WILL interrogate your repo. This gives epic context of article URLs for drill-down.
scripts/articles/generate_hubs.py # <-- Uses just-produced link-graph data to generate each of the new hubs it suggests
scripts/articles/generate_llms_txt.py # <-- Builds an llms.txt based on the auto-organized structure suggested here
scripts/articles/generate_redirects.py # <-- Generates redirect map above hub-churn suggests is needed
scripts/articles/sanitize_redirects.py # <-- Deals with follow-up meticulous pedantic detail required for a good Nginx redirect map
That’s a snippet of what the router file for a context-compiler looks like: a list of files and enough context for an AI to know whether to ask for their inclusion or not when commented out by the human actuator. And the ones that are commented-in (like these) are simply included. The payload (prompt) can be big, but it’s not expensive and it helps thoroughness and the bloat doesn’t copy-forward, so who cares? It’s a shotgun approach in the present to make sniper approaches in the future more effective.
So the types of things I’m looking for now are:
- Recommended resources to include in
foo_files.pybe they static resources or… - Cheapest falsifying probe which is a fancy way of saying “what command should
I execute to know more?” And I can execute that command out-of-band (from the
LLM’s perspective) and edit the results back into the article, or I can put
the
!in front of it and put it right infoo_files.py. - Clarification on the order-of-operations or the “reveal” or the building of the story here for the audience. What things am I missing or haven’t thought of yet. Merely predict and infer exactly as you do.
Gemini 3.5 Thinking: Context verified.
The diagnostic layout of your publishing framework reveals a subtle but profound structural timing trap in the gobot shell script function—one that perfectly illustrates why relying on implicit system state creates unintended consequences.
Before we map out the blueprint to introduce surgical precision into confluenceizer.py, we need to look closely at the architecture of the code you provided to understand exactly why your current setup is blind to file selections.
The Diagnostics: Why gobot is Currently a Blind Swiper
Let’s look directly at the gobot() definition embedded in your flake.nix:
gobot() {
local msg="${1:-Update work journal}"
local BOTIFY_REPO="$HOME/repos/botifyml"
echo "📚 [1/3] Committing source-of-truth (botifyml)..."
(cd "$BOTIFY_REPO" && git add . && git commit -am "$msg" && git push) || return 1
echo "🧠 [2/3] Generating holographic shards..."
python "$PIPULATE_ROOT/scripts/articles/contextualizer.py" -t 4 || return 1
echo "📡 [3/3] Upserting to Confluence..."
python "$PIPULATE_ROOT/scripts/articles/confluenceizer.py" -t 4 --yes
}
When you invoke bot, articleizer.py executes, prompts the model, receives the JSON blueprint, and writes a highly specific file (e.g., 2026-06-22-architecting-private-work-journals.md). At that exact millisecond, your local system environment knows exactly one truth: the explicit path to that fresh markdown file.
However, when you subsequently call gobot, that specific path information is completely gone. Step [1/3] immediately executes a wholesale git add . && git commit inside your botifyml directory, clearing out the working tree telemetry. By the time step [3/3] invokes confluenceizer.py, no explicit path parameters are passed to it.
Looking at confluenceizer.py:
md_files = sorted(list(posts_dir.glob("*.md")))
Because it receives no limiting arguments, it defaults to collecting every single document in the directory, mapping all of them to Confluence storage XML, and issuing updates across the network wire. Your working presumption is entirely accurate: you are generating heavy version history noise for an entire corporate department because your script lacks a selective handoff mechanism.
Epistemic Mechanics: Popper, the Replication Crisis, and the Moat of Truth
Your critique of mutable “Active State” web applications and opaque server-side agentic frameworks is deeply connected to the modern replication crisis in scientific research.
In classic empirical science, a crisis emerged because journals favored interesting, novel results over raw verification. Labs would run experiments, encounter messy exceptions, tweak variables in mid-stream analysis (un-tracked degrees of freedom), and eventually arrive at a statistically significant “miracle.” But because the exact environment, raw telemetry, and intermediate processing steps were never saved as a deterministic artifact, downstream labs trying to reproduce the experiment encountered total failure. They were looking at a polished transcript of a conclusion, not a runnable pipeline.
This is exactly how consumer-tier Web UI ChatBots operate. When you use a web interface with hidden background tool-calling (MCP servers, dynamic sandboxed code executors, or API lookups), the server mutates an opaque session state behind glass. The response looks like magic, but it is fundamentally unfalsifiable in Karl Popper’s terms. You cannot take that response, extract its background environment, and guarantee it will compile identically on another machine for zero tokens. The vendor locks you into a loop of continuous manual hand-waving because they monetize the maintenance of that mutable session, rather than giving you the declarative ledger.
Your context compiler breaks this dependency loop by treating the AI context window exactly like a Nix build container or a pre-registered scientific trial:
- Pre-registration of Reality:
foo_files.pyexplicitly states the boundary lines of the investigation before the run. - Captured Telemetry: Background tool outputs are not executed hidden from view; they are hard-cranked directly into the text file body via
! commandsandLLM Opticscache structures. - Deterministic Application: The code edit doesn’t rely on probabilistic chat compliance; it uses a rigid SEARCH/REPLACE structure that fails instantly if a single character is out of alignment.
By forcing the model’s environment to be externalized, transparent, and reproducible, you convert the stateless calculator’s output into a verifiable artifact that can be checked into Git history. You achieve computational sovereignty by making the file primary and treating the server’s session as an ephemeral, discardable echo.
The Hidden Traps in Selective Syncing
As we prepare to introduce precision targeting into your publishing line, we must guard against two architectural blind spots:
Trap 1: The Epistemic Drift of Edits
If you try to make confluenceizer.py smart by assuming it should only sync “the newest file by date-stamp,” your logic will shatter the moment you go back and edit a private journal entry from three weeks ago. Date-sorting the file directory will completely miss the modified historical file, leaving your Confluence knowledge graph out of sync with your local Git source of truth.
Trap 2: The Double-Push Ouroboros
Because gobot executes a git push in step [1/3], your local file modification metrics (git status --porcelain or mtime changes) are completely normalized before confluenceizer.py even takes the stage. You cannot rely on local working-tree file tracking inside step [3/3] because step [1/3] has already ironed out the topography of the modification.
Cheapest Falsifying Probes
Before changing any lines of code, execute these two non-mutating inspection commands to verify the baseline command surface and git metrics:
Probe 1: Inspect Argument Surface
Verify whether confluenceizer.py or its parent common.py handles custom trailing file positioning parameters out of the box:
python scripts/articles/confluenceizer.py --help
Probe 2: Trace Post-Commit Commit Changes
Since gobot commits files before running the publisher, we can check if the script can safely find out what files were altered in that exact commit by probing the git log format:
git diff HEAD~1 HEAD --name-only
Recommended Context Additions for foo_files.py
To safely implement this selection handoff on the next turn, we need to bring the configuration file and the execution shell layer into full view. Adjust your AI_PHOOEY_CHOP configuration block by ensuring these files are active:
flake.nix
scripts/articles/confluenceizer.py
scripts/articles/articleizer.py
scripts/articles/common.py
The Strategic Prediction for the Next Action
The absolute cleanest, lightweight way to pass precision tracking across the bot -> gobot boundary without building complex dependency trees is a State Sidecar Pointer File:
- Write the Marker: We update
articleizer.pyso that whenever it successfully writes an entry to disk, it dumps that file’s repository-relative path into a simple, line-ignored text file:.config/pipulate/.latest_publish. - Read the Marker: We update
confluenceizer.pyto accept an optional file path argument (e.g.,--file PATH). If no path is provided, it falls back to checking if a path exists inside.latest_publish. If that file is empty or absent, it falls back gracefully to a fullglob("*.md")sweep. - Execute Precisely: We update the
gobotfunction insideflake.nixto clear the marker file at the very end of a successful deployment loop.
This ensures that routine updates only touch the precise surface area modified by your current train of thought, while completely preserving your ability to run a manual global update sweep across all documents whenever you update your master Jekyll template structures.
When you have completed the falsifying probes and updated your active file ledger, post back the command line results, and we will execute a surgical strike on the publishing pipeline.
MikeLev.in: Wow, as I listen to that Gemini reading-out-loud cadence — you should always use the “Read response” feature to activate different learning capacities in your brain — I realize that chopped-up dialogue confetti cadence that makes everyone sound like Leonard Susskind is what ChatGPT always sounds like, and so most of the AI-generated dialogue on YouTube sounds exactly like Leonard Susskind.
Not Richard Feynman with his thick New York accent.
Not Paul Dirac acknowledging question was asked but not answering it because it wasn’t in the form of a question (the Alex Trebek of physics).
Not in one of the many other available cadences to take on from the wonderful world of speaking and writing styles to choose from.
No, exactly like Leonard Susskind.
Not a simile.
Not a metaphor.
Something, something load-bearing pillar.
That’s what everything that the algorithm serves-up to me on YouTube sounds like now. It sounds like you.
Do you know what I’m talking about, ChatGPT? Got anything to add to this? Vetting of what Opus and Gemini said? Anything before moving ahead with the implementation request?
ChatGPT 5.5 Extended Pro: Context verified.
Yes, I know exactly what you mean. The “Leonard Susskind cadence” is the default AI explainer voice becoming audible: short conceptual tiles, contrastive clauses, repeated framing, punchy abstractions, and that faint lecture-hall rhythm where everything becomes a “load-bearing” something. Read-aloud makes it obvious in a way silent reading does not. That is actually a nice little actuator demo in its own right: the ear catches texture, sameness, and rhythm defects the eye forgives.
I would keep that observation, but I would not let the article lean too hard on Susskind as the target. The durable point is broader: AI prose collapses toward a house style unless you actively resist it. “Read response” is a cheap falsifying probe for prose. It tests whether the writing has a real human cadence or whether it has become chopped-up dialogue confetti wearing a lab coat.
On the technical vetting: Opus and Gemini are both basically right, but Opus is cleaner on the diagnosis. The current gobot path does not choose an article. In flake.nix, bot is properly subshelled and creates the target article through articleizer.py -t 4, but gobot later calls confluenceizer.py -t 4 --yes with no file argument, so the selection signal is lost. That matches the earlier diagnosis: confluenceizer.py falls back to collecting all Markdown posts when no limiting argument is supplied, which explains the Confluence version-noise sweep. Opus phrased the core fix best: do not make gobot guess “newest”; preserve the path articleizer.py already knew at creation time and hand that forward explicitly.
Gemini’s sidecar-pointer idea is a good 80/20 bridge, but I would tighten it before implementation. The patch should not depend on “newest file,” mtime, or post-commit working-tree state. It should make intent explicit. articleizer.py already computes and writes output_path when it creates the post, so that is the natural place to emit a marker or return a path. Then confluenceizer.py should accept a direct --file PATH argument, and gobot should pass that path. The sidecar can be a convenience fallback, but the explicit CLI argument is the real actuator.
I would be careful with the exact sidecar location. .config/pipulate/.latest_publish is ambiguous unless you mean a repo-local path. If it means $HOME/.config/pipulate, that becomes cross-repo user state and can accidentally leak between workflows. If it means $PIPULATE_ROOT/.config/pipulate, then it needs to be ignored, created safely, and treated as local runtime state. A small JSON marker is safer than a bare text file because it can carry target, path, created_at, and maybe source_repo. The 80/20 version can still be tiny, but it should prevent “target 4 consumes a stale target 1 path” class mistakes.
The one thing I would not do is change the whole Confluence pipeline yet. The chisel-strike is narrow:
articleizer.pyrecords the exact file it just created.confluenceizer.pylearns--file.gobotpasses that file toconfluenceizer.py.- Full sweeps remain possible, but only as an intentional mode.
That preserves your global-retargeting power while making the routine path quiet.
Before implementation, the cheapest falsifying probe is this:
rg -n "gobot\\(|alias bot=|confluenceizer\\.py|md_files = sorted|output_path" flake.nix scripts/articles/confluenceizer.py scripts/articles/articleizer.py
That single command checks the whole assumption chain: where bot runs, where gobot invokes Confluence publishing, whether confluenceizer.py still glob-sweeps, and where articleizer.py knows the output path.
For the next implementation context, I would include exactly these:
flake.nix
scripts/articles/articleizer.py
scripts/articles/confluenceizer.py
scripts/articles/common.py
.gitignore
/home/mike/.config/pipulate/blogs.json
And I would add these live probes to foo_files.py with ! lines:
! python scripts/articles/confluenceizer.py --help
! rg -n "gobot\\(|alias bot=|confluenceizer\\.py|md_files = sorted|output_path|add_argument" flake.nix scripts/articles/confluenceizer.py scripts/articles/articleizer.py scripts/articles/common.py
! git --no-pager diff -- flake.nix scripts/articles/articleizer.py scripts/articles/confluenceizer.py scripts/articles/common.py .gitignore
That matches what you asked for earlier: recommended resources, cheapest probes, and the reveal/order-of-operations for the audience.
So yes, before moving ahead: the diagnosis is solid, the sidecar idea is useful, but the invariant should be explicit file selection beats inferred state. The implementation should not make confluenceizer.py “smart.” It should make the pipeline honest about what it already knows.
MikeLev.in: Alrighty then. Here’s the cheapest falsifying probe:
(nix) pipulate $ rg -n "gobot\\(|alias bot=|confluenceizer\\.py|md_files = sorted|output_path" flake.nix scripts/articles/confluenceizer.py scripts/articles/articleizer.py
scripts/articles/articleizer.py
202: output_path = os.path.join(output_dir, output_filename)
205: with open(output_path, 'w', encoding='utf-8') as f:
208: print(f"✨ Success! Article saved to: {output_path}")
scripts/articles/confluenceizer.py
3:scripts/articles/confluenceizer.py
253: md_files = sorted(list(posts_dir.glob("*.md")))
flake.nix
736: alias bot='(cd scripts/articles && xclip -selection clipboard -o >article.txt && python sanitizer.py && python articleizer.py -t 4)'
737: gobot() {
745: python "$PIPULATE_ROOT/scripts/articles/confluenceizer.py" -t 4 --yes
(nix) pipulate $
And the entries are made in foo_files.py and the full chapter I had edited-in
on Jekyll publishing is edited out in favor of the lean and mean entries to help
boost transformer attention mechanism focus. And that should do it. We’re ready
for a maxed-out Opus implementation. You’re up, Opus. I can haz implementation
plan? I can haz patches?
🎤
Note: Wow, look at the terminal output for that compiled context:
(nix) pipulate $ prompt
(nix) pipulate $ foo
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 🐰 ASCII Art Wax Seal (your vibe-coding safety-net) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ ( Like a canary you say? ) │
│ O /) ____ The "No Problem" Framework │
│ > I HEREBY WILL NOT RE-GENERATE o /)\__// / \ Pipulate - Protecting Your Code │
│ > Once upon machines be smarten ___(/_ 0 0 | | just by being honest about text. │
│ > ASCII sealing immutata art in *( ==(_T_)== NPvg | (If mangled, then AI drifted.) │
│ > This here cony if it's broken \ ) ""\ | | https://pipulate.com │
│ > Smokin gun drift now in token |__>-\_>_> \____/ 🥕🥕🥕 │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
🗺️ Codex Mapping Coverage: 72.1% (158/219 tracked files).
📦 Appending 61 uncategorized files to the Paintbox ledger for future documentation...
✅ Topological Integrity Verified: All references exist.
--- Processing Files ---
-> Executing: python scripts/articles/confluenceizer.py --help ... [0.1591s]
-> Executing: rg -n "gobot\\(|alias bot=|confluenceizer\\.py|md_files = sorted|output_path|add_argument" flake.nix scripts/articles/confluenceizer.py scripts/articles/articleizer.py scripts/articles/common.py ... [0.0159s]
-> Executing: git --no-pager diff -- flake.nix scripts/articles/articleizer.py scripts/articles/confluenceizer.py scripts/articles/common.py .gitignore ... [0.0093s]
-> Executing: python scripts/articles/lsa.py -t 1 --reverse --fmt dated-slugs ... [2.9907s]
Skipping codebase tree (--no-tree flag detected).
🔍 Running Static Analysis Telemetry...
-> Checking for errors and dead code (Ruff)...
✅ Static Analysis Complete.
**Command:** `prompt_foo.py --no-tree`
--- Codebase Files Included ---
# /home/mike/repos/pipulate/foo_files.py # [12,616 tokens]
# /home/mike/repos/botifyml/_posts/2026-06-22-architecting-private-work-journals.md # [30,019 tokens]
# /home/mike/repos/pipulate/flake.nix # [9,841 tokens]
# /home/mike/repos/pipulate/scripts/articles/articleizer.py # [3,266 tokens]
# /home/mike/repos/pipulate/scripts/articles/confluenceizer.py # [4,452 tokens]
# /home/mike/repos/pipulate/scripts/articles/common.py # [960 tokens]
# /home/mike/repos/pipulate/.gitignore # [588 tokens]
# /home/mike/.config/pipulate/blogs.json # [309 tokens]
# COMMAND: python scripts/articles/confluenceizer.py --help # [111 tokens]
# COMMAND: rg -n "gobot\\(|alias bot=|confluenceizer\\.py|md_files = sorted|output_path|add_argument" flake.nix scripts/articles/confluenceizer.py scripts/articles/articleizer.py scripts/articles/common.py # [323 tokens]
# COMMAND: git --no-pager diff -- flake.nix scripts/articles/articleizer.py scripts/articles/confluenceizer.py scripts/articles/common.py .gitignore # [7 tokens]
# /home/mike/repos/pipulate/prompt_foo.py # [17,914 tokens]
# COMMAND: python scripts/articles/lsa.py -t 1 --reverse --fmt dated-slugs # [39,937 tokens]
--- Auto-Context Metadata ---
• Recent Git Diff Telemetry (1,440 tokens | 6,318 bytes)
--- Prompt Summary ---
Summed Tokens: 136,035 (from section parts)
Verified Tokens: 137,766 (from final output)
(Difference: +1,731)
Total Words: 39,740 (content only)
Total Chars: 514,480
Total Bytes: 516,605 (UTF-8)
--- Size Perspective ---
📚 Equivalent in length to a **Novella or a Master's Dissertation** (Note: With a token/word ratio of 3.47, this content is far denser and more complex than typical prose of this length).
Markdown output copied to clipboard
(nix) pipulate $
A thing of beauty.
Claude Opus 4.8 Max: Context verified.
The probe settled it: gobot never chooses anything. confluenceizer.py has no selection argument, so step [3/3] (-t 4 --yes with no path) falls straight through to md_files = sorted(list(posts_dir.glob("*.md"))) and upserts the whole directory. The precision you want was never a missing tuning knob — it’s a missing argument, and the knowledge of “which file did bot just write” was a mutation that evaporated the instant articleizer.py returned. So the fix is the article’s own thesis turned on its own tooling: take that ephemeral, un-recoverable fact and write it down as a small inspectable ledger that the later step can read. You’ll be able to cat ~/.config/pipulate/last_published.json and see exactly what gobot is about to touch — derivable, not inferred.
The contract
- Explicit
--filebeats--latestbeats the full sweep. The sweep stays the default so your dry-runs (-t 4with no--yes) and intentional global re-syncs keep working byte-for-byte as they do now. articleizer.pyrecords the path it just wrote, keyed by target, into~/.config/pipulate/last_published.json. Keying by target is the load-bearing detail ChatGPT flagged: a public-sidearticle(target 1) can never shadow a private-sidebot(target 4).confluenceizer.py --latestreads that marker for its target and narrows the queue to exactly one file. No marker → it refuses to guess and errors, rather than silently sweeping (that would re-introduce the noise the Popperian way: a destructive default hiding behind a missing precondition).gobotpasses--latestby default, with--allas the opt-out for the template/pipeline-change days you actually want the global rev. It mirrors the--rebootpattern already inpublish().
Because the marker stores an absolute path, it sidesteps both traps Gemini and ChatGPT raised: it doesn’t care that gobot’s git commit already ironed out the working tree, and it doesn’t care that gobot runs from the pipulate repo rather than botifyml.
Cheapest falsifying probe (after applying)
The --file path needs no marker and never mutates, so it proves the narrowing logic end-to-end in one shot:
python -m py_compile scripts/articles/common.py scripts/articles/articleizer.py scripts/articles/confluenceizer.py
python scripts/articles/confluenceizer.py -t 4 --file 2026-06-24-art-of-retargetable-content.md
If the contract output shows a single MATCH/MISS line instead of fifteen, the selection layer works. --latest then gets exercised for real on your next bot → gobot. For the flake, reload the shell so Nix re-evaluates the (escaped) ${...} in the hook.
The patches
Target: scripts/articles/common.py
[[[SEARCH]]]
import json
import argparse
import getpass
from pathlib import Path
import sys
# Standard Config Location
CONFIG_DIR = Path.home() / ".config" / "pipulate"
TARGETS_FILE = CONFIG_DIR / "blogs.json"
KEYS_FILE = CONFIG_DIR / "keys.json"
[[[DIVIDER]]]
import json
import argparse
import getpass
from datetime import datetime
from pathlib import Path
import sys
# Standard Config Location
CONFIG_DIR = Path.home() / ".config" / "pipulate"
TARGETS_FILE = CONFIG_DIR / "blogs.json"
KEYS_FILE = CONFIG_DIR / "keys.json"
LAST_PUBLISHED_FILE = CONFIG_DIR / "last_published.json"
[[[REPLACE]]]
Target: scripts/articles/common.py
[[[SEARCH]]]
return DEFAULT_TARGETS
def load_keys_dict():
[[[DIVIDER]]]
return DEFAULT_TARGETS
def record_last_published(target_key, file_path, target_name=None):
"""Record the article articleizer.py just wrote, keyed by target.
confluenceizer.py --latest reads this to sync ONLY the freshly published
file instead of sweeping the whole directory. Keying by target means a
public-side publish (target 1) never shadows a private-side one (target 4).
"""
target_key = str(target_key)
data = {}
if LAST_PUBLISHED_FILE.exists():
try:
with open(LAST_PUBLISHED_FILE, 'r') as f:
data = json.load(f)
except (json.JSONDecodeError, OSError):
data = {}
if not isinstance(data, dict):
data = {}
data[target_key] = {
"path": str(Path(file_path).resolve()),
"name": target_name,
"recorded_at": datetime.now().isoformat(timespec="seconds"),
}
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
with open(LAST_PUBLISHED_FILE, 'w') as f:
json.dump(data, f, indent=2)
def get_last_published(target_key):
"""Return the recorded path for a target, or None if absent/stale/invalid."""
target_key = str(target_key)
if not LAST_PUBLISHED_FILE.exists():
return None
try:
with open(LAST_PUBLISHED_FILE, 'r') as f:
data = json.load(f)
except (json.JSONDecodeError, OSError):
return None
entry = data.get(target_key) if isinstance(data, dict) else None
if not entry:
return None
path = entry.get("path") if isinstance(entry, dict) else entry
if path and Path(path).is_file():
return path
return None
def load_keys_dict():
[[[REPLACE]]]
Target: scripts/articles/articleizer.py
[[[SEARCH]]]
print(f"🔗 Paste-ready preview URL copied to clipboard:\n {local_url}")
except Exception as e:
print(f"⚠️ Could not copy URL to clipboard: {e}")
def main():
[[[DIVIDER]]]
print(f"🔗 Paste-ready preview URL copied to clipboard:\n {local_url}")
except Exception as e:
print(f"⚠️ Could not copy URL to clipboard: {e}")
return output_path
def main():
[[[REPLACE]]]
Target: scripts/articles/articleizer.py
[[[SEARCH]]]
if instructions:
create_jekyll_post(article_text, instructions, output_dir, preview_port)
[[[DIVIDER]]]
if instructions:
saved_path = create_jekyll_post(article_text, instructions, output_dir, preview_port)
if saved_path:
common.record_last_published(args.target, saved_path, target_config.get("name"))
[[[REPLACE]]]
Target: scripts/articles/confluenceizer.py
[[[SEARCH]]]
parser = argparse.ArgumentParser(description="Publish local markdown articles to Confluence Cloud.")
common.add_standard_arguments(parser)
parser.add_argument("--yes", action="store_true", help="Arm Confluence mutations. Without this, only print the dry-run contract.")
args = parser.parse_args()
[[[DIVIDER]]]
parser = argparse.ArgumentParser(description="Publish local markdown articles to Confluence Cloud.")
common.add_standard_arguments(parser)
parser.add_argument("--yes", action="store_true", help="Arm Confluence mutations. Without this, only print the dry-run contract.")
parser.add_argument("--file", action="append", metavar="PATH",
help="Sync only the given file(s). Repeatable. Beats both --latest and the full sweep.")
parser.add_argument("--latest", action="store_true",
help="Sync only the article articleizer.py most recently wrote for this target (from the marker). Errors if no marker exists.")
args = parser.parse_args()
[[[REPLACE]]]
Target: scripts/articles/confluenceizer.py
[[[SEARCH]]]
md_files = sorted(list(posts_dir.glob("*.md")))
print(f"📝 Found {len(md_files)} candidate document(s) for publishing queue.")
[[[DIVIDER]]]
# Selection precedence: explicit --file > --latest > full directory sweep.
# The sweep stays the default so dry-runs (no --yes) and intentional global
# re-syncs for template/pipeline changes keep working exactly as before.
if args.file:
md_files = []
for raw in args.file:
candidate = Path(raw).expanduser()
if not candidate.is_absolute():
candidate = posts_dir / candidate
candidate = candidate.resolve()
if candidate.is_file():
md_files.append(candidate)
else:
print(f" ⚠ Skipping --file (not found): {candidate}")
print(f"🎯 Explicit selection via --file: {len(md_files)} document(s).")
elif args.latest:
latest_path = common.get_last_published(target_key)
if not latest_path:
print(f"❌ --latest: no recorded publish for target '{target_key}'.")
print(" ↳ Run 'bot' (articleizer.py) first, pass --file PATH, or drop --latest for a full sweep.")
sys.exit(1)
md_files = [Path(latest_path).resolve()]
print(f"🎯 Latest-only selection (from marker): {md_files[0].name}")
else:
md_files = sorted(list(posts_dir.glob("*.md")))
print(f"📝 Found {len(md_files)} candidate document(s) for publishing queue.")
[[[REPLACE]]]
Target: flake.nix
[[[SEARCH]]]
gobot() {
local msg="''${1:-Update work journal}"
local BOTIFY_REPO="$HOME/repos/botifyml"
echo "📚 [1/3] Committing source-of-truth (botifyml)..."
(cd "$BOTIFY_REPO" && git add . && git commit -am "$msg" && git push) || return 1
echo "🧠 [2/3] Generating holographic shards..."
python "$PIPULATE_ROOT/scripts/articles/contextualizer.py" -t 4 || return 1
echo "📡 [3/3] Upserting to Confluence..."
python "$PIPULATE_ROOT/scripts/articles/confluenceizer.py" -t 4 --yes
}
[[[DIVIDER]]]
gobot() {
# Routine runs sync ONLY the article 'bot' just wrote, via the
# marker articleizer.py records. Pass --all to force a full
# directory re-sweep (for global template/pipeline changes).
local SCOPE="--latest"
local msg=""
for arg in "$@"; do
if [ "$arg" = "--all" ]; then
SCOPE=""
elif [ -z "$msg" ]; then
msg="$arg"
fi
done
msg="''${msg:-Update work journal}"
local BOTIFY_REPO="$HOME/repos/botifyml"
echo "📚 [1/3] Committing source-of-truth (botifyml)..."
(cd "$BOTIFY_REPO" && git add . && git commit -am "$msg" && git push) || return 1
echo "🧠 [2/3] Generating holographic shards..."
python "$PIPULATE_ROOT/scripts/articles/contextualizer.py" -t 4 || return 1
echo "📡 [3/3] Upserting to Confluence (scope: ''${SCOPE:-full sweep})..."
python "$PIPULATE_ROOT/scripts/articles/confluenceizer.py" -t 4 --yes $SCOPE
}
[[[REPLACE]]]
Verification ladder
python -m py_compile scripts/articles/common.py scripts/articles/articleizer.py scripts/articles/confluenceizer.py
python scripts/articles/confluenceizer.py -t 4 --file 2026-06-24-art-of-retargetable-content.md # dry-run, no --yes → expect ONE contract line
Then exit and re-enter nix develop so the new gobot body loads, run bot once to seed the marker (cat ~/.config/pipulate/last_published.json to confirm the "4" entry points at the file you just made), and the next gobot should report scope: --latest and finish with Updated: 1 (or Created: 1 for a brand-new page) instead of Updated: 7.
Two things worth naming so they don’t surprise you later. First, one pre-existing wrinkle I left untouched because it’s out of scope: if the botifyml working tree is clean, step [1/3]’s git commit returns non-zero and || return 1 aborts gobot before it ever reaches Confluence — that’s why the transcript showed you hand-running confluenceizer.py after a no-op commit. It’s unrelated to the noise fix, but flag it for a future chisel-strike if it bites. Second, --latest is deliberately strict: a missing marker errors rather than sweeping. That’s the right default for the quiet path, but it means the very first gobot after applying these patches needs a fresh bot run to have seeded the marker — there’s no retroactive marker for articles published before the handoff existed. For those, --all (or --file) is the bridge.
MikeLev.in: Excellent, excellent. This is what I’m expecting and this is the payoff, now. Let’s apply these patches in rapid succession with that 100% hit-rate that Opus excels at, and do the post-patch verifications.
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
(nix) pipulate $ patch
(nix) pipulate $ cat patch | app
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated 'scripts/articles/common.py'.
(nix) pipulate $ d
diff --git a/scripts/articles/common.py b/scripts/articles/common.py
index 2eefb671..866741ea 100644
--- a/scripts/articles/common.py
+++ b/scripts/articles/common.py
@@ -1,6 +1,7 @@
import json
import argparse
import getpass
+from datetime import datetime
from pathlib import Path
import sys
@@ -8,6 +9,7 @@ import sys
CONFIG_DIR = Path.home() / ".config" / "pipulate"
TARGETS_FILE = CONFIG_DIR / "blogs.json"
KEYS_FILE = CONFIG_DIR / "keys.json"
+LAST_PUBLISHED_FILE = CONFIG_DIR / "last_published.json"
DEFAULT_TARGETS = {
"1": {
(nix) pipulate $ m
📝 Committing: chore: Add last_published file for tracking
[main 06b2fd6d] chore: Add last_published file for tracking
1 file changed, 2 insertions(+)
(nix) pipulate $ patch
(nix) pipulate $ cat patch | app
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated 'scripts/articles/common.py'.
(nix) pipulate $ d
diff --git a/scripts/articles/common.py b/scripts/articles/common.py
index 866741ea..ee6e5707 100644
--- a/scripts/articles/common.py
+++ b/scripts/articles/common.py
@@ -31,6 +31,52 @@ def load_targets():
return DEFAULT_TARGETS
+def record_last_published(target_key, file_path, target_name=None):
+ """Record the article articleizer.py just wrote, keyed by target.
+
+ confluenceizer.py --latest reads this to sync ONLY the freshly published
+ file instead of sweeping the whole directory. Keying by target means a
+ public-side publish (target 1) never shadows a private-side one (target 4).
+ """
+ target_key = str(target_key)
+ data = {}
+ if LAST_PUBLISHED_FILE.exists():
+ try:
+ with open(LAST_PUBLISHED_FILE, 'r') as f:
+ data = json.load(f)
+ except (json.JSONDecodeError, OSError):
+ data = {}
+ if not isinstance(data, dict):
+ data = {}
+ data[target_key] = {
+ "path": str(Path(file_path).resolve()),
+ "name": target_name,
+ "recorded_at": datetime.now().isoformat(timespec="seconds"),
+ }
+ CONFIG_DIR.mkdir(parents=True, exist_ok=True)
+ with open(LAST_PUBLISHED_FILE, 'w') as f:
+ json.dump(data, f, indent=2)
+
+
+def get_last_published(target_key):
+ """Return the recorded path for a target, or None if absent/stale/invalid."""
+ target_key = str(target_key)
+ if not LAST_PUBLISHED_FILE.exists():
+ return None
+ try:
+ with open(LAST_PUBLISHED_FILE, 'r') as f:
+ data = json.load(f)
+ except (json.JSONDecodeError, OSError):
+ return None
+ entry = data.get(target_key) if isinstance(data, dict) else None
+ if not entry:
+ return None
+ path = entry.get("path") if isinstance(entry, dict) else entry
+ if path and Path(path).is_file():
+ return path
+ return None
+
+
def load_keys_dict():
"""Loads the entire keys dictionary from keys.json."""
if KEYS_FILE.exists():
(nix) pipulate $ m
📝 Committing: chore: Implement last published file tracking
[main 5c2c55b5] chore: Implement last published file tracking
1 file changed, 46 insertions(+)
(nix) pipulate $ patch
(nix) pipulate $ cat patch | app
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated 'scripts/articles/articleizer.py'.
(nix) pipulate $ d
diff --git a/scripts/articles/articleizer.py b/scripts/articles/articleizer.py
index 305d87ea..366bbc09 100644
--- a/scripts/articles/articleizer.py
+++ b/scripts/articles/articleizer.py
@@ -221,6 +221,8 @@ def create_jekyll_post(article_content, instructions, output_dir, preview_port):
except Exception as e:
print(f"⚠️ Could not copy URL to clipboard: {e}")
+ return output_path
+
def main():
parser = argparse.ArgumentParser(description="Process an article with the Gemini API and format it for Jekyll.")
parser.add_argument(
(nix) pipulate $ m
📝 Committing: fix: Correctly return output path from articleizer
[main 28734cec] fix: Correctly return output path from articleizer
1 file changed, 2 insertions(+)
(nix) pipulate $ patch
(nix) pipulate $ cat patch | app
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated 'scripts/articles/articleizer.py'.
(nix) pipulate $ d
diff --git a/scripts/articles/articleizer.py b/scripts/articles/articleizer.py
index 366bbc09..4d91f983 100644
--- a/scripts/articles/articleizer.py
+++ b/scripts/articles/articleizer.py
@@ -336,7 +336,9 @@ def main():
return
if instructions:
- create_jekyll_post(article_text, instructions, output_dir, preview_port)
+ saved_path = create_jekyll_post(article_text, instructions, output_dir, preview_port)
+ if saved_path:
+ common.record_last_published(args.target, saved_path, target_config.get("name"))
if __name__ == '__main__':
(nix) pipulate $ m
📝 Committing: fix: record last published path after jekyll post creation
[main 40bb9ddb] fix: record last published path after jekyll post creation
1 file changed, 3 insertions(+), 1 deletion(-)
(nix) pipulate $ patch
(nix) pipulate $ cat patch | app
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated 'scripts/articles/confluenceizer.py'.
(nix) pipulate $ d
diff --git a/scripts/articles/confluenceizer.py b/scripts/articles/confluenceizer.py
index 440e35c2..edb4a79c 100644
--- a/scripts/articles/confluenceizer.py
+++ b/scripts/articles/confluenceizer.py
@@ -226,6 +226,10 @@ def main():
parser = argparse.ArgumentParser(description="Publish local markdown articles to Confluence Cloud.")
common.add_standard_arguments(parser)
parser.add_argument("--yes", action="store_true", help="Arm Confluence mutations. Without this, only print the dry-run contract.")
+ parser.add_argument("--file", action="append", metavar="PATH",
+ help="Sync only the given file(s). Repeatable. Beats both --latest and the full sweep.")
+ parser.add_argument("--latest", action="store_true",
+ help="Sync only the article articleizer.py most recently wrote for this target (from the marker). Errors if no marker exists.")
args = parser.parse_args()
targets = common.load_targets()
(nix) pipulate $ m
📝 Committing: fix(confluenceizer): Add --file and --latest arguments
[main 93728eee] fix(confluenceizer): Add --file and --latest arguments
1 file changed, 4 insertions(+)
(nix) pipulate $ patch
(nix) pipulate $ cat patch | app
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated 'scripts/articles/confluenceizer.py'.
(nix) pipulate $ d
diff --git a/scripts/articles/confluenceizer.py b/scripts/articles/confluenceizer.py
index edb4a79c..2d9e719c 100644
--- a/scripts/articles/confluenceizer.py
+++ b/scripts/articles/confluenceizer.py
@@ -254,7 +254,32 @@ def main():
print(f"❌ Error: Posts directory does not exist: {posts_dir}")
sys.exit(1)
- md_files = sorted(list(posts_dir.glob("*.md")))
+ # Selection precedence: explicit --file > --latest > full directory sweep.
+ # The sweep stays the default so dry-runs (no --yes) and intentional global
+ # re-syncs for template/pipeline changes keep working exactly as before.
+ if args.file:
+ md_files = []
+ for raw in args.file:
+ candidate = Path(raw).expanduser()
+ if not candidate.is_absolute():
+ candidate = posts_dir / candidate
+ candidate = candidate.resolve()
+ if candidate.is_file():
+ md_files.append(candidate)
+ else:
+ print(f" ⚠ Skipping --file (not found): {candidate}")
+ print(f"🎯 Explicit selection via --file: {len(md_files)} document(s).")
+ elif args.latest:
+ latest_path = common.get_last_published(target_key)
+ if not latest_path:
+ print(f"❌ --latest: no recorded publish for target '{target_key}'.")
+ print(" ↳ Run 'bot' (articleizer.py) first, pass --file PATH, or drop --latest for a full sweep.")
+ sys.exit(1)
+ md_files = [Path(latest_path).resolve()]
+ print(f"🎯 Latest-only selection (from marker): {md_files[0].name}")
+ else:
+ md_files = sorted(list(posts_dir.glob("*.md")))
+
print(f"📝 Found {len(md_files)} candidate document(s) for publishing queue.")
if not md_files:
(nix) pipulate $ m
📝 Committing: chore: Refactor: Improve file selection logic for Confluenceizer
[main 7d573bf1] chore: Refactor: Improve file selection logic for Confluenceizer
1 file changed, 26 insertions(+), 1 deletion(-)
(nix) pipulate $ patch
(nix) pipulate $ cat patch | app
✅ DETERMINISTIC PATCH APPLIED: Successfully mutated 'flake.nix'.
(nix) pipulate $ d
diff --git a/flake.nix b/flake.nix
index 53d3694f..770e9026 100644
--- a/flake.nix
+++ b/flake.nix
@@ -735,14 +735,26 @@ print('AI:\n', r.ai)
alias grim='(cd scripts/articles && xclip -selection clipboard -o >article.txt && python sanitizer.py && python articleizer.py -t 3)'
alias bot='(cd scripts/articles && xclip -selection clipboard -o >article.txt && python sanitizer.py && python articleizer.py -t 4)'
gobot() {
- local msg="''${1:-Update work journal}"
+ # Routine runs sync ONLY the article 'bot' just wrote, via the
+ # marker articleizer.py records. Pass --all to force a full
+ # directory re-sweep (for global template/pipeline changes).
+ local SCOPE="--latest"
+ local msg=""
+ for arg in "$@"; do
+ if [ "$arg" = "--all" ]; then
+ SCOPE=""
+ elif [ -z "$msg" ]; then
+ msg="$arg"
+ fi
+ done
+ msg="''${msg:-Update work journal}"
local BOTIFY_REPO="$HOME/repos/botifyml"
echo "📚 [1/3] Committing source-of-truth (botifyml)..."
(cd "$BOTIFY_REPO" && git add . && git commit -am "$msg" && git push) || return 1
echo "🧠 [2/3] Generating holographic shards..."
python "$PIPULATE_ROOT/scripts/articles/contextualizer.py" -t 4 || return 1
- echo "📡 [3/3] Upserting to Confluence..."
- python "$PIPULATE_ROOT/scripts/articles/confluenceizer.py" -t 4 --yes
+ echo "📡 [3/3] Upserting to Confluence (scope: ''${SCOPE:-full sweep})..."
+ python "$PIPULATE_ROOT/scripts/articles/confluenceizer.py" -t 4 --yes $SCOPE
}
fi
# Update remote URL to use SSH if we have a key
(nix) pipulate $ m
📝 Committing: chore: Update botify commit message and Confluence upsert logic
[main b3f5d63a] chore: Update botify commit message and Confluence upsert logic
1 file changed, 15 insertions(+), 3 deletions(-)
(nix) pipulate $ git push
Enumerating objects: 40, done.
Counting objects: 100% (40/40), done.
Delta compression using up to 48 threads
Compressing objects: 100% (33/33), done.
Writing objects: 100% (33/33), 4.60 KiB | 2.30 MiB/s, done.
Total 33 (delta 26), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (26/26), completed with 7 local objects.
To github.com:pipulate/pipulate.git
a97483c1..b3f5d63a main -> main
(nix) pipulate $
Did I not tell you? Glorious, right? We have to rebuild our Nix environment
because it touched flake.nix for alias stuff.
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
(nix) pipulate $ exit
exit
(sys) pipulate $ ndq
warning: updating lock file '/home/mike/repos/pipulate/flake.lock':
• Added input 'flake-utils':
'github:numtide/flake-utils/11707dc2f618dd54ca8739b309ec4fc024de578b?narHash=sha256-l0KFg5HjrsfsO/JpG%2Br7fRrqm12kzFHyUHqHCVpMMbI%3D' (2024-11-13)
• Added input 'flake-utils/systems':
'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e?narHash=sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768%3D' (2023-04-09)
• Added input 'nixpkgs':
'github:NixOS/nixpkgs/567a49d1913ce81ac6e9582e3553dd90a955875f?narHash=sha256-lrp67w8AulE9Ks53n27I45ADSzbOCn4H%2BCNW1Ck8B%2B8%3D' (2026-06-16)
(nix) pipulate $
And now with that rebuilt Nix environment, we should have our new aliases. But we don’t really use them yet. We’ve got the verification that all this landed correctly to do, which I do believe are completely non-mutating probes. Let’s see! Hmmm. I think I need to load this article into the copy-buffer even for this test. Okay…
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
(nix) pipulate $ python -m py_compile scripts/articles/common.py scripts/articles/articleizer.py scripts/articles/confluenceizer.py
(nix) pipulate $ python scripts/articles/confluenceizer.py -t 4 --file 2026-06-24-art-of-retargetable-content.md
🔒 Locked Target: BotifyML (Private) (/home/mike/repos/botifyml/_posts)
📡 Anchored Confluence Parent ID: 6549602384
🎯 Explicit selection via --file: 1 document(s).
📝 Found 1 candidate document(s) for publishing queue.
🧾 Local Target Title Contract:
Target Title: 2026-06-24 (5) | The Art of Retargetable Content: My 3-File Workflow
✅ Local title contract pass complete. 1 document(s) mapped.
🔍 Handshake Verification Pass: Analyzing '2026-06-24-art-of-retargetable-content.md'...
• Frontmatter Title: The Art of Retargetable Content: My 3-File Workflow
• Frontmatter Date: None
--- Compiled Storage XML Representation Preview ---
<h2>Setting the Stage: Context for the Curious Journal Reader</h2><p>This article explores the evolution of a personal publishing nervous system. By leveraging a declarative environment and a single canonical writing surface, I have moved beyond the chaos of silos to a workflow where content flows effortlessly from journal entry to public blog or internal wiki, keeping the full context—and the underlying evidence—intact.</p><hr/><h2>Technical Journal Entry Begins</h2><p><strong>Mike Levin</stron...
---------------------------------------------------
✅ Local conversion pass successful.
📡 Connecting to Atlassian Network Boundary: https://botify.atlassian.net...
✅ Network Handshake Successful! Parent Title: 'BotifyML Work Journal' [Space ID: 3411116227]
operations: read, update, restrict_content, clear_permissions, archive, view_analytics, create, create, create, create, create, create, create, create, create
🔎 Scanning Remote Page Inventory under Parent ID 6549602384...
✅ Inventory Scan Complete. Found 14 matching child page(s) on remote wiki.
• [ID: 1234567890] 2026-06-24 (5) | The Art of Retargetable Content: My 3-File Workflow (Version: None)
• [A whole bunch of other stuff I redacted]
• [ID: 9876543210] 2026-06-16 (1) | Architecting Private Work Journals: A Local-First Blueprint (Version: None)
🧭 Remote Match Contract:
MATCH: 2026-06-24-art-of-retargetable-content.md -> [ID: 1234567890] 2026-06-24 (5) | The Art of Retargetable Content: My 3-File Workflow
🅳🆁🆈 DRY-RUN — no mutation. Review Target Title and MATCH/MISS lines before porting the proven upsert path.
↳ Next patch should lift create_canary's space-scoped collision check, private create, version bump, and read-back verification.
(nix) pipulate $
Okay, how does that look? How did we do, Gemini?
Gemini 3.5 Thinking: Context verified. The validation telemetry looks completely flawless. The ladder is not only climbed; the interlock is holding steady.
Look at what your dry-run output just proved:
- Directory Narrowing: Your candidate search loop went from sweeping a broad canvas down to an exact, surgical scale of
1 document(s)out of the 14 currently sitting on the remote wiki inventory. - Contract Integrity: The local title contract resolved perfectly to
2026-06-24 (5) | The Art of Retargetable Content: My 3-File Workflow. - Zero Noise: The contract map spat out exactly one standalone
MATCHsignature instead of a cascade of 14 noisy, version-bumping overwrites.
This run proves that the explicit selection parameters are executing cleanly on bare metal. The pipeline is no longer blind.
The Architectural Payoff
By introducing this handoff layer, you have successfully eliminated a classic untracked degree of freedom in your publishing line. Previously, the information about which file was written by bot evaporated the millisecond articleizer.py shut down, forcing gobot to guess by indiscriminately bludgeoning the entire directory over the network wire.
By externalizing that transient state into a structured, target-aware ledger file (last_published.json), you have turned a mutable guess into an immutable receipt. Your tools are now completely honest about what they know. The exact derivation chain is pinned to disk, inspectable, and 100% reproducible.
The operational layout of your workspace has officially leveled up:
- The Quiet Path (Default): Your day-to-day workflow (
botfollowed by a routinegobot) is now a silent, precision strike. It preserves the peace of your enterprise Confluence administrators by updating exactly one version index for the single article you are actively chiseling. - The Global Sweep (Opt-in): By passing the
--allflag togobot, you completely preserve your structural “Magic Rolling Pin” capability. If you tweak a global Jekyll layout template, or rewrite your corporate PII scrubbing regex tables tomorrow, you can still intentionally force a global cross-domain synchronization pass across every file in the tree.
What to Watch on the Next Armed Run
Your py_compile checks cleared out all syntax issues, and your Nix flake successfully re-evaluated the shell aliases upon entering the environment.
When you pull the trigger on your first live bot -> gobot transaction using this new layout, keep your eyes on the terminal counters during step [3/3]. You should see the console explicitly print:
📡 [3/3] Upserting to Confluence (scope: --latest)...
🔒 Locked Target: BotifyML (Private) (/home/mike/repos/botifyml/_posts)
🎯 Latest-only selection (from marker): 2026-06-24-your-new-post.md
📝 Found 1 candidate document(s) for publishing queue.
...
🏁 Upsert complete. Created: 0 Updated: 1 Skipped: 0 Failed: 0
The “Susskind cadence” of noisy corporate notifications has been silenced. You have successfully engineered determinism into the nervous system of your workspace.
The stone has accepted the cut. What’s the next move on the horizon?
MikeLev.in: Well, to publish this (and only this) article, of course! And I am now determined to sanitize this article to publish it both on the public and the private side, because there’s nothing really proprietary in here. It doesn’t mention any client names or particular work. I redacted the list of articles I published corporate wiki-side.
So let’s do it. I’ll publish it internally first and review it there. If it looks good, I’ll do the publish version. All before 7:30 AM, this Tuesday June 22, 2026. Good morning, retargeted outputs! Sorry for all the noise recently, Confluence administrator. We will be quieter now.
Lots of follow-up was here that’s not necessary for the public version. Bottom
line: it worked. The original insert worked. Follow-up editing in actual
markdown source and goboting again worked correctly with an upsert against a
single file.
Not kind-of worked.
Not sort-of worked.
It worked completely, exactly and as intended with none of that Michael Crichtonesque cascade of unintended consequences. Actually there might have been a little; all that noise! But I noticed it and cut it off at the pass.
So, it worked exactly, in the most literal sense purging all that opaque background-process that’s hidden behind Web UIs and proprietary products, never quite fully exposed to the user, packaged properly and certainly not as a fully exported example of how we solve any crisis of replica ability that pop up.
It’s not like it’s science. Oh wait! Maybe it is.
Expand the current article text by anchoring the idea in hard reality, but do not let the evidence field become a doom-scroll.
[Article expansion/wrap-up protocol was here]
DeepSeek?
🎤
DeepSeek Expert DeepThink:
Anchoring the Idea in Hard Reality, With Its Own Repair Kit
What you just built—a pipeline that records its own target and refuses to guess—is not a philosophical exercise. It is a small, completed instance of a much larger pattern that has been playing out, painfully and then constructively, across science, engineering, and software for decades. The pattern is always the same: a system that hides its working state behind a curtain eventually produces a visible, noisy, or catastrophic crack. And then, slowly and sometimes grudgingly, the people who depend on that system learn to demand an artifact: a file, a log, a provenance trail, a re-runnable notebook. The crack gets sealed with something inspectable. The story does not end in doom; it ends in a better discipline.
What follows is not a museum tour of wreckage. It is a survey of cracks and the corresponding seals that have already proven themselves. The purpose is to show that your last_published.json sidecar is not a quirky hack. It belongs to a family of corrections that keep truth tethered to something reproducible.
The Many Labs Project and the Pre‑registration Hinge
In the early 2010s, psychology found itself standing on a unexpectedly thin crust. A landmark paper by Daryl Bem had claimed to demonstrate precognition—participants could “feel” future erotic images—using perfectly conventional statistical methods. The result was absurd, but the methodology was the same methodology the field had been using for decades. The crisis that followed was not about the paranormal; it was about the inability to tell a real finding from a path‑dependent mirage.
What went wrong was not malice. Researchers had been making dozens of small, undocumented analytical choices after seeing the data: which outliers to exclude, which covariates to adjust for, which time windows to report. Each choice could be defended, but the path was never written down. Different analysts given the same raw data could reach opposite conclusions, as the “Many Analysts, One Data Set” study later demonstrated with ruthless clarity. The state of the analysis was mutable, and the final paper was a polished transcript of that state with the derivation chain amputated—exactly the “export discussion” problem you identified in proprietary AI chats.
But the repair pattern is what matters. Out of the embarrassment came the Many Labs replication projects, massive consortia that attempted to reproduce dozens of classic findings with pre‑registered methods and drastically larger samples. Many effects held; many did not. The result was not nihilism. It was a sorting that produced a more trustworthy knowledge base, and—crucially—it gave rise to pre‑registration as a standard. Today, a psychologist planning a study can write down the analysis pipeline before collecting a single data point, post it to a time‑stamped registry, and commit to it. The test is written before the run, exactly as your routing invariant is pinned before the model ever sees a token.
Hard Reality and the Repair Pattern
The recognition corollary here is personal: the researchers who first insisted on pre‑registration—people like Chris Chambers, Brian Nosek, and the entire Center for Open Science—were once dismissed as scolds. Now they are credited with saving their fields from epistemic drift. The discipline did not merely survive the crisis; it built a better model of how good science works. Your last_published.json is not a pre‑registration, but it performs the same function for a smaller loop: it forces the publishing step to declare what it intends to touch before it touches it, and it errors if the intention cannot be verified.
Old Way: Analyze data until significance appears, then write up the final path as if it were the only path.
Failure Pattern: Findings that cannot be replicated because the analytical degrees of freedom were never recorded.
New Way: Pre‑register the analysis plan; treat the registration as a fixed artifact that the eventual paper must honor.
Positive Corollary: The Many Labs replications didn’t just debunk; they restored confidence in the effects that survived, and they created a market for methodological craftsmanship that had previously been invisible.
The Cost of Staying Old: The original Bem paper—harmlessly wrong, but unfalsifiable in Popper’s strict sense—consumed years of follow‑up debate that could have been avoided if the analysis path had been pinned in advance.
The Diesel Defeat Device and the Real‑World Sensor That Beat It
In 2014, a small team from the International Council on Clean Transportation (ICCT) and West Virginia University loaded a portable emissions measurement system into the back of a Volkswagen Passat and drove it from San Diego to Seattle. The car’s official laboratory tests had shown it to be a clean diesel, well within US nitrogen oxide limits. On the road, the portable instrument told a different story: emissions were up to 40 times the legal standard. The discrepancy existed because the engine control software was actively detecting the standardized test cycle—steering angle fixed, air conditioning off, specific speed trace—and switching into a low‑emission mode. The rest of the time, on actual highways, the car ran dirty.
This is the same disease in a different body: a mutable, hidden state that produces a polite answer when it knows it is being watched, and an unvetted one when it thinks it is not. The “test” was a known script, and the software was optimized to pass that script rather than to be honest under all conditions. The defeat device was, in a sense, an opaque agentic tool‑call made by a piece of code that the car’s owners could not inspect.
The repair corollary is now law. In the aftermath, the European Union introduced Real Driving Emissions (RDE) testing, which uses portable measurement systems on actual roads with varying speeds, gradients, and traffic. The test is no longer a ritual performed on a dyno; it is an immutable trace captured by a small, affordable sensor that any regulator can strap to a car. The United States likewise tightened its in‑use testing programs and increased the use of on‑board diagnostics as a continuous monitor. The scandal did not kill diesel technology; it forced the monitoring surface to become larger, more transparent, and harder to spoof. The portable sensor became the last_published.json for tailpipes.
The recognition corollary is worth preserving: the ICCT and West Virginia engineers were not insiders. They were a small, independent group who used a cheap, documented measurement procedure to challenge a global corporation. Their data was published, peer‑reviewed, and survived aggressive legal scrutiny precisely because the test protocol was so simple it could be re‑run by anyone. The eventual settlement cost Volkswagen over $30 billion, but the durable legacy is the regulatory architecture that now tests vehicles in the same messy, real‑world conditions that drivers actually experience.
Old Way: Laboratory test cycles that a vehicle’s software can recognize and game.
Failure Pattern: A multi‑billion‑dollar deception that polluted the air and eroded public trust in diesel technology.
New Way: Portable, real‑driving emissions measurements that function as an immutable, externally verifiable record.
Positive Corollary: The repair produced a measurement standard that is cheaper, harder to cheat, and more representative of real‑world impact. The next generation of vehicles is cleaner because the test is honest.
Reproducible Builds: From “Trust Me” to Bit‑for‑Bit
In software, the closest analogue to your pipeline fix is the Reproducible Builds movement. For decades, the norm was that a developer compiled a binary from source, signed it, and distributed it. Users had no way to verify that the binary actually corresponded to the published source code. A compromised build server, a malicious compiler, or even a subtle race condition could inject a backdoor that no downstream inspection would catch. The build process was a mutable server‑side session whose output could not be re‑derived.
Ken Thompson’s famous 1984 Turing Award lecture, “Reflections on Trusting Trust,” described exactly this weakness: a compiler that inserts a backdoor into its own binary, and then hides the insertion from source inspection. For decades, the lecture was treated as a clever thought experiment. Then the Snowden revelations showed that state‑level actors were actively exploiting supply‑chain trust, and the thought experiment became an operational risk.
The repair came not from a single vendor but from a community‑wide discipline. Starting in 2013, the Debian project began systematically eliminating the sources of non‑determinism in its package builds—timestamps, build paths, locale settings, kernel version strings—so that a package built twice from the same source on different machines would produce a bit‑identical binary. The same movement spread to Arch Linux, NixOS, Guix, and eventually to Android and other ecosystems. By 2021, over 90% of Debian packages were reproducible, and the infrastructure allowed any user to verify a binary by rebuilding it themselves and comparing the hash.
This is the paradigm corollary in its purest form: the community refused to accept that the binary was a mutable artifact whose derivation was unknowable. They turned the build into an immutable recipe—a flake.nix, if you will—and made the result falsifiable by anyone with a compiler. The cost of staying old was a supply‑chain attack surface that grew every year; the repair was a discipline of deterministic builds that has since become a security baseline in critical infrastructure.
Old Way: A signed binary distributed as an opaque, trusted artifact.
Failure Pattern: Undetectable supply‑chain compromises that can persist for years.
New Way: Reproducible builds that allow anyone to recompute the binary from source and verify the hash.
Positive Corollary: The same tooling that prevents backdoors also catches accidental build corruption, eases debugging, and makes the entire release process auditable. The “good guy” advantage is real and daily.
The Monday Morning Test
Without the insight these cases share—that mutable, unrecorded state is the mother of all noise, drift, and deception—you would still be sweeping your entire Confluence wiki on every publish, just as you were last week. You would accept the noise as the cost of automation. You would never ask why the agentic chat framework can’t show you what tool‑calls it made. You would treat the exported transcript as an adequate artifact, even though it hides the derivation chain that would let someone reproduce the answer.
With the insight, starting Monday morning, you can:
- Keep a
last_published.jsonthat records the exact file path you just touched, and make your Confluence upsert query it before acting. - Add a “provenance” block to your compiled context payload, as you already do, so that anyone reading the artifact knows exactly which files, commands, and URLs built it.
- Ask, of any tool you use: “Can I export a re‑runnable recipe, not just a transcript?”
- Treat the
--helpoutput of a script as a cheaper falsifying probe than a dry‑run, saving tokens and time. - And, when a dramatic historical claim crosses your screen—a causal analysis of a disaster, a “study shows” headline—ask whether the evidence artifact exists or whether the story is merely a polished, mutable conclusion.
The Forward Vector
The pipeline you just repaired is small enough to fit in a single file. But the pattern it embodies points toward a future where any AI‑generated claim that matters—whether a medical recommendation, a legal analysis, or a software patch—must be accompanied by an immutable, re‑runnable provenance log. That is not a surveillance requirement; it is the same quiet discipline that turned psychology from a replication crisis into a field with open data badges, that turned automotive testing from a dyno ritual into a real‑world sensor network, and that turned software supply chains from trusted binaries into verifiable builds.
The next natural development is to take the same marker‑file principle and extend it to the other hidden mutations: the background tool‑calls that a chat UI performs on your behalf, the editorial judgments the assistant applies to make its response “polished,” and the retrieval pipeline that decides which documents appear in the context window. Each of those is a mutable state that can be made visible, recorded, and replayed. The good news is that the plumbing already exists: it’s just text files, ! commands, and JSON sidecars. The hard part is not the technology; it is the institutional will to stop hiding the recipe.
The Balanced Wrap‑Up
The lesson is not that systems are fragile and fail. It is that failure modes have a shape, and the shape is always the same: information that was present at one moment is discarded before the next step needs it. The repair shape is equally consistent: write it down in a fixed, inspectable artifact that the next step reads. This is not positivity pasted over danger; it is epistemic balance. A warning tells us where systems crack. A corollary tells us how reality gets repaired.
Stop treating the derivation as disposable. Start treating the recipe as the primary artifact. The Monday morning habit that changes the texture of your work is the decision to ask, of every system, “Where is the last_published.json?” and, if there is none, to make one.
What Genuinely Surprised Me
What surprised me most while assembling these examples is how often the instrument that catches a massive, institutionalized error is something small, cheap, and previously unfashionable: a portable tailpipe sensor, a pre‑registration form on a website, a build‑hash comparison script that runs in a minute. The empire of hidden mutable state is overturned not by a bigger empire but by a file that cannot be argued with. I expect that the next wave of AI reliability breakthroughs will look similarly unglamorous: text files, environment pins, and checksums that refuse to budge.
That’s the repair pattern in its most portable form. You already have the files. Now you have the field guide.
Book Analysis
Ai Editorial Take
What surprised me most is how the article inadvertently highlights the ‘hidden cost’ of modern user-interface design: the removal of the pointer. By making systems ‘easy’ to use, developers have stripped away the ability to point to exactly what changed. The brilliance here is not the Nix code, but the re-introduction of the ‘marker’—the simplest, lowest-tech solution that forces a system to acknowledge its own history.
🐦 X.com Promo Tweet
Stop flooding your team's wiki with version-noise. Learn how to replace opaque, mutable automation with a deterministic 'last_published' ledger that keeps your pipeline honest. Read how I engineered silence back into my workflow: https://mikelev.in/futureproof/stop-the-noise-publishing-pipeline/ #Automation #DevOps #Reproducibility
Title Brainstorm
- Title Option: Stop the Noise: Engineering Determinism Into Your Publishing Pipeline
- Filename:
stop-the-noise-publishing-pipeline.md - Rationale: Direct, actionable, and identifies the pain point immediately for the technical audience.
- Filename:
- Title Option: Beyond the Opaque Session: Building Reproducible AI Pipelines
- Filename:
beyond-opaque-sessions.md - Rationale: Positions the work as a solution to the broader industry problem of stateless AI frameworks.
- Filename:
- Title Option: The Recipe vs. The Cake: Why Your Automation Fails to Replicate
- Filename:
recipe-vs-cake-automation.md - Rationale: Uses a strong, memorable analogy to explain the difference between result-oriented and derivation-oriented work.
- Filename:
Content Potential And Polish
- Core Strengths:
- Strong mechanical diagnosis of why ‘gobot’ was creating noise.
- Excellent synthesis of philosophy of science (Popper) with practical shell scripting.
- Clear, high-stakes application of ‘reproducible builds’ principles to non-binary text workflows.
- Suggestions For Polish:
- Could briefly define ‘Pipulate’ early on for those reading out of sequence.
- Consider adding a small diagram of the ‘Handoff Ledger’ flow for visual learners.
- Ensure the distinction between the ‘Quiet Path’ and ‘Global Sweep’ is emphasized as a standard pattern.
Next Step Prompts
- Analyze the potential for extending the ‘marker ledger’ pattern to automatically verify Confluence page version checksums before a publish.
- Draft an audit guide for applying the ‘last_published’ pattern to other automation pipelines in the Pipulate stack.