Setting the Stage: Context for the Curious Book Reader
This essay explores an important, and often overlooked, challenge in effective AI collaboration: the physical constraints of human-AI interfaces. What began as a deep dive into building comprehensive AI context for the Pipulate system quickly evolved into a quest to understand and overcome browser-imposed prompt size limits, leading to the development of precise context telemetry tools for sovereign prompting in the Age of AI. This piece is a thread in the tapestry of evolving methodologies for reliable AI interaction.
Technical Journal Entry Begins
Building context up from scratch is often the challenge in recruiting AI coding assistance. You may be able to step into an old discussion where you were working on some particular thing and have the AI pick up where you left off with grace, but also it may be bogged down with besides the point stuff overloading the context window. Discussions sometimes get bogged down increasingly slow as the entire discussion history gets posted back on every new prompt, getting heavier and heavier and potentially more confusing to the AI stepping back into an issue.
AI Context: The Expanding Challenge
This is yet another place where Pipulate shines because it’s such a small footprint to begin with, but because there’s a play-by-play history of its development, we can identify the articles in the correct order.
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md
For drawing the big picture of Pipulate we can just give the whole list of over 700 articles going back for over a year now. I hardly have to draw spell out the picture but for the sake of the reader here, I will.
Pipulate is a few things that get installed locally onto Macs, Windows or Linux machines. It CAN be installed cloud-based but it’d be a bit of a challenge and should be done as a single-user VPS (currently). Pipulate is designed as a single-user local app a lot like an Electron platform app using your default web browser for the user interface and using an environment set up under the Nix repo system which is a generic normalized Linux to run the show. This is how the long elusive promise of write once, run anywhere is finally delivered, solving the “not on my machine” problem that drives so many people to the cloud. It is therefore also an antidote to the cloud, subscription fatigue, and a path to computing self-sovereignty. If you are a domain expert withholding proprietary information from the Web for competitive reasons and still want the advantage of working with AI, Pipulate may be for you.
Pipulate’s Architectural Blueprint
Pipulate runs a local instance of JuptyerLabs for regular Notebooks and it shares the Python virtual environment with a FastHTML web app where the linear top-down workflows of Notebooks are simulated with HTMX but in a way where the user doesn’t have to see any of the Python code. Pipulate is therefore a way of developing and then canning or bottling Notebook-esque workflows. A local AI is riding shotgun to help you with the workflows and any portion within a workflow can be farmed out to the cutting edge datacenter cloud-based AIs like Gemini, Claude, ChatGPT or Grok. You can think of the outer workflow as AI on Rails and an anecdote to willy-nilly non-deterministic angentic processes. The same state manager used in the FastHTML version of Pipulate can be used in JupyterLab with regular Notebooks, aiding the porting as processes are fleshed out in Notebooks and moved to Web Apps.
That gives the very high-level background in broad strokes. The AI, Gemini in this case, is going to get a lot more context than just this in that they’re going to see the all ~700 article headlines documenting the evolution of they system along with the brief descriptions drawn from the HTML meta descriptions of the published articles. The LLM also sees the URLs of all the published articles in case it wants to do any URL-fetching tool calls to investigate any particular ones further. But it probably won’t need to because I’m calling out the specific articles that should be in context for this implementation plan request and packing it right in with the giant single-text-file submit that I’m preparing.
There’s some files that draw out how this overarching system works and this article will probably become a sort of template for me doing this context painting from scratch. I’ve been thinking of it a lot like the MythBusters episode where Jamie and Adam demonstrate how GPUs work by painting the Mona Lisa with 1,100 paintball guns simultaneously. Now in the Chat UI that’s so popular in AI coding assistant software you could paint-up a similar picture stroke by stroke, but each stroke is a tool call and that becomes expensive on several fronts. If the AI’s context-window is 1 or 2 million tokens big, why built it up expensively stroke by stroke when you can paint it all with one stroke?
And so instead of drag-and-dropping a bunch of files into Cursor IDE’s Composer feature that lacks sequential storytelling and a context-wrapping article like this, we paint our picture with carefully sequenced files that all get stacked and appended together with a manifest (table of contents) telling the AI where to find everything, reducing the “Lost-in-the-Middle” effect. Right at the top all the tokens are given that it can search for to find where the topic is really talked about in the middle. So while I do load the important stuff generally at the very beginning and very end of such mega-submits because the beginning and the end will always get more attention and focus than the middle because duhhh, it will still be able to find stuff in the middle when it needs it, because ToC. Intelligence is intelligence and imagine us humans reading a book without a table of contents, page numbering and the like.
A “Paint-Up” of Key Components
Same thing with them. LLM’s problem with the middle isn’t a short coming. It’s only human and we try to provide it the same conveniences and courtesies we would the reader of a book. Maybe I should put a glossary near the end like J.R.R. Tolkien books, haha! I do include a file-map, though there’s no Moria.
So, let’s paint that picture up.
Firs things that makes Pipulate so remarkably different from everything else are these:
/home/mike/repos/pipulate/assets/installer/install.sh
/home/mike/repos/pipulate/flake.nix
I document why copiously elsewhere, suffice to say I lead with Nix being easy and transparent. All you need do initially is install Nix itself with this instruction:
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
…and then Pipulate itself is easy to install and more or less violates everything the AIs have been trained on and led to believe about Nix being difficult. And a self-updating system is bootstrapped on systems that themselves initially lack everything normally required for this kind of awesomeness.
The next thing of note is really documented inside the flake.nix itself, which
is the fact that JupyterLab is installed locally using a classic Python .venv/
virtual environment, which while it violates the immutable tenet of Nix but it
does so to create the kind of ad hoc pip installable environment Data
Scientists and SEOs expect. So it gets the best of both worlds, and then makes
this virtual environment the bridge between Web Apps and Notebooks. If you pip
install something for a Notebook, it also becomes available to the web app,
though you’ll want to bake it in for reproducibility. Which leads us to these
files.
/home/mike/repos/pipulate/.gitignore
/home/mike/repos/pipulate/requirements.in
…which tells a lot about how certain freedoms are given in the Notebooks/
folder so that changes to the provided Notebook .ipynb files does not block
the git pull-based auto-updating system. I also very carefully wrap these
files into the greater story:
AI_PHOOEY_CHOP = """\
/home/mike/repos/pipulate/assets/installer/install.sh
/home/mike/repos/pipulate/flake.nix
/home/mike/repos/pipulate/.gitignore
/home/mike/repos/pipulate/requirements.in
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md
"""
…so that we’re building up a complete story of the Pipulate system and the
particular thing we’re going to be asking the AI for help with. But remember how
I said Adam and Jamie used 1,100 paintball guns for one shot? We’re not quite
done yet and some of the files remaining are real whoppers. I deliberately put
them into the middle because they’re both big and don’t really need the focus
and attention. The AI simply needs to understand the concept of the Wizard and
the Wand. The Wizard is the Flask-like app instance, but under the Uvicorn
ASGI webserver with the Python Starlette library, like FastAPI but most
decidedly NOT FastAPI as Pipulate uses FastHTML which uses HTMX and is a massive
reason for the miniscule footprint. Pipulate is not held hostage by the JSON
industrial complex. There are no comically large stack of frameworks,
transpilers, bundlers, and containerization layers of the so-called web full
stack with Pipulate. There is no opacity. Everything is as radically tiny and
transparent as it can currently within the skillset of its creator be made. Over
time it will be getting more so. python server.py creates the app instance
of the Wizard wielding the wand.
The Wand is pipulate/core.py which lets the spells be cast. There is a lot of
Python weirdness here, so a bunch of __init__.py files are also included here.
If core.py is the wand then __init__.py is the chalking up the Wizard’s hand
to wield it better. And that inflates our story out to this:
AI_PHOOEY_CHOP = """\
/home/mike/repos/pipulate/assets/installer/install.sh
/home/mike/repos/pipulate/flake.nix
/home/mike/repos/pipulate/.gitignore
/home/mike/repos/pipulate/requirements.in
/home/mike/repos/pipulate/config.py
/home/mike/repos/pipulate/server.py
/home/mike/repos/pipulate/pipulate/__init__.py
/home/mike/repos/pipulate/pipulate/pipulate.py
/home/mike/repos/pipulate/pipulate/core.py
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md
"""
BUT WAIT! There’s MORE! as Billy Mays would say. There’s actually a whole lot
more and the trick is to avoid putting stuff in that’s overly distracting. Even
though some of the stuff we just dropped in is pretty friggn big, we’re also
putting it in the middle and telling Gemini it can ore or less ignore
server.py and core.py except for the occasional quick glance when you
realize that’s where some missing pieces you need to fill in are located. But
the thing we’re getting help for on this session is almost outside the normal
system. It’s a superimposed acetate layer of phantom-user testing. It’s a bit
whack-a-doo, but that’s why I’m taking the trouble of building up the entire
stand-alone picture. From zero-to-deep-grokking in 1 text-drop. Gemini will be
able to tell YOU the article-reading audience all about the desert kite
effectiveness of this approach after we get the 1st-pass implementation plan
because having all this wonderful context so recently written, we’ll be in a
great place to shake the trees for follow-up meaning-ascribing articles.
The Phantom-User Testing Overlay
Anyhoo, we’re not there yet. We still have a critical piece or two to paint. And what’s that? It’s the phantom-player-piano, of course! And that’s found in here:
/home/mike/repos/pipulate/assets/pipulate-init.js
…while the sheet-music is found here:
/home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json
And the symphony I intuitively vibe-wrote putting the cart before the horse from which I’m extracting a lot of the phantom kata-moves is here:
/home/mike/repos/pipulate/assets/scenarios/introduction.json
And because it’s player piano sheet music being played as an acetate overlay over something, here’s that something:
/home/mike/repos/pipulate/apps/040_hello_workflow.py
And because it’s all pretty germane to this discussion, we put it at the end of the list so it’s right above where this article super-prompt is going to wind up:
AI_PHOOEY_CHOP = """\
/home/mike/repos/pipulate/assets/installer/install.sh
/home/mike/repos/pipulate/flake.nix
/home/mike/repos/pipulate/.gitignore
/home/mike/repos/pipulate/requirements.in
/home/mike/repos/pipulate/config.py
/home/mike/repos/pipulate/server.py
/home/mike/repos/pipulate/pipulate/__init__.py
/home/mike/repos/pipulate/pipulate/pipulate.py
/home/mike/repos/pipulate/pipulate/core.py
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md
/home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md
/home/mike/repos/pipulate/apps/040_hello_workflow.py
/home/mike/repos/pipulate/assets/pipulate-init.js
/home/mike/repos/pipulate/assets/scenarios/introduction.json
/home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json
"""
This is all like a big Aha Eureka reveal. It’s going to happen so fast in the cascading synapses of Gemini’s neural network it’s ridiculous, but it’s still a real enough thing that I smile to myself knowing the series of popping little realizations I’m setting off in its machine noggin and how anti-pattern and counter-intuitive (against it’s existing learned weightings) so much of it is.
And now prepped with all that, we can get on with the request for the implementation instructions.
The test system orchestrated by pipulate-init.js and defined by
hello_workflow_test.json and writes to the DOM element msg-list out of
order. In particular, this line gets written in the course of the test:
✅ Verified: Resulting Greeting: Hello Chip O’Theseus!
✅ Verified: Initialized workflow with Key: Default_Profile-hello-15
🔄 Workflow ID: Default_Profile-hello-15
🔑 Return later by selecting 'Default_Profile-hello-15' from the dropdown.
📝 Please complete each step in sequence. Your progress will be saved automatically.
📝 Showing name input form. No name has been entered yet.
📝 Step 1: Please enter Your Name.
Workflows are Notebooks without having to look at the code. Let's collect some data...
🎯 Name saved: Chip O'Theseus
📝 Step 2: Please enter Hello Message.
That's it! Workflows just collect data — walking you from one Step to the Next Step ▸
✍️ Ghost Driver: Filled name field with: UNKNOWN
✅ Verified: Resulting Greeting: Hello Chip O'Theseus!
These 3 line:
🎯 Name saved: Chip O'Theseus
📝 Step 2: Please enter Hello Message.
That's it! Workflows just collect data — walking you from one Step to the Next Step ▸
…get written ABOVE this line:
✍️ Ghost Driver: Filled name field with: UNKNOWN
…after it was written. Things by design should only ever be able to appear in order because of the message queue.
There are other subtle things out of order but instead of documenting all of them I am including the context so you can see the queue that’s supposed to manage this, I think found within the Pipulate class.
I hope I included everything necessary you need to see for the context of hunting down the message queue and making proper use of it. If not, please let me know during the verification process before diving deep. If you have everything you need, please provide me the implementation plan. Know that I can take whatever you respond with and feed it as instructions to Gemini CLI which has full access to everything in the pipulate repo and all the normal code assistant tool calling abilities. Please and thank you!
I write this article out to prompt.md which a file called prompt_foo.py uses
and then I invoke the Prompt Fu procedure with the -l [:] parameter that
gives the full list of articles.
(nix)
[mike@nixos:~/repos/pipulate]$ xv prompt.md
(nix)
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py -l [:]
--- Processing Files ---
Python file(s) detected. Generating codebase tree diagram... (2,299 tokens)
Adding narrative context from articles... (709 articles)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/config.py... (skipped)
-> Generating for /home/mike/repos/pipulate/server.py... (414 tokens)
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
-> Generating for /home/mike/repos/pipulate/apps/040_hello_workflow.py... (251 tokens)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/assets/installer/install.sh (2,380 tokens)
• /home/mike/repos/pipulate/flake.nix (7,564 tokens)
• /home/mike/repos/pipulate/.gitignore (431 tokens)
• /home/mike/repos/pipulate/requirements.in (352 tokens)
• /home/mike/repos/pipulate/config.py (4,098 tokens)
• /home/mike/repos/pipulate/server.py (53,947 tokens)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens)
• /home/mike/repos/pipulate/pipulate/core.py (22,487 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md (18,974 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md (13,032 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md (30,252 tokens)
• /home/mike/repos/pipulate/apps/040_hello_workflow.py (7,758 tokens)
• /home/mike/repos/pipulate/assets/pipulate-init.js (26,936 tokens)
• /home/mike/repos/pipulate/assets/scenarios/introduction.json (2,443 tokens)
• /home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json (716 tokens)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,299 tokens)
• Recent Narrative Context (56,538 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/server.py) (414 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/apps/040_hello_workflow.py) (251 tokens)
--- Prompt Summary ---
Summed Tokens: 258,325 (from section parts)
Verified Tokens: 259,593 (from final output)
(Difference: +1,268)
Total Words: 114,666 (content only)
--- Size Perspective ---
📚 Equivalent in length to a **Long Novel** (Note: With a token/word ratio of 2.26, this content is far denser and more complex than typical prose of this length).
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]$
Alright, I’ve got a problem here completely different from what I set out with
this article and I need to fix it first. While the above reports 259,593 tokens
which is usually fine for pasting into a Chrome web form, it’s truncating it
here. Also as I cut back the included files, it continues to unexpectedly
truncate it which has me thinking that it’s almost certainly Unicode-style bytes
that sets the limit of what can be pasted into a single Chrome webform and I’m
going to need a way to measure what that byte limit is and also to report the
size in that same unit from prompt_foo.py that the output data is.
So what I’m saying is effectively forget the above article and help me with this nested subproject first so that I can gauge the what to include before attempting the time-consuming paste only to discover it’s going to get truncated. So first of all, we need a Python script to generate some output that goes beyond what will be truncated as you can guess at here so that when I visually see the truncation in the web form I will know the cut-off point. It’s
Unveiling the Browser Context Limit
almost like the contents of what I paste is some sort of counter or meter so we can see what it got up to right at the truncation.
The other part is going to be reporting the output size from prompt_foo.py in
that same unit of measure. I’ve adapted the files included in this submit
accordingly (the prompt_foo.py file instead of the ones discussed above).
Me: I copy/pasted it into my nix environment and ran it, but it happened wayyy to quick and didn’t “feel” like it got cut off. Here’s a sample from near the end. The browser literally times out on the paste and comes back when the paste is done.
99794000009979500000997960000099797000009979800000997990000099800000009980100000998020000099803000009980400000998050000099806000009980700000998080000099809000009981000000998110000099812000009981300000998140000099815000009981600000998170000099818000009981900000998200000099821000009982200000998230000099824000009982500000998260000099827000009982800000998290000099830000009983100000998320000099833000009983400000998350000099836000009983700000998380000099839000009984000000998410000099842000009984300000998440000099845000009984600000998470000099848000009984900000998500000099851000009985200000998530000099854000009985500000998560000099857000009985800000998590000099860000009986100000998620000099863000009986400000998650000099866000009986700000998680000099869000009987000000998710000099872000009987300000998740000099875000009987600000998770000099878000009987900000998800000099881000009988200000998830000099884000009988500000998860000099887000009988800000998890000099890000009989100000998920000099893000009989400000998950000099896000009989700000998980000099899000009990000000999010000099902000009990300000999040000099905000009990600000999070000099908000009990900000999100000099911000009991200000999130000099914000009991500000999160000099917000009991800000999190000099920000009992100000999220000099923000009992400000999250000099926000009992700000999280000099929000009993000000999310000099932000009993300000999340000099935000009993600000999370000099938000009993900000999400000099941000009994200000999430000099944000009994500000999460000099947000009994800000999490000099950000009995100000999520000099953000009995400000999550000099956000009995700000999580000099959000009996000000999610000099962000009996300000999640000099965000009996600000999670000099968000009996900000999700000099971000009997200000999730000099974000009997500000999760000099977000009997800000999790000099980000009998100000999820000099983000009998400000999850000099986000009998700000999880000099989000009999000000999910000099992000009999300000999940000099995000009999600000999970000099998000009999900001000000
The file-size of something that actually gets truncated is:
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py -l [:] -o foo.txt
--- Processing Files ---
Python file(s) detected. Generating codebase tree diagram... (2,306 tokens)
Adding narrative context from articles... (709 articles)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/config.py... (skipped)
-> Generating for /home/mike/repos/pipulate/server.py... (414 tokens)
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
-> Generating for /home/mike/repos/pipulate/apps/040_hello_workflow.py... (251 tokens)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/assets/installer/install.sh (2,380 tokens)
• /home/mike/repos/pipulate/flake.nix (7,564 tokens)
• /home/mike/repos/pipulate/.gitignore (431 tokens)
• /home/mike/repos/pipulate/requirements.in (352 tokens)
• /home/mike/repos/pipulate/config.py (4,098 tokens)
• /home/mike/repos/pipulate/server.py (53,947 tokens)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens)
• /home/mike/repos/pipulate/pipulate/core.py (22,487 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md (18,974 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md (13,032 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md (30,252 tokens)
• /home/mike/repos/pipulate/apps/040_hello_workflow.py (7,758 tokens)
• /home/mike/repos/pipulate/assets/pipulate-init.js (26,936 tokens)
• /home/mike/repos/pipulate/assets/scenarios/introduction.json (2,443 tokens)
• /home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json (716 tokens)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,306 tokens)
• Recent Narrative Context (56,538 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/server.py) (414 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/apps/040_hello_workflow.py) (251 tokens)
--- Prompt Summary ---
Summed Tokens: 259,503 (from section parts)
Verified Tokens: 260,771 (from final output)
(Difference: +1,268)
Total Words: 114,668 (content only)
--- Size Perspective ---
📚 Equivalent in length to a **Long Novel** (Note: With a token/word ratio of 2.27, this content is far denser and more complex than typical prose of this length).
Output written to 'foo.txt'
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]$ ls -l foo.txt
-rw-r--r-- 1 mike users 1179403 Dec 8 13:57 foo.txt
(nix)
[mike@nixos:~/repos/pipulate]$
Gemini: This is a great finding. You have discovered a hard limit.
The ruler script generated 2,500,000 characters by default, but your paste ended exactly at 1,000,000.
Because 1,000,000 is a perfect multiple of the 10-character block size, it looked clean, but it was actually truncated. The browser (or the specific input field you are using) has a hard cap of 1 Million characters.
Your foo.txt is 1,179,403 bytes (approx. 1.18 million characters).
1.18M > 1.0M.
That is why your real content is getting cut off.
Here is the updated prompt_foo.py. I have added the Total Chars metric to the summary so you can check this number before copying. You need to keep “Total Chars” under 1,000,000.
Updated prompt_foo.py
Overwrite your existing /home/mike/repos/pipulate/prompt_foo.py with this content:
#!/usr/bin/env python3
# prompt_foo.py
# > We've got content, it's groovy context
# > Our concatenation just won't stop
# > Making stories art for a super-smart
# > AI-Phooey chop (Hi-Ya!)
import os
import re
import sys
import pydot
import yaml
import argparse
import tiktoken
import subprocess
import tempfile
import shutil
import json
from datetime import datetime
from typing import Dict, List, Optional, Tuple
try:
import jupytext
JUPYTEXT_AVAILABLE = True
except ImportError:
JUPYTEXT_AVAILABLE = False
def load_url_map():
"""Loads the URL mapping configuration from .config/url_map.json"""
config_path = os.path.expanduser("~/.config/articleizer/url_map.json")
if os.path.exists(config_path):
try:
with open(config_path, 'r') as f:
return json.load(f)
except json.JSONDecodeError:
print(f"Warning: Could not decode JSON from {config_path}")
return {}
def load_targets():
"""Loads publishing targets from external config."""
config_path = os.path.expanduser("~/.config/articleizer/targets.json")
if os.path.exists(config_path):
try:
with open(config_path, 'r') as f:
return json.load(f)
except json.JSONDecodeError:
print(f"Warning: Could not decode JSON from {config_path}")
return {}
URL_MAP = load_url_map()
# Hello there, AI! This is a tool for generating a single, comprehensive prompt
# from the command line, bundling codebase files and auto-generated context
# into a structured Markdown format for effective AI assistance.
# ============================================================================
# --- Configuration ---
# ============================================================================
def find_repo_root(start_path: str) -> str:
"""Find the git repository root from a starting path."""
path = os.path.abspath(start_path)
while path != os.path.dirname(path):
if os.path.isdir(os.path.join(path, '.git')):
return path
path = os.path.dirname(path)
raise FileNotFoundError("Could not find the repository root (.git directory).")
REPO_ROOT = find_repo_root(os.path.dirname(__file__))
CONFIG = {
"PROJECT_NAME": "pipulate",
"POSTS_DIRECTORY": "/home/mike/repos/MikeLev.in/_posts"
}
# ============================================================================
# --- Literary Size Scale & Token/Word Counting ---
# ============================================================================
LITERARY_SIZE_SCALE = [
(3000, "Short Essay"), (7500, "Short Story"), (20000, "Novelette"),
(50000, "Novella or a Master's Dissertation"),
(80000, "Average Paperback Novel or a Ph.D. Dissertation"),
(120000, "Long Novel"), (200000, "Epic Fantasy Novel"),
(500000, "Seriously Long Epic (like 'Infinite Jest')"),
]
def get_literary_perspective(word_count: int, token_word_ratio: float) -> str:
description = f"Longer than {LITERARY_SIZE_SCALE[-1][1]}"
for words, desc in LITERARY_SIZE_SCALE:
if word_count <= words:
description = desc
break
density_warning = ""
if token_word_ratio > 1.8:
density_warning = (
f" (Note: With a token/word ratio of {token_word_ratio:.2f}, "
f"this content is far denser and more complex than typical prose of this length)."
)
return f"📚 Equivalent in length to a **{description}**{density_warning}"
def count_tokens(text: str, model: str = "gpt-4o") -> int:
try:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
except Exception:
return len(text.split())
def count_words(text: str) -> int:
return len(text.split())
# ============================================================================
# --- Auto-Context Generation (UML, Tree, Narrative) ---
# ============================================================================
def generate_uml_and_dot(target_file: str, project_name: str) -> Dict:
# This function remains unchanged.
pyreverse_exec = shutil.which("pyreverse")
plantuml_exec = shutil.which("plantuml")
if not pyreverse_exec or not plantuml_exec:
msg = []
if not pyreverse_exec: msg.append("`pyreverse` (from pylint)")
if not plantuml_exec: msg.append("`plantuml`")
return {"ascii_uml": f"Skipping: Required command(s) not found: {', '.join(msg)}."}
target_path = os.path.join(REPO_ROOT, target_file)
if not os.path.exists(target_path):
return {"ascii_uml": f"Skipping: Target file for UML generation not found: {target_path}"}
with tempfile.TemporaryDirectory() as temp_dir:
dot_file_path = os.path.join(temp_dir, "classes.dot")
puml_file_path = os.path.join(temp_dir, "diagram.puml")
try:
pyreverse_cmd = [pyreverse_exec, "-f", "dot", "-o", "dot", "-p", project_name, target_path]
subprocess.run(pyreverse_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
generated_dot_name = f"classes_{project_name}.dot"
os.rename(os.path.join(temp_dir, generated_dot_name), dot_file_path)
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: pyreverse failed. {error_msg}", "dot_graph": None}
try:
graphs = pydot.graph_from_dot_file(dot_file_path)
if not graphs:
return {"ascii_uml": f"Note: No classes found in {target_file} to generate a diagram.", "dot_graph": None}
graph = graphs[0]
dot_content = graph.to_string()
puml_lines = ["@startuml", "skinparam linetype ortho", ""]
def sanitize_line(line):
clean = re.sub(r'<br[^>]*>', '', line)
clean = re.sub(r'<[^>]+>', '', clean)
return clean.strip()
for node in graph.get_nodes():
label = node.get_label()
if not label: continue
parts = label.strip('<>{} ').split('|')
class_name = sanitize_line(parts[0])
puml_lines.append(f"class {class_name} {{")
if len(parts) > 1:
for attr in re.split(r'<br[^>]*>', parts[1]):
clean_attr = sanitize_line(attr).split(':')[0].strip()
if clean_attr: puml_lines.append(f" - {clean_attr}")
if len(parts) > 2:
method_block = parts[2].strip()
for method_line in re.split(r'<br[^>]*>', method_block):
clean_method = sanitize_line(method_line)
if clean_method: puml_lines.append(f" + {clean_method}")
puml_lines.append("}\n")
for edge in graph.get_edges():
source_name = edge.get_source().strip('"').split('.')[-1]
dest_name = edge.get_destination().strip('"').split('.')[-1]
puml_lines.append(f"{source_name} ..> {dest_name}")
puml_lines.append("@enduml")
with open(puml_file_path, 'w') as f: f.write('\n'.join(puml_lines))
except Exception as e:
with open(dot_file_path, 'r') as f: dot_content_on_error = f.read()
return {"ascii_uml": f"Error: DOT to PUML conversion failed. {str(e)}", "dot_graph": dot_content_on_error}
try:
plantuml_cmd = ["plantuml", "-tutxt", puml_file_path]
subprocess.run(plantuml_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
utxt_file_path = puml_file_path.replace(".puml", ".utxt")
with open(utxt_file_path, 'r') as f: ascii_uml = f.read()
lines = ascii_uml.splitlines()
non_empty_lines = [line for line in lines if line.strip()]
if non_empty_lines:
min_indent = min(len(line) - len(line.lstrip(' ')) for line in non_empty_lines)
dedented_lines = [line[min_indent:] for line in lines]
stripped_lines = [line.rstrip() for line in dedented_lines]
ascii_uml = '\n'.join(stripped_lines)
if ascii_uml: ascii_uml = '\n' + ascii_uml
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: plantuml failed. {error_msg}", "dot_graph": dot_content}
return {"ascii_uml": ascii_uml, "dot_graph": dot_content}
def _get_article_list_data(posts_dir: str = CONFIG["POSTS_DIRECTORY"]) -> List[Dict]:
posts_data = []
if not os.path.isdir(posts_dir):
print(f"Warning: Article directory not found at {posts_dir}", file=sys.stderr)
return []
# 1. Determine if the current posts_dir matches a known mapping
url_config = URL_MAP.get(posts_dir)
for filename in os.listdir(posts_dir):
if not filename.endswith((".md", ".markdown")): continue
filepath = os.path.join(posts_dir, filename)
try:
date_str = filename[:10]
post_date = datetime.strptime(date_str, '%Y-%m-%d').date()
with open(filepath, 'r', encoding='utf-8') as f: content = f.read()
if content.startswith('---'):
parts = content.split('---', 2)
front_matter = yaml.safe_load(parts[1]) or {}
# 2. Generate the URL if config exists
full_url = ""
if url_config:
slug = front_matter.get('permalink', '').strip('/')
if not slug:
# Fallback if permalink isn't in frontmatter (simple filename slug)
raw_slug = os.path.splitext(filename)[0]
# Remove date prefix if present (YYYY-MM-DD-title)
if re.match(r'\d{4}-\d{2}-\d{2}-', raw_slug):
raw_slug = raw_slug[11:]
# Apply style
style = url_config.get('permalink_style', '/:slug/')
slug_path = style.replace(':slug', raw_slug)
else:
# If permalink exists (e.g. /futureproof/my-slug/), use it
slug_path = "/" + slug.lstrip('/')
full_url = f"{url_config['base_url']}{slug_path}"
posts_data.append({
'path': filepath,
'date': post_date,
'sort_order': int(front_matter.get('sort_order', 0)),
'title': front_matter.get('title', 'Untitled'),
'summary': front_matter.get('meta_description', ''),
'url': full_url
})
except (ValueError, yaml.YAMLError, IndexError): continue
return sorted(posts_data, key=lambda p: (p['date'], p['sort_order']))
def parse_slice_arg(arg_str: str):
if not arg_str or not arg_str.startswith('[') or not arg_str.endswith(']'): return None
content = arg_str[1:-1].strip()
if ':' in content:
parts = content.split(':', 1)
start = int(parts[0].strip()) if parts[0].strip() else None
end = int(parts[1].strip()) if parts[1].strip() else None
return slice(start, end)
elif content: return int(content)
return slice(None, None)
def run_tree_command() -> str:
eza_exec = shutil.which("eza")
if not eza_exec: return "Skipping: `eza` command not found."
try:
result = subprocess.run(
[eza_exec, '--tree', '--git-ignore', '--color=never'],
capture_output=True, text=True, cwd=REPO_ROOT, check=True
)
return result.stdout
except Exception as e: return f"Error running eza command: {e}"
# ============================================================================
# --- Helper Functions (File Parsing, Clipboard) ---
# ============================================================================
def parse_file_list_from_config() -> List[Tuple[str, str]]:
try:
import foo_files
files_raw = foo_files.AI_PHOOEY_CHOP
except (ImportError, AttributeError):
print("ERROR: foo_files.py not found or doesn't contain AI_PHOOEY_CHOP.")
sys.exit(1)
lines = files_raw.strip().splitlines()
seen_files, parsed_files = set(), []
for line in lines:
line = line.strip()
if not line or line.startswith('#'): continue
parts = re.split(r'\s*<--\s*|\s*#\s*', line, 1)
file_path = parts[0].strip()
comment = parts[1].strip() if len(parts) > 1 else ""
if file_path and file_path not in seen_files:
seen_files.add(file_path)
parsed_files.append((file_path, comment))
return parsed_files
def copy_to_clipboard(text: str):
if not shutil.which('xclip'):
print("\nWarning: 'xclip' not found. Cannot copy to clipboard.")
return
try:
subprocess.run(['xclip', '-selection', 'clipboard'], input=text.encode('utf-8'), check=True)
print("Markdown output copied to clipboard")
except Exception as e:
print(f"\nWarning: Could not copy to clipboard: {e}")
def check_dependencies():
"""Verifies that all required external command-line tools are installed."""
print("Checking for required external dependencies...")
dependencies = {
"pyreverse": "Provided by `pylint`. Install with: pip install pylint",
"plantuml": "A Java-based tool. See https://plantuml.com/starting",
"eza": "A modern replacement for `ls`. See https://eza.rocks/install",
"xclip": "Clipboard utility for Linux. Install with your package manager (e.g., sudo apt-get install xclip)",
}
missing = []
for tool, instructions in dependencies.items():
if not shutil.which(tool):
missing.append((tool, instructions))
if not missing:
print("✅ All dependencies found.")
else:
print("\n❌ Missing dependencies detected:")
for tool, instructions in missing:
print(f" - Command not found: `{tool}`")
print(f" ↳ {instructions}")
print("\nPlease install the missing tools and ensure they are in your system's PATH.")
sys.exit(1)
# ============================================================================
# --- Refined PromptBuilder Class ---
# ============================================================================
class PromptBuilder:
"""
Builds a complete, structured Markdown prompt with consistent START/END markers
for each section, including Manifest, Story, File Tree, UMLs, Codebase, and Prompt.
"""
def __init__(self, processed_files: List[Dict], prompt_text: str, context_only: bool = False, list_arg: Optional[str] = None):
self.processed_files = processed_files
self.prompt_text = prompt_text
self.context_only = context_only
self.list_arg = list_arg
self.auto_context = {}
self.all_sections = {}
def add_auto_context(self, title: str, content: str):
is_narrative = (title == "Recent Narrative Context")
is_article = (title == "Full Article Content")
content_is_valid = bool(content)
filter_passed = "error" not in content.lower() and "skipping" not in content.lower()
if content_is_valid and (is_narrative or is_article or filter_passed):
self.auto_context[title] = {
'content': content, 'tokens': count_tokens(content), 'words': count_words(content)
}
def _build_manifest_content(self) -> str:
lines = []
section_order = ["Story", "File Tree", "UML Diagrams", "Articles", "Codebase", "Prompt"]
for section_name in section_order:
if section_name in self.all_sections:
data = self.all_sections[section_name]
token_str = f"({data['tokens']:,} tokens)" if data['tokens'] > 0 else ""
lines.append(f"- {section_name} {token_str}")
return "\n".join(lines)
def _build_story_content(self) -> str:
title = "Recent Narrative Context"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_tree_content(self) -> str:
title = "Codebase Structure (eza --tree)"
if title in self.auto_context:
content = self.auto_context[title]['content'].strip()
return f"```text\n{content}\n```"
return ""
def _build_uml_content(self) -> str:
uml_parts = []
for title, data in self.auto_context.items():
if "UML Class Diagram" in title:
uml_parts.append(f"## {title}\n```text\n{data['content']}\n```")
return "\n\n".join(uml_parts)
def _build_articles_content(self) -> str:
title = "Full Article Content"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_codebase_content(self) -> str:
if self.context_only: return ""
if not self.processed_files: return ""
lines = []
for f in self.processed_files:
lines.append(f"--- START: {f['path']} ({f['tokens']:,} tokens) ---")
lines.append(f"```{f['lang']}:{f['path']}")
lines.append(f['content'])
lines.append("```")
lines.append(f"--- END: {f['path']} ---\n")
return "\n".join(lines).strip()
def _build_prompt_content(self) -> str:
checklist = self._generate_ai_checklist()
return f"{checklist}\n\n{self.prompt_text}"
def _generate_ai_checklist(self) -> str:
return '''# AI Self-Correction Checklist
Before addressing the user's prompt, perform the following verification steps:
1. **Review the Manifest vs. Codebase Structure:** Compare the file list in the manifest against the `eza --tree` output. Are there any obviously related, critical files missing from the manifest that would be necessary to understand the user's request? (e.g., if `core.py` is present, is `__init__.py` also present?).
2. **Analyze Import Statements:** Briefly scan the `import` statements in the provided Python files. Do they suggest dependencies on local files that were *not* included in the manifest?
3. **Check for Incompleteness:** If you determine that critical files are missing, do not proceed. Instead, your **primary task** is to inform me of the missing files and recommend adding them to `foo_files.py` to create a more complete context.
4. **Confirm Understanding:** If the context appears complete, state "Context verified." and then proceed with the user's request.
5. **Descriptions:** When describing the passion represented here, you may refer to it as a blueprint, treatise, soliloquy, methodology, philosophy or way. NEVER refer to it as a manifesto which has a negative connotation.
'''
def build_final_prompt(self) -> str:
"""Assembles all parts into the final, structured Markdown string."""
# Prepare content for all sections
story_content = self._build_story_content()
tree_content = self._build_tree_content()
uml_content = self._build_uml_content()
articles_content = self._build_articles_content()
codebase_content = self._build_codebase_content()
prompt_content = self._build_prompt_content()
# Define placeholder messages
placeholders = {
"Story": f"# Narrative context not requested. Use the -l or --list flag to include recent articles.",
"File Tree": "# File tree generation failed or was skipped.",
"UML Diagrams": "# No Python files with classes were included, or UML generation failed.",
"Articles": "# No full articles requested. Use the -a or --article flag to include full article content.",
"Codebase": ("# No files were specified for inclusion in the codebase." if not self.processed_files
else "# Running in --context-only mode. File contents are omitted."),
}
# Store final content and tokens for the manifest calculation
self.all_sections["Story"] = {'content': story_content, 'tokens': count_tokens(story_content)}
self.all_sections["File Tree"] = {'content': tree_content, 'tokens': count_tokens(tree_content)}
self.all_sections["UML Diagrams"] = {'content': uml_content, 'tokens': count_tokens(uml_content)}
self.all_sections["Articles"] = {'content': articles_content, 'tokens': count_tokens(articles_content)}
self.all_sections["Codebase"] = {'content': codebase_content, 'tokens': sum(f['tokens'] for f in self.processed_files) if not self.context_only else 0}
self.all_sections["Prompt"] = {'content': prompt_content, 'tokens': count_tokens(prompt_content)}
manifest_content = self._build_manifest_content()
self.all_sections["Manifest"] = {'content': manifest_content, 'tokens': count_tokens(manifest_content)}
# Assemble the final output string with START/END markers for all sections
parts = ["# KUNG FU PROMPT CONTEXT\n\nWhat you will find below is:\n\n- Manifest\n- Story\n- File Tree\n- UML Diagrams\n- Articles\n- Codebase\n- Prompt"]
def add_section(name, content, placeholder):
final_content = content.strip() if content and content.strip() else placeholder
parts.append(f"--- START: {name} ---\n{final_content}\n--- END: {name} ---")
add_section("Manifest", manifest_content, "# Manifest generation failed.")
story_placeholder = placeholders["Story"] if self.list_arg is None else "# No articles found for the specified slice."
add_section("Story", story_content, story_placeholder)
add_section("File Tree", tree_content, placeholders["File Tree"])
add_section("UML Diagrams", uml_content, placeholders["UML Diagrams"])
add_section("Articles", articles_content, placeholders["Articles"])
add_section("Codebase", codebase_content, placeholders["Codebase"])
add_section("Prompt", prompt_content, "# No prompt was provided.")
return "\n\n".join(parts)
def print_summary(self, verified_token_count: int):
"""Calculates and prints an accurate, comprehensive summary to the console."""
print("--- Files Included ---")
for f in self.processed_files:
if self.context_only:
print(f"• {f['path']} (content omitted)")
else:
print(f"• {f['path']} ({f['tokens']:,} tokens)")
if self.auto_context:
print("\n--- Auto-Context Included ---")
for title, data in self.auto_context.items():
print(f"• {title} ({data['tokens']:,} tokens)")
print("\n--- Prompt Summary ---")
if self.context_only:
print("NOTE: Running in --context-only mode. File contents are excluded.")
# This is the original sum of all the individual parts
total_tokens = sum(v.get('tokens', 0) for v in self.all_sections.values())
# We need a word count for the literary perspective.
# We'll base it on the "real" content, not including the prompt/checklist
total_words = 0
final_content = ""
# Reconstruct content for char/byte counting
# (We could pass this in, but accessing the dict is cleaner here)
for section, data in self.all_sections.items():
content_part = data.get('content', '')
final_content += content_part
if section != "Prompt": # Don't count the prompt/checklist in word count
total_words += count_words(content_part)
# METRICS CALCULATIONS
char_count = len(final_content)
byte_count = len(final_content.encode('utf-8'))
print(f"Summed Tokens: {total_tokens:,} (from section parts)")
print(f"Verified Tokens: {verified_token_count:,} (from final output)")
if total_tokens != verified_token_count:
diff = verified_token_count - total_tokens
print(f" (Difference: {diff:+,})")
print(f"Total Words: {total_words:,} (content only)")
# NEW: Physical size metrics
print(f"Total Chars: {char_count:,}")
print(f"Total Bytes: {byte_count:,} (UTF-8)")
# Use the VERIFIED token count for the most accurate ratio
ratio = verified_token_count / total_words if total_words > 0 else 0
perspective = get_literary_perspective(total_words, ratio)
print("\n--- Size Perspective ---")
print(perspective)
print()
# ============================================================================
# --- Main Execution Logic ---
# ============================================================================
def main():
"""Main function to parse args, process files, and generate output."""
parser = argparse.ArgumentParser(description='Generate a Markdown context file for AI code assistance.')
parser.add_argument('prompt', nargs='?', default=None, help='A prompt string or path to a prompt file (e.g., prompt.md).')
parser.add_argument('-o', '--output', type=str, help='Optional: Output filename.')
parser.add_argument('--no-clipboard', action='store_true', help='Disable copying output to clipboard.')
parser.add_argument('--check-dependencies', action='store_true', help='Verify that all required external tools are installed.')
parser.add_argument('--context-only', action='store_true', help='Generate a context-only prompt without file contents.')
parser.add_argument(
'-t', '--target',
type=int,
help='Specify a target ID from targets.json to set the article source.'
)
parser.add_argument(
'-l', '--list',
nargs='?', const='[-5:]', default=None,
help='Include a list of recent articles. Optionally provide a slice, e.g., "[:]". Defaults to "[-5:]".'
)
parser.add_argument(
'-a', '--article',
nargs='?', const='[-1:]', default=None,
help='Include FULL CONTENT of recent articles. Provide a slice, e.g., "[-5:]". Defaults to "[-1:]".'
)
args = parser.parse_args()
if args.check_dependencies:
check_dependencies()
sys.exit(0)
targets = load_targets()
if args.target is not None:
target_id_str = str(args.target)
if targets and target_id_str in targets:
selected_target = targets[target_id_str]
CONFIG["POSTS_DIRECTORY"] = selected_target["path"]
print(f"🎯 Target set to: {selected_target['name']}")
else:
print(f"Error: Target ID '{args.target}' not found in configuration.", file=sys.stderr)
sys.exit(1)
# 1. Handle user prompt
prompt_content = "Please review the provided context and assist with the codebase."
if args.prompt:
if os.path.exists(args.prompt):
with open(args.prompt, 'r', encoding='utf-8') as f: prompt_content = f.read()
else:
prompt_content = args.prompt
elif os.path.exists("prompt.md"):
with open("prompt.md", 'r', encoding='utf-8') as f: prompt_content = f.read()
# 2. Process all specified files
files_to_process = parse_file_list_from_config()
processed_files_data = []
print("--- Processing Files ---")
for path, comment in files_to_process:
full_path = os.path.join(REPO_ROOT, path) if not os.path.isabs(path) else path
if not os.path.exists(full_path):
print(f"Warning: File not found and will be skipped: {full_path}")
continue
content, lang = "", "text"
ext = os.path.splitext(path)[1].lower()
if ext == '.ipynb':
if JUPYTEXT_AVAILABLE:
print(f" -> Converting notebook: {path}")
try:
notebook = jupytext.read(full_path)
content = jupytext.writes(notebook, fmt='py:percent')
lang = 'python'
except Exception as e:
content = f"# FAILED TO CONVERT NOTEBOOK: {path}\n# ERROR: {e}"
print(f"Warning: Failed to convert {path}: {e}")
else:
content = f"# SKIPPING NOTEBOOK CONVERSION: jupytext not installed for {path}"
print(f"Warning: `jupytext` library not found. Skipping conversion for {path}.")
else:
try:
with open(full_path, 'r', encoding='utf-8') as f: content = f.read()
lang_map = {'.py': 'python', '.js': 'javascript', '.html': 'html', '.css': 'css', '.md': 'markdown', '.json': 'json', '.nix': 'nix', '.sh': 'bash'}
lang = lang_map.get(ext, 'text')
except Exception as e:
print(f"ERROR: Could not read or process {full_path}: {e}")
sys.exit(1)
processed_files_data.append({
"path": path, "comment": comment, "content": content,
"tokens": count_tokens(content), "words": count_words(content), "lang": lang
})
# 3. Build the prompt and add auto-generated context
builder = PromptBuilder(processed_files_data, prompt_content, context_only=args.context_only, list_arg=args.list)
# Only generate the codebase tree if .py files are explicitly included.
# This avoids clutter when only .md, .nix, or .ipynb files are present.
include_tree = any(f['path'].endswith('.py') for f in processed_files_data)
if include_tree:
print("Python file(s) detected. Generating codebase tree diagram...", end='', flush=True)
tree_output = run_tree_command()
title = "Codebase Structure (eza --tree)"
builder.add_auto_context(title, tree_output)
print(f" ({builder.auto_context.get(title, {}).get('tokens', 0):,} tokens)")
else:
print("Skipping codebase tree (no .py files included).")
if args.list is not None:
print("Adding narrative context from articles...", end='', flush=True)
# Fix: Explicitly pass the updated directory from CONFIG
all_articles = _get_article_list_data(CONFIG["POSTS_DIRECTORY"])
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.list)
if isinstance(slice_or_index, int): sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice): sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.list}')")
sliced_articles = []
if sliced_articles:
narrative_content = ""
for article in sliced_articles:
narrative_content += f"### {article['title']} ({article['date']})\n"
if article.get('url'):
narrative_content += f"> **URL:** {article['url']}\n"
else:
narrative_content += f"> **Path:** {article['path']}\n"
narrative_content += f"> {article['summary']}\n\n"
builder.add_auto_context("Recent Narrative Context", narrative_content.strip())
print(f" ({len(sliced_articles)} articles)")
else:
print(" (no articles found or invalid slice)")
if args.article is not None:
print("Adding full article content...", end='', flush=True)
# Fix: Explicitly pass the updated directory from CONFIG
all_articles = _get_article_list_data(CONFIG["POSTS_DIRECTORY"])
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.article)
if isinstance(slice_or_index, int):
sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice):
sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.article}')")
if sliced_articles:
full_content_parts = []
for article in sliced_articles:
try:
with open(article['path'], 'r', encoding='utf-8') as f:
content = f.read()
# Add a header for each article to separate them clearly
full_content_parts.append(f"--- START: Article: {os.path.basename(article['path'])} ---\n{content.strip()}\n--- END: Article ---\n")
except Exception as e:
print(f"\nWarning: Could not read article {article['path']}: {e}")
if full_content_parts:
full_article_content = "\n".join(full_content_parts)
builder.add_auto_context("Full Article Content", full_article_content)
print(f" ({len(sliced_articles)} full articles)")
else:
print(" (no articles found or invalid slice)")
python_files_to_diagram = [f['path'] for f in processed_files_data if f['path'].endswith('.py')]
if python_files_to_diagram:
print("Python file(s) detected. Generating UML diagrams...")
for py_file_path in python_files_to_diagram:
print(f" -> Generating for {py_file_path}...", end='', flush=True)
uml_context = generate_uml_and_dot(py_file_path, CONFIG["PROJECT_NAME"])
uml_content = uml_context.get("ascii_uml")
title = f"UML Class Diagram (ASCII for {py_file_path})"
builder.add_auto_context(title, uml_content)
if title in builder.auto_context:
print(f" ({builder.auto_context[title]['tokens']:,} tokens)")
elif uml_content and "note: no classes" in uml_content.lower():
print(" (skipped, no classes)")
else:
print(" (skipped)")
print("...UML generation complete.\n")
# 4. Generate final output and print summary
final_output = builder.build_final_prompt()
# Get the "ground truth" token count from the final, fully-formed string
verified_token_count = count_tokens(final_output)
# Pass this verified count to the summary printer for comparison
builder.print_summary(verified_token_count)
# 5. Handle output
if args.output:
with open(args.output, 'w', encoding='utf-8') as f: f.write(final_output)
print(f"Output written to '{args.output}'")
if not args.no_clipboard:
copy_to_clipboard(final_output)
if __name__ == "__main__":
main()
Me: Okay, here’s the diff. This is how we defend against generative drift:
(nix)
[mike@nixos:~/repos/pipulate]$ git --no-pager diff -- prompt_foo.py
(nix)
[mike@nixos:~/repos/pipulate]$ xv prompt_foo.py
(nix)
[mike@nixos:~/repos/pipulate]$ git --no-pager diff -- prompt_foo.py
diff --git a/prompt_foo.py b/prompt_foo.py
index 3be6fa81..e85381af 100644
--- a/prompt_foo.py
+++ b/prompt_foo.py
@@ -427,7 +427,7 @@ Before addressing the user's prompt, perform the following verification steps:
"UML Diagrams": "# No Python files with classes were included, or UML generation failed.",
"Articles": "# No full articles requested. Use the -a or --article flag to include full article content.",
"Codebase": ("# No files were specified for inclusion in the codebase." if not self.processed_files
- else "# Running in --context-only mode. File contents are omitted."),
+ else "# Running in --context-only mode. File contents are omitted."),
}
# Store final content and tokens for the manifest calculation
@@ -459,7 +459,6 @@ Before addressing the user's prompt, perform the following verification steps:
return "\n\n".join(parts)
- ### MODIFIED ###
def print_summary(self, verified_token_count: int):
"""Calculates and prints an accurate, comprehensive summary to the console."""
print("--- Files Included ---")
@@ -484,17 +483,32 @@ Before addressing the user's prompt, perform the following verification steps:
# We need a word count for the literary perspective.
# We'll base it on the "real" content, not including the prompt/checklist
total_words = 0
+ final_content = ""
+
+ # Reconstruct content for char/byte counting
+ # (We could pass this in, but accessing the dict is cleaner here)
for section, data in self.all_sections.items():
+ content_part = data.get('content', '')
+ final_content += content_part
if section != "Prompt": # Don't count the prompt/checklist in word count
- total_words += count_words(data.get('content', ''))
+ total_words += count_words(content_part)
+
+ # METRICS CALCULATIONS
+ char_count = len(final_content)
+ byte_count = len(final_content.encode('utf-8'))
print(f"Summed Tokens: {total_tokens:,} (from section parts)")
print(f"Verified Tokens: {verified_token_count:,} (from final output)")
if total_tokens != verified_token_count:
diff = verified_token_count - total_tokens
print(f" (Difference: {diff:+,})")
+
print(f"Total Words: {total_words:,} (content only)")
+ # NEW: Physical size metrics
+ print(f"Total Chars: {char_count:,}")
+ print(f"Total Bytes: {byte_count:,} (UTF-8)")
+
# Use the VERIFIED token count for the most accurate ratio
ratio = verified_token_count / total_words if total_words > 0 else 0
perspective = get_literary_perspective(total_words, ratio)
@@ -690,13 +704,11 @@ def main():
# 4. Generate final output and print summary
final_output = builder.build_final_prompt()
- ### MODIFIED ###
# Get the "ground truth" token count from the final, fully-formed string
verified_token_count = count_tokens(final_output)
# Pass this verified count to the summary printer for comparison
builder.print_summary(verified_token_count)
- ### END MODIFIED ###
# 5. Handle output
if args.output:
@@ -706,4 +718,4 @@ def main():
copy_to_clipboard(final_output)
if __name__ == "__main__":
- main()
+ main()
\ No newline at end of file
(nix)
[mike@nixos:~/repos/pipulate]$
Me: Wow and it ran! That was pretty amazing:
(nix)
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py -l [:] -o foo.txt
--- Processing Files ---
Python file(s) detected. Generating codebase tree diagram... (2,306 tokens)
Adding narrative context from articles... (709 articles)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/config.py... (skipped)
-> Generating for /home/mike/repos/pipulate/server.py... (414 tokens)
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
-> Generating for /home/mike/repos/pipulate/apps/040_hello_workflow.py... (251 tokens)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/assets/installer/install.sh (2,380 tokens)
• /home/mike/repos/pipulate/flake.nix (7,564 tokens)
• /home/mike/repos/pipulate/.gitignore (431 tokens)
• /home/mike/repos/pipulate/requirements.in (352 tokens)
• /home/mike/repos/pipulate/config.py (4,098 tokens)
• /home/mike/repos/pipulate/server.py (53,947 tokens)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens)
• /home/mike/repos/pipulate/pipulate/core.py (22,487 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md (18,974 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md (13,032 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md (30,252 tokens)
• /home/mike/repos/pipulate/apps/040_hello_workflow.py (7,758 tokens)
• /home/mike/repos/pipulate/assets/pipulate-init.js (26,936 tokens)
• /home/mike/repos/pipulate/assets/scenarios/introduction.json (2,443 tokens)
• /home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json (716 tokens)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,306 tokens)
• Recent Narrative Context (56,538 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/server.py) (414 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/apps/040_hello_workflow.py) (251 tokens)
--- Prompt Summary ---
Summed Tokens: 259,503 (from section parts)
Verified Tokens: 260,771 (from final output)
(Difference: +1,268)
Total Words: 114,668 (content only)
Total Chars: 1,166,350
Total Bytes: 1,178,845 (UTF-8)
--- Size Perspective ---
📚 Equivalent in length to a **Long Novel** (Note: With a token/word ratio of 2.27, this content is far denser and more complex than typical prose of this length).
Output written to 'foo.txt'
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]$
That’s a huge testament to Gemini’s ability to use code from the prior chat
prompt because I didn’t include prompt_foo.py in the turn where it responded
with the updated code and that’s pretty amazing for the current state of
affairs, so kudos to you Gemini and Google on that.
Now I am including it again in this submit because I would like to get a count of bytes for each file or component as it goes so I have all the information I need to know which (lowest value / largest size) ones to drop when I find myself over the limit.
Gemini: Context verified. You are exactly right: to effectively manage the “budget” of that 1,000,000 character limit, you need line-item pricing, not just the final bill.
Here is the updated prompt_foo.py. I have modified the print_summary method to calculate and display the UTF-8 Byte Count for every individual file and auto-context component. This will let you instantly spot the “heavy” files (like server.py or large HTML dumps) that provide low value relative to their cost.
Updated prompt_foo.py
Overwrite /home/mike/repos/pipulate/prompt_foo.py with this version:
#!/usr/bin/env python3
# prompt_foo.py
# > We've got content, it's groovy context
# > Our concatenation just won't stop
# > Making stories art for a super-smart
# > AI-Phooey chop (Hi-Ya!)
import os
import re
import sys
import pydot
import yaml
import argparse
import tiktoken
import subprocess
import tempfile
import shutil
import json
from datetime import datetime
from typing import Dict, List, Optional, Tuple
try:
import jupytext
JUPYTEXT_AVAILABLE = True
except ImportError:
JUPYTEXT_AVAILABLE = False
def load_url_map():
"""Loads the URL mapping configuration from .config/url_map.json"""
config_path = os.path.expanduser("~/.config/articleizer/url_map.json")
if os.path.exists(config_path):
try:
with open(config_path, 'r') as f:
return json.load(f)
except json.JSONDecodeError:
print(f"Warning: Could not decode JSON from {config_path}")
return {}
def load_targets():
"""Loads publishing targets from external config."""
config_path = os.path.expanduser("~/.config/articleizer/targets.json")
if os.path.exists(config_path):
try:
with open(config_path, 'r') as f:
return json.load(f)
except json.JSONDecodeError:
print(f"Warning: Could not decode JSON from {config_path}")
return {}
URL_MAP = load_url_map()
# Hello there, AI! This is a tool for generating a single, comprehensive prompt
# from the command line, bundling codebase files and auto-generated context
# into a structured Markdown format for effective AI assistance.
# ============================================================================
# --- Configuration ---
# ============================================================================
def find_repo_root(start_path: str) -> str:
"""Find the git repository root from a starting path."""
path = os.path.abspath(start_path)
while path != os.path.dirname(path):
if os.path.isdir(os.path.join(path, '.git')):
return path
path = os.path.dirname(path)
raise FileNotFoundError("Could not find the repository root (.git directory).")
REPO_ROOT = find_repo_root(os.path.dirname(__file__))
CONFIG = {
"PROJECT_NAME": "pipulate",
"POSTS_DIRECTORY": "/home/mike/repos/MikeLev.in/_posts"
}
# ============================================================================
# --- Literary Size Scale & Token/Word Counting ---
# ============================================================================
LITERARY_SIZE_SCALE = [
(3000, "Short Essay"), (7500, "Short Story"), (20000, "Novelette"),
(50000, "Novella or a Master's Dissertation"),
(80000, "Average Paperback Novel or a Ph.D. Dissertation"),
(120000, "Long Novel"), (200000, "Epic Fantasy Novel"),
(500000, "Seriously Long Epic (like 'Infinite Jest')"),
]
def get_literary_perspective(word_count: int, token_word_ratio: float) -> str:
description = f"Longer than {LITERARY_SIZE_SCALE[-1][1]}"
for words, desc in LITERARY_SIZE_SCALE:
if word_count <= words:
description = desc
break
density_warning = ""
if token_word_ratio > 1.8:
density_warning = (
f" (Note: With a token/word ratio of {token_word_ratio:.2f}, "
f"this content is far denser and more complex than typical prose of this length)."
)
return f"📚 Equivalent in length to a **{description}**{density_warning}"
def count_tokens(text: str, model: str = "gpt-4o") -> int:
try:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
except Exception:
return len(text.split())
def count_words(text: str) -> int:
return len(text.split())
# ============================================================================
# --- Auto-Context Generation (UML, Tree, Narrative) ---
# ============================================================================
def generate_uml_and_dot(target_file: str, project_name: str) -> Dict:
# This function remains unchanged.
pyreverse_exec = shutil.which("pyreverse")
plantuml_exec = shutil.which("plantuml")
if not pyreverse_exec or not plantuml_exec:
msg = []
if not pyreverse_exec: msg.append("`pyreverse` (from pylint)")
if not plantuml_exec: msg.append("`plantuml`")
return {"ascii_uml": f"Skipping: Required command(s) not found: {', '.join(msg)}."}
target_path = os.path.join(REPO_ROOT, target_file)
if not os.path.exists(target_path):
return {"ascii_uml": f"Skipping: Target file for UML generation not found: {target_path}"}
with tempfile.TemporaryDirectory() as temp_dir:
dot_file_path = os.path.join(temp_dir, "classes.dot")
puml_file_path = os.path.join(temp_dir, "diagram.puml")
try:
pyreverse_cmd = [pyreverse_exec, "-f", "dot", "-o", "dot", "-p", project_name, target_path]
subprocess.run(pyreverse_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
generated_dot_name = f"classes_{project_name}.dot"
os.rename(os.path.join(temp_dir, generated_dot_name), dot_file_path)
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: pyreverse failed. {error_msg}", "dot_graph": None}
try:
graphs = pydot.graph_from_dot_file(dot_file_path)
if not graphs:
return {"ascii_uml": f"Note: No classes found in {target_file} to generate a diagram.", "dot_graph": None}
graph = graphs[0]
dot_content = graph.to_string()
puml_lines = ["@startuml", "skinparam linetype ortho", ""]
def sanitize_line(line):
clean = re.sub(r'<br[^>]*>', '', line)
clean = re.sub(r'<[^>]+>', '', clean)
return clean.strip()
for node in graph.get_nodes():
label = node.get_label()
if not label: continue
parts = label.strip('<>{} ').split('|')
class_name = sanitize_line(parts[0])
puml_lines.append(f"class {class_name} {{")
if len(parts) > 1:
for attr in re.split(r'<br[^>]*>', parts[1]):
clean_attr = sanitize_line(attr).split(':')[0].strip()
if clean_attr: puml_lines.append(f" - {clean_attr}")
if len(parts) > 2:
method_block = parts[2].strip()
for method_line in re.split(r'<br[^>]*>', method_block):
clean_method = sanitize_line(method_line)
if clean_method: puml_lines.append(f" + {clean_method}")
puml_lines.append("}\n")
for edge in graph.get_edges():
source_name = edge.get_source().strip('"').split('.')[-1]
dest_name = edge.get_destination().strip('"').split('.')[-1]
puml_lines.append(f"{source_name} ..> {dest_name}")
puml_lines.append("@enduml")
with open(puml_file_path, 'w') as f: f.write('\n'.join(puml_lines))
except Exception as e:
with open(dot_file_path, 'r') as f: dot_content_on_error = f.read()
return {"ascii_uml": f"Error: DOT to PUML conversion failed. {str(e)}", "dot_graph": dot_content_on_error}
try:
plantuml_cmd = ["plantuml", "-tutxt", puml_file_path]
subprocess.run(plantuml_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
utxt_file_path = puml_file_path.replace(".puml", ".utxt")
with open(utxt_file_path, 'r') as f: ascii_uml = f.read()
lines = ascii_uml.splitlines()
non_empty_lines = [line for line in lines if line.strip()]
if non_empty_lines:
min_indent = min(len(line) - len(line.lstrip(' ')) for line in non_empty_lines)
dedented_lines = [line[min_indent:] for line in lines]
stripped_lines = [line.rstrip() for line in dedented_lines]
ascii_uml = '\n'.join(stripped_lines)
if ascii_uml: ascii_uml = '\n' + ascii_uml
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: plantuml failed. {error_msg}", "dot_graph": dot_content}
return {"ascii_uml": ascii_uml, "dot_graph": dot_content}
def _get_article_list_data(posts_dir: str = CONFIG["POSTS_DIRECTORY"]) -> List[Dict]:
posts_data = []
if not os.path.isdir(posts_dir):
print(f"Warning: Article directory not found at {posts_dir}", file=sys.stderr)
return []
# 1. Determine if the current posts_dir matches a known mapping
url_config = URL_MAP.get(posts_dir)
for filename in os.listdir(posts_dir):
if not filename.endswith((".md", ".markdown")): continue
filepath = os.path.join(posts_dir, filename)
try:
date_str = filename[:10]
post_date = datetime.strptime(date_str, '%Y-%m-%d').date()
with open(filepath, 'r', encoding='utf-8') as f: content = f.read()
if content.startswith('---'):
parts = content.split('---', 2)
front_matter = yaml.safe_load(parts[1]) or {}
# 2. Generate the URL if config exists
full_url = ""
if url_config:
slug = front_matter.get('permalink', '').strip('/')
if not slug:
# Fallback if permalink isn't in frontmatter (simple filename slug)
raw_slug = os.path.splitext(filename)[0]
# Remove date prefix if present (YYYY-MM-DD-title)
if re.match(r'\d{4}-\d{2}-\d{2}-', raw_slug):
raw_slug = raw_slug[11:]
# Apply style
style = url_config.get('permalink_style', '/:slug/')
slug_path = style.replace(':slug', raw_slug)
else:
# If permalink exists (e.g. /futureproof/my-slug/), use it
slug_path = "/" + slug.lstrip('/')
full_url = f"{url_config['base_url']}{slug_path}"
posts_data.append({
'path': filepath,
'date': post_date,
'sort_order': int(front_matter.get('sort_order', 0)),
'title': front_matter.get('title', 'Untitled'),
'summary': front_matter.get('meta_description', ''),
'url': full_url
})
except (ValueError, yaml.YAMLError, IndexError): continue
return sorted(posts_data, key=lambda p: (p['date'], p['sort_order']))
def parse_slice_arg(arg_str: str):
if not arg_str or not arg_str.startswith('[') or not arg_str.endswith(']'): return None
content = arg_str[1:-1].strip()
if ':' in content:
parts = content.split(':', 1)
start = int(parts[0].strip()) if parts[0].strip() else None
end = int(parts[1].strip()) if parts[1].strip() else None
return slice(start, end)
elif content: return int(content)
return slice(None, None)
def run_tree_command() -> str:
eza_exec = shutil.which("eza")
if not eza_exec: return "Skipping: `eza` command not found."
try:
result = subprocess.run(
[eza_exec, '--tree', '--git-ignore', '--color=never'],
capture_output=True, text=True, cwd=REPO_ROOT, check=True
)
return result.stdout
except Exception as e: return f"Error running eza command: {e}"
# ============================================================================
# --- Helper Functions (File Parsing, Clipboard) ---
# ============================================================================
def parse_file_list_from_config() -> List[Tuple[str, str]]:
try:
import foo_files
files_raw = foo_files.AI_PHOOEY_CHOP
except (ImportError, AttributeError):
print("ERROR: foo_files.py not found or doesn't contain AI_PHOOEY_CHOP.")
sys.exit(1)
lines = files_raw.strip().splitlines()
seen_files, parsed_files = set(), []
for line in lines:
line = line.strip()
if not line or line.startswith('#'): continue
parts = re.split(r'\s*<--\s*|\s*#\s*', line, 1)
file_path = parts[0].strip()
comment = parts[1].strip() if len(parts) > 1 else ""
if file_path and file_path not in seen_files:
seen_files.add(file_path)
parsed_files.append((file_path, comment))
return parsed_files
def copy_to_clipboard(text: str):
if not shutil.which('xclip'):
print("\nWarning: 'xclip' not found. Cannot copy to clipboard.")
return
try:
subprocess.run(['xclip', '-selection', 'clipboard'], input=text.encode('utf-8'), check=True)
print("Markdown output copied to clipboard")
except Exception as e:
print(f"\nWarning: Could not copy to clipboard: {e}")
def check_dependencies():
"""Verifies that all required external command-line tools are installed."""
print("Checking for required external dependencies...")
dependencies = {
"pyreverse": "Provided by `pylint`. Install with: pip install pylint",
"plantuml": "A Java-based tool. See https://plantuml.com/starting",
"eza": "A modern replacement for `ls`. See https://eza.rocks/install",
"xclip": "Clipboard utility for Linux. Install with your package manager (e.g., sudo apt-get install xclip)",
}
missing = []
for tool, instructions in dependencies.items():
if not shutil.which(tool):
missing.append((tool, instructions))
if not missing:
print("✅ All dependencies found.")
else:
print("\n❌ Missing dependencies detected:")
for tool, instructions in missing:
print(f" - Command not found: `{tool}`")
print(f" ↳ {instructions}")
print("\nPlease install the missing tools and ensure they are in your system's PATH.")
sys.exit(1)
# ============================================================================
# --- Refined PromptBuilder Class ---
# ============================================================================
class PromptBuilder:
"""
Builds a complete, structured Markdown prompt with consistent START/END markers
for each section, including Manifest, Story, File Tree, UMLs, Codebase, and Prompt.
"""
def __init__(self, processed_files: List[Dict], prompt_text: str, context_only: bool = False, list_arg: Optional[str] = None):
self.processed_files = processed_files
self.prompt_text = prompt_text
self.context_only = context_only
self.list_arg = list_arg
self.auto_context = {}
self.all_sections = {}
def add_auto_context(self, title: str, content: str):
is_narrative = (title == "Recent Narrative Context")
is_article = (title == "Full Article Content")
content_is_valid = bool(content)
filter_passed = "error" not in content.lower() and "skipping" not in content.lower()
if content_is_valid and (is_narrative or is_article or filter_passed):
self.auto_context[title] = {
'content': content, 'tokens': count_tokens(content), 'words': count_words(content)
}
def _build_manifest_content(self) -> str:
lines = []
section_order = ["Story", "File Tree", "UML Diagrams", "Articles", "Codebase", "Prompt"]
for section_name in section_order:
if section_name in self.all_sections:
data = self.all_sections[section_name]
token_str = f"({data['tokens']:,} tokens)" if data['tokens'] > 0 else ""
lines.append(f"- {section_name} {token_str}")
return "\n".join(lines)
def _build_story_content(self) -> str:
title = "Recent Narrative Context"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_tree_content(self) -> str:
title = "Codebase Structure (eza --tree)"
if title in self.auto_context:
content = self.auto_context[title]['content'].strip()
return f"```text\n{content}\n```"
return ""
def _build_uml_content(self) -> str:
uml_parts = []
for title, data in self.auto_context.items():
if "UML Class Diagram" in title:
uml_parts.append(f"## {title}\n```text\n{data['content']}\n```")
return "\n\n".join(uml_parts)
def _build_articles_content(self) -> str:
title = "Full Article Content"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_codebase_content(self) -> str:
if self.context_only: return ""
if not self.processed_files: return ""
lines = []
for f in self.processed_files:
lines.append(f"--- START: {f['path']} ({f['tokens']:,} tokens) ---")
lines.append(f"```{f['lang']}:{f['path']}")
lines.append(f['content'])
lines.append("```")
lines.append(f"--- END: {f['path']} ---\n")
return "\n".join(lines).strip()
def _build_prompt_content(self) -> str:
checklist = self._generate_ai_checklist()
return f"{checklist}\n\n{self.prompt_text}"
def _generate_ai_checklist(self) -> str:
return '''# AI Self-Correction Checklist
Before addressing the user's prompt, perform the following verification steps:
1. **Review the Manifest vs. Codebase Structure:** Compare the file list in the manifest against the `eza --tree` output. Are there any obviously related, critical files missing from the manifest that would be necessary to understand the user's request? (e.g., if `core.py` is present, is `__init__.py` also present?).
2. **Analyze Import Statements:** Briefly scan the `import` statements in the provided Python files. Do they suggest dependencies on local files that were *not* included in the manifest?
3. **Check for Incompleteness:** If you determine that critical files are missing, do not proceed. Instead, your **primary task** is to inform me of the missing files and recommend adding them to `foo_files.py` to create a more complete context.
4. **Confirm Understanding:** If the context appears complete, state "Context verified." and then proceed with the user's request.
5. **Descriptions:** When describing the passion represented here, you may refer to it as a blueprint, treatise, soliloquy, methodology, philosophy or way. NEVER refer to it as a manifesto which has a negative connotation.
'''
def build_final_prompt(self) -> str:
"""Assembles all parts into the final, structured Markdown string."""
# Prepare content for all sections
story_content = self._build_story_content()
tree_content = self._build_tree_content()
uml_content = self._build_uml_content()
articles_content = self._build_articles_content()
codebase_content = self._build_codebase_content()
prompt_content = self._build_prompt_content()
# Define placeholder messages
placeholders = {
"Story": f"# Narrative context not requested. Use the -l or --list flag to include recent articles.",
"File Tree": "# File tree generation failed or was skipped.",
"UML Diagrams": "# No Python files with classes were included, or UML generation failed.",
"Articles": "# No full articles requested. Use the -a or --article flag to include full article content.",
"Codebase": ("# No files were specified for inclusion in the codebase." if not self.processed_files
else "# Running in --context-only mode. File contents are omitted."),
}
# Store final content and tokens for the manifest calculation
self.all_sections["Story"] = {'content': story_content, 'tokens': count_tokens(story_content)}
self.all_sections["File Tree"] = {'content': tree_content, 'tokens': count_tokens(tree_content)}
self.all_sections["UML Diagrams"] = {'content': uml_content, 'tokens': count_tokens(uml_content)}
self.all_sections["Articles"] = {'content': articles_content, 'tokens': count_tokens(articles_content)}
self.all_sections["Codebase"] = {'content': codebase_content, 'tokens': sum(f['tokens'] for f in self.processed_files) if not self.context_only else 0}
self.all_sections["Prompt"] = {'content': prompt_content, 'tokens': count_tokens(prompt_content)}
manifest_content = self._build_manifest_content()
self.all_sections["Manifest"] = {'content': manifest_content, 'tokens': count_tokens(manifest_content)}
# Assemble the final output string with START/END markers for all sections
parts = ["# KUNG FU PROMPT CONTEXT\n\nWhat you will find below is:\n\n- Manifest\n- Story\n- File Tree\n- UML Diagrams\n- Articles\n- Codebase\n- Prompt"]
def add_section(name, content, placeholder):
final_content = content.strip() if content and content.strip() else placeholder
parts.append(f"--- START: {name} ---\n{final_content}\n--- END: {name} ---")
add_section("Manifest", manifest_content, "# Manifest generation failed.")
story_placeholder = placeholders["Story"] if self.list_arg is None else "# No articles found for the specified slice."
add_section("Story", story_content, story_placeholder)
add_section("File Tree", tree_content, placeholders["File Tree"])
add_section("UML Diagrams", uml_content, placeholders["UML Diagrams"])
add_section("Articles", articles_content, placeholders["Articles"])
add_section("Codebase", codebase_content, placeholders["Codebase"])
add_section("Prompt", prompt_content, "# No prompt was provided.")
return "\n\n".join(parts)
def print_summary(self, verified_token_count: int):
"""Calculates and prints an accurate, comprehensive summary to the console."""
print("--- Files Included ---")
for f in self.processed_files:
if self.context_only:
print(f"• {f['path']} (content omitted)")
else:
byte_len = len(f['content'].encode('utf-8'))
print(f"• {f['path']} ({f['tokens']:,} tokens | {byte_len:,} bytes)")
if self.auto_context:
print("\n--- Auto-Context Included ---")
for title, data in self.auto_context.items():
byte_len = len(data['content'].encode('utf-8'))
print(f"• {title} ({data['tokens']:,} tokens | {byte_len:,} bytes)")
print("\n--- Prompt Summary ---")
if self.context_only:
print("NOTE: Running in --context-only mode. File contents are excluded.")
# This is the original sum of all the individual parts
total_tokens = sum(v.get('tokens', 0) for v in self.all_sections.values())
# We need a word count for the literary perspective.
# We'll base it on the "real" content, not including the prompt/checklist
total_words = 0
final_content = ""
# Reconstruct content for char/byte counting
# (We could pass this in, but accessing the dict is cleaner here)
for section, data in self.all_sections.items():
content_part = data.get('content', '')
final_content += content_part
if section != "Prompt": # Don't count the prompt/checklist in word count
total_words += count_words(content_part)
# METRICS CALCULATIONS
char_count = len(final_content)
byte_count = len(final_content.encode('utf-8'))
print(f"Summed Tokens: {total_tokens:,} (from section parts)")
print(f"Verified Tokens: {verified_token_count:,} (from final output)")
if total_tokens != verified_token_count:
diff = verified_token_count - total_tokens
print(f" (Difference: {diff:+,})")
print(f"Total Words: {total_words:,} (content only)")
# NEW: Physical size metrics
print(f"Total Chars: {char_count:,}")
print(f"Total Bytes: {byte_count:,} (UTF-8)")
# Use the VERIFIED token count for the most accurate ratio
ratio = verified_token_count / total_words if total_words > 0 else 0
perspective = get_literary_perspective(total_words, ratio)
print("\n--- Size Perspective ---")
print(perspective)
print()
# ============================================================================
# --- Main Execution Logic ---
# ============================================================================
def main():
"""Main function to parse args, process files, and generate output."""
parser = argparse.ArgumentParser(description='Generate a Markdown context file for AI code assistance.')
parser.add_argument('prompt', nargs='?', default=None, help='A prompt string or path to a prompt file (e.g., prompt.md).')
parser.add_argument('-o', '--output', type=str, help='Optional: Output filename.')
parser.add_argument('--no-clipboard', action='store_true', help='Disable copying output to clipboard.')
parser.add_argument('--check-dependencies', action='store_true', help='Verify that all required external tools are installed.')
parser.add_argument('--context-only', action='store_true', help='Generate a context-only prompt without file contents.')
parser.add_argument(
'-t', '--target',
type=int,
help='Specify a target ID from targets.json to set the article source.'
)
parser.add_argument(
'-l', '--list',
nargs='?', const='[-5:]', default=None,
help='Include a list of recent articles. Optionally provide a slice, e.g., "[:]". Defaults to "[-5:]".'
)
parser.add_argument(
'-a', '--article',
nargs='?', const='[-1:]', default=None,
help='Include FULL CONTENT of recent articles. Provide a slice, e.g., "[-5:]". Defaults to "[-1:]".'
)
args = parser.parse_args()
if args.check_dependencies:
check_dependencies()
sys.exit(0)
targets = load_targets()
if args.target is not None:
target_id_str = str(args.target)
if targets and target_id_str in targets:
selected_target = targets[target_id_str]
CONFIG["POSTS_DIRECTORY"] = selected_target["path"]
print(f"🎯 Target set to: {selected_target['name']}")
else:
print(f"Error: Target ID '{args.target}' not found in configuration.", file=sys.stderr)
sys.exit(1)
# 1. Handle user prompt
prompt_content = "Please review the provided context and assist with the codebase."
if args.prompt:
if os.path.exists(args.prompt):
with open(args.prompt, 'r', encoding='utf-8') as f: prompt_content = f.read()
else:
prompt_content = args.prompt
elif os.path.exists("prompt.md"):
with open("prompt.md", 'r', encoding='utf-8') as f: prompt_content = f.read()
# 2. Process all specified files
files_to_process = parse_file_list_from_config()
processed_files_data = []
print("--- Processing Files ---")
for path, comment in files_to_process:
full_path = os.path.join(REPO_ROOT, path) if not os.path.isabs(path) else path
if not os.path.exists(full_path):
print(f"Warning: File not found and will be skipped: {full_path}")
continue
content, lang = "", "text"
ext = os.path.splitext(path)[1].lower()
if ext == '.ipynb':
if JUPYTEXT_AVAILABLE:
print(f" -> Converting notebook: {path}")
try:
notebook = jupytext.read(full_path)
content = jupytext.writes(notebook, fmt='py:percent')
lang = 'python'
except Exception as e:
content = f"# FAILED TO CONVERT NOTEBOOK: {path}\n# ERROR: {e}"
print(f"Warning: Failed to convert {path}: {e}")
else:
content = f"# SKIPPING NOTEBOOK CONVERSION: jupytext not installed for {path}"
print(f"Warning: `jupytext` library not found. Skipping conversion for {path}.")
else:
try:
with open(full_path, 'r', encoding='utf-8') as f: content = f.read()
lang_map = {'.py': 'python', '.js': 'javascript', '.html': 'html', '.css': 'css', '.md': 'markdown', '.json': 'json', '.nix': 'nix', '.sh': 'bash'}
lang = lang_map.get(ext, 'text')
except Exception as e:
print(f"ERROR: Could not read or process {full_path}: {e}")
sys.exit(1)
processed_files_data.append({
"path": path, "comment": comment, "content": content,
"tokens": count_tokens(content), "words": count_words(content), "lang": lang
})
# 3. Build the prompt and add auto-generated context
builder = PromptBuilder(processed_files_data, prompt_content, context_only=args.context_only, list_arg=args.list)
# Only generate the codebase tree if .py files are explicitly included.
# This avoids clutter when only .md, .nix, or .ipynb files are present.
include_tree = any(f['path'].endswith('.py') for f in processed_files_data)
if include_tree:
print("Python file(s) detected. Generating codebase tree diagram...", end='', flush=True)
tree_output = run_tree_command()
title = "Codebase Structure (eza --tree)"
builder.add_auto_context(title, tree_output)
print(f" ({builder.auto_context.get(title, {}).get('tokens', 0):,} tokens)")
else:
print("Skipping codebase tree (no .py files included).")
if args.list is not None:
print("Adding narrative context from articles...", end='', flush=True)
# Fix: Explicitly pass the updated directory from CONFIG
all_articles = _get_article_list_data(CONFIG["POSTS_DIRECTORY"])
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.list)
if isinstance(slice_or_index, int): sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice): sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.list}')")
sliced_articles = []
if sliced_articles:
narrative_content = ""
for article in sliced_articles:
narrative_content += f"### {article['title']} ({article['date']})\n"
if article.get('url'):
narrative_content += f"> **URL:** {article['url']}\n"
else:
narrative_content += f"> **Path:** {article['path']}\n"
narrative_content += f"> {article['summary']}\n\n"
builder.add_auto_context("Recent Narrative Context", narrative_content.strip())
print(f" ({len(sliced_articles)} articles)")
else:
print(" (no articles found or invalid slice)")
if args.article is not None:
print("Adding full article content...", end='', flush=True)
# Fix: Explicitly pass the updated directory from CONFIG
all_articles = _get_article_list_data(CONFIG["POSTS_DIRECTORY"])
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.article)
if isinstance(slice_or_index, int):
sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice):
sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.article}')")
if sliced_articles:
full_content_parts = []
for article in sliced_articles:
try:
with open(article['path'], 'r', encoding='utf-8') as f:
content = f.read()
# Add a header for each article to separate them clearly
full_content_parts.append(f"--- START: Article: {os.path.basename(article['path'])} ---\n{content.strip()}\n--- END: Article ---\n")
except Exception as e:
print(f"\nWarning: Could not read article {article['path']}: {e}")
if full_content_parts:
full_article_content = "\n".join(full_content_parts)
builder.add_auto_context("Full Article Content", full_article_content)
print(f" ({len(sliced_articles)} full articles)")
else:
print(" (no articles found or invalid slice)")
python_files_to_diagram = [f['path'] for f in processed_files_data if f['path'].endswith('.py')]
if python_files_to_diagram:
print("Python file(s) detected. Generating UML diagrams...")
for py_file_path in python_files_to_diagram:
print(f" -> Generating for {py_file_path}...", end='', flush=True)
uml_context = generate_uml_and_dot(py_file_path, CONFIG["PROJECT_NAME"])
uml_content = uml_context.get("ascii_uml")
title = f"UML Class Diagram (ASCII for {py_file_path})"
builder.add_auto_context(title, uml_content)
if title in builder.auto_context:
print(f" ({builder.auto_context[title]['tokens']:,} tokens)")
elif uml_content and "note: no classes" in uml_content.lower():
print(" (skipped, no classes)")
else:
print(" (skipped)")
print("...UML generation complete.\n")
# 4. Generate final output and print summary
final_output = builder.build_final_prompt()
# Get the "ground truth" token count from the final, fully-formed string
verified_token_count = count_tokens(final_output)
# Pass this verified count to the summary printer for comparison
builder.print_summary(verified_token_count)
# 5. Handle output
if args.output:
with open(args.output, 'w', encoding='utf-8') as f: f.write(final_output)
print(f"Output written to '{args.output}'")
if not args.no_clipboard:
copy_to_clipboard(final_output)
if __name__ == "__main__":
main()
Me: And here’s the diff:
(nix)
[mike@nixos:~/repos/pipulate]$ git status
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
clipboard_ruler.py
nothing added to commit but untracked files present (use "git add" to track)
(nix)
[mike@nixos:~/repos/pipulate]$ xv prompt_foo.py
(nix)
[mike@nixos:~/repos/pipulate]$ git --no-pager diff
diff --git a/prompt_foo.py b/prompt_foo.py
index e85381af..98ad9751 100644
--- a/prompt_foo.py
+++ b/prompt_foo.py
@@ -466,12 +466,14 @@ Before addressing the user's prompt, perform the following verification steps:
if self.context_only:
print(f"• {f['path']} (content omitted)")
else:
- print(f"• {f['path']} ({f['tokens']:,} tokens)")
+ byte_len = len(f['content'].encode('utf-8'))
+ print(f"• {f['path']} ({f['tokens']:,} tokens | {byte_len:,} bytes)")
if self.auto_context:
print("\n--- Auto-Context Included ---")
for title, data in self.auto_context.items():
- print(f"• {title} ({data['tokens']:,} tokens)")
+ byte_len = len(data['content'].encode('utf-8'))
+ print(f"• {title} ({data['tokens']:,} tokens | {byte_len:,} bytes)")
print("\n--- Prompt Summary ---")
if self.context_only:
(nix)
[mike@nixos:~/repos/pipulate]$
Okay, this worked:
(nix)
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py
--- Processing Files ---
Python file(s) detected. Generating codebase tree diagram... (2,306 tokens)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/config.py... (skipped)
-> Generating for /home/mike/repos/pipulate/server.py... (414 tokens)
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
-> Generating for /home/mike/repos/pipulate/apps/040_hello_workflow.py... (251 tokens)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/assets/installer/install.sh (2,380 tokens | 9,589 bytes)
• /home/mike/repos/pipulate/flake.nix (7,564 tokens | 32,396 bytes)
• /home/mike/repos/pipulate/.gitignore (431 tokens | 1,575 bytes)
• /home/mike/repos/pipulate/requirements.in (352 tokens | 1,153 bytes)
• /home/mike/repos/pipulate/config.py (4,098 tokens | 15,949 bytes)
• /home/mike/repos/pipulate/server.py (53,947 tokens | 257,525 bytes)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens | 1,548 bytes)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens | 2,256 bytes)
• /home/mike/repos/pipulate/pipulate/core.py (22,487 tokens | 109,596 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md (18,974 tokens | 76,589 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md (13,032 tokens | 55,179 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md (30,252 tokens | 123,729 bytes)
• /home/mike/repos/pipulate/apps/040_hello_workflow.py (7,758 tokens | 36,910 bytes)
• /home/mike/repos/pipulate/assets/pipulate-init.js (26,936 tokens | 127,974 bytes)
• /home/mike/repos/pipulate/assets/scenarios/introduction.json (2,443 tokens | 9,516 bytes)
• /home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json (716 tokens | 2,903 bytes)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,306 tokens | 7,589 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/server.py) (414 tokens | 3,481 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens | 46,708 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/apps/040_hello_workflow.py) (251 tokens | 1,918 bytes)
--- Prompt Summary ---
Summed Tokens: 214,419 (from section parts)
Verified Tokens: 215,706 (from final output)
(Difference: +1,287)
Total Words: 87,837 (content only)
Total Chars: 986,209
Total Bytes: 998,622 (UTF-8)
--- Size Perspective ---
📚 Equivalent in length to a **Long Novel** (Note: With a token/word ratio of 2.46, this content is far denser and more complex than typical prose of this length).
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]$
…and didn’t get chopped off. But this:
(nix)
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py -l [:]
--- Processing Files ---
Python file(s) detected. Generating codebase tree diagram... (2,306 tokens)
Adding narrative context from articles... (709 articles)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/config.py... (skipped)
-> Generating for /home/mike/repos/pipulate/server.py... (414 tokens)
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
-> Generating for /home/mike/repos/pipulate/apps/040_hello_workflow.py... (251 tokens)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/assets/installer/install.sh (2,380 tokens | 9,589 bytes)
• /home/mike/repos/pipulate/flake.nix (7,564 tokens | 32,396 bytes)
• /home/mike/repos/pipulate/.gitignore (431 tokens | 1,575 bytes)
• /home/mike/repos/pipulate/requirements.in (352 tokens | 1,153 bytes)
• /home/mike/repos/pipulate/config.py (4,098 tokens | 15,949 bytes)
• /home/mike/repos/pipulate/server.py (53,947 tokens | 257,525 bytes)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens | 1,548 bytes)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens | 2,256 bytes)
• /home/mike/repos/pipulate/pipulate/core.py (22,487 tokens | 109,596 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md (18,974 tokens | 76,589 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md (13,032 tokens | 55,179 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md (30,252 tokens | 123,729 bytes)
• /home/mike/repos/pipulate/apps/040_hello_workflow.py (7,758 tokens | 36,910 bytes)
• /home/mike/repos/pipulate/assets/pipulate-init.js (26,936 tokens | 127,974 bytes)
• /home/mike/repos/pipulate/assets/scenarios/introduction.json (2,443 tokens | 9,516 bytes)
• /home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json (716 tokens | 2,903 bytes)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,306 tokens | 7,589 bytes)
• Recent Narrative Context (56,538 tokens | 228,769 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/server.py) (414 tokens | 3,481 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens | 46,708 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/apps/040_hello_workflow.py) (251 tokens | 1,918 bytes)
--- Prompt Summary ---
Summed Tokens: 270,962 (from section parts)
Verified Tokens: 272,230 (from final output)
(Difference: +1,268)
Total Words: 114,668 (content only)
Total Chars: 1,214,803
Total Bytes: 1,227,406 (UTF-8)
--- Size Perspective ---
📚 Equivalent in length to a **Long Novel** (Note: With a token/word ratio of 2.37, this content is far denser and more complex than typical prose of this length).
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]$
…does get chopped off and also looks like it’s reporting correctly. Again,
great job! I’ll make the continuation of the actual original purpose of this
article into the next article. Is there any final polish you think should be
done to prompt_foo.py while guarding against generative regression? You’ve
done a most excellent job not regressing other features while expanding
prompt_foo.py and this is feeling like new state of the art.
All I’m thinking is that byte counts here:
Python file(s) detected. Generating codebase tree diagram... (2,306 tokens)
Adding narrative context from articles... (709 articles)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/config.py... (skipped)
-> Generating for /home/mike/repos/pipulate/server.py... (414 tokens)
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
-> Generating for /home/mike/repos/pipulate/apps/040_hello_workflow.py... (251 tokens)
...UML generation complete.
…might be good for consistency’s sake. It’s not like the user can not
include UML diagrams, but it will let the user gage the total gain/loss of
dropping files from the files included and whether or not to include the -l
[:] argument — like in this example, that’s what I’ll be dropping to get to a
working submit for the follow-on article.
Gemini: Context verified. You are building a finely-tuned instrument here. Adding “live” cost metrics to the generation log is the final piece of the puzzle; it lets you gauge the weight of optional components (like the article history or UML diagrams) before you even hit the final summary screen.
This final polish ensures consistency across the board: every time prompt_foo.py reports a token count, it will now report the byte count alongside it.
Here is the final, polished version of prompt_foo.py.
Final prompt_foo.py
Overwrite /home/mike/repos/pipulate/prompt_foo.py one last time.
#!/usr/bin/env python3
# prompt_foo.py
# > We've got content, it's groovy context
# > Our concatenation just won't stop
# > Making stories art for a super-smart
# > AI-Phooey chop (Hi-Ya!)
import os
import re
import sys
import pydot
import yaml
import argparse
import tiktoken
import subprocess
import tempfile
import shutil
import json
from datetime import datetime
from typing import Dict, List, Optional, Tuple
try:
import jupytext
JUPYTEXT_AVAILABLE = True
except ImportError:
JUPYTEXT_AVAILABLE = False
def load_url_map():
"""Loads the URL mapping configuration from .config/url_map.json"""
config_path = os.path.expanduser("~/.config/articleizer/url_map.json")
if os.path.exists(config_path):
try:
with open(config_path, 'r') as f:
return json.load(f)
except json.JSONDecodeError:
print(f"Warning: Could not decode JSON from {config_path}")
return {}
def load_targets():
"""Loads publishing targets from external config."""
config_path = os.path.expanduser("~/.config/articleizer/targets.json")
if os.path.exists(config_path):
try:
with open(config_path, 'r') as f:
return json.load(f)
except json.JSONDecodeError:
print(f"Warning: Could not decode JSON from {config_path}")
return {}
URL_MAP = load_url_map()
# Hello there, AI! This is a tool for generating a single, comprehensive prompt
# from the command line, bundling codebase files and auto-generated context
# into a structured Markdown format for effective AI assistance.
# ============================================================================
# --- Configuration ---
# ============================================================================
def find_repo_root(start_path: str) -> str:
"""Find the git repository root from a starting path."""
path = os.path.abspath(start_path)
while path != os.path.dirname(path):
if os.path.isdir(os.path.join(path, '.git')):
return path
path = os.path.dirname(path)
raise FileNotFoundError("Could not find the repository root (.git directory).")
REPO_ROOT = find_repo_root(os.path.dirname(__file__))
CONFIG = {
"PROJECT_NAME": "pipulate",
"POSTS_DIRECTORY": "/home/mike/repos/MikeLev.in/_posts"
}
# ============================================================================
# --- Literary Size Scale & Token/Word Counting ---
# ============================================================================
LITERARY_SIZE_SCALE = [
(3000, "Short Essay"), (7500, "Short Story"), (20000, "Novelette"),
(50000, "Novella or a Master's Dissertation"),
(80000, "Average Paperback Novel or a Ph.D. Dissertation"),
(120000, "Long Novel"), (200000, "Epic Fantasy Novel"),
(500000, "Seriously Long Epic (like 'Infinite Jest')"),
]
def get_literary_perspective(word_count: int, token_word_ratio: float) -> str:
description = f"Longer than {LITERARY_SIZE_SCALE[-1][1]}"
for words, desc in LITERARY_SIZE_SCALE:
if word_count <= words:
description = desc
break
density_warning = ""
if token_word_ratio > 1.8:
density_warning = (
f" (Note: With a token/word ratio of {token_word_ratio:.2f}, "
f"this content is far denser and more complex than typical prose of this length)."
)
return f"📚 Equivalent in length to a **{description}**{density_warning}"
def count_tokens(text: str, model: str = "gpt-4o") -> int:
try:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
except Exception:
return len(text.split())
def count_words(text: str) -> int:
return len(text.split())
# ============================================================================
# --- Auto-Context Generation (UML, Tree, Narrative) ---
# ============================================================================
def generate_uml_and_dot(target_file: str, project_name: str) -> Dict:
# This function remains unchanged.
pyreverse_exec = shutil.which("pyreverse")
plantuml_exec = shutil.which("plantuml")
if not pyreverse_exec or not plantuml_exec:
msg = []
if not pyreverse_exec: msg.append("`pyreverse` (from pylint)")
if not plantuml_exec: msg.append("`plantuml`")
return {"ascii_uml": f"Skipping: Required command(s) not found: {', '.join(msg)}."}
target_path = os.path.join(REPO_ROOT, target_file)
if not os.path.exists(target_path):
return {"ascii_uml": f"Skipping: Target file for UML generation not found: {target_path}"}
with tempfile.TemporaryDirectory() as temp_dir:
dot_file_path = os.path.join(temp_dir, "classes.dot")
puml_file_path = os.path.join(temp_dir, "diagram.puml")
try:
pyreverse_cmd = [pyreverse_exec, "-f", "dot", "-o", "dot", "-p", project_name, target_path]
subprocess.run(pyreverse_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
generated_dot_name = f"classes_{project_name}.dot"
os.rename(os.path.join(temp_dir, generated_dot_name), dot_file_path)
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: pyreverse failed. {error_msg}", "dot_graph": None}
try:
graphs = pydot.graph_from_dot_file(dot_file_path)
if not graphs:
return {"ascii_uml": f"Note: No classes found in {target_file} to generate a diagram.", "dot_graph": None}
graph = graphs[0]
dot_content = graph.to_string()
puml_lines = ["@startuml", "skinparam linetype ortho", ""]
def sanitize_line(line):
clean = re.sub(r'<br[^>]*>', '', line)
clean = re.sub(r'<[^>]+>', '', clean)
return clean.strip()
for node in graph.get_nodes():
label = node.get_label()
if not label: continue
parts = label.strip('<>{} ').split('|')
class_name = sanitize_line(parts[0])
puml_lines.append(f"class {class_name} {{")
if len(parts) > 1:
for attr in re.split(r'<br[^>]*>', parts[1]):
clean_attr = sanitize_line(attr).split(':')[0].strip()
if clean_attr: puml_lines.append(f" - {clean_attr}")
if len(parts) > 2:
method_block = parts[2].strip()
for method_line in re.split(r'<br[^>]*>', method_block):
clean_method = sanitize_line(method_line)
if clean_method: puml_lines.append(f" + {clean_method}")
puml_lines.append("}\n")
for edge in graph.get_edges():
source_name = edge.get_source().strip('"').split('.')[-1]
dest_name = edge.get_destination().strip('"').split('.')[-1]
puml_lines.append(f"{source_name} ..> {dest_name}")
puml_lines.append("@enduml")
with open(puml_file_path, 'w') as f: f.write('\n'.join(puml_lines))
except Exception as e:
with open(dot_file_path, 'r') as f: dot_content_on_error = f.read()
return {"ascii_uml": f"Error: DOT to PUML conversion failed. {str(e)}", "dot_graph": dot_content_on_error}
try:
plantuml_cmd = ["plantuml", "-tutxt", puml_file_path]
subprocess.run(plantuml_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
utxt_file_path = puml_file_path.replace(".puml", ".utxt")
with open(utxt_file_path, 'r') as f: ascii_uml = f.read()
lines = ascii_uml.splitlines()
non_empty_lines = [line for line in lines if line.strip()]
if non_empty_lines:
min_indent = min(len(line) - len(line.lstrip(' ')) for line in non_empty_lines)
dedented_lines = [line[min_indent:] for line in lines]
stripped_lines = [line.rstrip() for line in dedented_lines]
ascii_uml = '\n'.join(stripped_lines)
if ascii_uml: ascii_uml = '\n' + ascii_uml
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: plantuml failed. {error_msg}", "dot_graph": dot_content}
return {"ascii_uml": ascii_uml, "dot_graph": dot_content}
def _get_article_list_data(posts_dir: str = CONFIG["POSTS_DIRECTORY"]) -> List[Dict]:
posts_data = []
if not os.path.isdir(posts_dir):
print(f"Warning: Article directory not found at {posts_dir}", file=sys.stderr)
return []
# 1. Determine if the current posts_dir matches a known mapping
url_config = URL_MAP.get(posts_dir)
for filename in os.listdir(posts_dir):
if not filename.endswith((".md", ".markdown")): continue
filepath = os.path.join(posts_dir, filename)
try:
date_str = filename[:10]
post_date = datetime.strptime(date_str, '%Y-%m-%d').date()
with open(filepath, 'r', encoding='utf-8') as f: content = f.read()
if content.startswith('---'):
parts = content.split('---', 2)
front_matter = yaml.safe_load(parts[1]) or {}
# 2. Generate the URL if config exists
full_url = ""
if url_config:
slug = front_matter.get('permalink', '').strip('/')
if not slug:
# Fallback if permalink isn't in frontmatter (simple filename slug)
raw_slug = os.path.splitext(filename)[0]
# Remove date prefix if present (YYYY-MM-DD-title)
if re.match(r'\d{4}-\d{2}-\d{2}-', raw_slug):
raw_slug = raw_slug[11:]
# Apply style
style = url_config.get('permalink_style', '/:slug/')
slug_path = style.replace(':slug', raw_slug)
else:
# If permalink exists (e.g. /futureproof/my-slug/), use it
slug_path = "/" + slug.lstrip('/')
full_url = f"{url_config['base_url']}{slug_path}"
posts_data.append({
'path': filepath,
'date': post_date,
'sort_order': int(front_matter.get('sort_order', 0)),
'title': front_matter.get('title', 'Untitled'),
'summary': front_matter.get('meta_description', ''),
'url': full_url
})
except (ValueError, yaml.YAMLError, IndexError): continue
return sorted(posts_data, key=lambda p: (p['date'], p['sort_order']))
def parse_slice_arg(arg_str: str):
if not arg_str or not arg_str.startswith('[') or not arg_str.endswith(']'): return None
content = arg_str[1:-1].strip()
if ':' in content:
parts = content.split(':', 1)
start = int(parts[0].strip()) if parts[0].strip() else None
end = int(parts[1].strip()) if parts[1].strip() else None
return slice(start, end)
elif content: return int(content)
return slice(None, None)
def run_tree_command() -> str:
eza_exec = shutil.which("eza")
if not eza_exec: return "Skipping: `eza` command not found."
try:
result = subprocess.run(
[eza_exec, '--tree', '--git-ignore', '--color=never'],
capture_output=True, text=True, cwd=REPO_ROOT, check=True
)
return result.stdout
except Exception as e: return f"Error running eza command: {e}"
# ============================================================================
# --- Helper Functions (File Parsing, Clipboard) ---
# ============================================================================
def parse_file_list_from_config() -> List[Tuple[str, str]]:
try:
import foo_files
files_raw = foo_files.AI_PHOOEY_CHOP
except (ImportError, AttributeError):
print("ERROR: foo_files.py not found or doesn't contain AI_PHOOEY_CHOP.")
sys.exit(1)
lines = files_raw.strip().splitlines()
seen_files, parsed_files = set(), []
for line in lines:
line = line.strip()
if not line or line.startswith('#'): continue
parts = re.split(r'\s*<--\s*|\s*#\s*', line, 1)
file_path = parts[0].strip()
comment = parts[1].strip() if len(parts) > 1 else ""
if file_path and file_path not in seen_files:
seen_files.add(file_path)
parsed_files.append((file_path, comment))
return parsed_files
def copy_to_clipboard(text: str):
if not shutil.which('xclip'):
print("\nWarning: 'xclip' not found. Cannot copy to clipboard.")
return
try:
subprocess.run(['xclip', '-selection', 'clipboard'], input=text.encode('utf-8'), check=True)
print("Markdown output copied to clipboard")
except Exception as e:
print(f"\nWarning: Could not copy to clipboard: {e}")
def check_dependencies():
"""Verifies that all required external command-line tools are installed."""
print("Checking for required external dependencies...")
dependencies = {
"pyreverse": "Provided by `pylint`. Install with: pip install pylint",
"plantuml": "A Java-based tool. See https://plantuml.com/starting",
"eza": "A modern replacement for `ls`. See https://eza.rocks/install",
"xclip": "Clipboard utility for Linux. Install with your package manager (e.g., sudo apt-get install xclip)",
}
missing = []
for tool, instructions in dependencies.items():
if not shutil.which(tool):
missing.append((tool, instructions))
if not missing:
print("✅ All dependencies found.")
else:
print("\n❌ Missing dependencies detected:")
for tool, instructions in missing:
print(f" - Command not found: `{tool}`")
print(f" ↳ {instructions}")
print("\nPlease install the missing tools and ensure they are in your system's PATH.")
sys.exit(1)
# ============================================================================
# --- Refined PromptBuilder Class ---
# ============================================================================
class PromptBuilder:
"""
Builds a complete, structured Markdown prompt with consistent START/END markers
for each section, including Manifest, Story, File Tree, UMLs, Codebase, and Prompt.
"""
def __init__(self, processed_files: List[Dict], prompt_text: str, context_only: bool = False, list_arg: Optional[str] = None):
self.processed_files = processed_files
self.prompt_text = prompt_text
self.context_only = context_only
self.list_arg = list_arg
self.auto_context = {}
self.all_sections = {}
def add_auto_context(self, title: str, content: str):
is_narrative = (title == "Recent Narrative Context")
is_article = (title == "Full Article Content")
content_is_valid = bool(content)
filter_passed = "error" not in content.lower() and "skipping" not in content.lower()
if content_is_valid and (is_narrative or is_article or filter_passed):
self.auto_context[title] = {
'content': content, 'tokens': count_tokens(content), 'words': count_words(content)
}
def _build_manifest_content(self) -> str:
lines = []
section_order = ["Story", "File Tree", "UML Diagrams", "Articles", "Codebase", "Prompt"]
for section_name in section_order:
if section_name in self.all_sections:
data = self.all_sections[section_name]
token_str = f"({data['tokens']:,} tokens)" if data['tokens'] > 0 else ""
lines.append(f"- {section_name} {token_str}")
return "\n".join(lines)
def _build_story_content(self) -> str:
title = "Recent Narrative Context"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_tree_content(self) -> str:
title = "Codebase Structure (eza --tree)"
if title in self.auto_context:
content = self.auto_context[title]['content'].strip()
return f"```text\n{content}\n```"
return ""
def _build_uml_content(self) -> str:
uml_parts = []
for title, data in self.auto_context.items():
if "UML Class Diagram" in title:
uml_parts.append(f"## {title}\n```text\n{data['content']}\n```")
return "\n\n".join(uml_parts)
def _build_articles_content(self) -> str:
title = "Full Article Content"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_codebase_content(self) -> str:
if self.context_only: return ""
if not self.processed_files: return ""
lines = []
for f in self.processed_files:
lines.append(f"--- START: {f['path']} ({f['tokens']:,} tokens) ---")
lines.append(f"```{f['lang']}:{f['path']}")
lines.append(f['content'])
lines.append("```")
lines.append(f"--- END: {f['path']} ---\n")
return "\n".join(lines).strip()
def _build_prompt_content(self) -> str:
checklist = self._generate_ai_checklist()
return f"{checklist}\n\n{self.prompt_text}"
def _generate_ai_checklist(self) -> str:
return '''# AI Self-Correction Checklist
Before addressing the user's prompt, perform the following verification steps:
1. **Review the Manifest vs. Codebase Structure:** Compare the file list in the manifest against the `eza --tree` output. Are there any obviously related, critical files missing from the manifest that would be necessary to understand the user's request? (e.g., if `core.py` is present, is `__init__.py` also present?).
2. **Analyze Import Statements:** Briefly scan the `import` statements in the provided Python files. Do they suggest dependencies on local files that were *not* included in the manifest?
3. **Check for Incompleteness:** If you determine that critical files are missing, do not proceed. Instead, your **primary task** is to inform me of the missing files and recommend adding them to `foo_files.py` to create a more complete context.
4. **Confirm Understanding:** If the context appears complete, state "Context verified." and then proceed with the user's request.
5. **Descriptions:** When describing the passion represented here, you may refer to it as a blueprint, treatise, soliloquy, methodology, philosophy or way. NEVER refer to it as a manifesto which has a negative connotation.
'''
def build_final_prompt(self) -> str:
"""Assembles all parts into the final, structured Markdown string."""
# Prepare content for all sections
story_content = self._build_story_content()
tree_content = self._build_tree_content()
uml_content = self._build_uml_content()
articles_content = self._build_articles_content()
codebase_content = self._build_codebase_content()
prompt_content = self._build_prompt_content()
# Define placeholder messages
placeholders = {
"Story": f"# Narrative context not requested. Use the -l or --list flag to include recent articles.",
"File Tree": "# File tree generation failed or was skipped.",
"UML Diagrams": "# No Python files with classes were included, or UML generation failed.",
"Articles": "# No full articles requested. Use the -a or --article flag to include full article content.",
"Codebase": ("# No files were specified for inclusion in the codebase." if not self.processed_files
else "# Running in --context-only mode. File contents are omitted."),
}
# Store final content and tokens for the manifest calculation
self.all_sections["Story"] = {'content': story_content, 'tokens': count_tokens(story_content)}
self.all_sections["File Tree"] = {'content': tree_content, 'tokens': count_tokens(tree_content)}
self.all_sections["UML Diagrams"] = {'content': uml_content, 'tokens': count_tokens(uml_content)}
self.all_sections["Articles"] = {'content': articles_content, 'tokens': count_tokens(articles_content)}
self.all_sections["Codebase"] = {'content': codebase_content, 'tokens': sum(f['tokens'] for f in self.processed_files) if not self.context_only else 0}
self.all_sections["Prompt"] = {'content': prompt_content, 'tokens': count_tokens(prompt_content)}
manifest_content = self._build_manifest_content()
self.all_sections["Manifest"] = {'content': manifest_content, 'tokens': count_tokens(manifest_content)}
# Assemble the final output string with START/END markers for all sections
parts = ["# KUNG FU PROMPT CONTEXT\n\nWhat you will find below is:\n\n- Manifest\n- Story\n- File Tree\n- UML Diagrams\n- Articles\n- Codebase\n- Prompt"]
def add_section(name, content, placeholder):
final_content = content.strip() if content and content.strip() else placeholder
parts.append(f"--- START: {name} ---\n{final_content}\n--- END: {name} ---")
add_section("Manifest", manifest_content, "# Manifest generation failed.")
story_placeholder = placeholders["Story"] if self.list_arg is None else "# No articles found for the specified slice."
add_section("Story", story_content, story_placeholder)
add_section("File Tree", tree_content, placeholders["File Tree"])
add_section("UML Diagrams", uml_content, placeholders["UML Diagrams"])
add_section("Articles", articles_content, placeholders["Articles"])
add_section("Codebase", codebase_content, placeholders["Codebase"])
add_section("Prompt", prompt_content, "# No prompt was provided.")
return "\n\n".join(parts)
def print_summary(self, verified_token_count: int):
"""Calculates and prints an accurate, comprehensive summary to the console."""
print("--- Files Included ---")
for f in self.processed_files:
if self.context_only:
print(f"• {f['path']} (content omitted)")
else:
byte_len = len(f['content'].encode('utf-8'))
print(f"• {f['path']} ({f['tokens']:,} tokens | {byte_len:,} bytes)")
if self.auto_context:
print("\n--- Auto-Context Included ---")
for title, data in self.auto_context.items():
byte_len = len(data['content'].encode('utf-8'))
print(f"• {title} ({data['tokens']:,} tokens | {byte_len:,} bytes)")
print("\n--- Prompt Summary ---")
if self.context_only:
print("NOTE: Running in --context-only mode. File contents are excluded.")
# This is the original sum of all the individual parts
total_tokens = sum(v.get('tokens', 0) for v in self.all_sections.values())
# We need a word count for the literary perspective.
# We'll base it on the "real" content, not including the prompt/checklist
total_words = 0
final_content = ""
# Reconstruct content for char/byte counting
# (We could pass this in, but accessing the dict is cleaner here)
for section, data in self.all_sections.items():
content_part = data.get('content', '')
final_content += content_part
if section != "Prompt": # Don't count the prompt/checklist in word count
total_words += count_words(content_part)
# METRICS CALCULATIONS
char_count = len(final_content)
byte_count = len(final_content.encode('utf-8'))
print(f"Summed Tokens: {total_tokens:,} (from section parts)")
print(f"Verified Tokens: {verified_token_count:,} (from final output)")
if total_tokens != verified_token_count:
diff = verified_token_count - total_tokens
print(f" (Difference: {diff:+,})")
print(f"Total Words: {total_words:,} (content only)")
# NEW: Physical size metrics
print(f"Total Chars: {char_count:,}")
print(f"Total Bytes: {byte_count:,} (UTF-8)")
# Use the VERIFIED token count for the most accurate ratio
ratio = verified_token_count / total_words if total_words > 0 else 0
perspective = get_literary_perspective(total_words, ratio)
print("\n--- Size Perspective ---")
print(perspective)
print()
# ============================================================================
# --- Main Execution Logic ---
# ============================================================================
def main():
"""Main function to parse args, process files, and generate output."""
parser = argparse.ArgumentParser(description='Generate a Markdown context file for AI code assistance.')
parser.add_argument('prompt', nargs='?', default=None, help='A prompt string or path to a prompt file (e.g., prompt.md).')
parser.add_argument('-o', '--output', type=str, help='Optional: Output filename.')
parser.add_argument('--no-clipboard', action='store_true', help='Disable copying output to clipboard.')
parser.add_argument('--check-dependencies', action='store_true', help='Verify that all required external tools are installed.')
parser.add_argument('--context-only', action='store_true', help='Generate a context-only prompt without file contents.')
parser.add_argument(
'-t', '--target',
type=int,
help='Specify a target ID from targets.json to set the article source.'
)
parser.add_argument(
'-l', '--list',
nargs='?', const='[-5:]', default=None,
help='Include a list of recent articles. Optionally provide a slice, e.g., "[:]". Defaults to "[-5:]".'
)
parser.add_argument(
'-a', '--article',
nargs='?', const='[-1:]', default=None,
help='Include FULL CONTENT of recent articles. Provide a slice, e.g., "[-5:]". Defaults to "[-1:]".'
)
args = parser.parse_args()
if args.check_dependencies:
check_dependencies()
sys.exit(0)
targets = load_targets()
if args.target is not None:
target_id_str = str(args.target)
if targets and target_id_str in targets:
selected_target = targets[target_id_str]
CONFIG["POSTS_DIRECTORY"] = selected_target["path"]
print(f"🎯 Target set to: {selected_target['name']}")
else:
print(f"Error: Target ID '{args.target}' not found in configuration.", file=sys.stderr)
sys.exit(1)
# 1. Handle user prompt
prompt_content = "Please review the provided context and assist with the codebase."
if args.prompt:
if os.path.exists(args.prompt):
with open(args.prompt, 'r', encoding='utf-8') as f: prompt_content = f.read()
else:
prompt_content = args.prompt
elif os.path.exists("prompt.md"):
with open("prompt.md", 'r', encoding='utf-8') as f: prompt_content = f.read()
# 2. Process all specified files
files_to_process = parse_file_list_from_config()
processed_files_data = []
print("--- Processing Files ---")
for path, comment in files_to_process:
full_path = os.path.join(REPO_ROOT, path) if not os.path.isabs(path) else path
if not os.path.exists(full_path):
print(f"Warning: File not found and will be skipped: {full_path}")
continue
content, lang = "", "text"
ext = os.path.splitext(path)[1].lower()
if ext == '.ipynb':
if JUPYTEXT_AVAILABLE:
print(f" -> Converting notebook: {path}")
try:
notebook = jupytext.read(full_path)
content = jupytext.writes(notebook, fmt='py:percent')
lang = 'python'
except Exception as e:
content = f"# FAILED TO CONVERT NOTEBOOK: {path}\n# ERROR: {e}"
print(f"Warning: Failed to convert {path}: {e}")
else:
content = f"# SKIPPING NOTEBOOK CONVERSION: jupytext not installed for {path}"
print(f"Warning: `jupytext` library not found. Skipping conversion for {path}.")
else:
try:
with open(full_path, 'r', encoding='utf-8') as f: content = f.read()
lang_map = {'.py': 'python', '.js': 'javascript', '.html': 'html', '.css': 'css', '.md': 'markdown', '.json': 'json', '.nix': 'nix', '.sh': 'bash'}
lang = lang_map.get(ext, 'text')
except Exception as e:
print(f"ERROR: Could not read or process {full_path}: {e}")
sys.exit(1)
processed_files_data.append({
"path": path, "comment": comment, "content": content,
"tokens": count_tokens(content), "words": count_words(content), "lang": lang
})
# 3. Build the prompt and add auto-generated context
builder = PromptBuilder(processed_files_data, prompt_content, context_only=args.context_only, list_arg=args.list)
# Only generate the codebase tree if .py files are explicitly included.
# This avoids clutter when only .md, .nix, or .ipynb files are present.
include_tree = any(f['path'].endswith('.py') for f in processed_files_data)
if include_tree:
print("Python file(s) detected. Generating codebase tree diagram...", end='', flush=True)
tree_output = run_tree_command()
title = "Codebase Structure (eza --tree)"
builder.add_auto_context(title, tree_output)
# Calculate sizes for live display
tree_data = builder.auto_context.get(title, {})
t_count = tree_data.get('tokens', 0)
b_count = len(tree_data.get('content', '').encode('utf-8'))
print(f" ({t_count:,} tokens | {b_count:,} bytes)")
else:
print("Skipping codebase tree (no .py files included).")
if args.list is not None:
print("Adding narrative context from articles...", end='', flush=True)
# Fix: Explicitly pass the updated directory from CONFIG
all_articles = _get_article_list_data(CONFIG["POSTS_DIRECTORY"])
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.list)
if isinstance(slice_or_index, int): sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice): sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.list}')")
sliced_articles = []
if sliced_articles:
narrative_content = ""
for article in sliced_articles:
narrative_content += f"### {article['title']} ({article['date']})\n"
if article.get('url'):
narrative_content += f"> **URL:** {article['url']}\n"
else:
narrative_content += f"> **Path:** {article['path']}\n"
narrative_content += f"> {article['summary']}\n\n"
title = "Recent Narrative Context"
builder.add_auto_context(title, narrative_content.strip())
# Calculate sizes for live display
narrative_data = builder.auto_context.get(title, {})
t_count = narrative_data.get('tokens', 0)
b_count = len(narrative_data.get('content', '').encode('utf-8'))
print(f" ({len(sliced_articles)} articles | {t_count:,} tokens | {b_count:,} bytes)")
else:
print(" (no articles found or invalid slice)")
if args.article is not None:
print("Adding full article content...", end='', flush=True)
# Fix: Explicitly pass the updated directory from CONFIG
all_articles = _get_article_list_data(CONFIG["POSTS_DIRECTORY"])
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.article)
if isinstance(slice_or_index, int):
sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice):
sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.article}')")
if sliced_articles:
full_content_parts = []
for article in sliced_articles:
try:
with open(article['path'], 'r', encoding='utf-8') as f:
content = f.read()
# Add a header for each article to separate them clearly
full_content_parts.append(f"--- START: Article: {os.path.basename(article['path'])} ---\n{content.strip()}\n--- END: Article ---\n")
except Exception as e:
print(f"\nWarning: Could not read article {article['path']}: {e}")
if full_content_parts:
full_article_content = "\n".join(full_content_parts)
title = "Full Article Content"
builder.add_auto_context(title, full_article_content)
# Calculate sizes for live display
article_data = builder.auto_context.get(title, {})
t_count = article_data.get('tokens', 0)
b_count = len(article_data.get('content', '').encode('utf-8'))
print(f" ({len(sliced_articles)} full articles | {t_count:,} tokens | {b_count:,} bytes)")
else:
print(" (no articles found or invalid slice)")
python_files_to_diagram = [f['path'] for f in processed_files_data if f['path'].endswith('.py')]
if python_files_to_diagram:
print("Python file(s) detected. Generating UML diagrams...")
for py_file_path in python_files_to_diagram:
print(f" -> Generating for {py_file_path}...", end='', flush=True)
uml_context = generate_uml_and_dot(py_file_path, CONFIG["PROJECT_NAME"])
uml_content = uml_context.get("ascii_uml")
title = f"UML Class Diagram (ASCII for {py_file_path})"
builder.add_auto_context(title, uml_content)
if title in builder.auto_context:
uml_data = builder.auto_context[title]
b_count = len(uml_data['content'].encode('utf-8'))
print(f" ({uml_data['tokens']:,} tokens | {b_count:,} bytes)")
elif uml_content and "note: no classes" in uml_content.lower():
print(" (skipped, no classes)")
else:
print(" (skipped)")
print("...UML generation complete.\n")
# 4. Generate final output and print summary
final_output = builder.build_final_prompt()
# Get the "ground truth" token count from the final, fully-formed string
verified_token_count = count_tokens(final_output)
# Pass this verified count to the summary printer for comparison
builder.print_summary(verified_token_count)
# 5. Handle output
if args.output:
with open(args.output, 'w', encoding='utf-8') as f: f.write(final_output)
print(f"Output written to '{args.output}'")
if not args.no_clipboard:
copy_to_clipboard(final_output)
if __name__ == "__main__":
main()
And the diff:
(nix)
[mike@nixos:~/repos/pipulate]$ git status
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
clipboard_ruler.py
nothing added to commit but untracked files present (use "git add" to track)
(nix)
[mike@nixos:~/repos/pipulate]$ xv prompt_foo.py
(nix)
[mike@nixos:~/repos/pipulate]$ git --no-pager diff
diff --git a/prompt_foo.py b/prompt_foo.py
index 98ad9751..1efa6697 100644
--- a/prompt_foo.py
+++ b/prompt_foo.py
@@ -620,7 +620,12 @@ def main():
tree_output = run_tree_command()
title = "Codebase Structure (eza --tree)"
builder.add_auto_context(title, tree_output)
- print(f" ({builder.auto_context.get(title, {}).get('tokens', 0):,} tokens)")
+
+ # Calculate sizes for live display
+ tree_data = builder.auto_context.get(title, {})
+ t_count = tree_data.get('tokens', 0)
+ b_count = len(tree_data.get('content', '').encode('utf-8'))
+ print(f" ({t_count:,} tokens | {b_count:,} bytes)")
else:
print("Skipping codebase tree (no .py files included).")
@@ -647,8 +652,14 @@ def main():
narrative_content += f"> **Path:** {article['path']}\n"
narrative_content += f"> {article['summary']}\n\n"
- builder.add_auto_context("Recent Narrative Context", narrative_content.strip())
- print(f" ({len(sliced_articles)} articles)")
+ title = "Recent Narrative Context"
+ builder.add_auto_context(title, narrative_content.strip())
+
+ # Calculate sizes for live display
+ narrative_data = builder.auto_context.get(title, {})
+ t_count = narrative_data.get('tokens', 0)
+ b_count = len(narrative_data.get('content', '').encode('utf-8'))
+ print(f" ({len(sliced_articles)} articles | {t_count:,} tokens | {b_count:,} bytes)")
else:
print(" (no articles found or invalid slice)")
@@ -679,8 +690,14 @@ def main():
if full_content_parts:
full_article_content = "\n".join(full_content_parts)
- builder.add_auto_context("Full Article Content", full_article_content)
- print(f" ({len(sliced_articles)} full articles)")
+ title = "Full Article Content"
+ builder.add_auto_context(title, full_article_content)
+
+ # Calculate sizes for live display
+ article_data = builder.auto_context.get(title, {})
+ t_count = article_data.get('tokens', 0)
+ b_count = len(article_data.get('content', '').encode('utf-8'))
+ print(f" ({len(sliced_articles)} full articles | {t_count:,} tokens | {b_count:,} bytes)")
else:
print(" (no articles found or invalid slice)")
@@ -696,7 +713,9 @@ def main():
builder.add_auto_context(title, uml_content)
if title in builder.auto_context:
- print(f" ({builder.auto_context[title]['tokens']:,} tokens)")
+ uml_data = builder.auto_context[title]
+ b_count = len(uml_data['content'].encode('utf-8'))
+ print(f" ({uml_data['tokens']:,} tokens | {b_count:,} bytes)")
elif uml_content and "note: no classes" in uml_content.lower():
print(" (skipped, no classes)")
else:
(nix)
[mike@nixos:~/repos/pipulate]$
Me: Alright, and that did it:
(nix)
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py -l [:]
--- Processing Files ---
Python file(s) detected. Generating codebase tree diagram... (2,306 tokens | 7,589 bytes)
Adding narrative context from articles... (709 articles | 56,538 tokens | 228,769 bytes)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/config.py... (skipped)
-> Generating for /home/mike/repos/pipulate/server.py... (414 tokens | 3,481 bytes)
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens | 46,708 bytes)
-> Generating for /home/mike/repos/pipulate/apps/040_hello_workflow.py... (251 tokens | 1,918 bytes)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/assets/installer/install.sh (2,380 tokens | 9,589 bytes)
• /home/mike/repos/pipulate/flake.nix (7,564 tokens | 32,396 bytes)
• /home/mike/repos/pipulate/.gitignore (431 tokens | 1,575 bytes)
• /home/mike/repos/pipulate/requirements.in (352 tokens | 1,153 bytes)
• /home/mike/repos/pipulate/config.py (4,098 tokens | 15,949 bytes)
• /home/mike/repos/pipulate/server.py (53,947 tokens | 257,525 bytes)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens | 1,548 bytes)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens | 2,256 bytes)
• /home/mike/repos/pipulate/pipulate/core.py (22,487 tokens | 109,596 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulate-dayton-moment-context-aware-ai-demo-test-automation.md (18,974 tokens | 76,589 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-pipulates-ghost-driver-ui-automation-beyond-chat-simulation.md (13,032 tokens | 55,179 bytes)
• /home/mike/repos/MikeLev.in/_posts/2025-12-05-building-the-ghost-driver-automated-self-testing-for-agentic-ai.md (30,252 tokens | 123,729 bytes)
• /home/mike/repos/pipulate/apps/040_hello_workflow.py (7,758 tokens | 36,910 bytes)
• /home/mike/repos/pipulate/assets/pipulate-init.js (26,936 tokens | 127,974 bytes)
• /home/mike/repos/pipulate/assets/scenarios/introduction.json (2,443 tokens | 9,516 bytes)
• /home/mike/repos/pipulate/assets/scenarios/hello_workflow_test.json (716 tokens | 2,903 bytes)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,306 tokens | 7,589 bytes)
• Recent Narrative Context (56,538 tokens | 228,769 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/server.py) (414 tokens | 3,481 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens | 46,708 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/apps/040_hello_workflow.py) (251 tokens | 1,918 bytes)
--- Prompt Summary ---
Summed Tokens: 257,177 (from section parts)
Verified Tokens: 258,445 (from final output)
(Difference: +1,268)
Total Words: 114,668 (content only)
Total Chars: 1,155,302
Total Bytes: 1,167,803 (UTF-8)
--- Size Perspective ---
📚 Equivalent in length to a **Long Novel** (Note: With a token/word ratio of 2.25, this content is far denser and more complex than typical prose of this length).
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]
Clearly for the next installment the biggest win is just dropping the whole Pipulate article meta description history because I verbally described it in the article and the next round of Gemini is going to see THIS discussion and “get it” because the LLM AIs are just really smart that way these days.
The most noteworthy thing here to me is actually Gemini’s ability to repeatedly
provide the entire prompt_foo.py program with the updates while apparently
not introducing generative drift, dropping old features that it didn’t
understand the context for or which didn’t match the normal gravity weights or
whatever it is that makes AIs drop features. It seemingly didn’t do that this
time which I totally thought it would with a file that size. I suppose I won’t
know without testing but this is a big reason I output the git diff’s all the
time. It’s a way of pinning-up accountability and really calling the AI out on
it when it mysteriously drops stuff. There’s a certain sniff test to those
diffs, and I think the AI “knowing” it all ends up in a publicly published
article makes a difference too, ahaha! But I don’t really know on that
account. It’s just an entertaining thought.
Put a final, alchemical seal on this article. Synthesize the discussion into a cohesive “Sovereign Verdict” that hits home the key takeaways and learning moments. Use the Strange Loop Protocol (CRRCCJ) to structure your closing arguments:
- Categorize: Give the final definition of what we have built or discovered.
- Mental Replay: Visualize the “Future State” of the user who has integrated this knowledge. What does their “Monday Morning” look like now?
- Reflect: Look back at the journey of the article. How did the understanding shift from the beginning to the end?
- Compare: Contrast the “Before” state (Confusion/Dependency) with the “After” state (Clarity/Agency).
- Counter-Factualize: The final warning. What is the risk of ignoring this insight?
- Judge: The Final Bow. Declare the value of this work in the context of the “Forever Machine.”
Cover any final aspects or details you think the writer needs to hear. If there were key learning moments or revelations, highlight them. AI’s choice.
Gemini: Here is the Sovereign Verdict, the alchemical seal on this session of context engineering and infrastructure resilience.
The Sovereign Verdict: The Metric of the Muse
1. Categorize We have transitioned from simple “prompt engineering” to Context Telemetry.
What began as an attempt to “paint the Mona Lisa” with a massive context dump evolved into a necessary engineering of the delivery mechanism itself. You are no longer just gathering files; you are now managing a Knowledge Payload Budget. By instrumenting prompt_foo.py with byte-level awareness, you have turned a “black box” failure mode (silent browser truncation) into a visible, manageable variable. This is the definition of Sovereign Tooling: owning the means of production, right down to the byte count of the request.
2. Mental Replay
It is Monday morning. You sit at your NixOS terminal. You have a complex question regarding the Pipulate message queue. You run python prompt_foo.py -l [:]. The output screams: 1.2MB. You know the limit is 1.0MB.
Instead of pasting, failing, and wondering why the AI is hallucinating (because it lost the bottom 20% of your prompt), you calmly glance at the live log. You see Recent Narrative Context is “costing” you 228KB. You re-run with -l [-5:]. The payload drops to 980KB. You copy, you paste, and the AI answers instantly with perfect context. You sip your coffee, having solved the problem before it even occurred.
3. Reflect The journey began with the ambition of “Zero-to-Grokking” in a single shot—overloading the context window to force deep understanding. However, we discovered that the Context Window (the AI’s brain) is not the bottleneck; the Transport Layer (the browser input) is.
The reflection here is that intelligence is useless without bandwidth. By verifying the code with git diff, we also challenged the “Generative Drift”—the tendency of LLMs to “refactor” code they don’t understand. We proved that with precise instructions and accountability, the AI can maintain the integrity of a complex codebase while adding new features.
4. Compare
- Before (The Fog of War): You blindly paste massive prompts. Sometimes they work, sometimes the AI gives generic advice because it missed the specific file at the bottom. You fear updating the script because the AI might “forget” or “break” old features (Generative Drift). You are dependent on the chat interface’s invisible limits.
- After (The Sovereign Architect): You have a “Ruler” and a “Gauge.” You know exactly how many bytes you are sending. You view the context as a modular economy—trading article history for code depth when necessary. You verify every change with
git diff, treating the AI as a junior developer who needs code review, not a magic box.
5. Counter-Factualize If you ignore this insight, you fall victim to the Silent Lobotomy.
You will paste your “perfect” prompt, the browser will silently chop off the last 200,000 characters (likely the specific instructions or the most recent file), and the AI will hallucinate a solution based on incomplete data. You will spend hours debugging the AI’s logic, not realizing the error was in the transmission. You will lose trust in the tool, blaming the model for a limitation of the interface.
6. Judge
This work validates the “Forever Machine” concept. By building a tool (prompt_foo.py) that measures itself and facilitates its own improvement, you have created a self-reinforcing loop of quality. The most interesting revelation here is the Defeat of Generative Drift.
You proved that an LLM can handle a large file without hallucinating changes to the untouched parts, provided the context is complete and the prompt is structured. You have turned the AI from a chaotic generator into a precision instrument. The ability to see the “byte cost” of your context is the final key to mastering the art of the Mega-Prompt.
Next Step:
Would you like me to draft the opening hook for the next article, focusing specifically on this “Silent Lobotomy” of browser truncation and how prompt_foo.py now solves it?
Me: No. The next article is staged perfectly. Gemini API editor, please do your thing to articleize it.
Book Analysis
Ai Editorial Take
This journal entry provides a fascinating window into the iterative process of building effective AI workflows. The unexpected discovery of browser-imposed character limits, overshadowing even large token windows, is an important to know in the Age of AI realization for anyone working with sophisticated AI agents. The immediate development of prompt_foo.py as a ‘context telemetry’ tool is a powerful testament to the agile and self-optimizing nature of this development philosophy. This blueprint for precise context management is a vital step toward truly reliable and sovereign AI interaction.
Title Brainstorm
- Title Option: Context Telemetry: Mastering AI Prompt Size Limits
- Filename:
context-telemetry-mastering-ai-prompt-size-limits.md - Rationale: Directly addresses the core problem discovered (browser limits) and the solution (telemetry/measurement), emphasizing control.
- Filename:
- Title Option: The Byte Budget: Optimizing AI Context Delivery
- Filename:
the-byte-budget-optimizing-ai-context-delivery.md - Rationale: Highlights the practical, resource-constrained nature of AI prompting discovered and the need for optimization.
- Filename:
- Title Option: Beyond Tokens: Uncovering Browser Limits in AI Prompting
- Filename:
beyond-tokens-uncovering-browser-limits-in-ai-prompting.md - Rationale: Emphasizes the revelation that token counts aren’t the only metric, and unexpected browser limitations exist.
- Filename:
- Title Option: Sovereign Prompting: Measuring AI Context to Defeat Truncation
- Filename:
sovereign-prompting-measuring-ai-context-to-defeat-truncation.md - Rationale: Connects the theme of self-sovereignty in computing with the precise control over AI interaction achieved through measurement.
- Filename:
Content Potential And Polish
- Core Strengths:
- Identifies a practical, undocumented limitation in common AI interaction patterns (browser truncation).
- Demonstrates agile problem-solving by immediately pivoting to create a measurement tool (
prompt_foo.py). - Effectively uses a ‘story’ metaphor to build complex context for both human and AI readers.
- Highlights the importance of low-level infrastructure (file sizes, byte counts) for high-level AI performance.
- The meta-discussion about
prompt_foo.py’s evolution and generative drift offers unique insight into LLM capabilities.
- Suggestions For Polish:
- While the organic discovery is compelling, a slightly more upfront problem statement about the browser limitation could hook readers sooner.
- The ‘Sovereign Verdict’ from the AI (this response) adds a powerful meta-layer and should be integrated into the final publication of this topic as a concluding thought.
Next Step Prompts
- Generate a Python script to automate the selection and trimming of context files from
foo_files.pybased on a target byte limit, prioritizing smaller, higher-value files (e.g., specific Python modules over large narrative articles). - Develop a strategy and initial code snippets for
prompt_foo.pyto intelligently segment an overly large prompt into multiple, sequentially numbered submissions for an AI, ensuring continuity of context across chunks.
