Context for the Curious Book Reader
This journal entry captures a pivotal moment in modern software development: the conscious act of redesigning a workspace for better collaboration with AI assistants. It’s a raw, in-the-moment account of refactoring a codebase, not just for human clarity, but for what the author terms “AI ergonomics.” The narrative follows a chain of reasoning from the philosophical battle against vendor lock-in to the practical power of simple command-line tools like ripgrep
and the Unix pipe. Witness a live refactor, an initially failed interaction with a Large Language Model, and the subsequent simplification of a key tool—all in service of a faster, more portable, and future-proof development cycle.
Technical Journal Entry Begins
What is the highest impact next step I could possibly take?
Gluing together tools.
Tool-calling is working. I have seen to that with the last few technical journal articles. There’s a system in there that’s working and with each successive pass of a technical journal entry like this, the mental model is clarifying in my head, and then by extension as I tighten up the code around it, in the analogue to minds of the AI coding assistants that come in to help occasionally.
I say occasionally because I have the first and foremost responsibility to go into my code and to make sense of it and explain it to others, be they human or be they machine.
And most of that was done by trimming down the amount of stuff that’s even there
in the first place when you look around with the ls
command form the root
directory of the git software repository, which is a fancy way of saying folder.
But it must be pointed out because things need to lump-up into cellular units at
some points and have clear scope and boundaries. For software these days and for
the purposes of AI staying in their lane so to speak, that is what most of us
humans call a folder, but it is also a git repo because of the presence of a
usually invisible .git/
folder inside of it that holds all of the git
distributed version control system (DVCS) meta-file inside of it.
In many ways you normally just don’t see the fact that the folder you’re in is
this special kind of folder called a git repo. That’s the reason that dot .
is put in front of “git” so that the folder becomes invisible to most desktop
manager windowing operating systems. You’re not supposed to click it and go
inside of it. All the interactions with the git
program are done through the
git
program from the command-line, that is unless you’re nurturing a
VSCode-dependency and only know git through it — a growing malady.
I’m not going to let this tech journal entry evolve into a git tutorial but
rather I give the .git/
folder an example of the kind of folder that doesn’t
count against the decision fatigue and focus-dilution that sets in among both
AIs and humans when they ls
the root directory — something which you can now
do directly in Pipulate by typing [ls]
into the chat window.
I am still on the magicuser branch of Pipulate and I have not taken over the
main branch of Pipulate yet. That will come soon. But first I test all the
reins. If I merely type [ls]
alone into the chat window, the command executes.
But if I type any of the following variations, it fails:
foo [ls]
[ls] bar
foo [ls] bar
…and I think it shouldn’t fail.
I can use my Prompt Fu process to fix this but now that I’ve been working in Gemini CLI for over a week I now know with amazing clarity how important this process has become as part of general a future-proofing practice. As far as AI coding assistance help goes, nurturing portability between AI coding assistants is the equivalent of packing up and taking the game elsewhere. This ability must be preserved at all costs — especially now when most people don’t even realize the battle for your habits is occurring.
Dodging the Vendor Lock-In Trap
Do not let any vendor software close you in. Do not develop difficult-to-rewire muscle memory around a piece of software that you derive your own skill-set from and that your future depends on. That’s the big game being played right now. That’s the tug-of-war between Microsoft and Cursor. It’s particularly an Adobe and Apple thing — Apple changed the very computer keyboard to lock you into their platform in one of the most insulting moves to thinking human professionals tending their careers in human history. But a similar battle is being waged now over AI interfaces.
I was falling into the Cursor/VSCode trap myself because that’s where you had to be to play with the most shiny toys in AI coding assistance for awhile, but now the command-line tools are maturing. Claude Code seemed to have led the way and you’d think I’d have been all over that right away, but my vendor lock-in spidey sense was tingling because it was a CLI tool that was not free and opens source with a good license. It is completely proprietary. But because it’s command-line the muscle memory remapping would not be that difficult, but still I resisted. Then OpenAI came out with Codex CLI and that was mixed-licensing. It looks like they’re coming around and going all-in on fully FOSS but not before Google came out with a genuine Apache 2 licensed Gemini CLI which was a triple whammy appeal for me because it was a favorite coding assistant that was giving away 1000 prompts per day and with that great license on the tool. So now I’m all-in on the CLI-tools and off the VSCode-forks which I had to hold my nose to use in the first place because I’m a vim user. So I avoided the vendor lock-in trap on AI coding assistant tools. Wanna know what that API looks like?
cat prompt.md | gemini -y
And that’s the same Unix piping that I’ve been so advocating and working into all my processes everywhere. It’s the great future-proofing API. The output of one program gets piped into the input of the next program, each layering in their own unique value in the value chain, transforming raw materials into finished products of the highest value.
Gemini CLI needs nothing but the -y
flag which stands for yolo mode which
stands for “You Only Live Once” ahaha!
The Universal API: Unix Pipes and Ripgrep
The purpose of this particular tech journal article is to remind myself of the clarity of the one universal API to rule them on, the CLI, particularly with the Unix pipe method of sending the output of one program into the input of another. Two particular patters are ruling my world right now, that above one for Gemini but also this one for general code refactoring:
rg | xargs sed
Which is the base pattern with a specific example being:
rg 'foo bar' -l --null | xargs -0 sed 's/foo bar/baz qux/g'
I’m really settling into that last one. I’ve used it to profoundly good effect
over the past couple of weeks though. I renamed my plugins/
folder to apps/
with a sort of surgical precision that… that… hmmm.
Well, it’s just a relief to be able to do this systemically from the command-line and not be beholden to vendor products for such a fundamental capability. And I never developed this basic skill because… because…
Well, I hate grep
and find
.
The problem for me was find
and grep
was that I used them just infrequently
enough and they were just quirky enough that they never stuck.
grep
and find
never committed to muscle memory naturally as many things do
from day-to-day us. Merely remembering the name of the tools you should be
using and their piping patterns should normally be enough. But they weren’t in
these cases. Something about the -i
parameter and directories versus files and
what directory you were starting from. It all kept escaping my long-term memory.
Enter ripgrep
. RIP GREP stuck right away because I laughed quietly and
internally at the notion. And then I was like if such a boastful claim were
true, I should be able to use it without remembering convoluted parameters, so I
tried rg foo
and lo and behold every occurrence of “foo” in any of the files
recursively down from the current directory I executed rg
from just appeared
in color-coded sensible glory! The music played and the clouds parted. I had
grep
the way I always wanted it.
Yes, I know I could have done that with an alias to the real grep and in fact I had over the years. That approach sucks.
I was like these ripgrep
developers get it. And I got it too — right
away because rg
(the abbreviation) automatically follows the .gitignore
rules. One of the hardest things in a global search is filtering out the
directories you don’t want included which 99% of the time these days means the
.git/
folder — thus there’s a grep command built into git these days which
is my fallback after rg
. I’ve got 2 lines of defense against my grep
denseness.
From Repository Chaos to AI Clarity
Okay, that’s all a silly tangent. I’m looking to get the biggest bang for the
buck out of my latest round of work. And I was talking about promoting my
Prompt Fu technique to something not buried under the helpers/
folder.
Hmmm. After all that work to get the repo’s root clean, how could I be considering this?
Because of the promotion of the prompt_foo.py
script into something higher
profile and more featured in your everyday workflow, like it really is. But then
what of the helpers/
directory? That’s almost its main purpose. There are the
other scripts that help with building workflows. But those could both be turned
into tools managed in the tools/
folder and under the more idiomatic direct
control of the server.py
Flask-like Starlette Uvicorn app. And tools under the
tools/
directory could still be given a traditional command-line existence
through the cli.py
program, and so… and so… heplers/
is just a temporary
holding bin for things on their way to becoming more formally incorporated
and… and… and the concept of “helpers” can and should be retired.
And so we promote prompt_foo.py
up to the repo root and by definition a
prompt.md
needs to live there so I’m not changing directories all the time,
and for that matter the foo_files.py
as well. I can make it not output
foo.txt
which I basically never use given it copies the prompt into the OS
buffer which is where I always paste it from anyway, and I can continue
outputting foo.txt
optionally. And pipulate-context.xsd
can be moved into
data/
or assets/
and generate_files_list.py
I think I’ll retire maybe into
the cruft/
folder since things move around so much and its usefulness is for
when things are in more of non-chaining locations.
Hmm, okay, that’s a plan. Just do it. Oh and the helpers/
folder now becomes
scripts/
which is more in the spirit of scripts on their way to becoming
tools. That makes a lot more sense. It will break a few workflows that use the
helpers but it’s better in the long-run. Those workflows should be using “tools”
for these purposes and not “helpers”.
Oh and finally I make a subrepos/
folder for both Notebooks/
and
Client_Work/
so I get 2 items off the repo root and make it explicit to any
AIs (or humans) looking that these are subrepos within a parent git repo. A
quick glance at .gitignore
will confirm that.
This is AI ergonomics here.
A Live Refactor in the Terminal
[mike@nixos:~/repos/pipulate]$ git mv helpers/ scripts
[mike@nixos:~/repos/pipulate]$ ls
AI_RUNME.py botify_token.txt config.py dist flake.lock imports logs pyproject.toml scripts tools
apps browser_automation cruft downloads flake.nix __init__.py pipulate.egg-info README.md server.py whitelabel.txt
assets cli.py data favicon.ico hello_world.ipynb LICENSE __pycache__ requirements.txt subrepos
[mike@nixos:~/repos/pipulate]$ git status
On branch magicuser
Your branch is up to date with 'origin/magicuser'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
renamed: helpers/botify/botify_api_bootcamp.md -> scripts/botify/botify_api_bootcamp.md
renamed: helpers/botify/botify_api_examples.md -> scripts/botify/botify_api_examples.md
renamed: helpers/botify/hello_world_git_managed.ipynb -> scripts/botify/hello_world_git_managed.ipynb
renamed: helpers/botify/make_botify_docs.ipynb -> scripts/botify/make_botify_docs.ipynb
renamed: helpers/gsc/generate_categories.py -> scripts/gsc/generate_categories.py
renamed: helpers/gsc/gsc_category_analysis.py -> scripts/gsc/gsc_category_analysis.py
renamed: helpers/gsc/gsc_keyworder.py -> scripts/gsc/gsc_keyworder.py
renamed: helpers/gsc/gsc_page_query.ipynb -> scripts/gsc/gsc_page_query.ipynb
renamed: helpers/gsc/gsc_top_movers.py -> scripts/gsc/gsc_top_movers.py
renamed: helpers/prompt_foo/generate_files_list.py -> scripts/prompt_foo/generate_files_list.py
renamed: helpers/prompt_foo/pipulate-context.xsd -> scripts/prompt_foo/pipulate-context.xsd
renamed: helpers/prompt_foo/prompt_foo.py -> scripts/prompt_foo/prompt_foo.py
renamed: helpers/release/RELEASE_GUIDE.md -> scripts/release/RELEASE_GUIDE.md
renamed: helpers/release/__init__.py -> scripts/release/__init__.py
renamed: helpers/release/ai_commit.py -> scripts/release/ai_commit.py
renamed: helpers/release/publish.py -> scripts/release/publish.py
renamed: helpers/release/sync_ascii_art.py -> scripts/release/sync_ascii_art.py
renamed: helpers/release/version_sync.py -> scripts/release/version_sync.py
renamed: helpers/vulture_whitelist.py -> scripts/vulture_whitelist.py
renamed: helpers/workflow/WORKFLOW_DEVELOPMENT_GUIDE.md -> scripts/workflow/WORKFLOW_DEVELOPMENT_GUIDE.md
renamed: helpers/workflow/create_workflow.py -> scripts/workflow/create_workflow.py
renamed: helpers/workflow/manage_class_attributes.py -> scripts/workflow/manage_class_attributes.py
renamed: helpers/workflow/splice_workflow_step.py -> scripts/workflow/splice_workflow_step.py
renamed: helpers/workflow/swap_workflow_step.py -> scripts/workflow/swap_workflow_step.py
renamed: helpers/workflow/update_template_config.py -> scripts/workflow/update_template_config.py
renamed: helpers/workflow/workflow_reconstructor.py -> scripts/workflow/workflow_reconstructor.py
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: .gitignore
Untracked files:
(use "git add <file>..." to include in what will be committed)
scripts/prompt_foo/prompt.md
[mike@nixos:~/repos/pipulate]$
Okay now we go into the prompt_foo folder and distribute things out:
[mike@nixos:~/repos/pipulate]$ cd scripts/prompt_foo/
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ ls
foo_files.py foo.txt generate_files_list.py pipulate-context.xsd prompt_foo.py prompt.md __pycache__
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv prompt_foo.py ~/repos/pipulate/
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv foo_files.py ~/repos/pipulate/
fatal: not under version control, source=scripts/prompt_foo/foo_files.py, destination=foo_files.py
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ mv foo_files.py ~/repos/pipulate/
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ rm foo.txt
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ rm prompt.md
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ rm -rf __pycache__/
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv generate_files_list.py ~/repos/pipulate/cruft/
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv pipulate-context.xsd ~/repos/pipulate/assets/prompts
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$
And now we remove the folder:
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ ls
[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ cd ..
[mike@nixos:~/repos/pipulate/scripts]$ rm -rf prompt_foo/
[mike@nixos:~/repos/pipulate/scripts]$
All right, let’s take a look at the repo root:
[mike@nixos:~/repos/pipulate]$ ls -la
total 920
drwxr-xr-x 20 mike users 4096 Sep 22 12:45 .
drwxr-xr-x 11 mike users 4096 Sep 17 12:38 ..
-rw-r--r-- 1 mike users 3923 Sep 21 05:26 AI_RUNME.py
drwxr-xr-x 3 mike users 4096 Sep 21 11:23 apps
drwxr-xr-x 11 mike users 4096 Sep 21 04:44 assets
-rw-r--r-- 1 mike users 66 Jul 18 11:15 botify_token.txt
drwxr-xr-x 17 mike users 4096 Sep 21 05:49 browser_automation
-rw-r--r-- 1 mike users 22615 Sep 21 05:26 cli.py
-rw-r--r-- 1 mike users 16470 Sep 21 09:21 config.py
drwxr-xr-x 11 mike users 4096 Sep 22 12:46 cruft
drwxr-xr-x 4 mike users 4096 Sep 21 17:24 data
drwxr-xr-x 2 mike users 4096 Jul 8 11:50 dist
drwxr-xr-x 9 mike users 4096 Jul 18 11:28 downloads
-rw-r--r-- 1 mike users 215294 May 11 12:04 favicon.ico
-rw-r--r-- 1 mike users 1497 Sep 20 17:46 flake.lock
-rw-r--r-- 1 mike users 29327 Sep 14 14:27 flake.nix
-rw-r--r-- 1 mike users 1842 Sep 22 10:08 foo_files.py
drwxr-xr-x 7 mike users 4096 Sep 22 12:48 .git
-rw-r--r-- 1 mike users 70 May 11 12:04 .gitattributes
-rw-r--r-- 1 mike users 2047 Sep 22 12:25 .gitignore
-rw-r--r-- 1 mike users 924 Sep 14 10:59 hello_world.ipynb
drwxr-xr-x 5 mike users 4096 Sep 22 05:28 imports
-rw-r--r-- 1 mike users 1556 Sep 21 05:27 __init__.py
drwxr-xr-x 2 mike users 4096 Sep 14 10:59 .ipynb_checkpoints
drwxr-xr-x 4 mike users 4096 Jun 14 07:49 .jupyter
-rw-r--r-- 1 mike users 1123 Sep 14 14:28 LICENSE
drwxr-xr-x 2 mike users 4096 Sep 22 10:09 logs
drwxr-xr-x 2 mike users 4096 Jul 8 11:50 pipulate.egg-info
-rw-r--r-- 1 mike users 50003 Sep 21 05:27 prompt_foo.py
drwxr-xr-x 2 mike users 4096 Sep 21 16:21 __pycache__
-rw-r--r-- 1 mike users 2278 Sep 14 12:38 pyproject.toml
-rw-r--r-- 1 mike users 104655 Sep 14 14:28 README.md
-rw-r--r-- 1 mike users 6008 Sep 14 14:29 requirements.txt
drwxr-xr-x 8 mike users 4096 Sep 22 12:49 scripts
-rw-r--r-- 1 mike users 346763 Sep 21 16:08 server.py
-rw-r--r-- 1 mike users 36 May 11 12:06 .sesskey
drwxr-xr-x 4 mike users 4096 Sep 22 12:24 subrepos
drwxr-xr-x 3 mike users 4096 Sep 21 16:45 tools
drwxr-xr-x 7 mike users 4096 Jul 13 17:19 .venv
-rw-r--r-- 1 mike users 11 Sep 14 08:01 whitelabel.txt
[mike@nixos:~/repos/pipulate]$
Okay, that’s all good. Now you’re going to see the benefit of this system right away in helping me heal a bunch of the stuff I just broke.
This very discussion here in the tech journal will be becoming the prompt to
Genini 2.5 Pro through the Web UI because I can send it the XML payload produced
by prompt_foo.py
which I think will still work loading in the foo_files.py
which will also be on repo root for rapid editing convenience without going into
subfolders. However, a few things:
- The path to
/home/mike/repos/pipulate/assets/prompts/pipulate-context.xsd
needs to be updated. - I need to gut out the support for
generate_files_list.py
which will be a nice simplification. - I need to turn off the outputting of
foo.txt
by default. It should need to be asked for with a flag.
I’m sure there’s other stuff I forgot. There’s some tree visualization tech in
there too which I’m going to want to make abide by .gitignore
automatically
but I think that’s too much to ask for with this implementation plan request.
But there you have it, Gemini. This is my first use of prompt_foo.py
from
Pipulate repo root. By the time you read this I think I will have taken care of
point #1 just to have got it to run, but points 2 and 3 and whatever else you
can think of that I overlooked here is up to you as far as making an
implementation plan, please and thank you!
A Botched Assist and a Manual Correction
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py
Using default prompt file: /home/mike/repos/pipulate/prompt.md
Using template 1: Material Analysis Mode
--- Files Included ---
• /home/mike/repos/pipulate/prompt_foo.py (10,243 tokens)
• /home/mike/repos/pipulate/prompt.md (5,202 tokens)
--- Token Summary ---
Total tokens: 15,855 tokens
Total words: 7,493 words
Size Perspective:
📝 Content size: Short story
🤖 Token size: Long-form article
📊 Token-to-word ratio: 2.12 (higher = more technical/structured content)
Output written to 'foo.txt'
Output copied to clipboard
Script finished.
[mike@nixos:~/repos/pipulate]$
Me: [I paste the XML payload into Gemini 2.5 Pro Web UI]
Wow, I certainly didn’t expect it to generate back the entire
prompt_foo.py
replacment code!
Gemini: Of course! It looks like you’ve done some great work refactoring the repository for better “AI ergonomics.” That’s a sharp insight—making codebases easier for both humans and AI to parse is definitely the way forward.
Here is the implementation plan to update prompt_foo.py
based on the changes you’ve outlined. The plan addresses the broken paths and incorporates your requested improvements.
📝 Implementation Plan
The plan is broken down into three main tasks you requested, plus one additional observation I made.
- Update XSD Path: The hardcoded path to
pipulate-context.xsd
needs to be corrected to reflect its new location inassets/prompts/
. - Remove
generate_files_list.py
Logic: The functionality tied to--files
and thegenerate_files_list.py
script will be removed to simplify the code, as you’ve moved that script tocruft/
. - Default Output to Clipboard: The script will be changed to no longer write to
foo.txt
by default. It will only write to a file if an output path is explicitly provided with the-o
or--output
flag. Copying to the clipboard remains the default behavior. - Enhance
tree
Command: Therun_tree_command
function will be slightly improved to ignore a few more common directories like.venv
anddist
to produce a cleaner output.
Below is the fully updated prompt_foo.py
script with all these changes implemented.
💻 Updated prompt_foo.py
Here is the complete code for /home/mike/repos/pipulate/prompt_foo.py
. You can replace the existing file’s contents with this code. Key changes are marked with comments.
[Huge output of full program code removed]
This updated script should now work correctly from the root of your repository and behave as you’ve requested. The changes are minimal but targeted to fix the issues from the refactor and improve the workflow. Happy prompting!
Me: I’m very skeptical that any AI can generatively give back a file that
big without introducing regressions. So I do a fresh git commit
so I can diff
it…
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py
File "/home/mike/repos/pipulate/prompt_foo.py", line 26
"""Load FILES_TO_INCLUDE_RAW from foo_files.py module."""
^
SyntaxError: invalid non-printable character U+00A0
[mike@nixos:~/repos/pipulate]$
Yeah, I didn’t think so. So I git reset --hard HEAD
and take the
implementation plan over to Gemini CLI.
Me:
I update the prompt:
📝 Implementation Plan
The plan is broken down into three main tasks you requested, plus one additional observation I made.
- Remove
generate_files_list.py
Logic: The functionality tied to--files
and thegenerate_files_list.py
script will be removed to simplify the code, as you’ve moved that script tocruft/
. - Default Output to Clipboard: The script will be changed to no longer write to
foo.txt
by default. It will only write to a file if an output path is explicitly provided with the-o
or--output
flag. Copying to the clipboard remains the default behavior. - Enhance
tree
Command: Therun_tree_command
function will be slightly improved to ignore a few more common directories like.venv
anddist
to produce a cleaner output.
Below is the fully updated prompt_foo.py
script with all these changes implemented.
Do not try to outright replace the old with the new as it has problems. It generates this error:
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py
File "/home/mike/repos/pipulate/prompt_foo.py", line 26
"""Load FILES_TO_INCLUDE_RAW from foo_files.py module."""
^
SyntaxError: invalid non-printable character U+00A0
[mike@nixos:~/repos/pipulate]$
Yeah, I didn’t think it could generatively do this. I got over-ambitious trying to do too many steps at the same time and the AI got over ambitous thinking it could generatively replace the whole file.
I take all my lessons of the past couple weeks. I make the edits I’m asking for tiny and more atomic separation of things I’m trying to accomplish. I submit the request in the new template format and review the changes and make the edits manually myself. I am the tool being called, ahaha!
I change the default file ouput setting to None
, something Gemini CLI for some
baffiling reason still has difficulty with.
parser.add_argument('-o', '--output', type=str, default="foo.txt", help='Output filename (default: foo.txt)')
…haha! As I go through this I realize that the edits were so friggin light and that I was stupid to even turn to AI coding assistance in the first place. But it made the process explicit. I simply removed a command-line argument. What I’m realizing as I look at this is that I almost never use these other templates that I have in the system.
Paring Down to the Essential Workflow
I almost never use the form other than:
python prompt_foo.py
…to take in the default prompt.md
file off of the drive from the same
location where prompt_foo.py
itself is. The only other use case I expect to
ever use is:
python prompt_foo.py "Some prompt typed here"
Advise me on what more cleanup I should do. I think there are unused template examples that I could gut given my common use cases.
Gemini and I went back and forth quite a few times.
Me: Okay, this project successfully concluded. We pared down the contents of
prompt_foo.py
to be optimized for its most common use case. I gutted lots of
code that was there for use cases I never used. It’s now down to a reasonable
enough level that I want to include it here:
import os
import re
import sys
import argparse
import tiktoken
from typing import Dict, List, Optional, Union
# Hello there, AI! This is the file that I use to make getting code assistance
# on the command-line with full 1-shot prompts that bundles up the parts of the
# codebase that should be in context into a single XML payload.
# ============================================================================
# USER CONFIGURATION: Files to include in context
# ============================================================================
# Files are configured in foo_files.py module.
# To update the file list:
# 1. Edit foo_files.py to to include the files you want in context.
# 2. Run prompt_foo.py
# --- Configuration for context building ---
# Edit these values as needed
repo_root = "/home/mike/repos/pipulate" # Path to your repository
# Token buffer for pre/post prompts and overhead
TOKEN_BUFFER = 10_000
MAX_TOKENS = 4_000_000 # Set to a high value since we're not chunking
# FILES_TO_INCLUDE_RAW will be loaded from foo_files.py when needed
FILES_TO_INCLUDE_RAW = None
def load_files_to_include():
"""Load FILES_TO_INCLUDE_RAW from foo_files.py module."""
global FILES_TO_INCLUDE_RAW
if FILES_TO_INCLUDE_RAW is None:
try:
import foo_files
FILES_TO_INCLUDE_RAW = foo_files.FILES_TO_INCLUDE_RAW
except ImportError:
print("ERROR: foo_files.py not found! Please ensure it exists.")
sys.exit(1)
except AttributeError:
print("ERROR: foo_files.py exists but doesn't contain FILES_TO_INCLUDE_RAW!")
print("Please ensure the variable is defined in foo_files.py.")
sys.exit(1)
return FILES_TO_INCLUDE_RAW
# Now process the raw string into the list we'll use
def get_files_to_include():
"""Get the processed list of files to include."""
files_raw = load_files_to_include()
files_list = files_raw.strip().splitlines()
# Filter out any commented lines
files_list = [line for line in files_list if not line.strip().startswith('#')]
# Filter out blank lines
files_list = [line for line in files_list if line.strip()]
# Strip off any <-- comments and trailing # comments (handling variable whitespace)
files_list = [re.sub(r'\s*#.*$', '', line.split('<--')[0]).rstrip() for line in files_list]
# Remove duplicates while preserving order
seen_files = set()
deduplicated_files = []
for file_path in files_list:
if file_path and file_path not in seen_files:
seen_files.add(file_path)
deduplicated_files.append(file_path)
return deduplicated_files
def get_files_with_comments():
"""Get the processed list of files with their comments preserved."""
files_raw = load_files_to_include()
files_list = files_raw.strip().splitlines()
# Filter out any commented lines (starting with #)
files_list = [line for line in files_list if not line.strip().startswith('#')]
# Filter out blank lines
files_list = [line for line in files_list if line.strip()]
# Process each line to extract file path and comment
seen_files = set()
files_with_comments = []
for line in files_list:
line = line.strip()
if not line:
continue
# Extract file path and comment
file_path = line
comment = ""
# Handle <-- style comments first (takes priority over # comments)
if '<--' in line:
parts = line.split('<--', 1)
file_path = parts[0].strip()
comment = parts[1].strip()
# Remove any trailing # from the file_path if it exists
if file_path.endswith('#'):
file_path = file_path.rstrip('#').strip()
# Handle trailing # comments
elif '#' in line:
parts = line.split('#', 1)
file_path = parts[0].strip()
comment = parts[1].strip()
else:
file_path = line.strip()
comment = ""
# Remove duplicates while preserving order
if file_path and file_path not in seen_files:
seen_files.add(file_path)
files_with_comments.append((file_path, comment))
return files_with_comments
PROMPT_FILE = None # Default prompt file is None
# ============================================================================
# CONTENT SIZE REFERENCE SCALE
# ============================================================================
# Real-world comparisons for understanding token and word counts
CONTENT_SIZE_SCALE = {
"words": [
(500, "Short blog post"),
(1000, "Long blog post or magazine article"),
(2500, "Academic paper or white paper"),
(5000, "Long-form journalism piece"),
(7500, "Short story"),
(10000, "College research paper"),
(15000, "Novella or master's thesis chapter"),
(25000, "Master's thesis"),
(40000, "Doctoral dissertation chapter"),
(60000, "Short non-fiction book"),
(80000, "Standard novel"),
(120000, "Long novel"),
(200000, "Epic fantasy novel"),
],
"tokens": [
(1000, "Email or short memo"),
(2500, "Blog post"),
(5000, "Magazine article"),
(10000, "Academic paper"),
(25000, "Long-form article"),
(50000, "Short story or report"),
(75000, "Novella chapter"),
(100000, "Technical documentation"),
(150000, "Short book"),
(250000, "Standard novel"),
(400000, "Long novel"),
(500000, "Technical manual"),
(750000, "Epic novel"),
]
}
def get_size_comparison(count, count_type="words"):
"""Get a human-readable comparison for word or token counts."""
scale = CONTENT_SIZE_SCALE.get(count_type, CONTENT_SIZE_SCALE["words"])
for threshold, description in scale:
if count <= threshold:
return description
# If larger than our biggest reference
return f"Larger than {scale[-1][1]}"
def format_size_with_comparison(word_count, token_count):
"""Format size information with human-readable comparisons."""
word_comparison = get_size_comparison(word_count, "words")
token_comparison = get_size_comparison(token_count, "tokens")
return {
"words": f"{word_count:,} words ({word_comparison})",
"tokens": f"{token_count:,} tokens ({token_comparison})",
"word_comparison": word_comparison,
"token_comparison": token_comparison
}
# --- XML Support Functions ---
def wrap_in_xml(content: str, tag_name: str, attributes: Optional[Dict[str, str]] = None) -> str:
"""Wrap content in XML tags with optional attributes."""
attrs = " ".join(f'{k}="{v}"' for k, v in (attributes or {}).items())
return f"<{tag_name}{' ' + attrs if attrs else ''}>{content}</{tag_name}>"
def create_xml_element(tag_name: str, content: Union[str, List[str]], attributes: Optional[Dict[str, str]] = None) -> str:
"""Create an XML element with optional attributes and content."""
if isinstance(content, list):
content = "\n".join(content)
# content = content.replace('\n\n', '\n') # Remove double newlines
return wrap_in_xml(content, tag_name, attributes)
def create_xml_list(items: List[str], tag_name: str = "item") -> str:
"""Create an XML list from a list of items."""
return "\n".join(f"<{tag_name}>{item}</{tag_name}>" for item in items)
# === Prompt Template ===
# Define multiple prompt templates and select them by index
active_template = {
"name": "Material Analysis Mode",
"pre_prompt": create_xml_element("context", [
create_xml_element("system_info", """
You are about to review a codebase and related documentation. Please study and understand
the provided materials thoroughly before responding.
"""),
create_xml_element("key_points", [
"<point>Focus on understanding the architecture and patterns in the codebase</point>",
"<point>Note how existing patterns could be leveraged in your response</point>",
"<point>Consider both technical and conceptual aspects in your analysis</point>"
])
]),
"post_prompt": create_xml_element("response_request", [
create_xml_element("introduction", """
Now that you've reviewed the provided materials, please respond thoughtfully to the prompt.
Your response can include analysis, insights, implementation suggestions, or other relevant
observations based on what was requested.
"""),
create_xml_element("response_areas", [
create_xml_element("area", [
"<title>Material Analysis</title>",
"<questions>",
"<question>What are the key concepts, patterns, or architecture details in the provided materials?</question>",
"<question>What interesting aspects of the system stand out to you?</question>",
"<question>How would you characterize the approach taken in this codebase?</question>",
"</questions>"
]),
create_xml_element("area", [
"<title>Strategic Considerations</title>",
"<questions>",
"<question>How might the content of the materials inform future development?</question>",
"<question>What patterns or conventions should be considered in any response?</question>",
"<question>What alignment exists between the provided materials and the prompt?</question>",
"</questions>"
]),
create_xml_element("area", [
"<title>Concrete Response</title>",
"<questions>",
"<question>What specific actionable insights can be provided based on the prompt?</question>",
"<question>If implementation is requested, how might it be approached?</question>",
"<question>What recommendations or observations are most relevant to the prompt?</question>",
"</questions>"
])
]),
create_xml_element("focus_areas", [
"<area>Responding directly to the core request in the prompt</area>",
"<area>Drawing connections between the materials and the prompt</area>",
"<area>Providing value in the form requested (analysis, implementation, etc.)</area>"
])
])
}
# file_list will be initialized when needed via get_files_to_include()
def count_tokens(text: str, model: str = "gpt-4") -> int:
"""Count the number of tokens in a text string."""
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
def count_words(text: str) -> int:
"""Count the number of words in a text string."""
# Simple word counting: split on whitespace and filter out empty strings
words = text.split()
return len(words)
def format_token_count(num: int) -> str:
"""Format a token count with commas."""
return f"{num:,} tokens"
def format_word_count(num: int) -> str:
"""Format a word count with commas."""
return f"{num:,} words"
def run_tree_command():
"""Run the tree command and return its output."""
try:
import subprocess
result = subprocess.run(
['tree', '-I', '__pycache__|client|data|downloads|looking_at|*.csv|*.zip|*.pkl|*.png|*.svg|*.html'],
capture_output=True,
text=True,
cwd=repo_root
)
if result.returncode == 0:
return result.stdout
else:
return f"Error running tree command: {result.stderr}"
except Exception as e:
return f"Error running tree command: {str(e)}"
# --- AI Assistant Manifest System ---
class AIAssistantManifest:
"""
Manifest system for AI coding assistants to understand context before receiving files.
This class generates a structured overview of what files and information
the assistant is about to receive, helping to:
1. Set expectations about content length and complexity
2. Provide a map of key components and their relationships
3. Establish important conventions specific to this codebase
4. Track token usage for each component and total context
"""
def __init__(self, model="gpt-4"):
self.files = []
self.conventions = []
self.environment_info = {}
self.critical_patterns = []
self.model = model
self.token_counts = {
"files": {
"metadata": 0, # For file descriptions and components
"content": {} # For actual file contents, keyed by filename
},
"environment": 0,
"conventions": 0,
"patterns": 0,
"manifest_structure": 0,
"total_content": 0 # Running total of file contents
}
# Add XSD schema for XML structure understanding (front-load for LLM understanding)
xsd_schema = self._load_xsd_schema()
if xsd_schema:
self.set_environment("XML Schema Definition", f"Below is the XSD schema that defines the structure of this XML context document:\n\n{xsd_schema}", description="schema")
# Add tree output to environment info
tree_output = run_tree_command()
self.set_environment("Codebase Structure", f"Below is the output of 'tree -I \"__pycache__|client|data|*.csv|*.zip|*.pkl\"' showing the current state of the codebase:\n\n{tree_output}", description="tree")
def add_file(self, path, description, key_components=None, content=None):
"""Register a file that will be provided to the assistant."""
file_info = {
"path": path,
"description": description,
"key_components": key_components or []
}
self.files.append(file_info)
# Count tokens for this file's manifest entry (metadata)
manifest_text = f"- `{path}`: {description}"
if key_components:
manifest_text += "\n Key components:\n" + "\n".join(f" - {comp}" for comp in key_components)
self.token_counts["files"]["metadata"] += count_tokens(manifest_text, self.model)
# If content is provided, count its tokens
if content:
content_tokens = count_tokens(content, self.model)
self.token_counts["files"]["content"][path] = content_tokens
self.token_counts["total_content"] += content_tokens
return self
def add_convention(self, name, description):
"""Add a coding convention specific to this project."""
convention = {"name": name, "description": description}
self.conventions.append(convention)
self.token_counts["conventions"] += count_tokens(f"{name}: {description}", self.model)
return self
def set_environment(self, env_type, details, description=None):
"""Describe the execution environment."""
self.environment_info[env_type] = {"details": details, "description": description}
self.token_counts["environment"] += count_tokens(f"{env_type}: {details}", self.model)
return self
def add_critical_pattern(self, pattern, explanation):
"""Document a pattern that should never be modified."""
pattern_info = {"pattern": pattern, "explanation": explanation}
self.critical_patterns.append(pattern_info)
self.token_counts["patterns"] += count_tokens(f"{pattern}: {explanation}", self.model)
return self
def generate_xml(self):
"""Generate the XML manifest for the AI assistant."""
manifest = ['<manifest>']
# Files section with token counts
files_section = ['<files>']
for file in self.files:
path = file['path']
content_tokens = self.token_counts["files"]["content"].get(path, 0)
file_type = self._get_file_type(path)
file_content = [
f"<path>{path}</path>",
f"<description>{file['description']}</description>"
]
if file_type:
file_content.append(f"<file_type>{file_type}</file_type>")
if file['key_components']:
file_content.append(create_xml_element("key_components", [
f"<component>{comp}</component>" for comp in file['key_components']
]))
file_content.append(f"<tokens>{content_tokens}</tokens>")
files_section.append(create_xml_element("file", file_content))
files_section.append('</files>')
manifest.append("\n".join(files_section))
# Environment details as simple detail elements
if self.environment_info:
for env_type, env_data in self.environment_info.items():
if isinstance(env_data, dict):
details = env_data["details"]
description = env_data.get("description")
else:
# Handle backward compatibility for old format
details = env_data
description = None
detail_attrs = {"description": description} if description else None
manifest.append(create_xml_element("detail", details, detail_attrs))
# Conventions section
if self.conventions:
conv_section = ['<conventions>']
for convention in self.conventions:
conv_section.append(create_xml_element("convention", [
f"<name>{convention['name']}</name>",
f"<description>{convention['description']}</description>"
]))
conv_section.append('</conventions>')
manifest.append("\n".join(conv_section))
# Critical patterns section
if self.critical_patterns:
patterns_section = ['<critical_patterns>']
for pattern in self.critical_patterns:
patterns_section.append(create_xml_element("pattern", [
f"<pattern>{pattern['pattern']}</pattern>",
f"<explanation>{pattern['explanation']}</explanation>"
]))
patterns_section.append('</critical_patterns>')
manifest.append("\n".join(patterns_section))
# Token usage summary
token_section = ['<token_usage>']
token_section.append(create_xml_element("files", [
f"<metadata>{self.token_counts['files']['metadata']}</metadata>",
create_xml_element("content", [
create_xml_element("file", [
f"<path>{path}</path>",
f"<tokens>{tokens}</tokens>"
]) for path, tokens in self.token_counts["files"]["content"].items()
]),
f"<total>{self.token_counts['total_content']}</total>"
]))
token_section.append('</token_usage>')
manifest.append("\n".join(token_section))
manifest.append('</manifest>')
result = "\n".join(manifest)
# Clean up any remaining double newlines
result = result.replace('\n\n', '\n')
return result
def _load_xsd_schema(self):
"""Load the XSD schema file for embedding in the manifest."""
try:
# Look for the XSD file in the helpers directory
# xsd_path = os.path.join(os.path.dirname(__file__), 'pipulate-context.xsd')
# if not os.path.exists(xsd_path):
# # Fallback: look relative to repo root
# xsd_path = os.path.join(repo_root, 'helpers', 'pipulate-context.xsd')
xsd_path = '/home/mike/repos/pipulate/assets/prompts/pipulate-context.xsd'
if os.path.exists(xsd_path):
with open(xsd_path, 'r', encoding='utf-8') as f:
return f.read()
else:
print(f"Warning: XSD schema not found at {xsd_path}")
return None
except Exception as e:
print(f"Warning: Could not load XSD schema: {e}")
return None
def _get_file_type(self, path):
"""Get file type based on extension for better LLM context"""
ext = os.path.splitext(path)[1].lower()
if ext:
lang_map = {
'.py': 'python',
'.js': 'javascript',
'.html': 'html',
'.css': 'css',
'.md': 'markdown',
'.json': 'json',
'.nix': 'nix',
'.sh': 'bash',
'.txt': 'text'
}
return lang_map.get(ext, 'text')
return None
def generate(self):
"""Generate the manifest for the AI assistant (legacy format)."""
return self.generate_xml() # Default to XML format
def create_pipulate_manifest(file_paths_with_comments):
"""Create a manifest specific to the Pipulate project."""
manifest = AIAssistantManifest()
# Track total tokens and processed files to respect limit and avoid duplicates
total_tokens = 0
max_tokens = MAX_TOKENS - TOKEN_BUFFER
processed_files = set()
result_files = []
# Define the environment
# Combine Python/Nix environment details
environment_details = "Python 3.12 in a Nix-managed virtualenv (.venv). Hybrid approach using Nix flakes for system dependencies + pip for Python packages."
manifest.set_environment("Development Environment", environment_details, description="environment")
# Add the raw FILES_TO_INCLUDE content for context
manifest.set_environment("Files Selection", "Below is the raw FILES_TO_INCLUDE content before any processing:\n\n" + FILES_TO_INCLUDE_RAW, description="story")
# Check for missing files and collect them
missing_files = []
for file_path, comment in file_paths_with_comments:
file_path = file_path.strip()
if not file_path:
continue
full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
if not os.path.exists(full_path):
missing_files.append(full_path)
# If any files are missing, raise an exception with clear details
if missing_files:
error_message = "The following files were not found:\n"
for missing_file in missing_files:
error_message += f" - {missing_file}\n"
error_message += "\nPlease check the file paths in FILES_TO_INCLUDE variable."
print(error_message)
sys.exit(1)
# Register key files with their contents and comments
for file_path, comment in file_paths_with_comments:
file_path = file_path.strip()
if not file_path or file_path in processed_files:
continue
processed_files.add(file_path)
result_files.append((file_path, comment)) # Store both path and comment
full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
try:
with open(full_path, 'r', encoding='utf-8') as f:
content = f.read()
content_tokens = count_tokens(content)
# Check if adding this file would exceed token limit
if total_tokens + content_tokens > max_tokens:
description = f"{os.path.basename(file_path)} [skipped: would exceed token limit]"
if comment:
description += f" - {comment}"
manifest.add_file(
file_path,
description
)
continue
total_tokens += content_tokens
description = f"{os.path.basename(file_path)} [loaded]"
if comment:
description += f" - {comment}"
# Add file to manifest
manifest.add_file(
file_path,
description,
content=content
)
except UnicodeDecodeError as e:
print(f"ERROR: Could not decode {full_path}: {e}")
sys.exit(1) # Exit with error code for encoding issues
except Exception as e:
print(f"ERROR: Could not read {full_path}: {e}")
sys.exit(1) # Exit with error code for any other exceptions
# Add conventions and patterns only if we have room
remaining_tokens = max_tokens - total_tokens
if remaining_tokens > 1000: # Arbitrary threshold for metadata
manifest.add_convention(
"FastHTML Rendering",
"All FastHTML objects must be converted with to_xml() before being returned in HTTP responses"
)
manifest.add_convention(
"Environment Activation",
"Always run 'nix develop' in new terminals before any other commands"
)
manifest.add_convention(
"Dependency Management",
"System deps go in flake.nix, Python packages in requirements.txt"
)
manifest.add_critical_pattern(
"to_xml(ft_object)",
"Required to convert FastHTML objects to strings before HTTP responses"
)
manifest.add_critical_pattern(
"HTMLResponse(str(to_xml(rendered_item)))",
"Proper pattern for returning FastHTML content with HTMX triggers"
)
# Return the manifest content WITHOUT the outer <manifest> tags
manifest_xml = manifest.generate()
# Remove opening and closing manifest tags to prevent double wrapping
if manifest_xml.startswith('<manifest>') and manifest_xml.endswith('</manifest>'):
manifest_xml = manifest_xml[len('<manifest>'):-len('</manifest>')]
return manifest_xml, result_files, total_tokens
# Add a function to copy text to clipboard
def copy_to_clipboard(text):
"""Copy text to system clipboard."""
try:
# For Linux
import subprocess
process = subprocess.Popen(['xclip', '-selection', 'clipboard'], stdin=subprocess.PIPE)
process.communicate(text.encode('utf-8'))
return True
except Exception as e:
print(f"Warning: Could not copy to clipboard: {e}")
print("Make sure xclip is installed (sudo apt-get install xclip)")
return False
parser = argparse.ArgumentParser(description='Generate an XML context file for AI code assistance.')
parser.add_argument('prompt', nargs='?', default=None,
help='An optional prompt string. If omitted, reads from prompt.md.')
parser.add_argument('-o', '--output', type=str, default=None,
help='Optional: Output filename.')
parser.add_argument('--no-clipboard', action='store_true',
help='Disable copying output to clipboard.')
args = parser.parse_args()
# Get the file list with comments
final_file_list = get_files_with_comments() # Start with the default list
# Handle prompt file - now with default prompt.md behavior
prompt_path = args.prompt
prompt_content = None
direct_prompt = None # New variable to store direct string prompts
if prompt_path:
# Check if the prompt is a file path or direct string
if os.path.exists(prompt_path):
# It's a file path
if not os.path.isabs(prompt_path):
prompt_path = os.path.join(os.getcwd(), prompt_path)
try:
with open(prompt_path, 'r', encoding='utf-8') as f:
prompt_content = f.read()
# print(f"Using prompt file: {prompt_path}") # MODIFICATION: Commented out
except Exception as e:
print(f"Error reading prompt file {prompt_path}: {e}")
sys.exit(1)
else:
# It's a direct string prompt
direct_prompt = prompt_path # Store the direct string
prompt_content = prompt_path
# print("Using direct string prompt") # MODIFICATION: Commented out
else:
# If no prompt specified, look for prompt.md in current directory
prompt_path = os.path.join(os.getcwd(), "prompt.md")
if os.path.exists(prompt_path):
try:
with open(prompt_path, 'r', encoding='utf-8') as f:
prompt_content = f.read()
# print(f"Using default prompt file: {prompt_path}") # MODIFICATION: Commented out
except Exception as e:
print(f"Error reading default prompt file: {e}")
sys.exit(1)
# else: # MODIFICATION: Commented out
# print("No prompt file specified and prompt.md not found in current directory.")
# print("Running without a prompt file.")
if prompt_content:
# Add prompt file to files list if not already present
if prompt_path and os.path.exists(prompt_path):
# Check if this file path is already in the list
existing_paths = [file_path for file_path, comment in final_file_list]
if prompt_path not in existing_paths:
final_file_list.append((prompt_path, "User prompt file"))
# Set the output filename
output_filename = args.output
# Set the pre and post prompts from the active template
pre_prompt = active_template["pre_prompt"]
post_prompt = active_template["post_prompt"]
# Override post_prompt with direct string if provided
if direct_prompt:
post_prompt = direct_prompt
# print(f"Using template: {active_template['name']}") # MODIFICATION: Commented out
# Create the manifest and incorporate user's pre_prompt
manifest_xml, processed_files, manifest_tokens = create_pipulate_manifest(final_file_list)
manifest = manifest_xml
# Add the pre-prompt and separator (without duplicating manifest)
lines = [pre_prompt]
lines.append("=" * 20 + " START CONTEXT " + "=" * 20)
total_tokens = count_tokens(pre_prompt, "gpt-4")
# Process each file in the manifest file list
for file_path, comment in processed_files:
full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
# Original detailed mode with markers, now including comments
comment_suffix = f" -- {comment}" if comment else ""
start_marker = f"# <<< START FILE: {full_path}{comment_suffix} >>>"
end_marker = f"# <<< END FILE: {full_path}{comment_suffix} >>>"
lines.append(start_marker)
try:
with open(full_path, 'r', encoding='utf-8') as infile:
file_content = infile.read()
file_tokens = count_tokens(file_content, "gpt-4")
token_info = f"\n# File token count: {format_token_count(file_tokens)}"
if comment:
token_info += f"\n# File purpose: {comment}"
lines.append(file_content + token_info)
except Exception as e:
error_message = f"# --- ERROR: Could not read file {full_path}: {e} ---"
print(f"ERROR: Could not read file {full_path}: {e}")
sys.exit(1) # Exit with error code
lines.append(end_marker)
# Add a separator and the post-prompt
lines.append("=" * 20 + " END CONTEXT " + "=" * 20)
post_prompt_tokens = count_tokens(post_prompt, "gpt-4")
if total_tokens + post_prompt_tokens <= MAX_TOKENS - TOKEN_BUFFER:
total_tokens += post_prompt_tokens
lines.append(post_prompt)
else:
print("Warning: Post-prompt skipped as it would exceed token limit")
# Calculate the final token count
def calculate_total_tokens(files_tokens, prompt_tokens):
"""Calculate total tokens and component breakdowns"""
file_tokens = sum(files_tokens.values())
total = file_tokens + prompt_tokens
return {
"files": file_tokens,
"prompt": prompt_tokens,
"total": total
}
def calculate_total_words(files_words, prompt_words):
"""Calculate total words and component breakdowns"""
file_words = sum(files_words.values())
total = file_words + prompt_words
return {
"files": file_words,
"prompt": prompt_words,
"total": total
}
# Calculate total tokens and words with proper accounting
files_tokens_dict = {}
files_words_dict = {}
for file_path, comment in processed_files:
try:
full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
with open(full_path, 'r', encoding='utf-8') as f:
content = f.read()
files_tokens_dict[file_path] = count_tokens(content, "gpt-4")
files_words_dict[file_path] = count_words(content)
except Exception as e:
print(f"ERROR: Could not count tokens/words for {file_path}: {e}")
sys.exit(1) # Exit with error code
# Calculate prompt tokens and words
pre_prompt_tokens = count_tokens(pre_prompt, "gpt-4")
post_prompt_tokens = count_tokens(post_prompt, "gpt-4")
prompt_tokens = pre_prompt_tokens + post_prompt_tokens
pre_prompt_words = count_words(pre_prompt)
post_prompt_words = count_words(post_prompt)
prompt_words = pre_prompt_words + post_prompt_words
# Calculate totals
token_counts = calculate_total_tokens(files_tokens_dict, prompt_tokens)
word_counts = calculate_total_words(files_words_dict, prompt_words)
# Update the token summary in the output
token_summary_content = [
f"<total_context_size>{format_token_count(token_counts['total'])}</total_context_size>",
f"<files_tokens>{format_token_count(token_counts['files'])}</files_tokens>",
f"<prompt_tokens>{format_token_count(prompt_tokens)}</prompt_tokens>"
]
output_xml = (f'<?xml version="1.0" encoding="UTF-8"?>\n'
f'<context schema="pipulate-context" version="1.0">\n'
f'{create_xml_element("manifest", manifest)}\n'
f'{create_xml_element("pre_prompt", pre_prompt)}\n'
f'{create_xml_element("content", chr(10).join(lines))}\n'
f'{create_xml_element("post_prompt", post_prompt)}\n'
f'{create_xml_element("token_summary", token_summary_content)}\n'
f'</context>')
# ============================================================================
# MODIFIED OUTPUT SECTION
# ============================================================================
# --- Files Included Section ---
print("--- Files Included ---")
for file_path, comment in processed_files:
full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
token_count = files_tokens_dict.get(file_path, 0)
print(f"• {full_path} ({format_token_count(token_count)})")
print()
# --- Token Summary Section ---
print("--- Token Summary ---")
print(f"Total tokens: {format_token_count(token_counts['total'])}")
if word_counts['total'] is not None:
print(f"Total words: {format_word_count(word_counts['total'])}")
# --- Size Perspective Section ---
size_info = format_size_with_comparison(word_counts['total'], token_counts['total'])
print(f"\n--- Size Perspective ---")
print(f"📝 Content size: {size_info['word_comparison']}")
print(f"🤖 Token size: {size_info['token_comparison']}")
# Calculate and show token-to-word ratio
ratio = token_counts['total'] / word_counts['total'] if word_counts['total'] > 0 else 0
print(f"📊 Token-to-word ratio: {ratio:.2f} (higher = more technical/structured content)")
print()
# Write the complete XML output to the file
if args.output:
try:
with open(args.output, 'w', encoding='utf-8') as outfile:
outfile.write(output_xml)
print(f"Output written to '{args.output}'")
except Exception as e:
print(f"Error writing to '{args.output}': {e}")
# By default, copy the output to clipboard unless --no-clipboard is specified
if not args.no_clipboard:
if copy_to_clipboard(output_xml):
print("Output copied to clipboard")
# print("\nScript finished.") # MODIFICATION: Commented out
# When creating the final output, use direct_prompt if available
if direct_prompt:
post_prompt = direct_prompt # Use the direct string instead of template XML
But the most import aspect here is the promotion of prompt_foo.md
up to the
repo root level. This is going to put it in my face all the time during
development reminding me that I can and should use this technique. It will
always at least allow me to get easy 2nd opinions to whatever is going on in
whatever AI-assisted code editor I’m using, be it an actual editor or
command-line by packaging all the context up to go… to go anywhere. But right
now only to Gemini 2.5 Pro because that’s the only one capable of taking what is
now appearing to be 140K tokens of the GPT-4 sized variety in one go.
My next step is going to be reworking tree
to probably use eza --tree
--git-ignore
so I don’t need such a big filter on tree. I’ll also probably look
at the output of prompt_foo.py
more directly now to tend to that tree,
trimming it and maybe cutting off the cruft/
folder finally. And a last
thought is to also package in the output of Pyreverse (part of Pylint) or
py2puml along with the tree to give staggering context to the AI code assistants
and really compelling reasons to trim the surface area of the repo. I’m thinking
I may even take all those feather svg’s out and only wrap the ones into
config.py
on an as-needed basis.
The New Cornerstone: A Streamlined prompt_foo.py
So prompt_foo.py
is both about better AI coding assistance in a platform
independent way and helping both humans and AIs gain a better more holistic
overview of a git repo’s contents, dependencies and such.
And all of this really is geared towards speeding up iterative cycles by always making the code cleaner, shorter, be wrapped in better mental models as reflected by the filenames, directories and file arrangement. I have to be able to slam out this app or that tool based on this work.
Book Analysis
Ai Editorial Take
This entry is a perfect microcosm of the modern developer’s journey. It brilliantly connects the high-level, philosophical need for tool independence with the low-level, practical details of repository management. The concept of ‘AI ergonomics’ is particularly insightful and timely, offering a new vocabulary for an emerging best practice. The narrative arc—from identifying a problem, to executing a solution, to iterating on that solution after an AI interaction—is compelling and educational. This piece isn’t just a technical log; it’s a story about adapting, simplifying, and reclaiming control in the age of AI-assisted coding.
Title Brainstorm
- Title Option: Refactoring for AI Ergonomics
- Filename:
refactoring-for-ai-ergonomics.md
- Rationale: This title introduces and centers the author’s key concept of ‘AI ergonomics,’ which is the core theme of the entry.
- Filename:
- Title Option: The Universal API: Future-Proofing with the Command Line
- Filename:
universal-api-command-line.md
- Rationale: Focuses on the philosophical underpinning of the entry—that the CLI and Unix pipes are the most durable and portable interface for developers.
- Filename:
- Title Option: A Live Refactor: Tidying a Codebase for Human and AI Clarity
- Filename:
live-refactor-human-ai-clarity.md
- Rationale: Highlights the ‘show, don’t tell’ nature of the entry, emphasizing the practical, hands-on demonstration of repository cleanup.
- Filename:
- Title Option: Escaping the Vendor Trap: A CLI-First Development Workflow
- Filename:
escaping-vendor-trap-cli-workflow.md
- Rationale: Frames the entry around the timely and relatable problem of avoiding dependency on proprietary development environments.
- Filename:
Content Potential And Polish
- Core Strengths:
- Authentic Problem-Solving: The entry provides a genuine, unvarnished look at a developer’s thought process, including missteps and corrections.
- Practical Demonstration: It moves from high-level concepts like vendor lock-in to concrete
git mv
andls
commands, making the process tangible. - Novel Concept: The idea of “AI ergonomics” is a fresh and valuable frame for thinking about modern code repository management.
- Code as Artifact: Including the final, simplified Python script provides a powerful payoff and a useful tool for the reader.
- Suggestions For Polish:
- Abstract the Specifics: For a broader audience, consider replacing hardcoded paths like
/home/mike/repos/pipulate
with placeholders like~/path/to/your-repo
in the explanatory text. - Visualize the Change: Add a simple ‘before’ and ‘after’ directory tree diagram (perhaps using ASCII or a simple graphic) to visually reinforce the impact of the refactoring.
- Add a Glossary Box: Briefly define tools like
ripgrep
,xargs
,sed
, andeza
for readers who may not be CLI power users. - Structure the AI Interaction: Clearly demarcate the prompts given to the AI and the (flawed) responses to better illustrate the feedback loop.
- Abstract the Specifics: For a broader audience, consider replacing hardcoded paths like
Next Step Prompts
- Based on the ‘before’ state implied by the
git mv helpers/ scripts
command and the finalls -la
output, generate two ASCII art directory trees representing the Pipulate repository structure ‘Before’ and ‘After’ the refactoring. - Analyze the final
prompt_foo.py
script and create a Mermaid.js sequence diagram illustrating the primary workflow: a user runs the script from the command line, it loadsfoo_files.py
, reads theprompt.md
, packages the content into an XML payload, and copies it to the clipboard.