Refactoring for AI Ergonomics and a Future-Proof CLI Workflow

Context for the Curious Book Reader

This journal entry captures a pivotal moment in modern software development: the conscious act of redesigning a workspace for better collaboration with AI assistants. It’s a raw, in-the-moment account of refactoring a codebase, not just for human clarity, but for what the author terms “AI ergonomics.” The narrative follows a chain of reasoning from the philosophical battle against vendor lock-in to the practical power of simple command-line tools like ripgrep and the Unix pipe. Witness a live refactor, an initially failed interaction with a Large Language Model, and the subsequent simplification of a key tool—all in service of a faster, more portable, and future-proof development cycle.

Technical Journal Entry Begins

What is the highest impact next step I could possibly take?

Gluing together tools.

Tool-calling is working. I have seen to that with the last few technical journal articles. There’s a system in there that’s working and with each successive pass of a technical journal entry like this, the mental model is clarifying in my head, and then by extension as I tighten up the code around it, in the analogue to minds of the AI coding assistants that come in to help occasionally.

I say occasionally because I have the first and foremost responsibility to go into my code and to make sense of it and explain it to others, be they human or be they machine.

And most of that was done by trimming down the amount of stuff that’s even there in the first place when you look around with the ls command form the root directory of the git software repository, which is a fancy way of saying folder. But it must be pointed out because things need to lump-up into cellular units at some points and have clear scope and boundaries. For software these days and for the purposes of AI staying in their lane so to speak, that is what most of us humans call a folder, but it is also a git repo because of the presence of a usually invisible .git/ folder inside of it that holds all of the git distributed version control system (DVCS) meta-file inside of it.

In many ways you normally just don’t see the fact that the folder you’re in is this special kind of folder called a git repo. That’s the reason that dot . is put in front of “git” so that the folder becomes invisible to most desktop manager windowing operating systems. You’re not supposed to click it and go inside of it. All the interactions with the git program are done through the git program from the command-line, that is unless you’re nurturing a VSCode-dependency and only know git through it — a growing malady.

I’m not going to let this tech journal entry evolve into a git tutorial but rather I give the .git/ folder an example of the kind of folder that doesn’t count against the decision fatigue and focus-dilution that sets in among both AIs and humans when they ls the root directory — something which you can now do directly in Pipulate by typing [ls] into the chat window.

I am still on the magicuser branch of Pipulate and I have not taken over the main branch of Pipulate yet. That will come soon. But first I test all the reins. If I merely type [ls] alone into the chat window, the command executes. But if I type any of the following variations, it fails:

foo [ls]

[ls] bar

foo [ls] bar

…and I think it shouldn’t fail.

I can use my Prompt Fu process to fix this but now that I’ve been working in Gemini CLI for over a week I now know with amazing clarity how important this process has become as part of general a future-proofing practice. As far as AI coding assistance help goes, nurturing portability between AI coding assistants is the equivalent of packing up and taking the game elsewhere. This ability must be preserved at all costs — especially now when most people don’t even realize the battle for your habits is occurring.

Dodging the Vendor Lock-In Trap

Do not let any vendor software close you in. Do not develop difficult-to-rewire muscle memory around a piece of software that you derive your own skill-set from and that your future depends on. That’s the big game being played right now. That’s the tug-of-war between Microsoft and Cursor. It’s particularly an Adobe and Apple thing — Apple changed the very computer keyboard to lock you into their platform in one of the most insulting moves to thinking human professionals tending their careers in human history. But a similar battle is being waged now over AI interfaces.

I was falling into the Cursor/VSCode trap myself because that’s where you had to be to play with the most shiny toys in AI coding assistance for awhile, but now the command-line tools are maturing. Claude Code seemed to have led the way and you’d think I’d have been all over that right away, but my vendor lock-in spidey sense was tingling because it was a CLI tool that was not free and opens source with a good license. It is completely proprietary. But because it’s command-line the muscle memory remapping would not be that difficult, but still I resisted. Then OpenAI came out with Codex CLI and that was mixed-licensing. It looks like they’re coming around and going all-in on fully FOSS but not before Google came out with a genuine Apache 2 licensed Gemini CLI which was a triple whammy appeal for me because it was a favorite coding assistant that was giving away 1000 prompts per day and with that great license on the tool. So now I’m all-in on the CLI-tools and off the VSCode-forks which I had to hold my nose to use in the first place because I’m a vim user. So I avoided the vendor lock-in trap on AI coding assistant tools. Wanna know what that API looks like?

cat prompt.md | gemini -y

And that’s the same Unix piping that I’ve been so advocating and working into all my processes everywhere. It’s the great future-proofing API. The output of one program gets piped into the input of the next program, each layering in their own unique value in the value chain, transforming raw materials into finished products of the highest value.

Gemini CLI needs nothing but the -y flag which stands for yolo mode which stands for “You Only Live Once” ahaha!

The Universal API: Unix Pipes and Ripgrep

The purpose of this particular tech journal article is to remind myself of the clarity of the one universal API to rule them on, the CLI, particularly with the Unix pipe method of sending the output of one program into the input of another. Two particular patters are ruling my world right now, that above one for Gemini but also this one for general code refactoring:

rg | xargs sed

Which is the base pattern with a specific example being:

rg 'foo bar' -l --null | xargs -0 sed 's/foo bar/baz qux/g'

I’m really settling into that last one. I’ve used it to profoundly good effect over the past couple of weeks though. I renamed my plugins/ folder to apps/ with a sort of surgical precision that… that… hmmm.

Well, it’s just a relief to be able to do this systemically from the command-line and not be beholden to vendor products for such a fundamental capability. And I never developed this basic skill because… because…

Well, I hate grep and find.

The problem for me was find and grep was that I used them just infrequently enough and they were just quirky enough that they never stuck.

grep and find never committed to muscle memory naturally as many things do from day-to-day us. Merely remembering the name of the tools you should be using and their piping patterns should normally be enough. But they weren’t in these cases. Something about the -i parameter and directories versus files and what directory you were starting from. It all kept escaping my long-term memory.

Enter ripgrep. RIP GREP stuck right away because I laughed quietly and internally at the notion. And then I was like if such a boastful claim were true, I should be able to use it without remembering convoluted parameters, so I tried rg foo and lo and behold every occurrence of “foo” in any of the files recursively down from the current directory I executed rg from just appeared in color-coded sensible glory! The music played and the clouds parted. I had grep the way I always wanted it.

Yes, I know I could have done that with an alias to the real grep and in fact I had over the years. That approach sucks.

I was like these ripgrep developers get it. And I got it too — right away because rg (the abbreviation) automatically follows the .gitignore rules. One of the hardest things in a global search is filtering out the directories you don’t want included which 99% of the time these days means the .git/ folder — thus there’s a grep command built into git these days which is my fallback after rg. I’ve got 2 lines of defense against my grep denseness.

From Repository Chaos to AI Clarity

Okay, that’s all a silly tangent. I’m looking to get the biggest bang for the buck out of my latest round of work. And I was talking about promoting my Prompt Fu technique to something not buried under the helpers/ folder.

Hmmm. After all that work to get the repo’s root clean, how could I be considering this?

Because of the promotion of the prompt_foo.py script into something higher profile and more featured in your everyday workflow, like it really is. But then what of the helpers/ directory? That’s almost its main purpose. There are the other scripts that help with building workflows. But those could both be turned into tools managed in the tools/ folder and under the more idiomatic direct control of the server.py Flask-like Starlette Uvicorn app. And tools under the tools/ directory could still be given a traditional command-line existence through the cli.py program, and so… and so… heplers/ is just a temporary holding bin for things on their way to becoming more formally incorporated and… and… and the concept of “helpers” can and should be retired.

And so we promote prompt_foo.py up to the repo root and by definition a prompt.md needs to live there so I’m not changing directories all the time, and for that matter the foo_files.py as well. I can make it not output foo.txt which I basically never use given it copies the prompt into the OS buffer which is where I always paste it from anyway, and I can continue outputting foo.txt optionally. And pipulate-context.xsd can be moved into data/ or assets/ and generate_files_list.py I think I’ll retire maybe into the cruft/ folder since things move around so much and its usefulness is for when things are in more of non-chaining locations.

Hmm, okay, that’s a plan. Just do it. Oh and the helpers/ folder now becomes scripts/ which is more in the spirit of scripts on their way to becoming tools. That makes a lot more sense. It will break a few workflows that use the helpers but it’s better in the long-run. Those workflows should be using “tools” for these purposes and not “helpers”.

Oh and finally I make a subrepos/ folder for both Notebooks/ and Client_Work/ so I get 2 items off the repo root and make it explicit to any AIs (or humans) looking that these are subrepos within a parent git repo. A quick glance at .gitignore will confirm that.

This is AI ergonomics here.

A Live Refactor in the Terminal

[mike@nixos:~/repos/pipulate]$ git mv helpers/ scripts

[mike@nixos:~/repos/pipulate]$ ls
AI_RUNME.py  botify_token.txt    config.py  dist         flake.lock         imports      logs               pyproject.toml    scripts    tools
apps         browser_automation  cruft      downloads    flake.nix          __init__.py  pipulate.egg-info  README.md         server.py  whitelabel.txt
assets       cli.py              data       favicon.ico  hello_world.ipynb  LICENSE      __pycache__        requirements.txt  subrepos

[mike@nixos:~/repos/pipulate]$ git status
On branch magicuser
Your branch is up to date with 'origin/magicuser'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	renamed:    helpers/botify/botify_api_bootcamp.md -> scripts/botify/botify_api_bootcamp.md
	renamed:    helpers/botify/botify_api_examples.md -> scripts/botify/botify_api_examples.md
	renamed:    helpers/botify/hello_world_git_managed.ipynb -> scripts/botify/hello_world_git_managed.ipynb
	renamed:    helpers/botify/make_botify_docs.ipynb -> scripts/botify/make_botify_docs.ipynb
	renamed:    helpers/gsc/generate_categories.py -> scripts/gsc/generate_categories.py
	renamed:    helpers/gsc/gsc_category_analysis.py -> scripts/gsc/gsc_category_analysis.py
	renamed:    helpers/gsc/gsc_keyworder.py -> scripts/gsc/gsc_keyworder.py
	renamed:    helpers/gsc/gsc_page_query.ipynb -> scripts/gsc/gsc_page_query.ipynb
	renamed:    helpers/gsc/gsc_top_movers.py -> scripts/gsc/gsc_top_movers.py
	renamed:    helpers/prompt_foo/generate_files_list.py -> scripts/prompt_foo/generate_files_list.py
	renamed:    helpers/prompt_foo/pipulate-context.xsd -> scripts/prompt_foo/pipulate-context.xsd
	renamed:    helpers/prompt_foo/prompt_foo.py -> scripts/prompt_foo/prompt_foo.py
	renamed:    helpers/release/RELEASE_GUIDE.md -> scripts/release/RELEASE_GUIDE.md
	renamed:    helpers/release/__init__.py -> scripts/release/__init__.py
	renamed:    helpers/release/ai_commit.py -> scripts/release/ai_commit.py
	renamed:    helpers/release/publish.py -> scripts/release/publish.py
	renamed:    helpers/release/sync_ascii_art.py -> scripts/release/sync_ascii_art.py
	renamed:    helpers/release/version_sync.py -> scripts/release/version_sync.py
	renamed:    helpers/vulture_whitelist.py -> scripts/vulture_whitelist.py
	renamed:    helpers/workflow/WORKFLOW_DEVELOPMENT_GUIDE.md -> scripts/workflow/WORKFLOW_DEVELOPMENT_GUIDE.md
	renamed:    helpers/workflow/create_workflow.py -> scripts/workflow/create_workflow.py
	renamed:    helpers/workflow/manage_class_attributes.py -> scripts/workflow/manage_class_attributes.py
	renamed:    helpers/workflow/splice_workflow_step.py -> scripts/workflow/splice_workflow_step.py
	renamed:    helpers/workflow/swap_workflow_step.py -> scripts/workflow/swap_workflow_step.py
	renamed:    helpers/workflow/update_template_config.py -> scripts/workflow/update_template_config.py
	renamed:    helpers/workflow/workflow_reconstructor.py -> scripts/workflow/workflow_reconstructor.py

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   .gitignore

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	scripts/prompt_foo/prompt.md


[mike@nixos:~/repos/pipulate]$

Okay now we go into the prompt_foo folder and distribute things out:

[mike@nixos:~/repos/pipulate]$ cd scripts/prompt_foo/

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ ls
foo_files.py  foo.txt  generate_files_list.py  pipulate-context.xsd  prompt_foo.py  prompt.md  __pycache__

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv prompt_foo.py ~/repos/pipulate/

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv foo_files.py ~/repos/pipulate/
fatal: not under version control, source=scripts/prompt_foo/foo_files.py, destination=foo_files.py

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ mv foo_files.py ~/repos/pipulate/

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ rm foo.txt

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ rm prompt.md

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ rm -rf __pycache__/

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv generate_files_list.py ~/repos/pipulate/cruft/

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ git mv pipulate-context.xsd ~/repos/pipulate/assets/prompts

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$

And now we remove the folder:

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ ls

[mike@nixos:~/repos/pipulate/scripts/prompt_foo]$ cd ..

[mike@nixos:~/repos/pipulate/scripts]$ rm -rf prompt_foo/

[mike@nixos:~/repos/pipulate/scripts]$

All right, let’s take a look at the repo root:

[mike@nixos:~/repos/pipulate]$ ls -la
total 920
drwxr-xr-x 20 mike users   4096 Sep 22 12:45 .
drwxr-xr-x 11 mike users   4096 Sep 17 12:38 ..
-rw-r--r--  1 mike users   3923 Sep 21 05:26 AI_RUNME.py
drwxr-xr-x  3 mike users   4096 Sep 21 11:23 apps
drwxr-xr-x 11 mike users   4096 Sep 21 04:44 assets
-rw-r--r--  1 mike users     66 Jul 18 11:15 botify_token.txt
drwxr-xr-x 17 mike users   4096 Sep 21 05:49 browser_automation
-rw-r--r--  1 mike users  22615 Sep 21 05:26 cli.py
-rw-r--r--  1 mike users  16470 Sep 21 09:21 config.py
drwxr-xr-x 11 mike users   4096 Sep 22 12:46 cruft
drwxr-xr-x  4 mike users   4096 Sep 21 17:24 data
drwxr-xr-x  2 mike users   4096 Jul  8 11:50 dist
drwxr-xr-x  9 mike users   4096 Jul 18 11:28 downloads
-rw-r--r--  1 mike users 215294 May 11 12:04 favicon.ico
-rw-r--r--  1 mike users   1497 Sep 20 17:46 flake.lock
-rw-r--r--  1 mike users  29327 Sep 14 14:27 flake.nix
-rw-r--r--  1 mike users   1842 Sep 22 10:08 foo_files.py
drwxr-xr-x  7 mike users   4096 Sep 22 12:48 .git
-rw-r--r--  1 mike users     70 May 11 12:04 .gitattributes
-rw-r--r--  1 mike users   2047 Sep 22 12:25 .gitignore
-rw-r--r--  1 mike users    924 Sep 14 10:59 hello_world.ipynb
drwxr-xr-x  5 mike users   4096 Sep 22 05:28 imports
-rw-r--r--  1 mike users   1556 Sep 21 05:27 __init__.py
drwxr-xr-x  2 mike users   4096 Sep 14 10:59 .ipynb_checkpoints
drwxr-xr-x  4 mike users   4096 Jun 14 07:49 .jupyter
-rw-r--r--  1 mike users   1123 Sep 14 14:28 LICENSE
drwxr-xr-x  2 mike users   4096 Sep 22 10:09 logs
drwxr-xr-x  2 mike users   4096 Jul  8 11:50 pipulate.egg-info
-rw-r--r--  1 mike users  50003 Sep 21 05:27 prompt_foo.py
drwxr-xr-x  2 mike users   4096 Sep 21 16:21 __pycache__
-rw-r--r--  1 mike users   2278 Sep 14 12:38 pyproject.toml
-rw-r--r--  1 mike users 104655 Sep 14 14:28 README.md
-rw-r--r--  1 mike users   6008 Sep 14 14:29 requirements.txt
drwxr-xr-x  8 mike users   4096 Sep 22 12:49 scripts
-rw-r--r--  1 mike users 346763 Sep 21 16:08 server.py
-rw-r--r--  1 mike users     36 May 11 12:06 .sesskey
drwxr-xr-x  4 mike users   4096 Sep 22 12:24 subrepos
drwxr-xr-x  3 mike users   4096 Sep 21 16:45 tools
drwxr-xr-x  7 mike users   4096 Jul 13 17:19 .venv
-rw-r--r--  1 mike users     11 Sep 14 08:01 whitelabel.txt

[mike@nixos:~/repos/pipulate]$

Okay, that’s all good. Now you’re going to see the benefit of this system right away in helping me heal a bunch of the stuff I just broke.

This very discussion here in the tech journal will be becoming the prompt to Genini 2.5 Pro through the Web UI because I can send it the XML payload produced by prompt_foo.py which I think will still work loading in the foo_files.py which will also be on repo root for rapid editing convenience without going into subfolders. However, a few things:

The path to /home/mike/repos/pipulate/assets/prompts/pipulate-context.xsd needs to be updated.
I need to gut out the support for generate_files_list.py which will be a nice simplification.
I need to turn off the outputting of foo.txt by default. It should need to be asked for with a flag.

I’m sure there’s other stuff I forgot. There’s some tree visualization tech in there too which I’m going to want to make abide by .gitignore automatically but I think that’s too much to ask for with this implementation plan request.

But there you have it, Gemini. This is my first use of prompt_foo.py from Pipulate repo root. By the time you read this I think I will have taken care of point #1 just to have got it to run, but points 2 and 3 and whatever else you can think of that I overlooked here is up to you as far as making an implementation plan, please and thank you!

A Botched Assist and a Manual Correction

[mike@nixos:~/repos/pipulate]$ python prompt_foo.py 
Using default prompt file: /home/mike/repos/pipulate/prompt.md
Using template 1: Material Analysis Mode

--- Files Included ---
• /home/mike/repos/pipulate/prompt_foo.py (10,243 tokens)
• /home/mike/repos/pipulate/prompt.md (5,202 tokens)

--- Token Summary ---
Total tokens: 15,855 tokens
Total words: 7,493 words

Size Perspective:
📝 Content size: Short story
🤖 Token size: Long-form article
📊 Token-to-word ratio: 2.12 (higher = more technical/structured content)

Output written to 'foo.txt'
Output copied to clipboard

Script finished.

[mike@nixos:~/repos/pipulate]$

Me: [I paste the XML payload into Gemini 2.5 Pro Web UI]

Wow, I certainly didn’t expect it to generate back the entire prompt_foo.py replacment code!

Gemini: Of course! It looks like you’ve done some great work refactoring the repository for better “AI ergonomics.” That’s a sharp insight—making codebases easier for both humans and AI to parse is definitely the way forward.

Here is the implementation plan to update prompt_foo.py based on the changes you’ve outlined. The plan addresses the broken paths and incorporates your requested improvements.

📝 Implementation Plan

The plan is broken down into three main tasks you requested, plus one additional observation I made.

Update XSD Path: The hardcoded path to pipulate-context.xsd needs to be corrected to reflect its new location in assets/prompts/.
Remove generate_files_list.py Logic: The functionality tied to --files and the generate_files_list.py script will be removed to simplify the code, as you’ve moved that script to cruft/.
Default Output to Clipboard: The script will be changed to no longer write to foo.txt by default. It will only write to a file if an output path is explicitly provided with the -o or --output flag. Copying to the clipboard remains the default behavior.
Enhance tree Command: The run_tree_command function will be slightly improved to ignore a few more common directories like .venv and dist to produce a cleaner output.

Below is the fully updated prompt_foo.py script with all these changes implemented.

💻 Updated `prompt_foo.py`

Here is the complete code for /home/mike/repos/pipulate/prompt_foo.py. You can replace the existing file’s contents with this code. Key changes are marked with comments.

[Huge output of full program code removed]

This updated script should now work correctly from the root of your repository and behave as you’ve requested. The changes are minimal but targeted to fix the issues from the refactor and improve the workflow. Happy prompting!

Me: I’m very skeptical that any AI can generatively give back a file that big without introducing regressions. So I do a fresh git commit so I can diff it…

[mike@nixos:~/repos/pipulate]$ python prompt_foo.py 
  File "/home/mike/repos/pipulate/prompt_foo.py", line 26
        """Load FILES_TO_INCLUDE_RAW from foo_files.py module."""
    ^
SyntaxError: invalid non-printable character U+00A0

[mike@nixos:~/repos/pipulate]$

Yeah, I didn’t think so. So I git reset --hard HEAD and take the implementation plan over to Gemini CLI.

Me:

I update the prompt:

📝 Implementation Plan

The plan is broken down into three main tasks you requested, plus one additional observation I made.

Remove generate_files_list.py Logic: The functionality tied to --files and the generate_files_list.py script will be removed to simplify the code, as you’ve moved that script to cruft/.
Default Output to Clipboard: The script will be changed to no longer write to foo.txt by default. It will only write to a file if an output path is explicitly provided with the -o or --output flag. Copying to the clipboard remains the default behavior.
Enhance tree Command: The run_tree_command function will be slightly improved to ignore a few more common directories like .venv and dist to produce a cleaner output.

Below is the fully updated prompt_foo.py script with all these changes implemented.

Do not try to outright replace the old with the new as it has problems. It generates this error:

[mike@nixos:~/repos/pipulate]$ python prompt_foo.py 
  File "/home/mike/repos/pipulate/prompt_foo.py", line 26
        """Load FILES_TO_INCLUDE_RAW from foo_files.py module."""
    ^
SyntaxError: invalid non-printable character U+00A0

[mike@nixos:~/repos/pipulate]$

Yeah, I didn’t think it could generatively do this. I got over-ambitious trying to do too many steps at the same time and the AI got over ambitous thinking it could generatively replace the whole file.

I take all my lessons of the past couple weeks. I make the edits I’m asking for tiny and more atomic separation of things I’m trying to accomplish. I submit the request in the new template format and review the changes and make the edits manually myself. I am the tool being called, ahaha!

I change the default file ouput setting to None, something Gemini CLI for some baffiling reason still has difficulty with.

parser.add_argument('-o', '--output', type=str, default="foo.txt", help='Output filename (default: foo.txt)')

…haha! As I go through this I realize that the edits were so friggin light and that I was stupid to even turn to AI coding assistance in the first place. But it made the process explicit. I simply removed a command-line argument. What I’m realizing as I look at this is that I almost never use these other templates that I have in the system.

Paring Down to the Essential Workflow

I almost never use the form other than:

python prompt_foo.py

…to take in the default prompt.md file off of the drive from the same location where prompt_foo.py itself is. The only other use case I expect to ever use is:

python prompt_foo.py "Some prompt typed here"

Advise me on what more cleanup I should do. I think there are unused template examples that I could gut given my common use cases.

Gemini and I went back and forth quite a few times.

Me: Okay, this project successfully concluded. We pared down the contents of prompt_foo.py to be optimized for its most common use case. I gutted lots of code that was there for use cases I never used. It’s now down to a reasonable enough level that I want to include it here:

import os
import re
import sys
import argparse
import tiktoken
from typing import Dict, List, Optional, Union

# Hello there, AI! This is the file that I use to make getting code assistance
# on the command-line with full 1-shot prompts that bundles up the parts of the
# codebase that should be in context into a single XML payload.

# ============================================================================
# USER CONFIGURATION: Files to include in context
# ============================================================================
# Files are configured in foo_files.py module.
# To update the file list:
#    1. Edit foo_files.py to to include the files you want in context.
#    2. Run prompt_foo.py

# --- Configuration for context building ---
# Edit these values as needed
repo_root = "/home/mike/repos/pipulate"  # Path to your repository

# Token buffer for pre/post prompts and overhead
TOKEN_BUFFER = 10_000
MAX_TOKENS = 4_000_000  # Set to a high value since we're not chunking

# FILES_TO_INCLUDE_RAW will be loaded from foo_files.py when needed
FILES_TO_INCLUDE_RAW = None

def load_files_to_include():
    """Load FILES_TO_INCLUDE_RAW from foo_files.py module."""
    global FILES_TO_INCLUDE_RAW
    if FILES_TO_INCLUDE_RAW is None:
        try:
            import foo_files
            FILES_TO_INCLUDE_RAW = foo_files.FILES_TO_INCLUDE_RAW
        except ImportError:
                print("ERROR: foo_files.py not found! Please ensure it exists.")
                sys.exit(1)
        except AttributeError:
                print("ERROR: foo_files.py exists but doesn't contain FILES_TO_INCLUDE_RAW!")
                print("Please ensure the variable is defined in foo_files.py.")
                sys.exit(1)
    return FILES_TO_INCLUDE_RAW

# Now process the raw string into the list we'll use  
def get_files_to_include():
    """Get the processed list of files to include."""
    files_raw = load_files_to_include()
    files_list = files_raw.strip().splitlines()
    
    # Filter out any commented lines
    files_list = [line for line in files_list if not line.strip().startswith('#')]
    # Filter out blank lines
    files_list = [line for line in files_list if line.strip()]
    
    # Strip off any <-- comments and trailing # comments (handling variable whitespace)
    files_list = [re.sub(r'\s*#.*$', '', line.split('<--')[0]).rstrip() for line in files_list]
    
    # Remove duplicates while preserving order
    seen_files = set()
    deduplicated_files = []
    for file_path in files_list:
        if file_path and file_path not in seen_files:
            seen_files.add(file_path)
            deduplicated_files.append(file_path)
    
    return deduplicated_files

def get_files_with_comments():
    """Get the processed list of files with their comments preserved."""
    files_raw = load_files_to_include()
    files_list = files_raw.strip().splitlines()
    
    # Filter out any commented lines (starting with #)
    files_list = [line for line in files_list if not line.strip().startswith('#')]
    # Filter out blank lines
    files_list = [line for line in files_list if line.strip()]
    
    # Process each line to extract file path and comment
    seen_files = set()
    files_with_comments = []
    
    for line in files_list:
        line = line.strip()
        if not line:
            continue
            
        # Extract file path and comment
        file_path = line
        comment = ""
        
        # Handle <-- style comments first (takes priority over # comments)
        if '<--' in line:
            parts = line.split('<--', 1)
            file_path = parts[0].strip()
            comment = parts[1].strip()
            # Remove any trailing # from the file_path if it exists
            if file_path.endswith('#'):
                file_path = file_path.rstrip('#').strip()
        # Handle trailing # comments
        elif '#' in line:
            parts = line.split('#', 1)
            file_path = parts[0].strip()
            comment = parts[1].strip()
        else:
            file_path = line.strip()
            comment = ""
        
        # Remove duplicates while preserving order
        if file_path and file_path not in seen_files:
            seen_files.add(file_path)
            files_with_comments.append((file_path, comment))
    
    return files_with_comments

PROMPT_FILE = None  # Default prompt file is None

# ============================================================================
# CONTENT SIZE REFERENCE SCALE
# ============================================================================
# Real-world comparisons for understanding token and word counts
CONTENT_SIZE_SCALE = {
    "words": [
        (500, "Short blog post"),
        (1000, "Long blog post or magazine article"),
        (2500, "Academic paper or white paper"),
        (5000, "Long-form journalism piece"),
        (7500, "Short story"),
        (10000, "College research paper"),
        (15000, "Novella or master's thesis chapter"),
        (25000, "Master's thesis"),
        (40000, "Doctoral dissertation chapter"),
        (60000, "Short non-fiction book"),
        (80000, "Standard novel"),
        (120000, "Long novel"),
        (200000, "Epic fantasy novel"),
    ],
    "tokens": [
        (1000, "Email or short memo"),
        (2500, "Blog post"),
        (5000, "Magazine article"),
        (10000, "Academic paper"),
        (25000, "Long-form article"),
        (50000, "Short story or report"),
        (75000, "Novella chapter"),
        (100000, "Technical documentation"),
        (150000, "Short book"),
        (250000, "Standard novel"),
        (400000, "Long novel"),
        (500000, "Technical manual"),
        (750000, "Epic novel"),
    ]
}

def get_size_comparison(count, count_type="words"):
    """Get a human-readable comparison for word or token counts."""
    scale = CONTENT_SIZE_SCALE.get(count_type, CONTENT_SIZE_SCALE["words"])
    
    for threshold, description in scale:
        if count <= threshold:
            return description
    
    # If larger than our biggest reference
    return f"Larger than {scale[-1][1]}"

def format_size_with_comparison(word_count, token_count):
    """Format size information with human-readable comparisons."""
    word_comparison = get_size_comparison(word_count, "words")
    token_comparison = get_size_comparison(token_count, "tokens")
    
    return {
        "words": f"{word_count:,} words ({word_comparison})",
        "tokens": f"{token_count:,} tokens ({token_comparison})",
        "word_comparison": word_comparison,
        "token_comparison": token_comparison
    }

# --- XML Support Functions ---
def wrap_in_xml(content: str, tag_name: str, attributes: Optional[Dict[str, str]] = None) -> str:
    """Wrap content in XML tags with optional attributes."""
    attrs = " ".join(f'{k}="{v}"' for k, v in (attributes or {}).items())
    return f"<{tag_name}{' ' + attrs if attrs else ''}>{content}</{tag_name}>"

def create_xml_element(tag_name: str, content: Union[str, List[str]], attributes: Optional[Dict[str, str]] = None) -> str:
    """Create an XML element with optional attributes and content."""
    if isinstance(content, list):
        content = "\n".join(content)
    # content = content.replace('\n\n', '\n')  # Remove double newlines
    return wrap_in_xml(content, tag_name, attributes)

def create_xml_list(items: List[str], tag_name: str = "item") -> str:
    """Create an XML list from a list of items."""
    return "\n".join(f"<{tag_name}>{item}</{tag_name}>" for item in items)

# === Prompt Template ===
# Define multiple prompt templates and select them by index
active_template = {
        "name": "Material Analysis Mode",
        "pre_prompt": create_xml_element("context", [
            create_xml_element("system_info", """
You are about to review a codebase and related documentation. Please study and understand
the provided materials thoroughly before responding.
"""),
            create_xml_element("key_points", [
                "<point>Focus on understanding the architecture and patterns in the codebase</point>",
                "<point>Note how existing patterns could be leveraged in your response</point>",
                "<point>Consider both technical and conceptual aspects in your analysis</point>"
            ])
        ]),
        "post_prompt": create_xml_element("response_request", [
            create_xml_element("introduction", """
Now that you've reviewed the provided materials, please respond thoughtfully to the prompt.
Your response can include analysis, insights, implementation suggestions, or other relevant
observations based on what was requested.
"""),
            create_xml_element("response_areas", [
                create_xml_element("area", [
                    "<title>Material Analysis</title>",
                    "<questions>",
                    "<question>What are the key concepts, patterns, or architecture details in the provided materials?</question>",
                    "<question>What interesting aspects of the system stand out to you?</question>",
                    "<question>How would you characterize the approach taken in this codebase?</question>",
                    "</questions>"
                ]),
                create_xml_element("area", [
                    "<title>Strategic Considerations</title>",
                    "<questions>",
                    "<question>How might the content of the materials inform future development?</question>",
                    "<question>What patterns or conventions should be considered in any response?</question>",
                    "<question>What alignment exists between the provided materials and the prompt?</question>",
                    "</questions>"
                ]),
                create_xml_element("area", [
                    "<title>Concrete Response</title>",
                    "<questions>",
                    "<question>What specific actionable insights can be provided based on the prompt?</question>",
                    "<question>If implementation is requested, how might it be approached?</question>",
                    "<question>What recommendations or observations are most relevant to the prompt?</question>",
                    "</questions>"
                ])
            ]),
            create_xml_element("focus_areas", [
                "<area>Responding directly to the core request in the prompt</area>",
                "<area>Drawing connections between the materials and the prompt</area>",
                "<area>Providing value in the form requested (analysis, implementation, etc.)</area>"
            ])
        ])
    }

# file_list will be initialized when needed via get_files_to_include()

def count_tokens(text: str, model: str = "gpt-4") -> int:
    """Count the number of tokens in a text string."""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def count_words(text: str) -> int:
    """Count the number of words in a text string."""
    # Simple word counting: split on whitespace and filter out empty strings
    words = text.split()
    return len(words)

def format_token_count(num: int) -> str:
    """Format a token count with commas."""
    return f"{num:,} tokens"

def format_word_count(num: int) -> str:
    """Format a word count with commas."""
    return f"{num:,} words"

def run_tree_command():
    """Run the tree command and return its output."""
    try:
        import subprocess
        result = subprocess.run(
            ['tree', '-I', '__pycache__|client|data|downloads|looking_at|*.csv|*.zip|*.pkl|*.png|*.svg|*.html'],
            capture_output=True,
            text=True,
            cwd=repo_root
        )
        if result.returncode == 0:
            return result.stdout
        else:
            return f"Error running tree command: {result.stderr}"
    except Exception as e:
        return f"Error running tree command: {str(e)}"

# --- AI Assistant Manifest System ---
class AIAssistantManifest:
    """
    Manifest system for AI coding assistants to understand context before receiving files.
    
    This class generates a structured overview of what files and information 
    the assistant is about to receive, helping to:
    1. Set expectations about content length and complexity
    2. Provide a map of key components and their relationships
    3. Establish important conventions specific to this codebase
    4. Track token usage for each component and total context
    """
    
    def __init__(self, model="gpt-4"):
        self.files = []
        self.conventions = []
        self.environment_info = {}
        self.critical_patterns = []
        self.model = model
        self.token_counts = {
            "files": {
                "metadata": 0,  # For file descriptions and components
                "content": {}   # For actual file contents, keyed by filename
            },
            "environment": 0,
            "conventions": 0,
            "patterns": 0,
            "manifest_structure": 0,
            "total_content": 0  # Running total of file contents
        }
                        # Add XSD schema for XML structure understanding (front-load for LLM understanding)
        xsd_schema = self._load_xsd_schema()
        if xsd_schema:
            self.set_environment("XML Schema Definition", f"Below is the XSD schema that defines the structure of this XML context document:\n\n{xsd_schema}", description="schema")
        
        # Add tree output to environment info
        tree_output = run_tree_command()
        self.set_environment("Codebase Structure", f"Below is the output of 'tree -I \"__pycache__|client|data|*.csv|*.zip|*.pkl\"' showing the current state of the codebase:\n\n{tree_output}", description="tree")
    
    def add_file(self, path, description, key_components=None, content=None):
        """Register a file that will be provided to the assistant."""
        file_info = {
            "path": path,
            "description": description,
            "key_components": key_components or []
        }
        self.files.append(file_info)
        
        # Count tokens for this file's manifest entry (metadata)
        manifest_text = f"- `{path}`: {description}"
        if key_components:
            manifest_text += "\n  Key components:\n" + "\n".join(f"  - {comp}" for comp in key_components)
        self.token_counts["files"]["metadata"] += count_tokens(manifest_text, self.model)
        
        # If content is provided, count its tokens
        if content:
            content_tokens = count_tokens(content, self.model)
            self.token_counts["files"]["content"][path] = content_tokens
            self.token_counts["total_content"] += content_tokens
            
        return self
    
    def add_convention(self, name, description):
        """Add a coding convention specific to this project."""
        convention = {"name": name, "description": description}
        self.conventions.append(convention)
        self.token_counts["conventions"] += count_tokens(f"{name}: {description}", self.model)
        return self
    
    def set_environment(self, env_type, details, description=None):
        """Describe the execution environment."""
        self.environment_info[env_type] = {"details": details, "description": description}
        self.token_counts["environment"] += count_tokens(f"{env_type}: {details}", self.model)
        return self
    
    def add_critical_pattern(self, pattern, explanation):
        """Document a pattern that should never be modified."""
        pattern_info = {"pattern": pattern, "explanation": explanation}
        self.critical_patterns.append(pattern_info)
        self.token_counts["patterns"] += count_tokens(f"{pattern}: {explanation}", self.model)
        return self
    
    def generate_xml(self):
        """Generate the XML manifest for the AI assistant."""
        manifest = ['<manifest>']
        
        # Files section with token counts
        files_section = ['<files>']
        for file in self.files:
            path = file['path']
            content_tokens = self.token_counts["files"]["content"].get(path, 0)
            file_type = self._get_file_type(path)
            file_content = [
                f"<path>{path}</path>",
                f"<description>{file['description']}</description>"
            ]
            
            if file_type:
                file_content.append(f"<file_type>{file_type}</file_type>")
                
            if file['key_components']:
                file_content.append(create_xml_element("key_components", [
                    f"<component>{comp}</component>" for comp in file['key_components']
            ]))
                
            file_content.append(f"<tokens>{content_tokens}</tokens>")
            files_section.append(create_xml_element("file", file_content))
        
        files_section.append('</files>')
        manifest.append("\n".join(files_section))
        
        # Environment details as simple detail elements
        if self.environment_info:
            for env_type, env_data in self.environment_info.items():
                if isinstance(env_data, dict):
                    details = env_data["details"]
                    description = env_data.get("description")
                else:
                    # Handle backward compatibility for old format
                    details = env_data
                    description = None
                
                detail_attrs = {"description": description} if description else None
                manifest.append(create_xml_element("detail", details, detail_attrs))
        
        # Conventions section
        if self.conventions:
            conv_section = ['<conventions>']
            for convention in self.conventions:
                conv_section.append(create_xml_element("convention", [
                    f"<name>{convention['name']}</name>",
                    f"<description>{convention['description']}</description>"
                ]))
            conv_section.append('</conventions>')
            manifest.append("\n".join(conv_section))
        
        # Critical patterns section
        if self.critical_patterns:
            patterns_section = ['<critical_patterns>']
            for pattern in self.critical_patterns:
                patterns_section.append(create_xml_element("pattern", [
                    f"<pattern>{pattern['pattern']}</pattern>",
                    f"<explanation>{pattern['explanation']}</explanation>"
                ]))
            patterns_section.append('</critical_patterns>')
            manifest.append("\n".join(patterns_section))
        
        # Token usage summary
        token_section = ['<token_usage>']
        token_section.append(create_xml_element("files", [
            f"<metadata>{self.token_counts['files']['metadata']}</metadata>",
            create_xml_element("content", [
                create_xml_element("file", [
                    f"<path>{path}</path>",
                    f"<tokens>{tokens}</tokens>"
                ]) for path, tokens in self.token_counts["files"]["content"].items()
            ]),
            f"<total>{self.token_counts['total_content']}</total>"
        ]))
        token_section.append('</token_usage>')
        manifest.append("\n".join(token_section))
        
        manifest.append('</manifest>')
        result = "\n".join(manifest)
        # Clean up any remaining double newlines
        result = result.replace('\n\n', '\n')
        return result
    
    def _load_xsd_schema(self):
        """Load the XSD schema file for embedding in the manifest."""
        try:
            # Look for the XSD file in the helpers directory
            # xsd_path = os.path.join(os.path.dirname(__file__), 'pipulate-context.xsd')
            # if not os.path.exists(xsd_path):
            #     # Fallback: look relative to repo root
            #     xsd_path = os.path.join(repo_root, 'helpers', 'pipulate-context.xsd')
            xsd_path = '/home/mike/repos/pipulate/assets/prompts/pipulate-context.xsd'
            
            if os.path.exists(xsd_path):
                with open(xsd_path, 'r', encoding='utf-8') as f:
                    return f.read()
            else:
                print(f"Warning: XSD schema not found at {xsd_path}")
                return None
        except Exception as e:
            print(f"Warning: Could not load XSD schema: {e}")
            return None
    
    def _get_file_type(self, path):
        """Get file type based on extension for better LLM context"""
        ext = os.path.splitext(path)[1].lower()
        if ext:
            lang_map = {
                '.py': 'python',
                '.js': 'javascript',
                '.html': 'html',
                '.css': 'css',
                '.md': 'markdown',
                '.json': 'json',
                '.nix': 'nix',
                '.sh': 'bash',
                '.txt': 'text'
            }
            return lang_map.get(ext, 'text')
        return None
    
    def generate(self):
        """Generate the manifest for the AI assistant (legacy format)."""
        return self.generate_xml()  # Default to XML format

def create_pipulate_manifest(file_paths_with_comments):
    """Create a manifest specific to the Pipulate project."""
    manifest = AIAssistantManifest()
    
    # Track total tokens and processed files to respect limit and avoid duplicates
    total_tokens = 0
    max_tokens = MAX_TOKENS - TOKEN_BUFFER
    processed_files = set()
    result_files = []
    
    # Define the environment
    # Combine Python/Nix environment details
    environment_details = "Python 3.12 in a Nix-managed virtualenv (.venv). Hybrid approach using Nix flakes for system dependencies + pip for Python packages."
    manifest.set_environment("Development Environment", environment_details, description="environment")
    # Add the raw FILES_TO_INCLUDE content for context
    manifest.set_environment("Files Selection", "Below is the raw FILES_TO_INCLUDE content before any processing:\n\n" + FILES_TO_INCLUDE_RAW, description="story")
    
    # Check for missing files and collect them
    missing_files = []
    for file_path, comment in file_paths_with_comments:
        file_path = file_path.strip()
        if not file_path:
            continue
            
        full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
        if not os.path.exists(full_path):
            missing_files.append(full_path)
    
    # If any files are missing, raise an exception with clear details
    if missing_files:
        error_message = "The following files were not found:\n"
        for missing_file in missing_files:
            error_message += f"  - {missing_file}\n"
        error_message += "\nPlease check the file paths in FILES_TO_INCLUDE variable."
        print(error_message)
        sys.exit(1)
    
    # Register key files with their contents and comments
    for file_path, comment in file_paths_with_comments:
        file_path = file_path.strip()
        if not file_path or file_path in processed_files:
            continue
            
        processed_files.add(file_path)
        result_files.append((file_path, comment))  # Store both path and comment
        full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
            
        try:
            with open(full_path, 'r', encoding='utf-8') as f:
                content = f.read()
                content_tokens = count_tokens(content)
                
                # Check if adding this file would exceed token limit
                if total_tokens + content_tokens > max_tokens:
                    description = f"{os.path.basename(file_path)} [skipped: would exceed token limit]"
                    if comment:
                        description += f" - {comment}"
                    manifest.add_file(
                        file_path,
                        description
                    )
                    continue
                
                total_tokens += content_tokens
                description = f"{os.path.basename(file_path)} [loaded]"
                if comment:
                    description += f" - {comment}"
                
                # Add file to manifest
                manifest.add_file(
                    file_path,
                    description,
                    content=content
                )
                
        except UnicodeDecodeError as e:
            print(f"ERROR: Could not decode {full_path}: {e}")
            sys.exit(1)  # Exit with error code for encoding issues
        except Exception as e:
            print(f"ERROR: Could not read {full_path}: {e}")
            sys.exit(1)  # Exit with error code for any other exceptions
    
    # Add conventions and patterns only if we have room
    remaining_tokens = max_tokens - total_tokens
    if remaining_tokens > 1000:  # Arbitrary threshold for metadata
        manifest.add_convention(
            "FastHTML Rendering", 
            "All FastHTML objects must be converted with to_xml() before being returned in HTTP responses"
        )
        
        manifest.add_convention(
            "Environment Activation", 
            "Always run 'nix develop' in new terminals before any other commands"
        )
        
        manifest.add_convention(
            "Dependency Management",
            "System deps go in flake.nix, Python packages in requirements.txt"
        )
        
        manifest.add_critical_pattern(
            "to_xml(ft_object)",
            "Required to convert FastHTML objects to strings before HTTP responses"
        )
        
        manifest.add_critical_pattern(
            "HTMLResponse(str(to_xml(rendered_item)))",
            "Proper pattern for returning FastHTML content with HTMX triggers"
        )
    
    # Return the manifest content WITHOUT the outer <manifest> tags
    manifest_xml = manifest.generate()
    # Remove opening and closing manifest tags to prevent double wrapping
    if manifest_xml.startswith('<manifest>') and manifest_xml.endswith('</manifest>'):
        manifest_xml = manifest_xml[len('<manifest>'):-len('</manifest>')]
    
    return manifest_xml, result_files, total_tokens

# Add a function to copy text to clipboard
def copy_to_clipboard(text):
    """Copy text to system clipboard."""
    try:
        # For Linux
        import subprocess
        process = subprocess.Popen(['xclip', '-selection', 'clipboard'], stdin=subprocess.PIPE)
        process.communicate(text.encode('utf-8'))
        return True
    except Exception as e:
        print(f"Warning: Could not copy to clipboard: {e}")
        print("Make sure xclip is installed (sudo apt-get install xclip)")
        return False

parser = argparse.ArgumentParser(description='Generate an XML context file for AI code assistance.')
parser.add_argument('prompt', nargs='?', default=None,
                    help='An optional prompt string. If omitted, reads from prompt.md.')
parser.add_argument('-o', '--output', type=str, default=None,
                    help='Optional: Output filename.')
parser.add_argument('--no-clipboard', action='store_true',
                    help='Disable copying output to clipboard.')
args = parser.parse_args()

# Get the file list with comments
final_file_list = get_files_with_comments()  # Start with the default list

# Handle prompt file - now with default prompt.md behavior
prompt_path = args.prompt
prompt_content = None
direct_prompt = None  # New variable to store direct string prompts

if prompt_path:
    # Check if the prompt is a file path or direct string
    if os.path.exists(prompt_path):
        # It's a file path
        if not os.path.isabs(prompt_path):
            prompt_path = os.path.join(os.getcwd(), prompt_path)
        
        try:
            with open(prompt_path, 'r', encoding='utf-8') as f:
                prompt_content = f.read()
            # print(f"Using prompt file: {prompt_path}") # MODIFICATION: Commented out
        except Exception as e:
            print(f"Error reading prompt file {prompt_path}: {e}")
            sys.exit(1)
    else:
        # It's a direct string prompt
        direct_prompt = prompt_path  # Store the direct string
        prompt_content = prompt_path
        # print("Using direct string prompt") # MODIFICATION: Commented out
else:
    # If no prompt specified, look for prompt.md in current directory
    prompt_path = os.path.join(os.getcwd(), "prompt.md")
    if os.path.exists(prompt_path):
        try:
            with open(prompt_path, 'r', encoding='utf-8') as f:
                prompt_content = f.read()
            # print(f"Using default prompt file: {prompt_path}") # MODIFICATION: Commented out
        except Exception as e:
            print(f"Error reading default prompt file: {e}")
            sys.exit(1)
    # else: # MODIFICATION: Commented out
        # print("No prompt file specified and prompt.md not found in current directory.")
        # print("Running without a prompt file.")

if prompt_content:
    # Add prompt file to files list if not already present
    if prompt_path and os.path.exists(prompt_path):
        # Check if this file path is already in the list
        existing_paths = [file_path for file_path, comment in final_file_list]
        if prompt_path not in existing_paths:
            final_file_list.append((prompt_path, "User prompt file"))

# Set the output filename
output_filename = args.output

# Set the pre and post prompts from the active template
pre_prompt = active_template["pre_prompt"]
post_prompt = active_template["post_prompt"]

# Override post_prompt with direct string if provided
if direct_prompt:
    post_prompt = direct_prompt

# print(f"Using template: {active_template['name']}") # MODIFICATION: Commented out

# Create the manifest and incorporate user's pre_prompt
manifest_xml, processed_files, manifest_tokens = create_pipulate_manifest(final_file_list)
manifest = manifest_xml

# Add the pre-prompt and separator (without duplicating manifest)
lines = [pre_prompt]
lines.append("=" * 20 + " START CONTEXT " + "=" * 20)
total_tokens = count_tokens(pre_prompt, "gpt-4")

# Process each file in the manifest file list
for file_path, comment in processed_files:
    full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
    
    # Original detailed mode with markers, now including comments
    comment_suffix = f" -- {comment}" if comment else ""
    start_marker = f"# <<< START FILE: {full_path}{comment_suffix} >>>"
    end_marker = f"# <<< END FILE: {full_path}{comment_suffix} >>>"
    
    lines.append(start_marker)
    try:
        with open(full_path, 'r', encoding='utf-8') as infile:
            file_content = infile.read()
            file_tokens = count_tokens(file_content, "gpt-4")
            token_info = f"\n# File token count: {format_token_count(file_tokens)}"
            if comment:
                token_info += f"\n# File purpose: {comment}"
            lines.append(file_content + token_info)
    except Exception as e:
        error_message = f"# --- ERROR: Could not read file {full_path}: {e} ---"
        print(f"ERROR: Could not read file {full_path}: {e}")
        sys.exit(1)  # Exit with error code
    
    lines.append(end_marker)

# Add a separator and the post-prompt
lines.append("=" * 20 + " END CONTEXT " + "=" * 20)
post_prompt_tokens = count_tokens(post_prompt, "gpt-4")
if total_tokens + post_prompt_tokens <= MAX_TOKENS - TOKEN_BUFFER:
    total_tokens += post_prompt_tokens
    lines.append(post_prompt)
else:
    print("Warning: Post-prompt skipped as it would exceed token limit")

# Calculate the final token count
def calculate_total_tokens(files_tokens, prompt_tokens):
    """Calculate total tokens and component breakdowns"""
    file_tokens = sum(files_tokens.values())
    total = file_tokens + prompt_tokens
    return {
        "files": file_tokens,
        "prompt": prompt_tokens,
        "total": total
    }

def calculate_total_words(files_words, prompt_words):
    """Calculate total words and component breakdowns"""
    file_words = sum(files_words.values())
    total = file_words + prompt_words
    return {
        "files": file_words,
        "prompt": prompt_words,
        "total": total
    }

# Calculate total tokens and words with proper accounting
files_tokens_dict = {}
files_words_dict = {}
for file_path, comment in processed_files:
    try:
        full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
        with open(full_path, 'r', encoding='utf-8') as f:
            content = f.read()
        files_tokens_dict[file_path] = count_tokens(content, "gpt-4")
        files_words_dict[file_path] = count_words(content)
    except Exception as e:
        print(f"ERROR: Could not count tokens/words for {file_path}: {e}")
        sys.exit(1)  # Exit with error code

# Calculate prompt tokens and words
pre_prompt_tokens = count_tokens(pre_prompt, "gpt-4")
post_prompt_tokens = count_tokens(post_prompt, "gpt-4") 
prompt_tokens = pre_prompt_tokens + post_prompt_tokens

pre_prompt_words = count_words(pre_prompt)
post_prompt_words = count_words(post_prompt)
prompt_words = pre_prompt_words + post_prompt_words

# Calculate totals
token_counts = calculate_total_tokens(files_tokens_dict, prompt_tokens)
word_counts = calculate_total_words(files_words_dict, prompt_words)

# Update the token summary in the output
token_summary_content = [
    f"<total_context_size>{format_token_count(token_counts['total'])}</total_context_size>",
    f"<files_tokens>{format_token_count(token_counts['files'])}</files_tokens>",
    f"<prompt_tokens>{format_token_count(prompt_tokens)}</prompt_tokens>"
]

output_xml = (f'<?xml version="1.0" encoding="UTF-8"?>\n'
              f'<context schema="pipulate-context" version="1.0">\n'
              f'{create_xml_element("manifest", manifest)}\n'
              f'{create_xml_element("pre_prompt", pre_prompt)}\n'
              f'{create_xml_element("content", chr(10).join(lines))}\n'
              f'{create_xml_element("post_prompt", post_prompt)}\n'
              f'{create_xml_element("token_summary", token_summary_content)}\n'
              f'</context>')

# ============================================================================
# MODIFIED OUTPUT SECTION
# ============================================================================

# --- Files Included Section ---
print("--- Files Included ---")
for file_path, comment in processed_files:
    full_path = os.path.join(repo_root, file_path) if not os.path.isabs(file_path) else file_path
    token_count = files_tokens_dict.get(file_path, 0)
    print(f"• {full_path} ({format_token_count(token_count)})")
print()

# --- Token Summary Section ---
print("--- Token Summary ---")
print(f"Total tokens: {format_token_count(token_counts['total'])}")
if word_counts['total'] is not None:
    print(f"Total words: {format_word_count(word_counts['total'])}")

    # --- Size Perspective Section ---
    size_info = format_size_with_comparison(word_counts['total'], token_counts['total'])
    print(f"\n--- Size Perspective ---")
    print(f"📝 Content size: {size_info['word_comparison']}")
    print(f"🤖 Token size: {size_info['token_comparison']}")

    # Calculate and show token-to-word ratio
    ratio = token_counts['total'] / word_counts['total'] if word_counts['total'] > 0 else 0
    print(f"📊 Token-to-word ratio: {ratio:.2f} (higher = more technical/structured content)")

print()

# Write the complete XML output to the file
if args.output:
    try:
        with open(args.output, 'w', encoding='utf-8') as outfile:
            outfile.write(output_xml)
        print(f"Output written to '{args.output}'")
    except Exception as e:
        print(f"Error writing to '{args.output}': {e}")

# By default, copy the output to clipboard unless --no-clipboard is specified
if not args.no_clipboard:
    if copy_to_clipboard(output_xml):
        print("Output copied to clipboard")

# print("\nScript finished.") # MODIFICATION: Commented out

# When creating the final output, use direct_prompt if available
if direct_prompt:
    post_prompt = direct_prompt  # Use the direct string instead of template XML

But the most import aspect here is the promotion of prompt_foo.md up to the repo root level. This is going to put it in my face all the time during development reminding me that I can and should use this technique. It will always at least allow me to get easy 2nd opinions to whatever is going on in whatever AI-assisted code editor I’m using, be it an actual editor or command-line by packaging all the context up to go… to go anywhere. But right now only to Gemini 2.5 Pro because that’s the only one capable of taking what is now appearing to be 140K tokens of the GPT-4 sized variety in one go.

My next step is going to be reworking tree to probably use eza --tree --git-ignore so I don’t need such a big filter on tree. I’ll also probably look at the output of prompt_foo.py more directly now to tend to that tree, trimming it and maybe cutting off the cruft/ folder finally. And a last thought is to also package in the output of Pyreverse (part of Pylint) or py2puml along with the tree to give staggering context to the AI code assistants and really compelling reasons to trim the surface area of the repo. I’m thinking I may even take all those feather svg’s out and only wrap the ones into config.py on an as-needed basis.

The New Cornerstone: A Streamlined `prompt_foo.py`

So prompt_foo.py is both about better AI coding assistance in a platform independent way and helping both humans and AIs gain a better more holistic overview of a git repo’s contents, dependencies and such.

And all of this really is geared towards speeding up iterative cycles by always making the code cleaner, shorter, be wrapped in better mental models as reflected by the filenames, directories and file arrangement. I have to be able to slam out this app or that tool based on this work.

Book Analysis

Ai Editorial Take

This entry is a perfect microcosm of the modern developer’s journey. It brilliantly connects the high-level, philosophical need for tool independence with the low-level, practical details of repository management. The concept of ‘AI ergonomics’ is particularly insightful and timely, offering a new vocabulary for an emerging best practice. The narrative arc—from identifying a problem, to executing a solution, to iterating on that solution after an AI interaction—is compelling and educational. This piece isn’t just a technical log; it’s a story about adapting, simplifying, and reclaiming control in the age of AI-assisted coding.

Title Brainstorm

Title Option: Refactoring for AI Ergonomics
- Filename: refactoring-for-ai-ergonomics.md
- Rationale: This title introduces and centers the author’s key concept of ‘AI ergonomics,’ which is the core theme of the entry.
Title Option: The Universal API: Future-Proofing with the Command Line
- Filename: universal-api-command-line.md
- Rationale: Focuses on the philosophical underpinning of the entry—that the CLI and Unix pipes are the most durable and portable interface for developers.
Title Option: A Live Refactor: Tidying a Codebase for Human and AI Clarity
- Filename: live-refactor-human-ai-clarity.md
- Rationale: Highlights the ‘show, don’t tell’ nature of the entry, emphasizing the practical, hands-on demonstration of repository cleanup.
Title Option: Escaping the Vendor Trap: A CLI-First Development Workflow
- Filename: escaping-vendor-trap-cli-workflow.md
- Rationale: Frames the entry around the timely and relatable problem of avoiding dependency on proprietary development environments.

Content Potential And Polish

Core Strengths:
- Authentic Problem-Solving: The entry provides a genuine, unvarnished look at a developer’s thought process, including missteps and corrections.
- Practical Demonstration: It moves from high-level concepts like vendor lock-in to concrete git mv and ls commands, making the process tangible.
- Novel Concept: The idea of “AI ergonomics” is a fresh and valuable frame for thinking about modern code repository management.
- Code as Artifact: Including the final, simplified Python script provides a powerful payoff and a useful tool for the reader.
Suggestions For Polish:
- Abstract the Specifics: For a broader audience, consider replacing hardcoded paths like /home/mike/repos/pipulate with placeholders like ~/path/to/your-repo in the explanatory text.
- Visualize the Change: Add a simple ‘before’ and ‘after’ directory tree diagram (perhaps using ASCII or a simple graphic) to visually reinforce the impact of the refactoring.
- Add a Glossary Box: Briefly define tools like ripgrep, xargs, sed, and eza for readers who may not be CLI power users.
- Structure the AI Interaction: Clearly demarcate the prompts given to the AI and the (flawed) responses to better illustrate the feedback loop.

Next Step Prompts

Based on the ‘before’ state implied by the git mv helpers/ scripts command and the final ls -la output, generate two ASCII art directory trees representing the Pipulate repository structure ‘Before’ and ‘After’ the refactoring.
Analyze the final prompt_foo.py script and create a Mermaid.js sequence diagram illustrating the primary workflow: a user runs the script from the command line, it loads foo_files.py, reads the prompt.md, packages the content into an XML payload, and copies it to the clipboard.

Context for the Curious Book Reader

Technical Journal Entry Begins

Dodging the Vendor Lock-In Trap

The Universal API: Unix Pipes and Ripgrep

From Repository Chaos to AI Clarity

A Live Refactor in the Terminal

A Botched Assist and a Manual Correction

📝 Implementation Plan

💻 Updated prompt_foo.py

📝 Implementation Plan

Paring Down to the Essential Workflow

The New Cornerstone: A Streamlined prompt_foo.py

Book Analysis

Ai Editorial Take

Title Brainstorm

Content Potential And Polish

Next Step Prompts

💻 Updated `prompt_foo.py`

The New Cornerstone: A Streamlined `prompt_foo.py`