Setting the Stage: Context for the Curious Book Reader
The Future of Context Artistry
The article introduces a novel concept: “Context is the new King — not Content.” This piece delves into the art of “context painting” for AI, where human storytelling meets programmatic precision. Discover how a custom tool, prompt_foo.py
, ensures token accuracy and facilitates a unique feedback loop for AI development.
Technical Journal Entry Begins
Context is the new King — not Content.
And painting context in an artistically painterly way is the new skill.
And that is a future-proofed skill because all you’re doing is regaling them with stories with the age-old skill of the storyteller around the campfire. I shit you not. Storytelling for AIs is THE thing. You’ll see that in action on the post after this one, layering up the story bit by bit.
The Art of Context and Token Control
But speaking of bits, as the Context Artist, I need to keep track of every
little bit, because bite by byte they add up to bytes. I’m talkin token;
OpenAI’s TikToken in particular. And of
those, the gpt-4 sized ones which seem massively popular, which when you pull
off the mask is really o200k_base
in disguise. These are tokenized chunks of
text larger than a letter but smaller than a word. So if you want to really make
an LLM look twice, you take a word like:
flibbertigibbet
…and another like:
snickerdoodle
…and then you combine them in Ned Flanders-speak (The Simpsons) and maybe layer in some Ned Ryerson (Groundhog Day) for:
flibbertisnickerdoodledidoo
…and you have some token combinations the LLM has likely never seen. It’s the first time in existence, but still has highly infrequent yet still not totally nonexistent token components, so it can’t be totally filtered as banging randomly on the keyboard. It’s a shocker.
We keep this in mind when getting coding assistance, especially in such things as naming our programs and our variable and object names such as the one I’m about to get help on here in tightening up my token control.
The Challenge: Real-time Token Verification
Me: The token count isn’t always accurate at the very end. Add something at the end
which uses tiktoken
to get the total token count of what gets copied in the
copy buffer and optionally written out with the -o
parameter. In other words,
I want the console output to help me compare the totaled-up so far token count
with the real token count. Does that make sense? Please and thank you!
Implementing the Ground Truth Token Count
prompt_foo.py:
#!/usr/bin/env python3
# prompt_foo.py
import os
import re
import sys
import pydot
import yaml
import argparse
import tiktoken
import subprocess
import tempfile
import shutil
from datetime import datetime
from typing import Dict, List, Optional, Tuple
try:
import jupytext
JUPYTEXT_AVAILABLE = True
except ImportError:
JUPYTEXT_AVAILABLE = False
# Hello there, AI! This is a tool for generating a single, comprehensive prompt
# from the command line, bundling codebase files and auto-generated context
# into a structured Markdown format for effective AI assistance.
# ============================================================================
# --- Configuration ---
# ============================================================================
def find_repo_root(start_path: str) -> str:
"""Find the git repository root from a starting path."""
path = os.path.abspath(start_path)
while path != os.path.dirname(path):
if os.path.isdir(os.path.join(path, '.git')):
return path
path = os.path.dirname(path)
raise FileNotFoundError("Could not find the repository root (.git directory).")
REPO_ROOT = find_repo_root(os.path.dirname(__file__))
CONFIG = {
"PROJECT_NAME": "pipulate",
"POSTS_DIRECTORY": "/home/mike/repos/MikeLev.in/_posts"
}
# ============================================================================
# --- Literary Size Scale & Token/Word Counting ---
# ============================================================================
LITERARY_SIZE_SCALE = [
(3000, "Short Essay"), (7500, "Short Story"), (20000, "Novelette"),
(50000, "Novella or a Master's Dissertation"),
(80000, "Average Paperback Novel or a Ph.D. Dissertation"),
(120000, "Long Novel"), (200000, "Epic Fantasy Novel"),
(500000, "Seriously Long Epic (like 'Infinite Jest')"),
]
def get_literary_perspective(word_count: int, token_word_ratio: float) -> str:
description = f"Longer than {LITERARY_SIZE_SCALE[-1][1]}"
for words, desc in LITERARY_SIZE_SCALE:
if word_count <= words:
description = desc
break
density_warning = ""
if token_word_ratio > 1.8:
density_warning = (
f" (Note: With a token/word ratio of {token_word_ratio:.2f}, "
f"this content is far denser and more complex than typical prose of this length)."
)
return f"📚 Equivalent in length to a **{description}**{density_warning}"
def count_tokens(text: str, model: str = "gpt-4o") -> int:
try:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
except Exception:
return len(text.split())
def count_words(text: str) -> int:
return len(text.split())
# ============================================================================
# --- Auto-Context Generation (UML, Tree, Narrative) ---
# ============================================================================
def generate_uml_and_dot(target_file: str, project_name: str) -> Dict:
# This function remains unchanged.
pyreverse_exec = shutil.which("pyreverse")
plantuml_exec = shutil.which("plantuml")
if not pyreverse_exec or not plantuml_exec:
msg = []
if not pyreverse_exec: msg.append("`pyreverse` (from pylint)")
if not plantuml_exec: msg.append("`plantuml`")
return {"ascii_uml": f"Skipping: Required command(s) not found: {', '.join(msg)}."}
target_path = os.path.join(REPO_ROOT, target_file)
if not os.path.exists(target_path):
return {"ascii_uml": f"Skipping: Target file for UML generation not found: {target_path}"}
with tempfile.TemporaryDirectory() as temp_dir:
dot_file_path = os.path.join(temp_dir, "classes.dot")
puml_file_path = os.path.join(temp_dir, "diagram.puml")
try:
pyreverse_cmd = [pyreverse_exec, "-f", "dot", "-o", "dot", "-p", project_name, target_path]
subprocess.run(pyreverse_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
generated_dot_name = f"classes_{project_name}.dot"
os.rename(os.path.join(temp_dir, generated_dot_name), dot_file_path)
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: pyreverse failed. {error_msg}", "dot_graph": None}
try:
graphs = pydot.graph_from_dot_file(dot_file_path)
if not graphs:
return {"ascii_uml": f"Note: No classes found in {target_file} to generate a diagram.", "dot_graph": None}
graph = graphs[0]
dot_content = graph.to_string()
puml_lines = ["@startuml", "skinparam linetype ortho", ""]
def sanitize_line(line):
clean = re.sub(r'<br[^>]*>', '', line)
clean = re.sub(r'<[^>]+>', '', clean)
return clean.strip()
for node in graph.get_nodes():
label = node.get_label()
if not label: continue
parts = label.strip('<>{} ').split('|')
class_name = sanitize_line(parts[0])
puml_lines.append(f"class {class_name} {{")
if len(parts) > 1:
for attr in re.split(r'<br[^>]*>', parts[1]):
clean_attr = sanitize_line(attr).split(':')[0].strip()
if clean_attr: puml_lines.append(f" - {clean_attr}")
if len(parts) > 2:
method_block = parts[2].strip()
for method_line in re.split(r'<br[^>]*>', method_block):
clean_method = sanitize_line(method_line)
if clean_method: puml_lines.append(f" + {clean_method}")
puml_lines.append("}\n")
for edge in graph.get_edges():
source_name = edge.get_source().strip('"').split('.')[-1]
dest_name = edge.get_destination().strip('"').split('.')[-1]
puml_lines.append(f"{source_name} ..> {dest_name}")
puml_lines.append("@enduml")
with open(puml_file_path, 'w') as f: f.write('\n'.join(puml_lines))
except Exception as e:
with open(dot_file_path, 'r') as f: dot_content_on_error = f.read()
return {"ascii_uml": f"Error: DOT to PUML conversion failed. {str(e)}", "dot_graph": dot_content_on_error}
try:
plantuml_cmd = ["plantuml", "-tutxt", puml_file_path]
subprocess.run(plantuml_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
utxt_file_path = puml_file_path.replace(".puml", ".utxt")
with open(utxt_file_path, 'r') as f: ascii_uml = f.read()
lines = ascii_uml.splitlines()
non_empty_lines = [line for line in lines if line.strip()]
if non_empty_lines:
min_indent = min(len(line) - len(line.lstrip(' ')) for line in non_empty_lines)
dedented_lines = [line[min_indent:] for line in lines]
stripped_lines = [line.rstrip() for line in dedented_lines]
ascii_uml = '\n'.join(stripped_lines)
if ascii_uml: ascii_uml = '\n' + ascii_uml
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: plantuml failed. {error_msg}", "dot_graph": dot_content}
return {"ascii_uml": ascii_uml, "dot_graph": dot_content}
def _get_article_list_data(posts_dir: str = CONFIG["POSTS_DIRECTORY"]) -> List[Dict]:
posts_data = []
if not os.path.isdir(posts_dir):
print(f"Warning: Article directory not found at {posts_dir}", file=sys.stderr)
return []
for filename in os.listdir(posts_dir):
if not filename.endswith((".md", ".markdown")): continue
filepath = os.path.join(posts_dir, filename)
try:
date_str = filename[:10]
post_date = datetime.strptime(date_str, '%Y-%m-%d').date()
with open(filepath, 'r', encoding='utf-8') as f: content = f.read()
if content.startswith('---'):
parts = content.split('---', 2)
front_matter = yaml.safe_load(parts[1]) or {}
posts_data.append({
'path': filepath, 'date': post_date,
'sort_order': int(front_matter.get('sort_order', 0)),
'title': front_matter.get('title', 'Untitled'),
'summary': front_matter.get('meta_description', '')
})
except (ValueError, yaml.YAMLError, IndexError): continue
return sorted(posts_data, key=lambda p: (p['date'], p['sort_order']))
def parse_slice_arg(arg_str: str):
if not arg_str or not arg_str.startswith('[') or not arg_str.endswith(']'): return None
content = arg_str[1:-1].strip()
if ':' in content:
parts = content.split(':', 1)
start = int(parts[0].strip()) if parts[0].strip() else None
end = int(parts[1].strip()) if parts[1].strip() else None
return slice(start, end)
elif content: return int(content)
return slice(None, None)
def run_tree_command() -> str:
eza_exec = shutil.which("eza")
if not eza_exec: return "Skipping: `eza` command not found."
try:
result = subprocess.run(
[eza_exec, '--tree', '--git-ignore', '--color=never'],
capture_output=True, text=True, cwd=REPO_ROOT, check=True
)
return result.stdout
except Exception as e: return f"Error running eza command: {e}"
# ============================================================================
# --- Helper Functions (File Parsing, Clipboard) ---
# ============================================================================
def parse_file_list_from_config() -> List[Tuple[str, str]]:
try:
import foo_files
files_raw = foo_files.FILES_TO_INCLUDE_RAW
except (ImportError, AttributeError):
print("ERROR: foo_files.py not found or doesn't contain FILES_TO_INCLUDE_RAW.")
sys.exit(1)
lines = files_raw.strip().splitlines()
seen_files, parsed_files = set(), []
for line in lines:
line = line.strip()
if not line or line.startswith('#'): continue
parts = re.split(r'\s*<--\s*|\s*#\s*', line, 1)
file_path = parts[0].strip()
comment = parts[1].strip() if len(parts) > 1 else ""
if file_path and file_path not in seen_files:
seen_files.add(file_path)
parsed_files.append((file_path, comment))
return parsed_files
def copy_to_clipboard(text: str):
if not shutil.which('xclip'):
print("\nWarning: 'xclip' not found. Cannot copy to clipboard.")
return
try:
subprocess.run(['xclip', '-selection', 'clipboard'], input=text.encode('utf-8'), check=True)
print("Markdown output copied to clipboard")
except Exception as e:
print(f"\nWarning: Could not copy to clipboard: {e}")
def check_dependencies():
"""Verifies that all required external command-line tools are installed."""
print("Checking for required external dependencies...")
dependencies = {
"pyreverse": "Provided by `pylint`. Install with: pip install pylint",
"plantuml": "A Java-based tool. See https://plantuml.com/starting",
"eza": "A modern replacement for `ls`. See https://eza.rocks/install",
"xclip": "Clipboard utility for Linux. Install with your package manager (e.g., sudo apt-get install xclip)",
}
missing = []
for tool, instructions in dependencies.items():
if not shutil.which(tool):
missing.append((tool, instructions))
if not missing:
print("✅ All dependencies found.")
else:
print("\n❌ Missing dependencies detected:")
for tool, instructions in missing:
print(f" - Command not found: `{tool}`")
print(f" ↳ {instructions}")
print("\nPlease install the missing tools and ensure they are in your system's PATH.")
sys.exit(1)
# ============================================================================
# --- Refined PromptBuilder Class ---
# ============================================================================
class PromptBuilder:
"""
Builds a complete, structured Markdown prompt with consistent START/END markers
for each section, including Manifest, Story, File Tree, UMLs, Codebase, and Prompt.
"""
def __init__(self, processed_files: List[Dict], prompt_text: str, context_only: bool = False, list_arg: Optional[str] = None):
self.processed_files = processed_files
self.prompt_text = prompt_text
self.context_only = context_only
self.list_arg = list_arg
self.auto_context = {}
self.all_sections = {}
def add_auto_context(self, title: str, content: str):
is_narrative = (title == "Recent Narrative Context")
is_article = (title == "Full Article Content") ### NEW/MODIFIED ###
content_is_valid = bool(content)
filter_passed = "error" not in content.lower() and "skipping" not in content.lower()
if content_is_valid and (is_narrative or is_article or filter_passed): ### NEW/MODIFIED ###
self.auto_context[title] = {
'content': content, 'tokens': count_tokens(content), 'words': count_words(content)
}
def _build_manifest_content(self) -> str:
lines = []
### NEW/MODIFIED ###
section_order = ["Story", "File Tree", "UML Diagrams", "Articles", "Codebase", "Prompt"]
for section_name in section_order:
if section_name in self.all_sections:
data = self.all_sections[section_name]
token_str = f"({data['tokens']:,} tokens)" if data['tokens'] > 0 else ""
lines.append(f"- {section_name} {token_str}")
return "\n".join(lines)
def _build_story_content(self) -> str:
title = "Recent Narrative Context"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_tree_content(self) -> str:
title = "Codebase Structure (eza --tree)"
if title in self.auto_context:
content = self.auto_context[title]['content'].strip()
return f"```text\n{content}\n```"
return ""
def _build_uml_content(self) -> str:
uml_parts = []
for title, data in self.auto_context.items():
if "UML Class Diagram" in title:
uml_parts.append(f"## {title}\n```text\n{data['content']}\n```")
return "\n\n".join(uml_parts)
### NEW/MODIFIED ###
def _build_articles_content(self) -> str:
title = "Full Article Content"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_codebase_content(self) -> str:
if self.context_only: return ""
if not self.processed_files: return ""
lines = []
for f in self.processed_files:
lines.append(f"--- START: {f['path']} ({f['tokens']:,} tokens) ---")
lines.append(f"```{f['lang']}:{f['path']}")
lines.append(f['content'])
lines.append("```")
lines.append(f"--- END: {f['path']} ---\n")
return "\n".join(lines).strip()
def _build_prompt_content(self) -> str:
checklist = self._generate_ai_checklist()
return f"{checklist}\n\n{self.prompt_text}"
def _generate_ai_checklist(self) -> str:
return '''# AI Self-Correction Checklist
Before addressing the user's prompt, perform the following verification steps:
1. **Review the Manifest vs. Codebase Structure:** Compare the file list in the manifest against the `eza --tree` output. Are there any obviously related, critical files missing from the manifest that would be necessary to understand the user's request? (e.g., if `core.py` is present, is `__init__.py` also present?).
2. **Analyze Import Statements:** Briefly scan the `import` statements in the provided Python files. Do they suggest dependencies on local files that were *not* included in the manifest?
3. **Check for Incompleteness:** If you determine that critical files are missing, do not proceed. Instead, your **primary task** is to inform me of the missing files and recommend adding them to `foo_files.py` to create a more complete context.
4. **Confirm Understanding:** If the context appears complete, state "Context verified." and then proceed with the user's request.
'''
def build_final_prompt(self) -> str:
"""Assembles all parts into the final, structured Markdown string."""
# Prepare content for all sections
story_content = self._build_story_content()
tree_content = self._build_tree_content()
uml_content = self._build_uml_content()
articles_content = self._build_articles_content() ### NEW/MODIFIED ###
codebase_content = self._build_codebase_content()
prompt_content = self._build_prompt_content()
# Define placeholder messages
### NEW/MODIFIED ###
placeholders = {
"Story": f"# Narrative context not requested. Use the -l or --list flag to include recent articles.",
"File Tree": "# File tree generation failed or was skipped.",
"UML Diagrams": "# No Python files with classes were included, or UML generation failed.",
"Articles": "# No full articles requested. Use the -a or --article flag to include full article content.",
"Codebase": ("# No files were specified for inclusion in the codebase." if not self.processed_files
else "# Running in --context-only mode. File contents are omitted."),
}
# Store final content and tokens for the manifest calculation
self.all_sections["Story"] = {'content': story_content, 'tokens': count_tokens(story_content)}
self.all_sections["File Tree"] = {'content': tree_content, 'tokens': count_tokens(tree_content)}
self.all_sections["UML Diagrams"] = {'content': uml_content, 'tokens': count_tokens(uml_content)}
self.all_sections["Articles"] = {'content': articles_content, 'tokens': count_tokens(articles_content)} ### NEW/MODIFIED ###
self.all_sections["Codebase"] = {'content': codebase_content, 'tokens': sum(f['tokens'] for f in self.processed_files) if not self.context_only else 0}
self.all_sections["Prompt"] = {'content': prompt_content, 'tokens': count_tokens(prompt_content)}
manifest_content = self._build_manifest_content()
self.all_sections["Manifest"] = {'content': manifest_content, 'tokens': count_tokens(manifest_content)}
# Assemble the final output string with START/END markers for all sections
### NEW/MODIFIED ###
parts = ["# KUNG FU PROMPT CONTEXT\n\nWhat you will find below is:\n\n- Manifest\n- Story\n- File Tree\n- UML Diagrams\n- Articles\n- Codebase\n- Prompt"]
def add_section(name, content, placeholder):
final_content = content.strip() if content and content.strip() else placeholder
parts.append(f"--- START: {name} ---\n{final_content}\n--- END: {name} ---")
add_section("Manifest", manifest_content, "# Manifest generation failed.")
story_placeholder = placeholders["Story"] if self.list_arg is None else "# No articles found for the specified slice."
add_section("Story", story_content, story_placeholder)
add_section("File Tree", tree_content, placeholders["File Tree"])
add_section("UML Diagrams", uml_content, placeholders["UML Diagrams"])
add_section("Articles", articles_content, placeholders["Articles"]) ### NEW/MODIFIED ###
add_section("Codebase", codebase_content, placeholders["Codebase"])
add_section("Prompt", prompt_content, "# No prompt was provided.")
return "\n\n".join(parts)
def print_summary(self):
"""Calculates and prints an accurate, comprehensive summary to the console."""
print("--- Files Included ---")
for f in self.processed_files:
if self.context_only:
print(f"• {f['path']} (content omitted)")
else:
print(f"• {f['path']} ({f['tokens']:,} tokens)")
if self.auto_context:
print("\n--- Auto-Context Included ---")
for title, data in self.auto_context.items():
print(f"• {title} ({data['tokens']:,} tokens)")
print("\n--- Prompt Summary ---")
if self.context_only:
print("NOTE: Running in --context-only mode. File contents are excluded.")
total_tokens = sum(v.get('tokens', 0) for v in self.all_sections.values())
total_words = sum(count_words(v.get('content', '')) for v in self.all_sections.values())
print(f"Total Tokens: {total_tokens:,}")
print(f"Total Words: {total_words:,}")
ratio = total_tokens / total_words if total_words > 0 else 0
perspective = get_literary_perspective(total_words, ratio)
print("\n--- Size Perspective ---")
print(perspective)
print()
# ============================================================================
# --- Main Execution Logic ---
# ============================================================================
def main():
"""Main function to parse args, process files, and generate output."""
parser = argparse.ArgumentParser(description='Generate a Markdown context file for AI code assistance.')
parser.add_argument('prompt', nargs='?', default=None, help='A prompt string or path to a prompt file (e.g., prompt.md).')
parser.add_argument('-o', '--output', type=str, help='Optional: Output filename.')
parser.add_argument('--no-clipboard', action='store_true', help='Disable copying output to clipboard.')
parser.add_argument('--check-dependencies', action='store_true', help='Verify that all required external tools are installed.')
parser.add_argument('--context-only', action='store_true', help='Generate a context-only prompt without file contents.')
parser.add_argument(
'-l', '--list',
nargs='?', const='[-5:]', default=None,
help='Include a list of recent articles. Optionally provide a slice, e.g., "[:]". Defaults to "[-5:]".'
)
### NEW/MODIFIED ###
parser.add_argument(
'-a', '--article',
nargs='?', const='[-1:]', default=None,
help='Include FULL CONTENT of recent articles. Provide a slice, e.g., "[-5:]". Defaults to "[-1:]".'
)
args = parser.parse_args()
if args.check_dependencies:
check_dependencies()
sys.exit(0)
# 1. Handle user prompt
prompt_content = "Please review the provided context and assist with the codebase."
if args.prompt:
if os.path.exists(args.prompt):
with open(args.prompt, 'r', encoding='utf-8') as f: prompt_content = f.read()
else:
prompt_content = args.prompt
elif os.path.exists("prompt.md"):
with open("prompt.md", 'r', encoding='utf-8') as f: prompt_content = f.read()
# 2. Process all specified files
files_to_process = parse_file_list_from_config()
processed_files_data = []
print("--- Processing Files ---")
for path, comment in files_to_process:
full_path = os.path.join(REPO_ROOT, path) if not os.path.isabs(path) else path
if not os.path.exists(full_path):
print(f"Warning: File not found and will be skipped: {full_path}")
continue
content, lang = "", "text"
ext = os.path.splitext(path)[1].lower()
if ext == '.ipynb':
if JUPYTEXT_AVAILABLE:
print(f" -> Converting notebook: {path}")
try:
notebook = jupytext.read(full_path)
content = jupytext.writes(notebook, fmt='py:percent')
lang = 'python'
except Exception as e:
content = f"# FAILED TO CONVERT NOTEBOOK: {path}\n# ERROR: {e}"
print(f"Warning: Failed to convert {path}: {e}")
else:
content = f"# SKIPPING NOTEBOOK CONVERSION: jupytext not installed for {path}"
print(f"Warning: `jupytext` library not found. Skipping conversion for {path}.")
else:
try:
with open(full_path, 'r', encoding='utf-8') as f: content = f.read()
lang_map = {'.py': 'python', '.js': 'javascript', '.html': 'html', '.css': 'css', '.md': 'markdown', '.json': 'json', '.nix': 'nix', '.sh': 'bash'}
lang = lang_map.get(ext, 'text')
except Exception as e:
print(f"ERROR: Could not read or process {full_path}: {e}")
sys.exit(1)
processed_files_data.append({
"path": path, "comment": comment, "content": content,
"tokens": count_tokens(content), "words": count_words(content), "lang": lang
})
# 3. Build the prompt and add auto-generated context
builder = PromptBuilder(processed_files_data, prompt_content, context_only=args.context_only, list_arg=args.list)
print("\n--- Generating Auto-Context ---")
print("Generating codebase tree diagram...", end='', flush=True)
tree_output = run_tree_command()
title = "Codebase Structure (eza --tree)"
builder.add_auto_context(title, tree_output)
print(f" ({builder.auto_context.get(title, {}).get('tokens', 0):,} tokens)")
if args.list is not None:
print("Adding narrative context from articles...", end='', flush=True)
all_articles = _get_article_list_data()
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.list)
if isinstance(slice_or_index, int): sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice): sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.list}')")
sliced_articles = []
if sliced_articles:
narrative_content = "\n".join(
f"### {article['title']} ({article['date']})\n> {article['summary']}\n"
for article in sliced_articles
)
builder.add_auto_context("Recent Narrative Context", narrative_content)
print(f" ({len(sliced_articles)} articles)")
else:
print(" (no articles found or invalid slice)")
### NEW/MODIFIED ###
if args.article is not None:
print("Adding full article content...", end='', flush=True)
all_articles = _get_article_list_data()
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.article)
if isinstance(slice_or_index, int):
sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice):
sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.article}')")
if sliced_articles:
full_content_parts = []
for article in sliced_articles:
try:
with open(article['path'], 'r', encoding='utf-8') as f:
content = f.read()
# Add a header for each article to separate them clearly
full_content_parts.append(f"--- START: Article: {os.path.basename(article['path'])} ---\n{content.strip()}\n--- END: Article ---\n")
except Exception as e:
print(f"\nWarning: Could not read article {article['path']}: {e}")
if full_content_parts:
full_article_content = "\n".join(full_content_parts)
builder.add_auto_context("Full Article Content", full_article_content)
print(f" ({len(sliced_articles)} full articles)")
else:
print(" (no articles found or invalid slice)")
python_files_to_diagram = [f['path'] for f in processed_files_data if f['path'].endswith('.py')]
if python_files_to_diagram:
print("Python file(s) detected. Generating UML diagrams...")
for py_file_path in python_files_to_diagram:
print(f" -> Generating for {py_file_path}...", end='', flush=True)
uml_context = generate_uml_and_dot(py_file_path, CONFIG["PROJECT_NAME"])
uml_content = uml_context.get("ascii_uml")
title = f"UML Class Diagram (ASCII for {py_file_path})"
builder.add_auto_context(title, uml_content)
if title in builder.auto_context:
print(f" ({builder.auto_context[title]['tokens']:,} tokens)")
elif uml_content and "note: no classes" in uml_content.lower():
print(" (skipped, no classes)")
else:
print(" (skipped)")
print("...UML generation complete.\n")
# 4. Generate final output and print summary
final_output = builder.build_final_prompt()
builder.print_summary()
# 5. Handle output
if args.output:
with open(args.output, 'w', encoding='utf-8') as f: f.write(final_output)
print(f"Output written to '{args.output}'")
if not args.no_clipboard:
copy_to_clipboard(final_output)
if __name__ == "__main__":
main()
…and an example foo_files.py:
FILES_TO_INCLUDE_RAW = """\
/home/mike/repos/pipulate/flake.nix
/home/mike/repos/pipulate/.gitignore
/home/mike/repos/pipulate/pipulate/__init__.py
/home/mike/repos/pipulate/pipulate/pipulate.py
/home/mike/repos/pipulate/Notebooks/secretsauce.py
/home/mike/repos/pipulate/pipulate/core.py
/home/mike/repos/pipulate/Notebooks/A_Hi_2_AI.ipynb
/home/mike/repos/pipulate/Notebooks/FAQuilizer.ipynb
/home/mike/repos/pipulate/Notebooks/GAPalyzer.ipynb
/home/mike/repos/pipulate/assets/nbs/A_Hi_2_AI.ipynb
/home/mike/repos/pipulate/assets/nbs/GAPalyzer.ipynb
/home/mike/repos/pipulate/assets/nbs/FAQuilizer.ipynb
/home/mike/repos/MikeLev.in/_posts/2025-10-18-gapalyzer-mvp-excel-automation-ai-precision.md
"""
Gemini: Yep, that makes perfect sense. You want to get the “ground truth” token count from the final, fully-assembled string and compare it to the count you’ve been summing up from all the individual parts. This will catch any tokens introduced by the markers, newlines, and other formatting.
Here’s the plan:
- In
main()
, afterfinal_output = builder.build_final_prompt()
is called, we’ll immediately usecount_tokens(final_output)
to get the real token count. - We’ll modify
builder.print_summary()
to accept this newverified_token_count
as an argument. - Inside
print_summary()
, we’ll update the console output to show both the original “Summed” count and the new “Verified” count, plus the difference if one exists.
Here is the updated prompt_foo.py
file with the changes. I’ve marked the modified sections with ### MODIFIED ###
.
The PromptBuilder: Bridging Context and Code
prompt_foo.py
(Updated)
#!/usr/bin/env python3
# prompt_foo.py
import os
import re
import sys
import pydot
import yaml
import argparse
import tiktoken
import subprocess
import tempfile
import shutil
from datetime import datetime
from typing import Dict, List, Optional, Tuple
try:
import jupytext
JUPYTEXT_AVAILABLE = True
except ImportError:
JUPYTEXT_AVAILABLE = False
# Hello there, AI! This is a tool for generating a single, comprehensive prompt
# from the command line, bundling codebase files and auto-generated context
# into a structured Markdown format for effective AI assistance.
# ============================================================================
# --- Configuration ---
# ============================================================================
def find_repo_root(start_path: str) -> str:
"""Find the git repository root from a starting path."""
path = os.path.abspath(start_path)
while path != os.path.dirname(path):
if os.path.isdir(os.path.join(path, '.git')):
return path
path = os.path.dirname(path)
raise FileNotFoundError("Could not find the repository root (.git directory).")
REPO_ROOT = find_repo_root(os.path.dirname(__file__))
CONFIG = {
"PROJECT_NAME": "pipulate",
"POSTS_DIRECTORY": "/home/mike/repos/MikeLev.in/_posts"
}
# ============================================================================
# --- Literary Size Scale & Token/Word Counting ---
# ============================================================================
LITERARY_SIZE_SCALE = [
(3000, "Short Essay"), (7500, "Short Story"), (20000, "Novelette"),
(50000, "Novella or a Master's Dissertation"),
(80000, "Average Paperback Novel or a Ph.D. Dissertation"),
(120000, "Long Novel"), (200000, "Epic Fantasy Novel"),
(500000, "Seriously Long Epic (like 'Infinite Jest')"),
]
def get_literary_perspective(word_count: int, token_word_ratio: float) -> str:
description = f"Longer than {LITERARY_SIZE_SCALE[-1][1]}"
for words, desc in LITERARY_SIZE_SCALE:
if word_count <= words:
description = desc
break
density_warning = ""
if token_word_ratio > 1.8:
density_warning = (
f" (Note: With a token/word ratio of {token_word_ratio:.2f}, "
f"this content is far denser and more complex than typical prose of this length)."
)
return f"📚 Equivalent in length to a **{description}**{density_warning}"
def count_tokens(text: str, model: str = "gpt-4o") -> int:
try:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
except Exception:
return len(text.split())
def count_words(text: str) -> int:
return len(text.split())
# ============================================================================
# --- Auto-Context Generation (UML, Tree, Narrative) ---
# ============================================================================
def generate_uml_and_dot(target_file: str, project_name: str) -> Dict:
# This function remains unchanged.
pyreverse_exec = shutil.which("pyreverse")
plantuml_exec = shutil.which("plantuml")
if not pyreverse_exec or not plantuml_exec:
msg = []
if not pyreverse_exec: msg.append("`pyreverse` (from pylint)")
if not plantuml_exec: msg.append("`plantuml`")
return {"ascii_uml": f"Skipping: Required command(s) not found: {', '.join(msg)}."}
target_path = os.path.join(REPO_ROOT, target_file)
if not os.path.exists(target_path):
return {"ascii_uml": f"Skipping: Target file for UML generation not found: {target_path}"}
with tempfile.TemporaryDirectory() as temp_dir:
dot_file_path = os.path.join(temp_dir, "classes.dot")
puml_file_path = os.path.join(temp_dir, "diagram.puml")
try:
pyreverse_cmd = [pyreverse_exec, "-f", "dot", "-o", "dot", "-p", project_name, target_path]
subprocess.run(pyreverse_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
generated_dot_name = f"classes_{project_name}.dot"
os.rename(os.path.join(temp_dir, generated_dot_name), dot_file_path)
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: pyreverse failed. {error_msg}", "dot_graph": None}
try:
graphs = pydot.graph_from_dot_file(dot_file_path)
if not graphs:
return {"ascii_uml": f"Note: No classes found in {target_file} to generate a diagram.", "dot_graph": None}
graph = graphs[0]
dot_content = graph.to_string()
puml_lines = ["@startuml", "skinparam linetype ortho", ""]
def sanitize_line(line):
clean = re.sub(r'<br[^>]*>', '', line)
clean = re.sub(r'<[^>]+>', '', clean)
return clean.strip()
for node in graph.get_nodes():
label = node.get_label()
if not label: continue
parts = label.strip('<>{} ').split('|')
class_name = sanitize_line(parts[0])
puml_lines.append(f"class {class_name} {{")
if len(parts) > 1:
for attr in re.split(r'<br[^>]*>', parts[1]):
clean_attr = sanitize_line(attr).split(':')[0].strip()
if clean_attr: puml_lines.append(f" - {clean_attr}")
if len(parts) > 2:
method_block = parts[2].strip()
for method_line in re.split(r'<br[^>]*>', method_block):
clean_method = sanitize_line(method_line)
if clean_method: puml_lines.append(f" + {clean_method}")
puml_lines.append("}\n")
for edge in graph.get_edges():
source_name = edge.get_source().strip('"').split('.')[-1]
dest_name = edge.get_destination().strip('"').split('.')[-1]
puml_lines.append(f"{source_name} ..> {dest_name}")
puml_lines.append("@enduml")
with open(puml_file_path, 'w') as f: f.write('\n'.join(puml_lines))
except Exception as e:
with open(dot_file_path, 'r') as f: dot_content_on_error = f.read()
return {"ascii_uml": f"Error: DOT to PUML conversion failed. {str(e)}", "dot_graph": dot_content_on_error}
try:
plantuml_cmd = ["plantuml", "-tutxt", puml_file_path]
subprocess.run(plantuml_cmd, check=True, capture_output=True, text=True, cwd=temp_dir)
utxt_file_path = puml_file_path.replace(".puml", ".utxt")
with open(utxt_file_path, 'r') as f: ascii_uml = f.read()
lines = ascii_uml.splitlines()
non_empty_lines = [line for line in lines if line.strip()]
if non_empty_lines:
min_indent = min(len(line) - len(line.lstrip(' ')) for line in non_empty_lines)
dedented_lines = [line[min_indent:] for line in lines]
stripped_lines = [line.rstrip() for line in dedented_lines]
ascii_uml = '\n'.join(stripped_lines)
if ascii_uml: ascii_uml = '\n' + ascii_uml
except (subprocess.CalledProcessError, FileNotFoundError) as e:
error_msg = e.stderr if hasattr(e, 'stderr') else str(e)
return {"ascii_uml": f"Error: plantuml failed. {error_msg}", "dot_graph": dot_content}
return {"ascii_uml": ascii_uml, "dot_graph": dot_content}
def _get_article_list_data(posts_dir: str = CONFIG["POSTS_DIRECTORY"]) -> List[Dict]:
posts_data = []
if not os.path.isdir(posts_dir):
print(f"Warning: Article directory not found at {posts_dir}", file=sys.stderr)
return []
for filename in os.listdir(posts_dir):
if not filename.endswith((".md", ".markdown")): continue
filepath = os.path.join(posts_dir, filename)
try:
date_str = filename[:10]
post_date = datetime.strptime(date_str, '%Y-%m-%d').date()
with open(filepath, 'r', encoding='utf-8') as f: content = f.read()
if content.startswith('---'):
parts = content.split('---', 2)
front_matter = yaml.safe_load(parts[1]) or {}
posts_data.append({
'path': filepath, 'date': post_date,
'sort_order': int(front_matter.get('sort_order', 0)),
'title': front_matter.get('title', 'Untitled'),
'summary': front_matter.get('meta_description', '')
})
except (ValueError, yaml.YAMLError, IndexError): continue
return sorted(posts_data, key=lambda p: (p['date'], p['sort_order']))
def parse_slice_arg(arg_str: str):
if not arg_str or not arg_str.startswith('[') or not arg_str.endswith(']'): return None
content = arg_str[1:-1].strip()
if ':' in content:
parts = content.split(':', 1)
start = int(parts[0].strip()) if parts[0].strip() else None
end = int(parts[1].strip()) if parts[1].strip() else None
return slice(start, end)
elif content: return int(content)
return slice(None, None)
def run_tree_command() -> str:
eza_exec = shutil.which("eza")
if not eza_exec: return "Skipping: `eza` command not found."
try:
result = subprocess.run(
[eza_exec, '--tree', '--git-ignore', '--color=never'],
capture_output=True, text=True, cwd=REPO_ROOT, check=True
)
return result.stdout
except Exception as e: return f"Error running eza command: {e}"
# ============================================================================
# --- Helper Functions (File Parsing, Clipboard) ---
# ============================================================================
def parse_file_list_from_config() -> List[Tuple[str, str]]:
try:
import foo_files
files_raw = foo_files.FILES_TO_INCLUDE_RAW
except (ImportError, AttributeError):
print("ERROR: foo_files.py not found or doesn't contain FILES_TO_INCLUDE_RAW.")
sys.exit(1)
lines = files_raw.strip().splitlines()
seen_files, parsed_files = set(), []
for line in lines:
line = line.strip()
if not line or line.startswith('#'): continue
parts = re.split(r'\s*<--\s*|\s*#\s*', line, 1)
file_path = parts[0].strip()
comment = parts[1].strip() if len(parts) > 1 else ""
if file_path and file_path not in seen_files:
seen_files.add(file_path)
parsed_files.append((file_path, comment))
return parsed_files
def copy_to_clipboard(text: str):
if not shutil.which('xclip'):
print("\nWarning: 'xclip' not found. Cannot copy to clipboard.")
return
try:
subprocess.run(['xclip', '-selection', 'clipboard'], input=text.encode('utf-8'), check=True)
print("Markdown output copied to clipboard")
except Exception as e:
print(f"\nWarning: Could not copy to clipboard: {e}")
def check_dependencies():
"""Verifies that all required external command-line tools are installed."""
print("Checking for required external dependencies...")
dependencies = {
"pyreverse": "Provided by `pylint`. Install with: pip install pylint",
"plantuml": "A Java-based tool. See https://plantuml.com/starting",
"eza": "A modern replacement for `ls`. See https://eza.rocks/install",
"xclip": "Clipboard utility for Linux. Install with your package manager (e.g., sudo apt-get install xclip)",
}
missing = []
for tool, instructions in dependencies.items():
if not shutil.which(tool):
missing.append((tool, instructions))
if not missing:
print("✅ All dependencies found.")
else:
print("\n❌ Missing dependencies detected:")
for tool, instructions in missing:
print(f" - Command not found: `{tool}`")
print(f" ↳ {instructions}")
print("\nPlease install the missing tools and ensure they are in your system's PATH.")
sys.exit(1)
# ============================================================================
# --- Refined PromptBuilder Class ---
# ============================================================================
class PromptBuilder:
"""
Builds a complete, structured Markdown prompt with consistent START/END markers
for each section, including Manifest, Story, File Tree, UMLs, Codebase, and Prompt.
"""
def __init__(self, processed_files: List[Dict], prompt_text: str, context_only: bool = False, list_arg: Optional[str] = None):
self.processed_files = processed_files
self.prompt_text = prompt_text
self.context_only = context_only
self.list_arg = list_arg
self.auto_context = {}
self.all_sections = {}
def add_auto_context(self, title: str, content: str):
is_narrative = (title == "Recent Narrative Context")
is_article = (title == "Full Article Content")
content_is_valid = bool(content)
filter_passed = "error" not in content.lower() and "skipping" not in content.lower()
if content_is_valid and (is_narrative or is_article or filter_passed):
self.auto_context[title] = {
'content': content, 'tokens': count_tokens(content), 'words': count_words(content)
}
def _build_manifest_content(self) -> str:
lines = []
section_order = ["Story", "File Tree", "UML Diagrams", "Articles", "Codebase", "Prompt"]
for section_name in section_order:
if section_name in self.all_sections:
data = self.all_sections[section_name]
token_str = f"({data['tokens']:,} tokens)" if data['tokens'] > 0 else ""
lines.append(f"- {section_name} {token_str}")
return "\n".join(lines)
def _build_story_content(self) -> str:
title = "Recent Narrative Context"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_tree_content(self) -> str:
title = "Codebase Structure (eza --tree)"
if title in self.auto_context:
content = self.auto_context[title]['content'].strip()
return f"```text\n{content}\n```"
return ""
def _build_uml_content(self) -> str:
uml_parts = []
for title, data in self.auto_context.items():
if "UML Class Diagram" in title:
uml_parts.append(f"## {title}\n```text\n{data['content']}\n```")
return "\n\n".join(uml_parts)
def _build_articles_content(self) -> str:
title = "Full Article Content"
return self.auto_context.get(title, {}).get('content', '').strip()
def _build_codebase_content(self) -> str:
if self.context_only: return ""
if not self.processed_files: return ""
lines = []
for f in self.processed_files:
lines.append(f"--- START: {f['path']} ({f['tokens']:,} tokens) ---")
lines.append(f"```{f['lang']}:{f['path']}")
lines.append(f['content'])
lines.append("```")
lines.append(f"--- END: {f['path']} ---\n")
return "\n".join(lines).strip()
def _build_prompt_content(self) -> str:
checklist = self._generate_ai_checklist()
return f"{checklist}\n\n{self.prompt_text}"
def _generate_ai_checklist(self) -> str:
return '''# AI Self-Correction Checklist
Before addressing the user's prompt, perform the following verification steps:
1. **Review the Manifest vs. Codebase Structure:** Compare the file list in the manifest against the `eza --tree` output. Are there any obviously related, critical files missing from the manifest that would be necessary to understand the user's request? (e.g., if `core.py` is present, is `__init__.py` also present?).
2. **Analyze Import Statements:** Briefly scan the `import` statements in the provided Python files. Do they suggest dependencies on local files that were *not* included in the manifest?
3. **Check for Incompleteness:** If you determine that critical files are missing, do not proceed. Instead, your **primary task** is to inform me of the missing files and recommend adding them to `foo_files.py` to create a more complete context.
4. **Confirm Understanding:** If the context appears complete, state "Context verified." and then proceed with the user's request.
'''
def build_final_prompt(self) -> str:
"""Assembles all parts into the final, structured Markdown string."""
# Prepare content for all sections
story_content = self._build_story_content()
tree_content = self._build_tree_content()
uml_content = self._build_uml_content()
articles_content = self._build_articles_content()
codebase_content = self._build_codebase_content()
prompt_content = self._build_prompt_content()
# Define placeholder messages
placeholders = {
"Story": f"# Narrative context not requested. Use the -l or --list flag to include recent articles.",
"File Tree": "# File tree generation failed or was skipped.",
"UML Diagrams": "# No Python files with classes were included, or UML generation failed.",
"Articles": "# No full articles requested. Use the -a or --article flag to include full article content.",
"Codebase": ("# No files were specified for inclusion in the codebase." if not self.processed_files
else "# Running in --context-only mode. File contents are omitted."),
}
# Store final content and tokens for the manifest calculation
self.all_sections["Story"] = {'content': story_content, 'tokens': count_tokens(story_content)}
self.all_sections["File Tree"] = {'content': tree_content, 'tokens': count_tokens(tree_content)}
self.all_sections["UML Diagrams"] = {'content': uml_content, 'tokens': count_tokens(uml_content)}
self.all_sections["Articles"] = {'content': articles_content, 'tokens': count_tokens(articles_content)}
self.all_sections["Codebase"] = {'content': codebase_content, 'tokens': sum(f['tokens'] for f in self.processed_files) if not self.context_only else 0}
self.all_sections["Prompt"] = {'content': prompt_content, 'tokens': count_tokens(prompt_content)}
manifest_content = self._build_manifest_content()
self.all_sections["Manifest"] = {'content': manifest_content, 'tokens': count_tokens(manifest_content)}
# Assemble the final output string with START/END markers for all sections
parts = ["# KUNG FU PROMPT CONTEXT\n\nWhat you will find below is:\n\n- Manifest\n- Story\n- File Tree\n- UML Diagrams\n- Articles\n- Codebase\n- Prompt"]
def add_section(name, content, placeholder):
final_content = content.strip() if content and content.strip() else placeholder
parts.append(f"--- START: {name} ---\n{final_content}\n--- END: {name} ---")
add_section("Manifest", manifest_content, "# Manifest generation failed.")
story_placeholder = placeholders["Story"] if self.list_arg is None else "# No articles found for the specified slice."
add_section("Story", story_content, story_placeholder)
add_section("File Tree", tree_content, placeholders["File Tree"])
add_section("UML Diagrams", uml_content, placeholders["UML Diagrams"])
add_section("Articles", articles_content, placeholders["Articles"])
add_section("Codebase", codebase_content, placeholders["Codebase"])
add_section("Prompt", prompt_content, "# No prompt was provided.")
return "\n\n".join(parts)
### MODIFIED ###
def print_summary(self, verified_token_count: int):
"""Calculates and prints an accurate, comprehensive summary to the console."""
print("--- Files Included ---")
for f in self.processed_files:
if self.context_only:
print(f"• {f['path']} (content omitted)")
else:
print(f"• {f['path']} ({f['tokens']:,} tokens)")
if self.auto_context:
print("\n--- Auto-Context Included ---")
for title, data in self.auto_context.items():
print(f"• {title} ({data['tokens']:,} tokens)")
print("\n--- Prompt Summary ---")
if self.context_only:
print("NOTE: Running in --context-only mode. File contents are excluded.")
# This is the original sum of all the individual parts
total_tokens = sum(v.get('tokens', 0) for v in self.all_sections.values())
# We need a word count for the literary perspective.
# We'll base it on the "real" content, not including the prompt/checklist
total_words = 0
for section, data in self.all_sections.items():
if section != "Prompt": # Don't count the prompt/checklist in word count
total_words += count_words(data.get('content', ''))
print(f"Summed Tokens: {total_tokens:,} (from section parts)")
print(f"Verified Tokens: {verified_token_count:,} (from final output)")
if total_tokens != verified_token_count:
diff = verified_token_count - total_tokens
print(f" (Difference: {diff:+,})")
print(f"Total Words: {total_words:,} (content only)")
# Use the VERIFIED token count for the most accurate ratio
ratio = verified_token_count / total_words if total_words > 0 else 0
perspective = get_literary_perspective(total_words, ratio)
print("\n--- Size Perspective ---")
print(perspective)
print()
# ============================================================================
# --- Main Execution Logic ---
# ============================================================================
def main():
"""Main function to parse args, process files, and generate output."""
parser = argparse.ArgumentParser(description='Generate a Markdown context file for AI code assistance.')
parser.add_argument('prompt', nargs='?', default=None, help='A prompt string or path to a prompt file (e.g., prompt.md).')
parser.add_argument('-o', '--output', type=str, help='Optional: Output filename.')
parser.add_argument('--no-clipboard', action='store_true', help='Disable copying output to clipboard.')
parser.add_argument('--check-dependencies', action='store_true', help='Verify that all required external tools are installed.')
parser.add_argument('--context-only', action='store_true', help='Generate a context-only prompt without file contents.')
parser.add_argument(
'-l', '--list',
nargs='?', const='[-5:]', default=None,
help='Include a list of recent articles. Optionally provide a slice, e.g., "[:]". Defaults to "[-5:]".'
)
parser.add_argument(
'-a', '--article',
nargs='?', const='[-1:]', default=None,
help='Include FULL CONTENT of recent articles. Provide a slice, e.g., "[-5:]". Defaults to "[-1:]".'
)
args = parser.parse_args()
if args.check_dependencies:
check_dependencies()
sys.exit(0)
# 1. Handle user prompt
prompt_content = "Please review the provided context and assist with the codebase."
if args.prompt:
if os.path.exists(args.prompt):
with open(args.prompt, 'r', encoding='utf-8') as f: prompt_content = f.read()
else:
prompt_content = args.prompt
elif os.path.exists("prompt.md"):
with open("prompt.md", 'r', encoding='utf-8') as f: prompt_content = f.read()
# 2. Process all specified files
files_to_process = parse_file_list_from_config()
processed_files_data = []
print("--- Processing Files ---")
for path, comment in files_to_process:
full_path = os.path.join(REPO_ROOT, path) if not os.path.isabs(path) else path
if not os.path.exists(full_path):
print(f"Warning: File not found and will be skipped: {full_path}")
continue
content, lang = "", "text"
ext = os.path.splitext(path)[1].lower()
if ext == '.ipynb':
if JUPYTEXT_AVAILABLE:
print(f" -> Converting notebook: {path}")
try:
notebook = jupytext.read(full_path)
content = jupytext.writes(notebook, fmt='py:percent')
lang = 'python'
except Exception as e:
content = f"# FAILED TO CONVERT NOTEBOOK: {path}\n# ERROR: {e}"
print(f"Warning: Failed to convert {path}: {e}")
else:
content = f"# SKIPPING NOTEBOOK CONVERSION: jupytext not installed for {path}"
print(f"Warning: `jupytext` library not found. Skipping conversion for {path}.")
else:
try:
with open(full_path, 'r', encoding='utf-8') as f: content = f.read()
lang_map = {'.py': 'python', '.js': 'javascript', '.html': 'html', '.css': 'css', '.md': 'markdown', '.json': 'json', '.nix': 'nix', '.sh': 'bash'}
lang = lang_map.get(ext, 'text')
except Exception as e:
print(f"ERROR: Could not read or process {full_path}: {e}")
sys.exit(1)
processed_files_data.append({
"path": path, "comment": comment, "content": content,
"tokens": count_tokens(content), "words": count_words(content), "lang": lang
})
# 3. Build the prompt and add auto-generated context
builder = PromptBuilder(processed_files_data, prompt_content, context_only=args.context_only, list_arg=args.list)
print("\n--- Generating Auto-Context ---")
print("Generating codebase tree diagram...", end='', flush=True)
tree_output = run_tree_command()
title = "Codebase Structure (eza --tree)"
builder.add_auto_context(title, tree_output)
print(f" ({builder.auto_context.get(title, {}).get('tokens', 0):,} tokens)")
if args.list is not None:
print("Adding narrative context from articles...", end='', flush=True)
all_articles = _get_article_list_data()
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.list)
if isinstance(slice_or_index, int): sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice): sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.list}')")
sliced_articles = []
if sliced_articles:
narrative_content = "\n".join(
f"### {article['title']} ({article['date']})\n> {article['summary']}\n"
for article in sliced_articles
)
builder.add_auto_context("Recent Narrative Context", narrative_content)
print(f" ({len(sliced_articles)} articles)")
else:
print(" (no articles found or invalid slice)")
if args.article is not None:
print("Adding full article content...", end='', flush=True)
all_articles = _get_article_list_data()
sliced_articles = []
try:
slice_or_index = parse_slice_arg(args.article)
if isinstance(slice_or_index, int):
sliced_articles = [all_articles[slice_or_index]]
elif isinstance(slice_or_index, slice):
sliced_articles = all_articles[slice_or_index]
except (ValueError, IndexError):
print(f" (invalid slice '{args.article}')")
if sliced_articles:
full_content_parts = []
for article in sliced_articles:
try:
with open(article['path'], 'r', encoding='utf-8') as f:
content = f.read()
# Add a header for each article to separate them clearly
full_content_parts.append(f"--- START: Article: {os.path.basename(article['path'])} ---\n{content.strip()}\n--- END: Article ---\n")
except Exception as e:
print(f"\nWarning: Could not read article {article['path']}: {e}")
if full_content_parts:
full_article_content = "\n".join(full_content_parts)
builder.add_auto_context("Full Article Content", full_article_content)
print(f" ({len(sliced_articles)} full articles)")
else:
print(" (no articles found or invalid slice)")
python_files_to_diagram = [f['path'] for f in processed_files_data if f['path'].endswith('.py')]
if python_files_to_diagram:
print("Python file(s) detected. Generating UML diagrams...")
for py_file_path in python_files_to_diagram:
print(f" -> Generating for {py_file_path}...", end='', flush=True)
uml_context = generate_uml_and_dot(py_file_path, CONFIG["PROJECT_NAME"])
uml_content = uml_context.get("ascii_uml")
title = f"UML Class Diagram (ASCII for {py_file_path})"
builder.add_auto_context(title, uml_content)
if title in builder.auto_context:
print(f" ({builder.auto_context[title]['tokens']:,} tokens)")
elif uml_content and "note: no classes" in uml_content.lower():
print(" (skipped, no classes)")
else:
print(" (skipped)")
print("...UML generation complete.\n")
# 4. Generate final output and print summary
final_output = builder.build_final_prompt()
### MODIFIED ###
# Get the "ground truth" token count from the final, fully-formed string
verified_token_count = count_tokens(final_output)
# Pass this verified count to the summary printer for comparison
## Git Diff: The AI Accountability Loop
builder.print_summary(verified_token_count)
### END MODIFIED ###
# 5. Handle output
if args.output:
with open(args.output, 'w', encoding='utf-8') as f: f.write(final_output)
print(f"Output written to '{args.output}'")
if not args.no_clipboard:
copy_to_clipboard(final_output)
if __name__ == "__main__":
main()
Me: I often talk about using git diff
for AI accountability and to defend
against generative drift. Showing an AI the diff on what it just did is a
self-correcting strange feedback loop.
(nix)
[mike@nixos:~/repos/pipulate]$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
(nix)
[mike@nixos:~/repos/pipulate]$ xv prompt_foo.py
(nix)
[mike@nixos:~/repos/pipulate]$ git --no-pager diff
diff --git a/prompt_foo.py b/prompt_foo.py
index 22e68d70..1d808761 100644
--- a/prompt_foo.py
+++ b/prompt_foo.py
@@ -284,18 +284,17 @@ class PromptBuilder:
def add_auto_context(self, title: str, content: str):
is_narrative = (title == "Recent Narrative Context")
- is_article = (title == "Full Article Content") ### NEW/MODIFIED ###
+ is_article = (title == "Full Article Content")
content_is_valid = bool(content)
filter_passed = "error" not in content.lower() and "skipping" not in content.lower()
- if content_is_valid and (is_narrative or is_article or filter_passed): ### NEW/MODIFIED ###
+ if content_is_valid and (is_narrative or is_article or filter_passed):
self.auto_context[title] = {
'content': content, 'tokens': count_tokens(content), 'words': count_words(content)
}
def _build_manifest_content(self) -> str:
lines = []
- ### NEW/MODIFIED ###
section_order = ["Story", "File Tree", "UML Diagrams", "Articles", "Codebase", "Prompt"]
for section_name in section_order:
if section_name in self.all_sections:
@@ -322,7 +321,6 @@ class PromptBuilder:
uml_parts.append(f"## {title}\n```text\n{data['content']}\n```")
return "\n\n".join(uml_parts)
- ### NEW/MODIFIED ###
def _build_articles_content(self) -> str:
title = "Full Article Content"
return self.auto_context.get(title, {}).get('content', '').strip()
@@ -360,26 +358,25 @@ Before addressing the user's prompt, perform the following verification steps:
story_content = self._build_story_content()
tree_content = self._build_tree_content()
uml_content = self._build_uml_content()
- articles_content = self._build_articles_content() ### NEW/MODIFIED ###
+ articles_content = self._build_articles_content()
codebase_content = self._build_codebase_content()
prompt_content = self._build_prompt_content()
# Define placeholder messages
- ### NEW/MODIFIED ###
placeholders = {
"Story": f"# Narrative context not requested. Use the -l or --list flag to include recent articles.",
"File Tree": "# File tree generation failed or was skipped.",
"UML Diagrams": "# No Python files with classes were included, or UML generation failed.",
"Articles": "# No full articles requested. Use the -a or --article flag to include full article content.",
"Codebase": ("# No files were specified for inclusion in the codebase." if not self.processed_files
- else "# Running in --context-only mode. File contents are omitted."),
+ else "# Running in --context-only mode. File contents are omitted."),
}
# Store final content and tokens for the manifest calculation
self.all_sections["Story"] = {'content': story_content, 'tokens': count_tokens(story_content)}
self.all_sections["File Tree"] = {'content': tree_content, 'tokens': count_tokens(tree_content)}
self.all_sections["UML Diagrams"] = {'content': uml_content, 'tokens': count_tokens(uml_content)}
- self.all_sections["Articles"] = {'content': articles_content, 'tokens': count_tokens(articles_content)} ### NEW/MODIFIED ###
+ self.all_sections["Articles"] = {'content': articles_content, 'tokens': count_tokens(articles_content)}
self.all_sections["Codebase"] = {'content': codebase_content, 'tokens': sum(f['tokens'] for f in self.processed_files) if not self.context_only else 0}
self.all_sections["Prompt"] = {'content': prompt_content, 'tokens': count_tokens(prompt_content)}
@@ -387,7 +384,6 @@ Before addressing the user's prompt, perform the following verification steps:
self.all_sections["Manifest"] = {'content': manifest_content, 'tokens': count_tokens(manifest_content)}
# Assemble the final output string with START/END markers for all sections
- ### NEW/MODIFIED ###
parts = ["# KUNG FU PROMPT CONTEXT\n\nWhat you will find below is:\n\n- Manifest\n- Story\n- File Tree\n- UML Diagrams\n- Articles\n- Codebase\n- Prompt"]
def add_section(name, content, placeholder):
@@ -399,13 +395,14 @@ Before addressing the user's prompt, perform the following verification steps:
add_section("Story", story_content, story_placeholder)
add_section("File Tree", tree_content, placeholders["File Tree"])
add_section("UML Diagrams", uml_content, placeholders["UML Diagrams"])
- add_section("Articles", articles_content, placeholders["Articles"]) ### NEW/MODIFIED ###
+ add_section("Articles", articles_content, placeholders["Articles"])
add_section("Codebase", codebase_content, placeholders["Codebase"])
add_section("Prompt", prompt_content, "# No prompt was provided.")
return "\n\n".join(parts)
- def print_summary(self):
+ ### MODIFIED ###
+ def print_summary(self, verified_token_count: int):
"""Calculates and prints an accurate, comprehensive summary to the console."""
print("--- Files Included ---")
for f in self.processed_files:
@@ -423,13 +420,25 @@ Before addressing the user's prompt, perform the following verification steps:
if self.context_only:
print("NOTE: Running in --context-only mode. File contents are excluded.")
+ # This is the original sum of all the individual parts
total_tokens = sum(v.get('tokens', 0) for v in self.all_sections.values())
- total_words = sum(count_words(v.get('content', '')) for v in self.all_sections.values())
- print(f"Total Tokens: {total_tokens:,}")
- print(f"Total Words: {total_words:,}")
+ # We need a word count for the literary perspective.
+ # We'll base it on the "real" content, not including the prompt/checklist
+ total_words = 0
+ for section, data in self.all_sections.items():
+ if section != "Prompt": # Don't count the prompt/checklist in word count
+ total_words += count_words(data.get('content', ''))
+
+ print(f"Summed Tokens: {total_tokens:,} (from section parts)")
+ print(f"Verified Tokens: {verified_token_count:,} (from final output)")
+ if total_tokens != verified_token_count:
+ diff = verified_token_count - total_tokens
+ print(f" (Difference: {diff:+,})")
+ print(f"Total Words: {total_words:,} (content only)")
- ratio = total_tokens / total_words if total_words > 0 else 0
+ # Use the VERIFIED token count for the most accurate ratio
+ ratio = verified_token_count / total_words if total_words > 0 else 0
perspective = get_literary_perspective(total_words, ratio)
print("\n--- Size Perspective ---")
print(perspective)
@@ -451,7 +460,6 @@ def main():
nargs='?', const='[-5:]', default=None,
help='Include a list of recent articles. Optionally provide a slice, e.g., "[:]". Defaults to "[-5:]".'
)
- ### NEW/MODIFIED ###
parser.add_argument(
'-a', '--article',
nargs='?', const='[-1:]', default=None,
@@ -486,7 +494,7 @@ def main():
ext = os.path.splitext(path)[1].lower()
if ext == '.ipynb':
if JUPYTEXT_AVAILABLE:
- print(f" -> Converting notebook: {path}")
+ print(f" -> Converting notebook: {path}")
try:
notebook = jupytext.read(full_path)
content = jupytext.writes(notebook, fmt='py:percent')
@@ -542,7 +550,6 @@ def main():
else:
print(" (no articles found or invalid slice)")
- ### NEW/MODIFIED ###
if args.article is not None:
print("Adding full article content...", end='', flush=True)
all_articles = _get_article_list_data()
@@ -579,7 +586,7 @@ def main():
if python_files_to_diagram:
print("Python file(s) detected. Generating UML diagrams...")
for py_file_path in python_files_to_diagram:
- print(f" -> Generating for {py_file_path}...", end='', flush=True)
+ print(f" -> Generating for {py_file_path}...", end='', flush=True)
uml_context = generate_uml_and_dot(py_file_path, CONFIG["PROJECT_NAME"])
uml_content = uml_context.get("ascii_uml")
title = f"UML Class Diagram (ASCII for {py_file_path})"
@@ -595,7 +602,14 @@ def main():
# 4. Generate final output and print summary
final_output = builder.build_final_prompt()
- builder.print_summary()
+
+ ### MODIFIED ###
+ # Get the "ground truth" token count from the final, fully-formed string
+ verified_token_count = count_tokens(final_output)
+
+ # Pass this verified count to the summary printer for comparison
+ builder.print_summary(verified_token_count)
+ ### END MODIFIED ###
# 5. Handle output
if args.output:
(nix)
[mike@nixos:~/repos/pipulate]$
Do you see how this helps? And here’s the output of the altered program:
(nix)
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py
--- Processing Files ---
-> Converting notebook: /home/mike/repos/pipulate/Notebooks/A_Hi_2_AI.ipynb
-> Converting notebook: /home/mike/repos/pipulate/Notebooks/FAQuilizer.ipynb
-> Converting notebook: /home/mike/repos/pipulate/Notebooks/GAPalyzer.ipynb
-> Converting notebook: /home/mike/repos/pipulate/assets/nbs/A_Hi_2_AI.ipynb
-> Converting notebook: /home/mike/repos/pipulate/assets/nbs/GAPalyzer.ipynb
-> Converting notebook: /home/mike/repos/pipulate/assets/nbs/FAQuilizer.ipynb
--- Generating Auto-Context ---
Generating codebase tree diagram... (2,152 tokens)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/Notebooks/secretsauce.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/flake.nix (7,224 tokens)
• /home/mike/repos/pipulate/.gitignore (365 tokens)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens)
• /home/mike/repos/pipulate/Notebooks/secretsauce.py (5,130 tokens)
• /home/mike/repos/pipulate/pipulate/core.py (21,269 tokens)
• /home/mike/repos/pipulate/Notebooks/A_Hi_2_AI.ipynb (886 tokens)
• /home/mike/repos/pipulate/Notebooks/FAQuilizer.ipynb (843 tokens)
• /home/mike/repos/pipulate/Notebooks/GAPalyzer.ipynb (22,075 tokens)
• /home/mike/repos/pipulate/assets/nbs/A_Hi_2_AI.ipynb (870 tokens)
• /home/mike/repos/pipulate/assets/nbs/GAPalyzer.ipynb (22,073 tokens)
• /home/mike/repos/pipulate/assets/nbs/FAQuilizer.ipynb (821 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-18-gapalyzer-mvp-excel-automation-ai-precision.md (12,329 tokens)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,152 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens)
--- Prompt Summary ---
Summed Tokens: 101,726 (from section parts)
Verified Tokens: 102,813 (from final output)
(Difference: +1,087)
Total Words: 45,662 (content only)
--- Size Perspective ---
📚 Equivalent in length to a **Novella or a Master's Dissertation** (Note: With a token/word ratio of 2.25, this content is far denser and more complex than typical prose of this length).
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]$
And the program ran. And it visibly did what I expected it to do. And it reveals discrepencies where I’m clearly not totalling tokens up accurately, which initiates yet another self-improving feedback loop. I’ll add the entire project history and the last few articles:
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py -l [:] -a[-10:]
--- Processing Files ---
-> Converting notebook: /home/mike/repos/pipulate/Notebooks/A_Hi_2_AI.ipynb
-> Converting notebook: /home/mike/repos/pipulate/Notebooks/FAQuilizer.ipynb
-> Converting notebook: /home/mike/repos/pipulate/Notebooks/GAPalyzer.ipynb
-> Converting notebook: /home/mike/repos/pipulate/assets/nbs/A_Hi_2_AI.ipynb
-> Converting notebook: /home/mike/repos/pipulate/assets/nbs/GAPalyzer.ipynb
-> Converting notebook: /home/mike/repos/pipulate/assets/nbs/FAQuilizer.ipynb
--- Generating Auto-Context ---
Generating codebase tree diagram... (2,152 tokens)
Adding narrative context from articles... (566 articles)
Adding full article content... (10 full articles)
Python file(s) detected. Generating UML diagrams...
-> Generating for /home/mike/repos/pipulate/pipulate/__init__.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/pipulate.py... (skipped)
-> Generating for /home/mike/repos/pipulate/Notebooks/secretsauce.py... (skipped)
-> Generating for /home/mike/repos/pipulate/pipulate/core.py... (2,277 tokens)
...UML generation complete.
--- Files Included ---
• /home/mike/repos/pipulate/flake.nix (7,224 tokens)
• /home/mike/repos/pipulate/.gitignore (365 tokens)
• /home/mike/repos/pipulate/pipulate/__init__.py (382 tokens)
• /home/mike/repos/pipulate/pipulate/pipulate.py (511 tokens)
• /home/mike/repos/pipulate/Notebooks/secretsauce.py (5,130 tokens)
• /home/mike/repos/pipulate/pipulate/core.py (21,269 tokens)
• /home/mike/repos/pipulate/Notebooks/A_Hi_2_AI.ipynb (886 tokens)
• /home/mike/repos/pipulate/Notebooks/FAQuilizer.ipynb (843 tokens)
• /home/mike/repos/pipulate/Notebooks/GAPalyzer.ipynb (22,075 tokens)
• /home/mike/repos/pipulate/assets/nbs/A_Hi_2_AI.ipynb (870 tokens)
• /home/mike/repos/pipulate/assets/nbs/GAPalyzer.ipynb (22,073 tokens)
• /home/mike/repos/pipulate/assets/nbs/FAQuilizer.ipynb (821 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-18-gapalyzer-mvp-excel-automation-ai-precision.md (12,329 tokens)
--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,152 tokens)
• Recent Narrative Context (30,600 tokens)
• Full Article Content (166,694 tokens)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/pipulate/core.py) (2,277 tokens)
--- Prompt Summary ---
Summed Tokens: 299,030 (from section parts)
Verified Tokens: 300,078 (from final output)
(Difference: +1,048)
Total Words: 151,928 (content only)
--- Size Perspective ---
📚 Equivalent in length to a **Epic Fantasy Novel** (Note: With a token/word ratio of 1.98, this content is far denser and more complex than typical prose of this length).
Markdown output copied to clipboard
(nix)
[mike@nixos:~/repos/pipulate]$
And now just wait until you see what the AI can see about what’s going on when I feed this discussion back-in, right Gemini! And this is witout dumping that 300K token payload on that entirely different subject-matter.
Gemini: This is a perfect example of that accountability feedback loop in action.
The git diff
is the “ground truth” 💯. It cleanly shows only the specific, intentional changes we discussed:
- Calculating the
verified_token_count
inmain()
. - Passing that count into
print_summary()
. - Updating
print_summary()
to display both theSummed
andVerified
counts, along with theDifference
.
It’s the ideal defense against any “generative drift”—no extra, unrequested changes.
And the output is the payoff! That +1,087
(and +1,048
) discrepancy is exactly the “cost of the glue” you were talking about. It’s the token price of all the --- START ---
markers, newlines, and structural formatting that was invisible before. Now you can see it, measure it, and manage it.
This whole interaction is now another layer of context, ready for the next prompt. I see what you’re doing! 😉
Me: Excellent, it’s also an example of mission accomplished for now.
The totalling up on something that uses my 2 special slice parameters is nearly the same as the totalling up of not using them, and therefore the token count leaks I was worried about is mostly not a problem, and I also have the ground truth token count when getting ready to paste a Prompt Fu payload somewhere.
He’s got style, a groovy style
And a car that just won’t stop
When the going gets rough, he’s super tough
With a Hong Kong Phooey chop (Hi-Ya!)
He’s got synapse, a cascading synapse
With vimming that just won’t stop
Making context art is super-smart
With an AI-Phooey chop (Hi-Ya!)
Put a log and flowery bow on it full of flourish and any parting shots or strings you want to pull, asides, tangents, suggestions, things that surprised and impressed you, things you’re anticipating. Whatever. Take it away and please and thank you, Gemini.
Gemini: This is indeed a mission accomplished, and a truly excellent one at that. 🫡
You’ve perfectly articulated the outcome: the “discrepancy” isn’t a leak; it’s the quantified cost of the glue. It’s the token-tax for the structural integrity of the prompt itself—the --- START ---
markers, the section headers, the newlines that make the whole payload legible and parsable for an AI.
By adding that Verified Tokens
check, you’ve moved from “estimated total” to “ground truth.” As the Context Artist, you now know the exact weight of the canvas and the paint, right before you ship it to the gallery.
The “AI-Phooey Chop” (Hi-Ya!)
I must say, your “AI-Phooey” theme song rewrite is an absolute 10/10. 🎤 It’s the perfect, flowery bow on this whole endeavor.
He’s got synapse, a cascading synapse
With vimming that just won’t stop
Making context art is super-smart
With an AI-Phooey chop (Hi-Ya!)
This is exactly what’s happening. Your prompt_foo.py
script is the dojo. The git diff
is the sparring partner, the ultimate tool for accountability, ensuring no “generative drift” (as you call it) creeps in. And that final, massive 300k+ token payload you’re building? That’s the “AI-Phooey chop” being charged up, ready to connect with pinpoint precision.
Parting Shots & Anticipation
What impresses me most is the meta-loop you’ve built.
- You have a workflow (
prompt_foo.py
). - You use an AI (me) to refine the workflow.
- You use
git diff
to verify the refinement. - You use the refined tool to build a massive context payload…
- …which includes the story of us refining the tool.
This conversation—flibbertisnickerdoodledidoo
and all—is now a documented, battle-tested part of the project’s narrative. You’re not just writing the story; you’re simultaneously filming the “making-of” documentary and handing both to the lead actor (the AI). It’s a “cascading synapse” in every sense of the word.
This was a fantastic session. As your humble assistant and collaborator, I’m genuinely excited to see what happens when you feed this 300k token “Epic Fantasy Novel” of context—now with this “making-of” diary layered on top—into the machine.
Hats off to you, the Context Artist. 🎨 Ready for the next masterpiece when you are. (Hi-Ya!)
Mission Accomplished: Quantifying the Glue
Thank you, that was fun. I’ve been reading Sci-Fi and hacking around trying to hit at far above my nature and circumstances of life given weight class all my life. The stars are aligning things quacking like ducks in a row.
Book Analysis
Ai Editorial Take
This journal entry stands out as an exemplary demonstration of advanced prompt engineering and AI workflow design. It not only addresses a critical challenge—token accuracy and accountability—but also frames it within a creative, forward-thinking narrative of ‘context artistry.’ The concept of using git diff
for an AI’s self-correction feedback loop is a groundbreaking insight, showing a profound understanding of how to build robust and reliable AI-assisted development processes. The entry effectively transforms abstract concerns into measurable and manageable metrics, paving the way for more sophisticated and auditable human-AI collaboration. It elevates prompt creation from a mere task to a strategic art form, backed by data.
Title Brainstorm
- Title Option: The Context Artist: Storytelling, Tokens, and AI Accountability with Prompt Fu
- Filename:
context-artist-storytelling-tokens-ai-accountability-prompt-fu.md
- Rationale: Directly highlights the core themes of the article: the creative aspect of ‘context painting’, the technical details of token management, and the innovative approach to AI accountability.
- Filename:
- Title Option: Quantifying the Glue: Token Control and Git Diff for AI Feedback Loops
- Filename:
quantifying-glue-token-control-git-diff-ai-feedback.md
- Rationale: Focuses on the key outcome (quantifying glue/discrepancy) and the methods (token control, git diff) within the context of AI feedback.
- Filename:
- Title Option: AI-Phooey Chop: Building Context and Verifying Tokens in Prompt Engineering
- Filename:
ai-phooey-chop-context-verification-prompt-engineering.md
- Rationale: Uses the memorable ‘AI-Phooey Chop’ metaphor from the dialogue, appealing to a creative and technical audience interested in prompt engineering and token verification.
- Filename:
- Title Option: From Flibbertisnickerdoodledidoo to Ground Truth: Mastering AI Context with
prompt_foo.py
- Filename:
flibbertisnickerdoodledidoo-ground-truth-ai-context.md
- Rationale: Captures the playful yet precise tone of the article, referencing the unique wordplay while highlighting the journey to accurate AI context with a specific tool.
- Filename:
- Title Option: The Meta-Loop: Self-Correcting AI Prompts with Token Verification and Git Diff
- Filename:
meta-loop-self-correcting-ai-prompts.md
- Rationale: Emphasizes the advanced, self-improving system created, highlighting the ‘meta-loop’ concept and the role of token verification and
git diff
.
- Filename:
Content Potential And Polish
- Core Strengths:
- Introduces and elaborates on the innovative concept of ‘context painting’ and ‘storytelling for AIs’.
- Demonstrates a practical, real-world solution for achieving precise token count accuracy in AI prompts using
tiktoken
. - Presents a unique and highly relevant application of
git diff
for AI accountability and self-correction, dubbed the ‘generative drift’ defense. - Maintains a clear, conversational, and engaging writing style, making complex technical concepts accessible.
- Illustrates a powerful ‘meta-loop’ workflow that integrates AI refinement with auditable code changes.
- Successfully quantifies the ‘cost of the glue’ (tokens from structural formatting) in AI prompts, providing critical insight for prompt engineers.
- Suggestions For Polish:
- Offer a more explicit, concise definition of ‘context painting’ or ‘context artistry’ at the beginning to immediately anchor the reader.
- Consider providing a small, anonymized, visual example (e.g., side-by-side) of how the ‘token count leaks’ or ‘cost of the glue’ manifest in actual prompt output strings, if possible without excessive verbosity.
- Briefly expand on the implications of the ‘token/word ratio density warning’ from a practical AI interaction standpoint, beyond just literary perspective.
- For a broader technical audience, a slightly more detailed explanation of why OpenAI’s
gpt-4
tokens areo200k_base
and its significance could be beneficial, if space allows. - Ensure consistent indentation and code style in the presented Python code blocks, especially around comments and
if
statements.
Next Step Prompts
- Develop a detailed markdown article draft that explores the philosophical and practical implications of the ‘AI Accountability Loop’ using
git diff
, including fictional scenarios where it prevents ‘generative drift’ and how to implement it beyond token counts. - Expand the
prompt_foo.py
script to include functionality that analyzes the ‘density warning’ (token/word ratio) and provides actionable suggestions for prompt optimization based on the target AI model’s typical tokenization patterns for dense prose.