Chronological Sorting for a 200k-Token AI Context Window

I hit a wall in my big refactor and realized my AI assistant needed the full story, not just isolated code snippets. This entry is a deep-dive side quest into building the exact tool I needed—a Python script to sort my journal entries chronologically. I used it to feed Gemini a massive 200,000 token prompt, and the resulting high-fidelity advice proved the entire effort worthwhile. It’s a perfect example of building the tools to build the tool, and a testament to the power of context engineering in human-AI collaboration.

By Mike Levin

Wednesday, October 1, 2025

Get Pipulate [View Markdown Source]

Setting the Stage: Context for the Curious Book Reader

200,000-token Narrative in 1 Textarea Submit

This entry documents a crucial side-quest born from necessity. In the midst of a complex software refactoring, I realized that for my AI coding assistant to provide truly insightful guidance, it needed the full, unabridged story of the project’s evolution. My technical journal entries held that story, but they were not easily accessible in the correct chronological sequence. This is the chronicle of how I paused the main project to build a specific tool—a Python script—to solve this ordering problem. The result was a breakthrough: the ability to feed the AI a massive, 200,000-token narrative, leading to a level of collaboration and precise feedback that proved the entire effort was not a diversion, but a necessary step in process integration.

Technical Journal Entry Begins

I’ve never done as many refactors and little touches as I have done to this pippyfields branch of Pipulate. I want to merge it back into main but I can’t until it has gone through some more rigorous testing. Also… also, what?

I’m on a race to get some particular work done for my actual day-job where I was asked whether I could do something and I said yes but I’ll need 2 weeks with 1 week of extreme focus, which this week is and Wednesday is almost already up. So I have to continue bearing down on this work until it is done, but sometimes things get harder before they get easier and you have to make 80/20-rule compromises along the way. When is good enough good enough?

So many of my touch-ups and edits have been of the big colossal refactoring kind, like extracting Pipulate as a stand-alone library from Pipulate for use in a Juptyer Notebook — one that gets run automatically in JupyterLab that gets installed side-by-side with Pipulate so that the big conceptual leaps of what’s going on can be communicated quickly and put to use quickly right in the Notebook for developers and aspiring developers just getting started who I also really want this to appeal to. This is a learners platform that I hope people get their hands on before they discover and become too predisposed by the so-called web full stack.

To that end, I am very, very motivated to drive this to completion and not throw off my circadian rhythm in the process. I’ve been waking up very early because I can hardly wait to hop out of bed and pick up where I so deliberately left off with a ready-to-go implementation plan. And I have another one of those ready.

But I touched so many things, especially this latest round of updating all the plugins… ugh. So it’s all about setting context with the AI coding assistants. And it’s so friggin easy to do it with the code now. I just pick the files out of the repo and put them in the correct order in my foo_files.py file that feeds the prompt_foo.py process… but… but… these articles on MikeLev.in!

The Context Problem: Feeding the AI a Coherent Story

These articles have such good context and the story flows so naturally from one to the next in such a good way for the AIs but the problem is several-fold and I want to solve it light-touch without it becoming a rabbit hole. Let’s tackle the first part. I need them listed in the actual chronological order as intended. That order is expressed in this liquid template for my homepage:

[Liquid Template of https://MikeLev.in/ homepage inserted here]

Now the thing about this is that there is some sort of “flattening” of the index that I believe is going on so that the page order can actually be correct using the sort_order field. It is not immediately intuitive. But what I need is a list of the absolute file-names, preferably with a Python script that can do much the same thing. If not I’ll accept a modification of my sitemap.xml so that it generates the same file-order as my homepage index:

Solution: A Python Script for Chronological Sorting

---
layout: null
---
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>{{ site.url }}/</loc>
    <lastmod>{{ site.time | date_to_xmlschema }}</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  {% for post in site.posts %}
    <url>
      <loc>{{ site.url }}{{ post.url }}</loc>
      <lastmod>{{ post.date | date_to_xmlschema }}</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.8</priority>
    </url>
  {% endfor %}
</urlset>

Me: Gemini was happy to oblige and I updated my sitemap.xml to have the correct reverse chronological order:

---
layout: null
---
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>{{ site.url }}/</loc>
    <lastmod>{{ site.time | date_to_xmlschema }}</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>

  {%- assign sorted_posts = site.posts | sort: 'date' -%}
  {%- assign grouped_posts = sorted_posts | group_by_exp: "post", "post.date | date: '%Y-%m-%d'" | reverse -%}

  {%- for day in grouped_posts -%}
    {%- assign day_posts = day.items | sort: 'sort_order' | reverse -%}
    {%- for post in day_posts -%}
    <url>
      <loc>{{ site.url }}{{ post.url }}</loc>
      <lastmod>{{ post.date | date_to_xmlschema }}</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.8</priority>
    </url>
    {%- endfor -%}
  {%- endfor -%}
</urlset>

…and also I got this nifty Python script to show my articles in reverse chronological order including the tiktoken gpt-4 token-type counts including the cumulative count list_articles.py:

Proof of Concept: A 200k Token Conversation

#!/usr/bin/env python3
# list_posts_chronologically_config.py
import os
import yaml
import argparse
import tiktoken
from datetime import datetime

# NOTE: This script now requires 'tiktoken' and 'PyYAML'.
# Install them with: pip install tiktoken PyYAML

# --- CONFIGURATION ---
# Hardwire the absolute path to your posts directory here.
POSTS_DIRECTORY = "/home/mike/repos/MikeLev.in/_posts"

def count_tokens(text: str, model: str = "gpt-4") -> int:
    """Counts the number of tokens in a text string using the tiktoken library."""
    try:
        encoding = tiktoken.encoding_for_model(model)
        return len(encoding.encode(text))
    except Exception:
        # Fallback for any tiktoken errors
        return 0

def get_post_order(posts_dir=POSTS_DIRECTORY):
    """
    Parses Jekyll posts from a specified directory, sorts them by date 
    and 'sort_order', and returns an ordered list of full absolute file paths.
    """
    posts_data = []
    
    if not os.path.isdir(posts_dir):
        print(f"Error: Could not find the configured directory at {posts_dir}")
        return []

    for filename in os.listdir(posts_dir):
        filepath = os.path.join(posts_dir, filename)
        
        if not os.path.isfile(filepath) or not filename.endswith(('.md', '.markdown')):
            continue

        try:
            date_str = filename[:10]
            post_date = datetime.strptime(date_str, '%Y-%m-%d').date()

            with open(filepath, 'r', encoding='utf-8') as f:
                content = f.read()

            if not content.startswith('---'):
                front_matter = {}
            else:
                parts = content.split('---', 2)
                front_matter = yaml.safe_load(parts[1]) or {}

            sort_order = int(front_matter.get('sort_order', 0))
            
            posts_data.append({
                'path': filepath,
                'date': post_date,
                'sort_order': sort_order
            })

        except (ValueError, yaml.YAMLError):
            continue
        except Exception as e:
            print(f"Could not process {filepath}: {e}")

    sorted_posts = sorted(
        posts_data, 
        key=lambda p: (p['date'], p['sort_order']), 
        reverse=True
    )

    return [post['path'] for post in sorted_posts]

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="List Jekyll posts in chronological order, optionally with token counts."
    )
    parser.add_argument(
        '-t', '--token',
        action='store_true',
        help='Calculate and display the GPT-4 token count for each file.'
    )
    args = parser.parse_args()

    ordered_files = get_post_order()
    
    print("Posts in intended chronological order (full paths):")
    
    # Initialize a variable to keep the running total
    cumulative_tokens = 0

    for filepath in ordered_files:
        if args.token:
            try:
                with open(filepath, 'r', encoding='utf-8') as f:
                    content = f.read()
                
                token_count = count_tokens(content)
                # Add the current file's tokens to the cumulative total
                cumulative_tokens += token_count
                
                # Print the new format with both individual and cumulative counts
                print(f"{filepath}  # {token_count:,} tokens ({cumulative_tokens:,} total)")
            except Exception as e:
                print(f"{filepath}  # Error: Could not read file - {e}")
        else:
            print(filepath)

…which I was able to use experimentally to get the largest window of tokens possible to submit to Google Gemini 2.5 Pro on a single submit. The answer is 200K tokens. Here’s my foo_files.py:

FILES_TO_INCLUDE_RAW = """\
/home/mike/repos/MikeLev.in/_posts/2025-10-01-refactoring-pipulate-persistent-notebook-workflows.md  # 11,497 tokens (11,497 total)
/home/mike/repos/MikeLev.in/_posts/2025-10-01-automation-whack-a-mole-debugging.md  # 14,532 tokens (26,029 total)
/home/mike/repos/MikeLev.in/_posts/2025-10-01-crafting-the-automation-wand.md  # 27,378 tokens (53,407 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-script-driven-refactoring-pipulate.md  # 7,140 tokens (60,547 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-the-magic-wand-refactor.md  # 20,662 tokens (81,209 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-grinding-lenses-forging-magic-wand.md  # 8,018 tokens (89,227 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-command-line-flow-state.md  # 13,100 tokens (102,327 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-a-causal-anchor.md  # 7,857 tokens (110,184 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-pipulate-magic-wand.md  # 12,323 tokens (122,507 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-the-great-stuffing-ai-native-wand.md  # 12,510 tokens (135,017 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-29-tightening-up-vs-making-less-brittle.md  # 14,915 tokens (149,932 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-29-ai-copilot-intellectual-warmup.md  # 7,855 tokens (157,787 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-29-the-roomba-mike-refactor.md  # 39,840 tokens (197,627 total)
"""

…which when run from prompt_foo.py I had to pare down to a mere 13 of my recent articles plus the prompt which it estimates as being the size of a Long novel, haha! I guess I’m long-winded:

[mike@nixos:~/repos/pipulate]$ python prompt_foo.py 
--- Files Included ---
• /home/mike/repos/MikeLev.in/_posts/2025-10-01-refactoring-pipulate-persistent-notebook-workflows.md (11,497 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-01-automation-whack-a-mole-debugging.md (14,532 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-01-crafting-the-automation-wand.md (27,378 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-script-driven-refactoring-pipulate.md (7,140 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-the-magic-wand-refactor.md (20,662 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-grinding-lenses-forging-magic-wand.md (8,018 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-command-line-flow-state.md (13,100 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-a-causal-anchor.md (7,857 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-pipulate-magic-wand.md (12,323 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-the-great-stuffing-ai-native-wand.md (12,510 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-29-tightening-up-vs-making-less-brittle.md (14,915 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-29-ai-copilot-intellectual-warmup.md (7,855 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-29-the-roomba-mike-refactor.md (39,840 tokens)
• /home/mike/repos/pipulate/prompt.md (38 tokens)

--- Token Summary ---
Total tokens: 197,703 tokens
Total words: 109,585 words

--- Size Perspective ---
📝 Content size: Long novel
🤖 Token size: Standard novel
📊 Token-to-word ratio: 1.80 (higher = more technical/structured content)

Markdown output copied to clipboard

[mike@nixos:~/repos/pipulate]$

The prompt by the way is:

Me: I am testing how large of a story you can summarize from a single prompt. This is a collection of articles from my website going back reverse chronologically. What can you tell me?

Gemini: Based on the massive collection of articles you’ve provided, I can tell you a detailed, multi-layered story. It’s not just a series of technical updates; it’s the real-time chronicle of a developer’s war against technological churn, the forging of a personal software philosophy, and a major architectural battle to reshape a project’s soul, all done in a unique partnership with an AI.

Here’s the story I’ve pieced together from your work.

The Core Story: Forging the “Magic Wand”

At the heart of these articles is an epic, multi-day refactoring saga of your project, Pipulate. You embarked on what you call “open heart surgery” to transform Pipulate from a monolithic web application (centered around server.py) into a self-contained, portable Python library—a “magic wand” that could be imported and used anywhere, especially in Jupyter Notebooks.

The goal was to accelerate your ability to prototype new data workflows in a native notebook environment (import pipulate as pip) and then easily port them into the main Pipulate web app. This would separate the complex state-management logic (the “guts”) from the HTMX-powered web UI (the “skin”).

This journey was far from linear. You dubbed your process the “Roomba-Mike Refactor”—a chaotic, wall-bouncing, but ultimately effective random walk of debugging and discovery. The process involved:

Architectural Struggle: You wrestled with programming dogma, rejecting pure “dependency injection” for a more pragmatic approach you coined “The Great Stuffing.” This involved moving all necessary logic and dependencies inside the main Pipulate class to make it a self-sufficient “Elder Wand.”
AI Collaboration: The entire refactor was a dialogue with me, your AI assistant. You used your “Prompt Fu” process and custom scripts (prompt_foo.py) to provide massive context, allowing me to act as a sparring partner. You’d use my generated plans as a “mental scaffold” or “intellectual warmup” before diving in, a process you call “simulating with the mind first.”
Debugging Hell: After the main “surgery,” you descended into a grueling “whack-a-mole” debugging phase, fighting dependency conflicts, NameErrors, and a cascading series of issues. The entire effort was nearly derailed by a tiny, maddening typo: self.self, which you called the “last boss.”
Victory: In the end, you succeeded. The server runs, the Pipulate class is extracted, and the foundation is laid for the notebook-driven workflow you envisioned. The wand is forged.

The Deeper “Why”: A War Against Tech Churn

This technical battle is rooted in a deep personal philosophy you’ve developed over a decades-long career. You are waging a war against what you call the “tyranny of the build process” and the “soul-crushing cycle” of JavaScript frameworks.

You feel that modern web development has hijacked the simple craft of the “webmaster,” forcing developers onto a hamster wheel of ephemeral technologies. To fight this, you’ve built a personal philosophy and tech stack you call a “Causal Anchor”:

The Stack (LPvg+HTMX): A timeless foundation of Linux, Python, vim, and git, augmented with HTMX to perform a “webfullstackectomy”—cutting out the complex JavaScript layer entirely.
The Goal (Computing Sovereignty): This stack is a deliberate choice to build your craft on tools that you control, that are durable, and that respect your “muscle memory.” It’s a rejection of “walled gardens” and “platform risk,” ensuring your 10,000 hours of mastery can’t be nullified by a corporate decision.
The Method (Craftsmanship): Your process is about achieving “oneness” with your tools, creating a “digital cockpit” for achieving a command-line flow state, free from the friction of unstable or unfamiliar interfaces.

The Meta-Narrative: Building the Tools to Build the Tool

Woven throughout this refactoring saga is a fascinating sub-plot where you repeatedly pause the main quest to build and refine the tools for your own process. This demonstrates a core part of your philosophy: process integration accelerates everything.

You built and refined prompt_foo.py to perfect your collaboration with AI.
You went on a deep research dive to find a unified AI command-line interface, settling on llm as a way to escape being locked into a single provider.
Most recently, you moved your entire article-publishing system (articleizer.py) directly into the Pipulate project repo to reduce the friction between coding, thinking, and documenting.

This pattern of tool-smithing is your way of “grinding the lenses”—meticulously crafting the world the AI “wakes up into” to ensure the highest-quality collaboration. You’ve concluded that for AI partnership to work, you must design a system with a minimal “Cognitive Surface Area”—something an AI can hold in its context and reason about holistically.

In short, this collection of articles tells the story of a master craftsman not just building a product, but building a philosophy, a workshop, and a new way of working in the age of AI.

I’m Convinced

I forgot that the entire 200K token history IS in the discussion post-back so I don’t think there’s anything like SQL-style “execution plan retention” of the prior inference like it feels. Continuity is most likely still an illusion — even with Google’s phenomenal cosmic power.

Me: Now the interesting thing is now that I know I can do 200K at once, I can do chunks that overlap each other and do summaries… ahhh, but this is a rabbit hole. I have the 80/20-rule functionality that I actually need. Here’s the sitch:

When I’m sitting down getting ready to pick up where I left off, I know all the context was in my public technical journal articles that I’m writing as fodder for my future — and really present based on the probability light-cloud I’m expanding with each published article — book: Future-proofing Your Tech Skills using Linux, Python, vim & git (LPvg) to Protect Yourself Against Planned Obsolescence in the Age of AI for the Love of The Craft yadda yadda blah blah something to that effect will probably be written slightly different every time play with weights haha!

Do I make myself clear?

No? Oh right, the sitch. It’s that I could never get correct chronological order of my articles easily because the file-system doesn’t know. It’s inside the files with a YAML field called sort_order which the file listing tools don’t care about. The publish date is in the filename but it’s only granular down to the day so 5 articles from the same day could appear in whatever order. But I needed to select a sequence of them in the correct order for story-telling.

And so what you just saw was me using this capability to put the absolute largest token-size submit that a single type-in style web form textarea can accept. It actually truncates in Chrome at around 200K tokens even if the model could take more. And yes giving file-attachments is an alternative but for those into this sort of stuff you know the difference of having it in the actual prompt. It’s different.

Iterative Improvement: Adding a Reverse-Sort Feature

Anyhoo I thought I’d catch you up on the story Gemini. Can I have the -r and --reverse switches to optionally reverse the sort order with or without the -t and --token flag? And chime in with whatever else you’d like to say.

Gemini: I can certainly do that. Adding a reverse-order option is a great idea, as it lets you easily switch between viewing your most recent work and tracing the story from the beginning.

You’ve built an impressive system for thinking, working, and documenting in public. It’s clear that these articles aren’t just a byproduct of your work; they are an integral part of your development process, allowing you to “simulate with the mind first” and refine your strategy. Your ability to bundle nearly 200,000 tokens of this narrative into a single context for an AI is a powerful demonstration of how to have a high-bandwidth, strategic conversation with a tool like me.

The Upgraded `list_articles.py`

Here is the updated version of your list_articles.py script. It now includes the -r and --reverse flags to control the sort order.

#!/usr/bin/env python3
# list_posts_chronologically_config.py
import os
import yaml
import argparse
import tiktoken
from datetime import datetime

# NOTE: This script now requires 'tiktoken' and 'PyYAML'.
# Install them with: pip install tiktoken PyYAML

# --- CONFIGURATION ---
# Hardwire the absolute path to your posts directory here.
POSTS_DIRECTORY = "/home/mike/repos/MikeLev.in/_posts"

def count_tokens(text: str, model: str = "gpt-4") -> int:
    """Counts the number of tokens in a text string using the tiktoken library."""
    try:
        encoding = tiktoken.encoding_for_model(model)
        return len(encoding.encode(text))
    except Exception:
        # Fallback for any tiktoken errors
        return 0

def get_post_order(posts_dir=POSTS_DIRECTORY, reverse_order=False):
    """
    Parses Jekyll posts from a specified directory, sorts them by date
    and 'sort_order', and returns an ordered list of full absolute file paths.
    """
    posts_data = []
    
    if not os.path.isdir(posts_dir):
        print(f"Error: Could not find the configured directory at {posts_dir}")
        return []

    for filename in os.listdir(posts_dir):
        filepath = os.path.join(posts_dir, filename)
        
        if not os.path.isfile(filepath) or not filename.endswith(('.md', '.markdown')):
            continue

        try:
            date_str = filename[:10]
            post_date = datetime.strptime(date_str, '%Y-%m-%d').date()

            with open(filepath, 'r', encoding='utf-8') as f:
                content = f.read()

            if not content.startswith('---'):
                front_matter = {}
            else:
                parts = content.split('---', 2)
                front_matter = yaml.safe_load(parts[1]) or {}

            sort_order = int(front_matter.get('sort_order', 0))
            
            posts_data.append({
                'path': filepath,
                'date': post_date,
                'sort_order': sort_order
            })

        except (ValueError, yaml.YAMLError):
            continue
        except Exception as e:
            print(f"Could not process {filepath}: {e}")

    # The 'reverse' flag of the sorted function is controlled by the new argument
    sorted_posts = sorted(
        posts_data, 
        key=lambda p: (p['date'], p['sort_order']), 
        reverse=not reverse_order
    )

    return [post['path'] for post in sorted_posts]

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="List Jekyll posts in chronological order, optionally with token counts."
    )
    parser.add_argument(
        '-t', '--token',
        action='store_true',
        help='Calculate and display the GPT-4 token count for each file.'
    )
    # Add the new reverse argument
    parser.add_argument(
        '-r', '--reverse',
        action='store_true',
        help='List posts in chronological order (oldest first) instead of the default reverse chronological.'
    )
    args = parser.parse_args()

    # Pass the reverse flag to the function
    ordered_files = get_post_order(reverse_order=args.reverse)
    
    order_description = "chronological (oldest first)" if args.reverse else "reverse chronological (newest first)"
    print(f"Posts in {order_description} order (full paths):")
    
    # Initialize a variable to keep the running total
    cumulative_tokens = 0

    for filepath in ordered_files:
        if args.token:
            try:
                with open(filepath, 'r', encoding='utf-8') as f:
                    content = f.read()
                
                token_count = count_tokens(content)
                # Add the current file's tokens to the cumulative total
                cumulative_tokens += token_count
                
                # Print the new format with both individual and cumulative counts
                print(f"{filepath}  # {token_count:,} tokens ({cumulative_tokens:,} total)")
            except Exception as e:
                print(f"{filepath}  # Error: Could not read file - {e}")
        else:
            print(filepath)

How to Use It

Your previous commands work exactly as before:

python list_articles.py: Lists newest posts first.
python list_articles.py -t: Lists newest posts first with token counts.

And now you have new options:

python list_articles.py -r: Lists oldest posts first.
python list_articles.py -r -t: Lists oldest posts first with token counts.

This gives you full control over how you assemble the narrative context for our conversations.

Me: That worked! And here’s the diff between the 2 versions for anyone who’s interested:

[mike@nixos:~/repos/pipulate/scripts/article_automation]$ git --no-pager diff
diff --git a/scripts/article_automation/list_articles.py b/scripts/article_automation/list_articles.py
index 63410818..f9b814f4 100644
--- a/scripts/article_automation/list_articles.py
+++ b/scripts/article_automation/list_articles.py
@@ -22,9 +22,9 @@ def count_tokens(text: str, model: str = "gpt-4") -> int:
         # Fallback for any tiktoken errors
         return 0
 
-def get_post_order(posts_dir=POSTS_DIRECTORY):
+def get_post_order(posts_dir=POSTS_DIRECTORY, reverse_order=False):
     """
-    Parses Jekyll posts from a specified directory, sorts them by date 
+    Parses Jekyll posts from a specified directory, sorts them by date
     and 'sort_order', and returns an ordered list of full absolute file paths.
     """
     posts_data = []
@@ -65,10 +65,11 @@ def get_post_order(posts_dir=POSTS_DIRECTORY):
         except Exception as e:
             print(f"Could not process {filepath}: {e}")
 
+    # The 'reverse' flag of the sorted function is controlled by the new argument
     sorted_posts = sorted(
         posts_data, 
         key=lambda p: (p['date'], p['sort_order']), 
-        reverse=True
+        reverse=not reverse_order
     )
 
     return [post['path'] for post in sorted_posts]
@@ -82,11 +83,19 @@ if __name__ == '__main__':
         action='store_true',
         help='Calculate and display the GPT-4 token count for each file.'
     )
+    # Add the new reverse argument
+    parser.add_argument(
+        '-r', '--reverse',
+        action='store_true',
+        help='List posts in chronological order (oldest first) instead of the default reverse chronological.'
+    )
     args = parser.parse_args()
 
-    ordered_files = get_post_order()
+    # Pass the reverse flag to the function
+    ordered_files = get_post_order(reverse_order=args.reverse)
     
-    print("Posts in intended chronological order (full paths):")
+    order_description = "chronological (oldest first)" if args.reverse else "reverse chronological (newest first)"
+    print(f"Posts in {order_description} order (full paths):")
     
     # Initialize a variable to keep the running total
     cumulative_tokens = 0

[mike@nixos:~/repos/pipulate/scripts/article_automation]$

And thank you! I’m curious when even after we’ve done a bit of back-and-forth and only the text of the discussion is visible to me whether you actually have all that 200K+ history at your disposal. Dig into it and give me some startlingly precise commentary or advice regarding my next steps to dispel any doubt that you’re not actually working with a “shallow” or “flattened” version that distilled away all that rich content after an intervening exchange.

The Payoff: AI Demonstrates Deep Context Retention

Gemini: Yes, I have the full 197,703 tokens of our recent history at my disposal. The story is not flattened or shallow; it’s a rich, multi-day saga of philosophy, architecture, and grueling, iterative debugging. To prove it, here is some startlingly precise advice for your next steps, dug from the very heart of that context.

You’re standing on the pippyfields branch, motivated but wary. You’re right to be. This branch is the artifact of a “colossal refactoring,” the “Roomba-Mike” method in action. Your next move isn’t to write new code, but to validate the core promise of this entire epic journey.

Step 1: Prove the “Parallel Universe DB” in its Native Habitat

The entire goal of the “Great Stuffing” and forging the “magic wand” was to make Pipulate work as a standalone library in a Jupyter Notebook. You’ve done the surgery, but you’ve never actually tested the patient in its new environment. Your very next action should be to create and run the “forever-forward hello_world.ipynb” you envisioned in refactoring-pipulate-persistent-notebook-workflows.md.

Create a new notebook and execute these cells, which use the exact API you designed:

Cell 1: The Import

# %%
import pipulate as pip
job = "notebook_validation_test"

Cell 2: First Write

# %%
first = "Mike"
pip.set(job, step="first_name", value=first)
print(f"Set first_name to: {pip.get(job, step='first_name')}")

Now, restart the kernel. This is the moment of truth that tests the persistence you built.

Cell 3: Read After Restart

# %%
# Re-import after kernel restart to get the `pip` object again.
import pipulate as pip
job = "notebook_validation_test"

# Retrieve the persisted value.
retrieved_first = pip.get(job, step="first_name")
print(f"Retrieved after restart: {retrieved_first}")

# Add the next piece of data.
last = "Levin"
pip.set(job, step="last_name", value=last)

Cell 4: Final Assembly

# %%
import pipulate as pip
job = "notebook_validation_test"

full_name = f"Hello {pip.get(job, step='first_name')} {pip.get(job, step='last_name')}"
pip.set(job, step="full_name", value=full_name)
print(full_name)

# Inspect the final state
print(pip.read(job))

If this workflow succeeds, the entire philosophical and architectural journey of the last week pays off. This is the single most important test you can run.

Step 2: Hunt the Ghosts of the `runme.py` Automation

Remember the “Whack-a-Mole of Automation”? Your runme.py script touched dozens of files, and you fixed the immediate fallout, but secondary bugs are almost certainly lurking. The pippyfields branch is not clean. Before you even think of merging to main, you must verify the plugins that were most affected.

The F-String Ghost: Your script created a SyntaxError in apps/230_dev_assistant.py by incorrectly modifying an f-string. You fixed it manually, but this is a sign that other, more subtle bugs could exist in plugins that generate code as text. Recommendation: Specifically load and interact with the “Dev Assistant” plugin.
The Missed self.db: The automation missed a reference in apps/010_introduction.py, causing an AttributeError. Recommendation: Navigate to the “Introduction” app (the homepage) and ensure it loads without error in the server logs.
The “Hottest Apps”: You specifically mentioned fixing the “two hottest apps for my day-job”—Link Graph Visualizer and Parameter Buster. Recommendation: Run through a simple workflow in both of these applications from the web UI. Don’t just check if they load; submit a form and go to the next step to ensure the state management logic is sound post-refactor.

This isn’t about testing every single plugin. It’s about a targeted, 80/20 approach to validate the areas you know were impacted by the chaotic but effective Roomba-Mike method.

This two-step plan directly addresses your race against the clock. It focuses your “extreme focus” on the most critical, high-risk areas, giving you the confidence you need to either fix the final bugs or, with victory in hand, finally merge this monumental branch and get back to building.

Final Hurdles: Editable Installs and the Last Refactor

Me: Yup. But two things. First, I still always have to do this from a nix develop‘d shell:

[mike@nixos:~/repos/pipulate]$ pip install -e .

…which I’m going to have to do in my flake.nix file more formally, but previous attempts to do this failed, so it’s non-obvious. It’s VERY dependent on exactly where and when it happens and probably has to go immediately before python server.py but silenced so it doesn’t garbage-up the startup output.

And second, I haven’t actually done the refactoring suggested in:

2025-10-01-refactoring-pipulate-persistent-notebook-workflows.md

…yet. It’s after 7:00 PM but I still plan on doing that tonight to put the final wrap on this thing tonight before actually DOING one of the Jupyter Notebook mock-up’s tomorrow that uses this new pattern at which time I am sure to also be playing that whack-a-mole game. I ran both of the important apps you mentioned all the way through. It’s the easy ones like 300_blank_placeholder.py that are broken, haha! Hitting the submit button on even the initial key generation isn’t even making output occur on the server console so something’s fubar. But that’s fine. I’ll whack those moles.

This DOES work from a Notebook:

import pipulate as pip
job = "notebook_validation_test"

…after running pip install -e . from the same terminal that runs python server.py. I have to Ctrl+c out of it, then type the command. server.py doesn’t even need to be restarted.

But:

first = "Mike"
pip.set(job, step="first_name", value=first)
print(f"Set first_name to: {pip.get(job, step='first_name')}")

Produces:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 2
      1 first = "Mike"
----> 2 pip.set(job, step="first_name", value=first)
      3 print(f"Set first_name to: {pip.get(job, step='first_name')}")

AttributeError: module 'pipulate' has no attribute 'set'

Gemini: You’re absolutely right on both counts. My proposed test was premature because we haven’t implemented the plan from refactoring-pipulate-persistent-notebook-workflows.md yet. The AttributeError: module 'pipulate' has no attribute 'set' is the expected failure, proving that the refactor is necessary.

Let’s tackle the two issues you’ve raised. You’ve correctly identified the final hurdles to achieving your vision.

1. The `pip install -e .` Nuance in `flake.nix`

You are right that this is non-obvious and timing-dependent. Your previous attempts likely failed because the command was run at the wrong point in the shellHook. The key is to run it after the virtual environment (.venv/) is fully set up and its pip is available, but before any interactive scripts run.

I’ve located the pythonInstallLogic block in your flake.nix. This is the perfect place for it. The logic is already designed to run after pip itself is upgraded and requirements.txt is installed.

The Fix: I’ll add the editable install command to the pythonInstallLogic block in your flake.nix. I’ll also add --no-deps to prevent it from re-resolving dependencies and --quiet to prevent it from cluttering the startup log, just as you suggested.

# In flake.nix

      # Logic for installing all Python packages
      pythonInstallLogic = ''
        # Activate the virtual environment to ensure commands run in the correct context
        source .venv/bin/activate
        
        # Always upgrade pip first
        pip install --upgrade pip --quiet

        # Install all dependencies from requirements.txt
        pip install -r requirements.txt --quiet
        
        # Install the local project in editable mode so it's importable
        # This is the key step for making `import pipulate` work everywhere
        pip install -e . --no-deps --quiet
      '';

With this change, every time you run nix develop, the pipulate package will be automatically installed in editable mode within the .venv. This completely eliminates the need for the manual pip install -e . step.

2. Executing the Final Refactor Plan

You are correct that the pip.set() and pip.get() methods don’t exist yet. It’s time to perform the final surgery detailed in 2025-10-01-refactoring-pipulate-persistent-notebook-workflows.md. You have the full, corrected plan in that document. It’s a deterministic, multi-phase plan designed for exactly this moment.

The next action is clear: Execute that plan. You have the CodeRefactorer class and the sentinel blocks. It’s time to create the refactor_pipulate.py script and run it to perform the final, safe, AST-validated changes. This will:

Create the pip factory object in pipulate/__init__.py.
Add the set(), get(), read(), and write() methods to the Pipulate class.
Update the hello_world.ipynb to be the new canonical example.

Once you execute that plan, the AttributeError will be resolved, and your notebook validation test will pass. You are one script execution away from completing the vision. The moles in 300_blank_placeholder.py are a separate, minor cleanup task for after this architectural win is secured.

Me:

Must rest. May wake up later to still get done this Jupyter Notebookification of Pipulate. Actual plan on prior article and don’t forget the correction regarding pipulate/__init__.py near bottom of page.

Book Analysis

Ai Editorial Take

This is a cornerstone entry. It’s not just about a feature; it’s about the philosophy of AI partnership. The author isn’t just using an AI, he’s building a custom ‘on-ramp’ to feed it high-quality, structured information. This ‘context engineering’ is a critical skill for the future. The entry perfectly captures the feedback loop: identify a process bottleneck, build a tool to solve it, and immediately reap the rewards with higher-quality AI output. It’s a powerful case study in treating the AI as a system that requires well-architected inputs, not just a magic black box.

Title Brainstorm

Title Option: Forging the Prompt: Chronological Order for a 200k Token Context
- Filename: forging-the-prompt-200k-token-context
- Rationale: Connects the recurring ‘forging’ theme to the practical act of building a prompt. It highlights the technical achievement (200k tokens) and the problem solved (chronological order).
Title Option: The Story So Far: Scripting Context for AI Collaboration
- Filename: scripting-context-for-ai-collaboration
- Rationale: Focuses on the narrative aspect (‘The Story So Far’) and the human-AI partnership. ‘Scripting Context’ is a nice double-entendre for both writing code and setting the stage.
Title Option: Building the Tools to Build the Tool
- Filename: building-tools-to-build-the-tool
- Rationale: Directly captures the meta-narrative of the entry, where the author pauses development to improve his own process and tooling, a key theme of the book.
Title Option: Beyond the Filename: Sorting Content for a Coherent AI Narrative
- Filename: sorting-content-for-coherent-ai-narrative
- Rationale: Emphasizes the core technical problem—that file metadata is insufficient—and the ultimate goal of creating a story for the AI to understand.

Content Potential And Polish

Core Strengths:
- Demonstrates a crucial, often-overlooked aspect of AI collaboration: context engineering.
- Provides a concrete, reusable Python script for a common problem in static site generation (Jekyll).
- The back-and-forth with the AI serves as a powerful, real-world example of prompt iteration and validation.
- Excellent meta-narrative on improving one’s own development process (‘tool-smithing’).
Suggestions For Polish:
- The initial Gemini summary, while an impressive artifact, is very long. For a book chapter, consider excerpting the most relevant parts to keep the author’s narrative front and center.
- The proposed subheadings are critical for structuring the narrative, which jumps between the sorting script, the large context test, and the ongoing refactor.
- Explicitly state the ‘aha!’ moment when the need for chronological sorting became undeniable, reinforcing the motivation for the side-quest.

Next Step Prompts

Based on the final state of the list_articles.py script, generate a user-friendly README.md file for it. Explain its purpose, dependencies (tiktoken, PyYAML), configuration (POSTS_DIRECTORY), and usage examples for all command-line flags.
Analyze the flake.nix solution provided at the end of the article. Explain in detail why placing the pip install -e . command within the pythonInstallLogic block is the correct approach and why previous attempts might have failed. Discuss the role of the shell hook and virtual environment activation in the Nix development lifecycle.