Enhancing AI Prompts with XML Tags: Testing Anthropic's Method and o4-mini-high

In this entry, I’m documenting my process of adopting Anthropic’s suggested XML tag format for structuring the prompts generated by my prompt_foo.py utility. I updated the script to wrap context, instructions, and other elements in descriptive XML tags, hoping to improve how AI models parse the complex information I feed them. I then ran a couple of tests using the new o4-mini-high model, generating large context prompts for my Pipulate project to get an architectural analysis and an MCP integration proposal, essentially kicking the tires on both the XML structure and the new AI model.

By Mike Levin

Friday, April 18, 2025

Get Pipulate [View Markdown Source]

Communicating with Large Language Models

This article delves into the practical aspects of communicating effectively with Large Language Models (LLMs), the powerful AI systems behind tools like ChatGPT and Claude. The author explores a specific technique for improving these interactions: structuring the instructions, or “prompts,” given to the AI using XML-like tags. This method, suggested by AI research company Anthropic, aims to help the AI better distinguish between different parts of a request, such as background information, specific instructions, and examples.

Applying XML Tagging to Python Tools

The author applies this XML tagging approach to a custom Python tool they developed, called prompt_foo.py. This tool is designed to gather and format large amounts of information (like source code from a software project or text from blog archives) to be fed into an LLM for analysis.

Enhancing AI Understanding with XML Tags

By incorporating XML tags, the author hopes to enhance the AI’s understanding of this complex context, leading to more accurate and insightful responses, particularly when analyzing their web application project, “Pipulate.” The article documents the process of modifying the tool and initial tests using a specific AI model (o4-mini-high).

XML vs. Plain Markdown For Large Prompts

Okay, my next step is making the super-prompts that I use for 1-shot evaluation between the different models use the XML wrapper format for prompts that Anthropic suggests. Let’s start with clarifying what that is exactly:

Me: What are the Anthropic-proposed XML prompt tags?

Gemini 2.5 Exp: Anthropic recommends using XML-like tags to structure prompts for their Claude AI models. This practice helps the model better understand the different components of a prompt, leading to more accurate and relevant responses.

While Anthropic doesn’t enforce a strict, fixed set of mandatory tags, they provide examples and guidance suggesting descriptive tags to delineate sections like:

Instructions/Task Definition: Tags like <instructions>, <task>, or <prompt> can enclose the core directions for the AI.
Context/Documents: Tags such as <context>, <document>, <text>, or <information> are used to provide background information, source material, or data the AI should reference.
Examples (Few-Shot Learning): Tags like <example> can encapsulate input/output pairs to guide the model’s response format or style. Often nested tags like <input> and <output> or <user> and <assistant> are used within <example>.
User Input: A tag like <query>, <input>, or <question> might specifically hold the user’s question or the data to be processed based on the instructions.
Role-Playing: Tags like <user> and <assistant> are commonly used to structure dialogues or indicate turns in a conversation.
Chain-of-Thought/Reasoning: A tag like <thinking> can be used to prompt the model to show its reasoning steps before providing the final answer.

Key Principles:

Clarity: The main goal is to clearly separate different parts of the prompt (e.g., instructions vs. context vs. examples vs. specific input).
Descriptiveness: Use tag names that logically describe the content they enclose.
Consistency: Maintain consistency in tag usage within a single prompt or conversation.
Nesting: Tags can be nested to create hierarchical structures (e.g., multiple <document> tags within a main <context> tag).

Using these XML-like tags helps Claude parse complex prompts more effectively, understand the role of each piece of information, and ultimately generate better outputs. Users can often define their own meaningful tags as long as they are used consistently to structure the prompt logically.

Implementing XML-wrapped Prompt Generation

Navigating Cursor’s Model Selection

Alright, so now we go over to Cursor. Quite frankly, I don’t know which model I’m using while in chat mode at any given moment anymore. It’s a bit obfuscated now when you keep model selection on Auto-select so Cursor helps you select the best premium model, based on performance and speed. It appears to reduce the actual slowness of “slow mode” which happens more and more as you get into the month and it tries cajoling you into a higher paid tier.

I’m not 100% sure, but I think keeping it on Claude 3.7 Sonnet was causing the super-slow slow mode, so I’m experimenting with Auto. My gut tells me that there’s a “going to the well one time too often” effect happening. Cursor appears to be trying to serve you fast despite being in “slow mode,” but if you try keeping it “locked onto” your favorite model, it undermines whatever model load-balancing or round-robinning they might be doing.

Streamlining the Article Content

Additionally, most of the coding is actually done by the time I do the following prompt. That is to spare you an excessively long article. I know some people might like to see the play-by-play evolution of the code, but when there’s a lot of back-and-forth iteration, it would be sizable and undermine the article. So at the end of such a coding session, I ask the AI (which one, I don’t know) to summarize what we just did.

Prompt: Summarize in article format what we just did. Include context regarding the purpose of the @prompt_foo.py program, and what we just did to theoretically improve it and why this might work.

Response:

Enhancing Context Generation with XML Structure: A Case Study in Prompt Engineering

Introduction

In the realm of large language model (LLM) interactions, the quality and structure of prompts significantly impact the model’s ability to understand and respond effectively. This article explores an enhancement to prompt_foo.py, a Python utility designed to generate comprehensive context files for LLM interactions, particularly when working with large codebases or blog archives.

The Original Context Generator

prompt_foo.py serves a crucial role in the Pipulate project, a Python web application that combines Nix for system dependencies with virtualenv for Python packages. The script’s primary purpose is to:

Aggregate multiple files into a single context
Manage token limits for different LLM models
Generate structured prompts for codebase analysis
Handle blog post processing and chunking
Provide clipboard functionality for easy prompt injection

The original implementation used a combination of markdown-style headers and plain text formatting to structure its output. While functional, this approach lacked the semantic clarity that modern LLMs can leverage for better understanding.

The XML Enhancement

Recognizing the potential for improved model comprehension, we implemented XML structure throughout the codebase. The enhancement focused on three main areas:

1. XML Support Functions

def wrap_in_xml(content: str, tag_name: str, attributes: Optional[Dict[str, str]] = None) -> str:
    """Wrap content in XML tags with optional attributes."""
    attrs = " ".join(f'{k}="{v}"' for k, v in (attributes or {}).items())
    return f"<{tag_name}{' ' + attrs if attrs else ''}>\n{content}\n</{tag_name}>"

These helper functions provide a clean, type-safe way to create structured XML content, making the code more maintainable and the output more consistent.

2. Prompt Template Restructuring

The prompt templates were transformed from plain text to structured XML:

<template name="General Codebase Analysis">
  <context>
    <system_info>
      This codebase uses a hybrid approach with Nix for system dependencies...
    </system_info>
    <key_points>
      <point>Always run `nix develop` before any commands</point>
      <point>FastHTML objects must be converted with to_xml()</point>
    </key_points>
  </context>
  <task>
    <analysis_areas>
      <area>Technical Architecture Analysis</area>
      <area>Pattern Recognition & Insights</area>
    </analysis_areas>
  </task>
</template>

3. Manifest and Chunk Processing

The AIAssistantManifest class and chunk processing were enhanced to use XML structure, providing:

Clear semantic boundaries between different types of content
Structured metadata for files and chunks
Improved token counting and management
Better organization of nested content

Theoretical Benefits

The XML enhancement offers several potential advantages:

Improved Model Understanding
- Clear semantic boundaries help models better parse different sections
- Structured metadata provides explicit context about content types
- Hierarchical organization makes relationships between components explicit
Better Token Management
- More precise token counting for different sections
- Clearer separation of metadata vs. content
- Better organization of different content types
Enhanced Content Organization
- Hierarchical structure for complex prompts
- Clear relationships between different sections
- Better handling of nested examples and context
Future Extensibility
- Easier to add new content types
- Better support for different model requirements
- More flexible prompt structure

Why This Might Work

The XML enhancement aligns with several principles of effective prompt engineering:

Explicit Structure: XML tags provide clear boundaries and relationships between different parts of the prompt, helping the model understand the context better.
Semantic Clarity: The use of descriptive tag names (e.g., <system_info>, <key_points>, <analysis_areas>) gives the model explicit information about the purpose of each section.
Consistent Formatting: The structured approach ensures consistent formatting across different types of content, reducing potential confusion.
Hierarchical Organization: The nested XML structure mirrors how humans naturally organize information, potentially making it easier for models to process and understand the content.

Conclusion

The XML enhancement to prompt_foo.py represents a step forward in prompt engineering, particularly for complex codebase analysis tasks. By providing clearer structure and semantic meaning to the content, we may enable models to better understand and respond to the prompts, leading to more accurate and relevant analysis of codebases and blog archives.

While the effectiveness of this enhancement will need to be validated through practical use, the theoretical foundations suggest it could significantly improve the quality of model responses, particularly for complex analysis tasks involving large codebases or extensive documentation.

Testing Super-prompt With o4-mini-high

The whole purpose of the context foo super-prompt is to feed a whole bunch of files into the big frontier LLM models with lots of context for some purpose. The default purpose is just to perform a code analysis. Here’s a extract of the massive prompt to give you the gist.

=== Prompt Structure ===

--- Manifest ---
  • Critical Pattern: to_xml(ft_object)
  • Critical Pattern: HTMLResponse(str(to_xml(rendered_item)))

--- Pre-Prompt ---
System Information:
  

This codebase uses a hybrid approach with Nix for system dependencies and virtualenv for Python packages.
Key things to know:
- Always run `nix develop` before any commands in a new terminal
- FastHTML objects must be converted with to_xml() before returning responses
- The project is organized as a server with plugin-based workflows

Key Points:
  • Always run `nix develop` before any commands in a new terminal
  • FastHTML objects must be converted with to_xml() before returning responses
  • The project is organized as a server with plugin-based workflows

--- Files Included ---
• README.md
• flake.nix
• server.py
• plugins/500_hello_workflow.py
• training/hello_workflow.md
• plugins/30_tasks.py
• training/tasks.md
• .cursorrules

--- Post-Prompt ---
Introduction:
  

Now that you've reviewed the codebase context, I'd love your insights and analysis!

Dear AI Assistant:
I've provided you with the core architecture of a Python web application that takes an interesting approach to modern web development. I'd appreciate your thoughtful analysis on any of these aspects:

Analysis Areas:
  • Technical Architecture Analysis
  • Pattern Recognition & Insights
  • Forward-Looking Perspective

Focus Areas:
  • The interplay between modern and traditional web development approaches
  • Architectural decisions that stand out as novel or counterintuitive
  • Potential implications for developer experience and system maintenance

--- Token Summary ---
Total tokens: 50,226 tokens (≈$1.51 at GPT-4 rates)
Maximum allowed: 2,087,152 tokens (≈$62.61 at GPT-4 rates)
Remaining: 2,036,926 tokens (≈$61.11 at GPT-4 rates)

=== End Prompt Structure ===

ChatGPT o4-mini-high: Here’s a deep dive into the architecture and design of Pipulate, organized around the areas you highlighted.

Technical Architecture Analysis

Hybrid Nix + virtualenv Approach

Reproducibility vs. Familiarity
Nix flakes guarantee bit‑perfect environments across machines, making “works on my machine” a non‑issue. On top of that, a .venv lets you pip‑install arbitrary Python packages at runtime. Contrast that with Docker (containerizes apps but still relies on vendor images) or pure pip/requirements.txt (brittle across OSes). Pipulate’s hybrid model gives you both reproducible system deps (via Nix) and flexible Python‑level deps—at the cost of a steeper on‑boarding curve (learning Nix) and slightly more complex tooling.
Trade‑off
- Pros: True hermetic builds, cross‑OS compatibility (including WSL), optional CUDA/unfree support.
- Cons: Nix’s learning curve, potential for slow nix develop spins, and the mental overhead of two package managers.

HTMX with Server‑Side State vs. Traditional SPAs

Push‑Pull Simplicity
Instead of bundling a JS app, HTMX issues small HTML fragments over AJAX, with state managed entirely on the server. This removes client‑side complexity (no React/Redux) and reduces JS bundle size dramatically.
Latency & Interactivity
- Pros: Instant developer feedback, simpler debugging (inspect JSON blobs or HTML), no hydration or diff‑ing.
- Cons: More network “chattiness” per interaction, potential higher latency per action, limited offline capabilities, and less rich client‑side UX possibilities (e.g., complex drag‑drop beyond simple sortable lists).
Comparison
SPAs shine for highly interactive dashboards and offline apps. HTMX shines for form‑centric flows and CRUD, where each interaction naturally maps to a server render.

Plugin System Extensibility

Auto‑Discovery via Filename Conventions
Numeric prefixes (500_hello_workflow.py → clean name hello_workflow) let you reorder menu items. Experimental plugins (xx_…) are skipped. Parentheses in filenames skip auto‑registration. This zero‑config convention makes it trivial to add/remove apps.
Uniform Constructor Signature
Workflow classes take (app, pipulate, pipeline, db) and expose methods like landing(), step_xx(), etc. Crud plugins subclass a shared BaseCrud. This shared signature means new plugins “just work” in the menu, chat, and routing.
Trade‑off
Low friction to extend, but less explicit: newcomers must learn the naming conventions and signature requirements to plug in correctly.

Pattern Recognition & Insights

Surprising Patterns

WET Workflows vs. DRY CRUD
Workflows are intentionally “Write Everything Twice” (WET) to maximize per‑step customizability, whereas CRUD apps are “Don’t Repeat Yourself” (DRY) via inheritance. You rarely see both extremes coexisting so clearly in one codebase.
Server‑Side DictLikeDB “Cookies”
Persisting UI state in a SQLite table but treating it like a cookie jar is a clever hack: fully observable, centralized, and no need for client cookies or Redis.
Reactive UI Without JS
Relying on HTMX and pure Python to generate every UI update—no JSX or TS at all—is counterintuitive in a world obsessed with client‑side frameworks.

Divergence from Current Trends

Minimal JavaScript vs. JS Monoliths
Most modern web apps lean into heavy front‑end frameworks (React, Vue, Svelte). Pipulate inverts that, embracing the server as UI driver.
Local‑First, Single‑Tenant
Cloud‑native microservices and multi‑tenant SaaS are the norm. This explicitly local, single‑tenant setup prioritizes privacy and resource control over multi‑user scaling.

Scaling Challenges & Opportunities

Opportunities
- Vertical Scaling: Leverage local hardware (GPU via CUDA, Ollama LLM) for heavy tasks.
- Developer Productivity: Simplified stack can dramatically speed up feature iteration.
Challenges
- Concurrency: Single‑tenant SQLite with server‑side state can become a bottleneck under heavy concurrent use.
- Network Overhead: HTMX’s chatty interactions might strain high‑latency networks or large workflows with many steps.
- Plugin Isolation: All code lives in one process; a misbehaving plugin could crash the server.

Forward‑Looking Perspective

Alignment with Emerging Patterns

Edge‑First & Server Components
React Server Components and “edge” functions share a philosophy of pushing UI logic back to the server—similar spirit to HTMX’s server‑driven UI.
Low‑Code/No‑Code
Workflow templates that non‑developers can use mirror the rising demand for low‑code platforms.

Suggestions for Future Evolution

Modularize State Stores
Swap SQLite/DictLikeDB for pluggable backends (e.g., JSON files, in‑memory stores) to support ephemeral demos or larger deployments.
Batch HTMX Requests
Introduce batching of sequential step loads to reduce round‑trips in deeply pipelined workflows.
Plugin Sandbox
Run untrusted or community plugins in isolated processes or threads to prevent a single plugin from bringing down the entire server.

Adapting as Web Tech Advances

WebAssembly UI Widgets
When WASM‑based UI components mature, selectively embed them into the HTMX‑driven pages for advanced interactions (e.g., data visualizations).
Incremental Adoption of Typed Front‑End
Introduce optional TS‑based components (e.g., for complex graphing) without abandoning the core HTMX approach.
AI‑Driven Workflows
As local LLMs grow more capable, automate even more of the step definitions—perhaps generating step handlers from plain‑language specifications.

Developer Experience & Maintenance Implications

On‑Ramp Slope
New contributors must learn Nix + FastHTML + HTMX + Ollama—a steeper but not impossible climb.
Observability
Every state change is logged and visible via middleware, making debugging workflows and state transitions transparent.
Long‑Term Maintainability
The clear WET vs DRY dichotomy helps maintainers quickly locate code: workflows are explicit, CRUD apps grok off of BaseCrud. However, the reliance on naming conventions means documentation must be kept up‑to‑date to avoid silent failures of plugin registration.

In sum, Pipulate stitches together traditional server‑side rendering principles with modern local‑first AI integration and reproducible environments. It bucks many trends—favoring simplicity, observability, and privacy—while still enabling powerful extensibility. As web development continues to oscillate between heavy client‑side apps and server‑driven UIs, Pipulate stands as a compelling example of the latter, pointing toward a future where lean server logic and local AI live at the core of user‑centric workflows.

o4-mini-high’s MCP Integration Proposal

So this is me switching the prompt that’s used with the context foo script from the default which asks for a code analysis to option 1 that asks for a recommended MCP integration plan.

=== Prompt Structure ===

--- Manifest ---
  • Critical Pattern: to_xml(ft_object)
  • Critical Pattern: HTMLResponse(str(to_xml(rendered_item)))

--- Pre-Prompt ---
System Information:
  

This codebase uses a hybrid approach with Nix for system dependencies and virtualenv for Python packages.
I'm looking to enhance the Pipulate application by integrating the Model Context Protocol (MCP) to empower
the local Ollama-powered LLM to execute actions directly rather than just generating text about them.

Key Points:
  • The app uses local Ollama models via HTTP API calls in the chat_with_llm() function
  • The Pipulate class serves as a central coordinator for plugins and functionality
  • Plugins are discovered and registered dynamically, with two types: CRUD-based and workflow-based
  • FastHTML objects must be converted with to_xml() before returning responses

--- Files Included ---
• README.md
• flake.nix
• server.py
• plugins/500_hello_workflow.py
• training/hello_workflow.md
• plugins/30_tasks.py
• training/tasks.md
• .cursorrules

--- Post-Prompt ---

--- Token Summary ---
Total tokens: 50,310 tokens (≈$1.51 at GPT-4 rates)
Maximum allowed: 2,087,152 tokens (≈$62.61 at GPT-4 rates)
Remaining: 2,036,842 tokens (≈$61.11 at GPT-4 rates)

=== End Prompt Structure ===

o4-mini-high: Below is a proposed plan to fold MCP into Pipulate’s Ollama‑backed chat flow, so that your LLM can actually invoke local actions (and external ones) rather than just talk about them.

1. Architectural Overview

        ┌───────────────────────────────┐
        │      FastHTML + HTMX UI       │
        └──────────────┬────────────────┘
                       │ HTTP/WS
                       ▼
        ┌───────────────────────────────┐
        │   Pipulate HTTP/WebSocket     │
        │      + MCP Client Layer       │◄─┐
        └──────────────┬────────────────┘  │
                       │ invokes tools     │
                       ▼                   │
        ┌───────────────────────────────┐  │
        │  Local MCP-Enabled Plugins    │  │
        │  (CRUD, Workflows, Tools)     │  │
        └───────────────────────────────┘  │
                       │                   │
                       │                   │
                       ▼                   │
        ┌───────────────────────────────┐  │
        │   Local Ollama LLM Server     │──┘
        │    (“chat_with_llm” wraps     │
        │       JSON tool calls)        │
        └───────────────────────────────┘

chat_with_llm → streams both text and special JSON “tool call” messages following the MCP spec.
MCP Client Layer intercepts those tool calls, dispatches to registered Python functions, captures their return, and streams the result back into the conversation.
Plugins (CRUD or Workflows) register their actions (e.g. tasks_create_record, tasks_toggle_item, hello_run_step) as MCP‐callable tools.
External MCP Servers can be added to the registry (e.g. a stock‐quote service); calls get forwarded over HTTP with proper auth.
Security: user consents, whitelists, rate limits, and error sandboxing around each tool invocation.

2. Key Code Modifications

2.1 Extend `chat_with_llm` to MCP‐aware

 async def chat_with_llm(MODEL: str, messages: list, base_app=None):
     url = "http://localhost:11434/api/chat"
     payload = {"MODEL": MODEL, "messages": messages, "stream": True}
-    async for line in response.content:
+    async for line in response.content:
         # parse as JSON; if it contains a “tool_call” field, intercept
-        chunk = json.loads(line)
+        chunk = json.loads(line)
+        if "tool_call" in chunk:
+            # MCP tool invocation: { tool_name, args }
+            result = await mcp_dispatcher.invoke(chunk["tool_call"])
+            # send result back to LLM as tool_result
+            await pipulate.stream(json.dumps({
+                "tool_result": {
+                   "tool_name": chunk["tool_call"]["name"],
+                   "result": result
+                }
+            }), verbatim=True)
+            continue
         # otherwise stream as before...

2.2 Add an MCP Dispatcher

# mcp.py
_tool_registry = {}

def mcp_tool(name: str, description: str, params: dict):
    def decorator(fn):
        _tool_registry[name] = {
            "fn": fn,
            "description": description,
            "params": params
        }
        return fn
    return decorator

class MCPDispatcher:
    async def invoke(self, call: dict):
        name = call["name"]
        args = call.get("args", {})
        tool = _tool_registry.get(name)
        if not tool:
            raise ValueError(f"Unknown MCP tool: {name}")
        try:
            result = tool["fn"](**args)
            # allow async
            if inspect.iscoroutine(result):
                result = await result
            return result
        except Exception as e:
            return {"error": str(e)}

mcp_dispatcher = MCPDispatcher()

3. Plugin Interface Extension

Enhance your plugin base so methods can be exposed as tools:

# in server.py (or a new mcp_tools.py)
from mcp import mcp_tool

class BaseCrud:
    # ...
    @mcp_tool(
        name="tasks_create_item",
        description="Create a new task in the tasks table",
        params={"text": "str", "profile_id": "int"}
    )
    def create_item_tool(self, text: str, profile_id: int):
        return self.table.insert({
            "text": text,
            "done": False,
            "priority": 0,
            "profile_id": profile_id
        }).__dict__

    @mcp_tool(
        name="tasks_mark_done",
        description="Mark an existing task as done",
        params={"item_id": "int"}
    )
    def mark_done_tool(self, item_id: int):
        item = self.table[item_id]
        item.done = True
        updated = self.table.update(item)
        return updated.__dict__

Now these appear in _tool_registry and can be called by the LLM.

4. Sample Conversion of an Existing Plugin

Below is how you’d adapt plugins/30_tasks.py to expose two MCP tools:

 from mcp import mcp_tool
 class CrudCustomizer(BaseCrud):
     # ...

+    @mcp_tool(
+      name="tasks_add",
+      description="Add a new task by text",
+      params={"text": "str"}
+    )
+    def mcp_add(self, text: str):
+        profile_id = self.plugin.db_dictlike.get("last_profile_id", 1)
+        return self.table.insert({
+            "text": text,
+            "done": False,
+            "priority": 0,
+            "profile_id": profile_id
+        }).__dict__

+    @mcp_tool(
+      name="tasks_complete",
+      description="Complete a task by ID",
+      params={"item_id": "int"}
+    )
+    def mcp_complete(self, item_id: int):
+        task = self.table[item_id]
+        task.done = True
+        updated = self.table.update(task)
+        return updated.__dict__

5. Security & Error‑Handling Considerations

Tool Whitelisting & User Consent
- Only functions annotated with @mcp_tool are callable.
- Present a one‑time consent dialog listing tool names/descriptions before the LLM can invoke them.
Parameter Validation
- Use the params schema in @mcp_tool to JSON‑schema–validate arguments before calling.
Sandboxing & Timeouts
- Run each tool in a asyncio.wait_for(..., timeout=5) to guard against hung calls.
- Catch exceptions in the dispatcher and return structured errors—not raw tracebacks.
Authentication for External MCP Servers
- Store credentials in an encrypted local store (.mcp_credentials.json, file‐based with OS‐level encryption).
- Sign outgoing calls with a JWT or HMAC header.
- Allow target URLs only from a user‐configured whitelist.
Logging & Auditing
- Log every MCP call with timestamp, tool name, arguments, and result (or error).
- Provide a simple UI in Pipulate to review the “MCP Audit Log.”

Next Steps

Implement the mcp.py dispatcher and integrate it into server.py.
Annotate key plugin methods with @mcp_tool.
Update chat_with_llm to route tool_call messages through the dispatcher.
Add a consent UI on first launch of MCP mode.
Test with a simple prompt like:

“Please run the tool tasks_add with text=‘Write MCP doc’.”

With this in place, your local Ollama model can not only describe actions but actually drive Pipulate plugins and even external services via MCP.

Non-Scientific Test

I probably should have tested the prompts with and without the XML wrappers. And as this was my first test of o4-mini-high, I probably should have done a before and after with o3-mini-high. But covering these combinations would have been a 2x2 matrix, huge article, and taken the rest of the day to do it right. And on the pragmatic front, I’m really just trying to move foreword and achieve that exponential acceleration effect, and that means not always being 100% scientific about it. It’s a forging forward sort of thing.

I already actually did these tests in Integrating MCP: Giving My Local LLM App the Power to Act when developing prompt_foo.py with the older model and without the XML wrappers, if you care to cross-reference yourself.

But this is really just kicking o4-mini-high’s tires and updating my context foo to use XML wrappers, and so, mission accomplished.

Analysis

Title/Headline Ideas:
- Enhancing AI Prompts with XML Tags: A prompt_foo.py Case Study
- Structuring LLM Context with XML: Testing Anthropic’s Method and o4-mini-high
- From Plain Text to XML: Improving Prompt Engineering for Codebase Analysis
- Implementing XML Prompt Wrappers for Better AI Responses (Pipulate Project)
- Trialing o4-mini-high and XML Tags for Advanced AI Context Generation
Strengths:
- Provides concrete, detailed examples of implementing XML tags in Python code and prompt templates.
- Includes raw outputs from AI models, showing the direct results of the prompts.
- Offers authentic insight into a developer’s iterative process, including tool modification, experimentation, and reflection.
- Covers advanced topics like specific prompt engineering techniques, AI model usage (o4-mini-high), and potential MCP integration.
- Demonstrates a practical application (prompt_foo.py, Pipulate analysis) of the concepts discussed.
Weaknesses:
- Assumes significant prior knowledge of the author’s specific tools (prompt_foo.py, Pipulate), environment (Nix, virtualenv), and related technologies (HTMX, FastHTML, MCP, Ollama).
- Heavy use of technical jargon may make it inaccessible to readers unfamiliar with AI development or the specific tech stack.
- The journal-like, “in-the-moment” structure lacks a formal introduction and conclusion, potentially making it hard to follow for outsiders.
- Limited background context provided within the article itself for the various components discussed.
- The testing methodology is acknowledged as non-rigorous (“Non-Scientific Test”).
AI Opinion: This article appears to be a valuable technical log documenting a specific experiment in prompt engineering and AI model testing. Its strength lies in its practical detail and authenticity, showing the real-world application of using XML tags to structure complex AI prompts for codebase analysis and exploring advanced concepts like MCP integration. While its clarity is likely high for the author or those deeply familiar with the project, the heavy reliance on specific jargon and assumed context makes it less accessible to a general audience. Its usefulness is significant as a case study for developers working on similar problems (improving LLM interactions for complex tasks, integrating AI tools) or those following the author’s specific project journey.