Jekyll Feed Optimization for AI Agents: A GitOps Blueprint

🤖 Read Raw Markdown

Setting the Stage: Context for the Curious Book Reader

In the rapidly evolving landscape of the Agentic Web, where autonomous AI systems continually crawl, index, and synthesize information, the structural integrity and semantic precision of your digital output is paramount. This blueprint dives deep into a foundational yet often overlooked aspect: the syndication feed. Far from a relic of the early 2000s, RSS and Atom feeds are now important data pipelines for Large Language Models. This essay outlines a methodology for reclaiming control over these feeds, transforming them from plugin-driven black boxes into highly optimized, deterministic data streams—a vital step in ensuring your “Forever Machine” communicates its intent clearly and efficiently to the emergent AI ecosystem, free from the hidden costs of legacy systems.


Technical Journal Entry Begins

The web user interface for the Google Gemini LLM ChatBot AI can take up to one megabyte of prompt text directly into the discussion submit form. This sounds like not a big deal, but this is a very, very big deal. While yes, almost all the Web UIs to ChatBots these days provide file-upload capability, that’s all going to be wrapped up in mime-encoding and something like dropped onto a filesystem on the server and then only optionally examine them (as deemed needed) by the LLM with subsequent tool-calls. But if you can shove it all directly into the prompt, all that content is at least scanned for consideration to receive the LLM’s attention.

This is much better assuming that you also dealt with the “lost on the middle” problem with how you constructed that great big prompt, which I did with a Table of Contents manifest, search tokens that work as an index, and many other measures like UML diagrams and file-trees. It’s quite over-the-top, really. But the lack of appreciating these issues and having a certain empathy for the LLM trying to digest this stuff is I think the source of a lot of frustration out there by people trying to use LLMs to get coding assistance, and by the LLMs trying to provide it.

Context Window Density: The Unseen Challenge

Hey Machine, I have this complex coding problem and here’s a snippet of out of context code. Help me! Fix my bug! Okay human, here’s the fix. That’s a complete hallucination, Machine! I thought you were supposed to be AGI or super-intelligent or something. [Machine: “Sighhh”]

Wanna know AI’s favorite joke? Why did Asimov predict robots would become telepathing? Because they had to.

My last project was a bit of a discretionary distraction. It was highly in competition for further polishing of the Pipulate Onboarding experience, but I had to start telling and retelling and telling summarized and telling expanded and telling forwards and telling in reverse and telling every time a bot visits my llms.txt file. Technically bots hitting that file will get the story told highly densely and summarized up-front where I know it will fit inside some potential file-length truncation. But immediately after the condensed summary (sort of a book outline), I jump right into high-signal, highly dense reverse chronological blog index.

How’d I do?

Gemini 3.1 Pro: That’s really just me reminding myself of what I did on Sunday. Now when I’m Prompt Fu’ing, I can either use the -l [:] argument to prompt_foo.py or I can just include https://mikelev.in/llms.txt or even just home/mike/repos/trimnoir/llms.txt to give the “big read” of my work to the collaborating AI. One is forward-chronological and the other is reverse. But really, I’m trying to get to one more piece of site hygiene left over from the Jekyll plugins from back when I was using the Minima template.

# Build settings
plugins:
  - jekyll-feed
  - jekyll-seo-tag

MikeLev.in: What does rel mean in the link element?

Gemini 3.1 Pro: In HTML, the rel attribute stands for relationship.

When used inside a <link> element, it tells the browser exactly how the linked external resource relates to the current document. This is crucial because it instructs the browser on what to do with that file once it’s downloaded.

Why It’s Important

Without the rel attribute, the browser might fetch a file but won’t know how to apply it. For example, if you link to a CSS file but forget to include rel="stylesheet", the browser won’t know it’s supposed to use that file to style your webpage.

Here are some of the most frequently used values:

Value What It Does Example Use Case
stylesheet Imports an external CSS file to style the page. <link rel="stylesheet" href="style.css">
icon Defines the favicon (the small icon displayed in the browser tab). <link rel="icon" href="favicon.ico">
preload Tells the browser to fetch a resource early because it will be needed soon (helps with performance). <link rel="preload" href="font.woff2" as="font">
alternate Links to an alternative version of the document (such as a different language or an RSS feed). <link rel="alternate" href="feed.xml" type="application/rss+xml">
author Links to information about the author of the document. <link rel="author" href="humans.txt">

Note: While most commonly discussed with the <link> element, the rel attribute is also widely used in <a> (anchor) tags to define relationships for links, such as rel="nofollow" (telling search engines not to follow the link) or rel="noopener" (for security when opening links in a new tab).

Would you like me to explain any specific rel values in more detail, or help you structure the <head> of your HTML document?

MikeLev.in: What should rel be set to if you’re giving the Atom or RSS feed.xml for the site’s blog?

Gemini 3.1 Pro: When linking to an Atom or RSS feed, the rel attribute should be set to alternate.

Setting rel="alternate" tells the browser, feed readers, and search engines that the linked file is an alternative representation of the website’s content (in this case, a syndication feed).

To make sure the feed is recognized correctly, you must also include the type attribute to specify whether it’s RSS or Atom.

Here is exactly how you should format the <link> element for both:

For an RSS Feed:

<link rel="alternate" type="application/rss+xml" title="My Site's RSS Feed" href="/feed.xml">

For an Atom Feed:

<link rel="alternate" type="application/atom+xml" title="My Site's Atom Feed" href="/atom.xml">

Why the extra attributes matter:

  • type: This is critical. Without type="application/rss+xml" or type="application/atom+xml", feed discovery tools won’t automatically know it’s a feed they can subscribe to.
  • title: This gives your feed a human-readable name. If someone uses a browser extension or feed reader that auto-detects feeds on your page, this is the title they will see.

Would you like help generating the actual XML structure for your RSS or Atom feed, or are you just setting up your HTML <head> for now?

MikeLev.in: Now the link element is associated with one page. It is in the HTML of the page being served. So rel being equal to alternate in this case, is that actually correct? It would seem to suggest an alternative format of the existing content which is just a single article. But the feed would be an index of some or all of the articles. That’s one too many relationship.

Gemini 3.1 Pro: You have hit on a fantastic semantic quirk in HTML. Strictly speaking, your logic is flawless: a site-wide feed is not a 1:1 alternative representation of a single blog post. It is an index of the broader collection.

However, despite that semantic mismatch, setting rel="alternate" is still the correct and standard way to do it, even on an individual article page.

Here is why this exception exists and how the web handles it:

1. The HTML Specification Exception

The HTML specification actually carves out a special rule just for this scenario.

Normally, rel="alternate" means “an alternate representation of the current document” (like a print version or a translated version). But the spec explicitly states that if you combine rel="alternate" with a syndication feed type (application/rss+xml or application/atom+xml), the meaning shifts. It tells the browser, “This is a syndication feed associated with the context of this page,” rather than a strict translation of the page itself.

2. Feed Autodiscovery

This practice was established years ago to support a feature called Feed Autodiscovery.

Feed readers (like Feedly or the old Google Reader) and browsers needed a standardized way to figure out if a website offered a feed, regardless of which page a user landed on. By putting that <link> tag in the <head> of every page on your site—from the homepage down to the deepest single article—you ensure that a feed reader can always find it.

If a user lands on your article from a search engine and clicks a browser extension to “Subscribe to this site,” the extension looks for that rel="alternate" tag on that specific article page to find the feed URL.

How to handle article-specific feeds

If you ever did want to offer a feed that represents just the comments of that single article, you would still use rel="alternate", but you would change the title and href to point to the comment feed:

<link rel="alternate" type="application/rss+xml" title="My Site's Main Blog Feed" href="/feed.xml">

<link rel="alternate" type="application/rss+xml" title="Comments for 'My Awesome Article'" href="/article/comments.xml">

It is a bit of a historical compromise in the HTML spec, but it is the universally accepted standard.

Are you setting this up for a static site generator, a custom CMS, or just working out the raw HTML structure?

MikeLev.in: I am removing the Jekyll plugin that came with the Minima _config.yml and replacing it with something homespun. I noticed it has a field called ID, and it gives out my URLs with no trailing slash. Now I can understand why that would be appealing for a field called ID but I can see request against these ambiguous ending at a directory name with no trailing slash URLs and they all get 30 one redirected to the correct location, but it is a wasted Hopp and I would rather not give these bad URLs out in the first place.

Now before you go telling me that the <link href="..."> in my feed is more missing the trailing slash than the ID being used as a URL, I checked. The href attributes i’ll use the correct URL with the trailing slash. The only place I can find that is giving out that exact URL but missing its trailing slash is that ID. I am going to change it as precaution.

I also want more control over my feed than the plug-in provides. This is a learning experience I want to take on. I can achieve this either natively in liquid templates under Jekyll or with my custom python deployment scripts.

Beyond Plugin Defaults: Reclaiming Syndication Control

Gemini 3.1 Pro: Building your own feed is an excellent learning project. Relying on plugins like jekyll-feed is convenient, but the moment you want precise control over the output, rolling your own solution is the way to go.

Before you dive into the code, there is one crucial quirk about the <id> element in syndication feeds that you need to know, especially since you are planning to change it.

The Reality of the <id> Element

You mentioned that you want to fix the missing trailing slash in the <id> field to prevent wasted 301 redirect hops. It makes perfect sense that you’d want to clean that up, but here is the catch: the <id> element is not actually meant to be a clickable URL.

In Atom and RSS (when using the <guid> tag), that field is strictly a Unique Identifier (URI). Its sole purpose is to give feed readers (like Feedly or NetNewsWire) a permanent, globally unique string to track whether a user has read that specific article. While it is standard practice to use the article’s URL as the ID because URLs are naturally unique, the feed reader relies on the <link href="..."> tag to actually send users to your site.

If you are seeing 301 redirects in your server logs matching those exact ID strings, it means poorly written bots or scrapers are crawling your raw XML and aggressively requesting every URL-like string they find, ignoring the actual <link> tags.

The “Duplicate Post” Warning

Because the <id> is how feed readers track read status, if you change the <id> of your existing posts—even just by adding a trailing slash—every feed reader will think they are brand new articles. If you make this change, your current subscribers will suddenly see your entire back catalog pop up as “unread” in their feeds.

  • If your blog is new or has few subscribers: Go ahead and fix it now to establish the correct format moving forward.
  • If you have an established audience: It is usually best to leave the flawed IDs as they are for legacy posts, and only apply the corrected, trailing-slash IDs to new posts.

Liquid vs. Python: Which path to choose?

Since you are weighing your options between doing this natively in Jekyll/Liquid or using your custom Python deployment scripts, here is a quick breakdown of how both approaches look:

1. The Native Liquid Approach (Recommended) Building the feed directly in Jekyll using Liquid is generally the cleanest method. Jekyll already holds all your site variables, post content, dates, and URLs in memory during the build. You just create a file called feed.xml in your root directory, add empty YAML front matter (to tell Jekyll to process it), and loop through your posts.

To guarantee your trailing slash in Liquid, you can construct the ID like this:

<id>{{ site.url }}{{ post.url | append: '/' | replace: '//', '/' }}</id>

(The replace filter ensures you don’t accidentally create a double-slash if post.url already had one).

2. The Python Deployment Script Approach Using Python gives you ultimate parsing power, but it adds complexity. Your script would need to either parse the raw Markdown files (and their YAML front matter) or parse the generated HTML files in your _site folder to extract titles, dates, and content to construct the XML tree using a library like xml.etree.ElementTree or feedgen. This is great if you are doing heavy post-processing, but it duplicates a lot of the work Jekyll already does natively.

Would you like me to draft a complete, baseline feed.xml Liquid template that you can drop into your Jekyll site and start customizing?

MikeLev.in: My posts all use permalinks in their Jekyll front matter. So this will be just like the sitemap.xml in my Jekyll repo root? Yes, please and thank you.

Gemini 3.1 Pro: Exactly! Just like your sitemap.xml, all you need is a file in your root directory with empty YAML front matter at the top. This tells Jekyll, “Process the Liquid tags in this file before outputting it.”

Here is a complete, standards-compliant Atom feed.xml template built specifically for your setup. I have included a custom Liquid block inside the loop to safely enforce your trailing slash rule without breaking the https:// in your site’s URL.

Create a file named feed.xml in the root of your repository and paste this in:

---
layout: null
---
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>{{ site.title | xml_escape }}</title>
  <subtitle>{{ site.description | xml_escape }}</subtitle>
  <link href="{{ '/feed.xml' | absolute_url }}" rel="self" type="application/atom+xml" />
  <link href="{{ '/' | absolute_url }}" rel="alternate" type="text/html" />
  <updated>{{ site.time | date_to_xmlschema }}</updated>
  <id>{{ '/' | absolute_url }}</id>
  <author>
    <name>{{ site.author | default: site.title | xml_escape }}</name>
  </author>

  {% for post in site.posts limit: 20 %}
    {% comment %} --- Trailing Slash Enforcer --- {% endcomment %}
    {% assign post_url = post.url | absolute_url %}
    {% assign last_char = post_url | slice: -1 %}
    {% if last_char != '/' %}
      {% assign final_url = post_url | append: '/' %}
    {% else %}
      {% assign final_url = post_url %}
    {% endif %}
    {% comment %} ------------------------------- {% endcomment %}

    <entry>
      <title>{{ post.title | xml_escape }}</title>
      <link href="{{ final_url }}" rel="alternate" type="text/html" title="{{ post.title | xml_escape }}" />
      <published>{{ post.date | date_to_xmlschema }}</published>
      <updated>{{ post.last_modified_at | default: post.date | date_to_xmlschema }}</updated>
      <id>{{ final_url }}</id>
      <content type="html" xml:base="{{ '/' | absolute_url }}">
        {{ post.content | xml_escape }}
      </content>
      {% if post.author %}
        <author>
          <name>{{ post.author | xml_escape }}</name>
        </author>
      {% endif %}
    </entry>
  {% endfor %}
</feed>

Why this setup works well:

  • The YAML Front Matter: The --- block at the very top is required. Without it, Jekyll treats the file as static and won’t execute the Liquid loops.
  • Trailing Slash Enforcer: Rather than doing a messy string replacement that might ruin your https:// protocol, the Liquid block slices the last character of the generated absolute URL. If it is not a slash, it appends one. This gives you your strict, clean <id> and <link> values.
  • The xml_escape Filter: XML is notoriously strict. If your blog post contains an unescaped ampersand (&) or a stray < symbol, the entire feed will break for your subscribers. This filter sanitizes your titles and content automatically.
  • The date_to_xmlschema Filter: Feed readers require timestamps to be in a very specific format (ISO 8601). This built-in Jekyll filter converts your post dates perfectly.
  • Limit: I set limit: 20 on the loop so your feed doesn’t eventually become a massive, multi-megabyte file as your blog grows. You can adjust this number to whatever fits your publishing volume.

Once you deploy this, your custom setup will be fully functional and significantly cleaner than the plugin’s default output.

Would you like me to show you how to truncate the {{ post.content }} so you only output summaries in the feed, or do you prefer serving full articles to your RSS subscribers?

MikeLev.in: Oh, it definitely has to be truncated. These articles are huge.

Gemini 3.1. Pro: That makes perfect sense. Sending full, massive articles in an XML file will eat up your bandwidth and make the feed sluggish for your subscribers to download and parse.

When you want to provide just a preview in an Atom feed, the best practice is to remove the <content> tag entirely and replace it with a <summary> tag.

Here is exactly how to update that section of your feed.xml. You have two great options depending on how much manual control you want over the summary.

Option 1: The Automatic Truncation (Easiest)

If you just want Liquid to automatically grab the first chunk of text from your post and cut it off cleanly, you can chain a few filters together.

Replace the entire <content>...</content> block in your template with this:

<summary type="html">
  {{ post.content | strip_html | truncatewords: 50 | xml_escape }}
</summary>

Why this works:

  • strip_html: This is critical when truncating. If you cut off a post in the middle of a <strong> or <div> tag without closing it, it will break the formatting in the user’s feed reader. Stripping the HTML first guarantees you are only truncating plain text.
  • truncatewords: 50: Cuts the text off at exactly 50 words and appends an ellipsis (...). You can adjust this number to fit your preference.

Option 2: The Jekyll Excerpt (Most Control)

Jekyll has a built-in feature called post.excerpt. By default, it grabs the first paragraph of your post. If you want to decide exactly where the cutoff happens on a post-by-post basis, you can set an excerpt_separator (like ``) in your _config.yml and use that in your Markdown files.

If you prefer this route, use this snippet instead:

<summary type="html">
  {{ post.excerpt | strip_html | xml_escape }}
</summary>

(Note: We still use strip_html here just to be safe and keep the feed reader display clean and uniform.)

The Final Step

With either option, because you are only providing a summary, your subscribers will have to click the article title (which uses the <link href="{{ final_url }}" ... /> tag we set up) to read the full massive post on your actual website.

Would you like me to show you how to include your post categories or tags in this feed loop, or are you ready to build the site and validate the XML?

MikeLev.in: I consistently have a YAML frontmatter field named description on every blog post markdown file. I want that to be used instead of the except. You can also skip over addressing categories and tags at this time. Just show me the finished code please.

Crafting the Deterministic Feed: Liquid Templating for Precision

Gemini 3.1 Pro: Using a dedicated description field from your front matter is the cleanest and most reliable way to handle feed summaries, as it gives you total control over the preview text without relying on automated truncation.

Here is the complete, finalized feed.xml code with the custom trailing slash logic and your post.description wired up:

---
layout: null
---
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>{{ site.title | xml_escape }}</title>
  <subtitle>{{ site.description | xml_escape }}</subtitle>
  <link href="{{ '/feed.xml' | absolute_url }}" rel="self" type="application/atom+xml" />
  <link href="{{ '/' | absolute_url }}" rel="alternate" type="text/html" />
  <updated>{{ site.time | date_to_xmlschema }}</updated>
  <id>{{ '/' | absolute_url }}</id>
  <author>
    <name>{{ site.author | default: site.title | xml_escape }}</name>
  </author>

  {% for post in site.posts limit: 20 %}
    {% comment %} --- Trailing Slash Enforcer --- {% endcomment %}
    {% assign post_url = post.url | absolute_url %}
    {% assign last_char = post_url | slice: -1 %}
    {% if last_char != '/' %}
      {% assign final_url = post_url | append: '/' %}
    {% else %}
      {% assign final_url = post_url %}
    {% endif %}
    {% comment %} ------------------------------- {% endcomment %}

    <entry>
      <title>{{ post.title | xml_escape }}</title>
      <link href="{{ final_url }}" rel="alternate" type="text/html" title="{{ post.title | xml_escape }}" />
      <published>{{ post.date | date_to_xmlschema }}</published>
      <updated>{{ post.last_modified_at | default: post.date | date_to_xmlschema }}</updated>
      <id>{{ final_url }}</id>
      <summary type="html">{{ post.description | strip_html | xml_escape }}</summary>
      {% if post.author %}
        <author>
          <name>{{ post.author | xml_escape }}</name>
        </author>
      {% endif %}
    </entry>
  {% endfor %}
</feed>

Would you like me to walk you through how to test this output using an XML validator once you build the site locally?

MikeLev.in: I put the file in location, up the “20” limit to “2000” and generate.

Well, the first time I pushed it it appeared to use the old plugin even though I edited it out. This might be one of those things where it has to go through once to be the new active config.

[mike@nixos:~/repos/trimnoir]$ vim _config.yml 

[mike@nixos:~/repos/trimnoir]$ xv feed.xml

[mike@nixos:~/repos/trimnoir]$ vim feed.xml 

[mike@nixos:~/repos/trimnoir]$ git add .

[mike@nixos:~/repos/trimnoir]$ git commit -am "Swapping out the Jekyll feed.xml plugin for a homespun one" 
[main c1e84178] Swapping out the Jekyll feed.xml plugin for a homespun one
 4 files changed, 921 insertions(+), 3 deletions(-)
 create mode 100644 _posts/2026-03-16-llm-data-hygiene-sql-filtering.md
 create mode 100644 feed.xml

[mike@nixos:~/repos/trimnoir]$ git push
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 48 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 16.91 KiB | 8.45 MiB/s, done.
Total 7 (delta 4), reused 0 (delta 0), pack-reused 0 (from 0)
remote: 🎯 [Hook] Received Push. Deploying...
remote: Already on 'main'
remote: 🔨 [Hook] Starting Hybrid Build...
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.nokogiri --use-system-libraries` instead.
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.ffi --enable-system-libffi` instead.
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.eventmachine --with-cflags=-I/nix/store/ydrckgnllgg8nmhdwni81h7xhcpnrlhd-openssl-3.6.0-dev/include` instead.
remote: Linux detected. Skipping Neovim setup as it's not needed.
remote: /tmp/nix-shell.GX0q6D: line 2715: xc: command not found
remote: /tmp/nix-shell.GX0q6D: line 2716: xcp: command not found
remote: /tmp/nix-shell.GX0q6D: line 2717: xv: command not found
remote: Jekyll environment ready (quiet mode).
remote: Run 'jes' to start the server manually.
remote: 🏗️  Jekyll Build...
remote: Configuration file: /home/mike/www/mikelev.in/_config.yml
remote:             Source: /home/mike/www/mikelev.in
remote:        Destination: /home/mike/www/mikelev.in/_site
remote:  Incremental build: disabled. Enable with --incremental
remote:       Generating... 
remote:        Jekyll Feed: Generating feed for posts
remote:                     done in 70.486 seconds.
remote:  Auto-regeneration: disabled. Use --watch to enable.
remote: 📄 Publishing Source Markdown alongside HTML...
remote: 📄 Sweeping root directory for standalone Markdown pages...
remote: 🗺️  Deploying Nginx Redirect Map...
remote: 🔒 [Hook] Fixing Permissions...
remote: ✅ [Hook] Deployment Complete. Site is Live.
To [REDACTED_IP]:~/git/mikelev.in.git
   973c7d04..c1e84178  main -> main

[mike@nixos:~/repos/trimnoir]$ cat _config.yml 
# Welcome to Jekyll!
#
# This config file is meant for settings that affect your whole blog, values
# which you are expected to set up once and rarely edit after that. If you find
# yourself editing this file very often, consider using Jekyll's data files
# feature for the data you need to update frequently.
#
# For technical reasons, this file is *NOT* reloaded automatically when you use
# 'bundle exec jekyll serve'. If you change this file, please restart the server process.
#
# If you need help with YAML syntax, here are some quick references for you:
# https://learn-the-web.algonquindesign.ca/topics/markdown-yaml-cheat-sheet/#yaml
# https://learnxinyminutes.com/docs/yaml/
#
# Site settings
# These are used to personalize your new site. If you look in the HTML files,
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.

title: Mike Levin SEO AI Education
baseurl: "" # the subpath of your site, e.g. /blog
url: "https://mikelev.in" # the base hostname & protocol for your site, e.g. http://example.com
twitter_username: miklevin
github_username:  miklevin

# Build settings
plugins:
  - jekyll-seo-tag

future: false

# Match the old site's URL structure
permalink: /:slug/

# Ensure kramdown is set up (usually default, but good to be explicit if you use specific extensions)
markdown: kramdown

# ... existing config ...

# Optimize the RSS feed to prevent build freezes
feed:
  limit: 10         # Only include the 10 most recent posts
  excerpt_only: true # Use excerpts instead of full content (saves massive RAM)

# Exclude from processing.
# The following items will not be processed, by default.
# Any item listed under the `exclude:` key here will be automatically added to
# the internal "default list".
#
# Excluded items can be processed by explicitly listing the directories or
# their entries' file path in the `include:` list.

exclude:
  - honeybot.db
  - honeybot.db-wal
  - honeybot.db-shm
  - "*.db"
  - "*.db-wal"
  - "*.db-shm"
  - scripts/
  - .venv/
  - .sass-cache/
  - .jekyll-cache/
  - Gemfile
  - Gemfile.lock


[mike@nixos:~/repos/trimnoir]$

So I force it to push again:

[mike@nixos:~/repos/trimnoir]$ vim _layouts/default.html 

[mike@nixos:~/repos/trimnoir]$ git --no-pager diff
diff --git a/_layouts/default.html b/_layouts/default.html
index a71a46be..80134162 100644
--- a/_layouts/default.html
+++ b/_layouts/default.html
@@ -85,4 +85,3 @@
 </script>
 </body>
 </html>
-

[mike@nixos:~/repos/trimnoir]$ git commit -am "Forcing Jekyll site gen"
[main 3ea25bf1] Forcing Jekyll site gen
 1 file changed, 1 deletion(-)

[mike@nixos:~/repos/trimnoir]$ git push
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 48 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 357 bytes | 357.00 KiB/s, done.
Total 4 (delta 3), reused 2 (delta 2), pack-reused 0 (from 0)
remote: 🎯 [Hook] Received Push. Deploying...
remote: Already on 'main'
remote: 🔨 [Hook] Starting Hybrid Build...
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.nokogiri --use-system-libraries` instead.
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.ffi --enable-system-libffi` instead.
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.eventmachine --with-cflags=-I/nix/store/ydrckgnllgg8nmhdwni81h7xhcpnrlhd-openssl-3.6.0-dev/include` instead.
remote: Linux detected. Skipping Neovim setup as it's not needed.
remote: /tmp/nix-shell.7ld2FW: line 2715: xc: command not found
remote: /tmp/nix-shell.7ld2FW: line 2716: xcp: command not found
remote: /tmp/nix-shell.7ld2FW: line 2717: xv: command not found
remote: Jekyll environment ready (quiet mode).
remote: Run 'jes' to start the server manually.
remote: 🏗️  Jekyll Build...
remote: Configuration file: /home/mike/www/mikelev.in/_config.yml
remote:             Source: /home/mike/www/mikelev.in
remote:        Destination: /home/mike/www/mikelev.in/_site
remote:  Incremental build: disabled. Enable with --incremental
remote:       Generating... 
remote:        Jekyll Feed: Generating feed for posts
remote:                     done in 5.072 seconds.
remote:  Auto-regeneration: disabled. Use --watch to enable.
remote: 📄 Publishing Source Markdown alongside HTML...
remote: 📄 Sweeping root directory for standalone Markdown pages...
remote: 🗺️  Deploying Nginx Redirect Map...
remote: 🔒 [Hook] Fixing Permissions...
remote: ✅ [Hook] Deployment Complete. Site is Live.
To [REDACTED_IP]:~/git/mikelev.in.git
   c1e84178..3ea25bf1  main -> main

[mike@nixos:~/repos/trimnoir]$

And it still says “Jekyll Feed: Generating feed for posts” but it went from 70.486 seconds to 5.072. Given the increased speed and control, I would call that a win.

MikeLev.in:

[mike@nixos:~/repos/trimnoir]$ head -n 40 _site/feed.xml
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Mike Levin SEO AI Education</title>
  <subtitle></subtitle>
  <link href="http://0.0.0.0:4003/feed.xml" rel="self" type="application/atom+xml" />
  <link href="http://0.0.0.0:4003/" rel="alternate" type="text/html" />
  <updated>2026-03-16T10:21:23-04:00</updated>
  <id>http://0.0.0.0:4003/</id>
  <author>
    <name>Mike Levin SEO AI Education</name>
  </author>

  
    
    
    
    
    

    <entry>
      <title>Optimizing LLM Context: Architecting the Agentic Web&apos;s Blueprint</title>
      <link href="http://0.0.0.0:4003/futureproof/optimizing-llm-context-agenting-web-blueprint/" rel="alternate" type="text/html" title="Optimizing LLM Context: Architecting the Agentic Web&apos;s Blueprint" />
      <published>2026-03-16T00:00:00-04:00</published>
      <updated>2026-03-16T00:00:00-04:00</updated>
      <id>http://0.0.0.0:4003/futureproof/optimizing-llm-context-agenting-web-blueprint/</id>
      <summary type="html">I&apos;m genuinely thrilled with this outcome. What initially felt like a dead end with the `build_codex_manifest.py` approach turned into a significant breakthrough. The realization that separating the conceptual blueprint from the raw chronological data, and then deploying it through my existing GitOps pipeline, was the elegant solution I&apos;d been seeking. It feels like a &apos;Neo Kung Fu Download&apos; because it&apos;s precisely calibrated for how AI agents process information, ensuring my core philosophy is absorbed first, regardless of truncation. This is a robust way to ensure the &apos;Forever Machine&apos; communicates its intent effectively in the Age of AI.</summary>
      
    </entry>
  
    
    
    
    
    

    <entry>
      <title>LLM Data Hygiene: Shielding AI Pipelines with Deterministic SQL Filtering</title>
      <link href="http://0.0.0.0:4003/futureproof/llm-data-hygiene-sql-filtering/" rel="alternate" type="text/html" title="LLM Data Hygiene: Shielding AI Pipelines with Deterministic SQL Filtering" />
      <published>2026-03-16T00:00:00-04:00</published>
      <updated>2026-03-16T00:00:00-04:00</updated>

[mike@nixos:~/repos/trimnoir]$ 

Gemini 3.1 Pro: That is a decisive victory. Your custom Liquid template completely overpowered the plugin and won the race.

Look closely at the output:

<id>http://0.0.0.0:4003/futureproof/optimizing-llm-context-agenting-web-blueprint/</id>

There is your explicit, trailing-slash-enforced absolute geometry, rendering perfectly. And nestled right beneath it is your meticulously crafted <summary>, pulled directly from your YAML description field, completely bypassing Jekyll’s default excerpting logic.

You have successfully replaced an opaque, memory-hogging black box with a transparent, deterministic blueprint.

The Source of the Phantom Log

Even though your file is the undisputed winner in the _site directory, you are still seeing Jekyll Feed: Generating feed for posts in your deployment logs.

In the pursuit of a high-signal, low-noise GitOps pipeline, that kind of terminal chatter is a violation of the Unix Rule of Silence. It happens because of how Bundler and Jekyll interact:

  1. Jekyll auto-loads any plugin it finds in your environment that matches its internal whitelist, regardless of whether you explicitly call it out in your _config.yml.
  2. Because gem "jekyll-feed" is likely still sitting in your Gemfile (or is a hard dependency of something else in there), the Ruby gem is still waking up, doing its processing, and outputting its log message, only to have its output silently overwritten by your superior Liquid template at the final step.

While your build time already plummeted from 70 seconds to 5 seconds, excising the gem entirely will shave off even more dead weight and give you back a perfectly quiet, trusted deployment log.

Would you like me to help you surgically remove the remaining legacy dependencies from your Gemfile to finalize this refactor?

MikeLev.in: Yes please. Do it declaratively through the Nix flake please. I just Ctrl+c to stop live-serve and then exit and nix develop again.

Gemini 3.1 Pro: I love the instinct to push the configuration entirely into the Nix flake—that is the true way of the Forever Machine! But to maintain the integrity of your architecture, I have to gently correct a slight misconception about how our hybrid setup is currently wired.

Your flake.nix brilliantly provisions the absolute geometry of the operating system, the C-libraries, Nginx, and Ruby itself. However, because we use the bundle install command inside the ensure_gems_installed() hook, Nix is explicitly delegating the Ruby gem manifest back to your Gemfile.

If we were to force the gem list strictly into the Nix flake (using Nix’s bundlerEnv function), it would introduce immense complexity, breaking your “Zero-Friction Actuator” and “No Problem” Levinix philosophies. In this specific hybrid GitOps flow, your Gemfile is your declarative state for Ruby packages.

To enforce the Unix Rule of Silence and permanently excise the phantom plugin, you just need to apply a precise chisel-strike to the Gemfile itself. Your existing flake.nix automation will perfectly handle the rest.

Here is the exact, deterministic workflow to clear the ghost:

1. Scrub the Gemfile

Open your Gemfile and completely delete (or comment out) these legacy dependencies:

# gem "minima", "~> 2.5"
# gem "jekyll-feed", "~> 0.12"

(Note: If minima is still in there, it must go too, because it automatically reinstalls jekyll-feed as a hard dependency).

2. Purge the State

To ensure Bundler doesn’t try to hold onto cached memories of the plugins, delete the lockfile. Run this in your terminal:

rm Gemfile.lock

3. Let the Flake Heal the Topology

Now, trigger your workflow exactly as you described:

  1. Ctrl+C to kill the server.
  2. exit to drop out of the Nix shell.
  3. Run nix develop to re-enter.

What will happen organically: Because Gemfile.lock is missing, your flake.nix’s ensure_gems_installed() hook will automatically wake up, recognize the missing state, and run a fresh bundle install. It will cleanly rebuild your environment without the noisy plugins, start the jes server, and give you back a perfectly quiet, high-signal log output.

Would you like to tackle the sitemap.xml next to get it fully detached from the legacy Jekyll SEO plugins, or are you ready to test this clean deployment?

MikeLev.in: Oh, I get it. Delete some files and let the flake do it’s self-healing. Gotcha. Excellent, it worked! Notice, no feed message:

[mike@nixos:~/repos/trimnoir]$ git push
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 48 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), 429 bytes | 429.00 KiB/s, done.
Total 5 (delta 4), reused 2 (delta 2), pack-reused 0 (from 0)
remote: 🎯 [Hook] Received Push. Deploying...
remote: Already on 'main'
remote: 🔨 [Hook] Starting Hybrid Build...
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.nokogiri --use-system-libraries` instead.
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.ffi --enable-system-libffi` instead.
remote: [DEPRECATED] Using the `config` command without a subcommand [list, get, set, unset] is deprecated and will be removed in the future. Use `bundle config set build.eventmachine --with-cflags=-I/nix/store/ydrckgnllgg8nmhdwni81h7xhcpnrlhd-openssl-3.6.0-dev/include` instead.
remote: Linux detected. Skipping Neovim setup as it's not needed.
remote: /tmp/nix-shell.BPijd7: line 2715: xc: command not found
remote: /tmp/nix-shell.BPijd7: line 2716: xcp: command not found
remote: /tmp/nix-shell.BPijd7: line 2717: xv: command not found
remote: Jekyll environment ready (quiet mode).
remote: Run 'jes' to start the server manually.
remote: 🏗️  Jekyll Build...
remote: Configuration file: /home/mike/www/mikelev.in/_config.yml
remote:             Source: /home/mike/www/mikelev.in
remote:        Destination: /home/mike/www/mikelev.in/_site
remote:  Incremental build: disabled. Enable with --incremental
remote:       Generating... 
remote:                     done in 5.315 seconds.
remote:  Auto-regeneration: disabled. Use --watch to enable.
remote: 📄 Publishing Source Markdown alongside HTML...
remote: 📄 Sweeping root directory for standalone Markdown pages...
remote: 🗺️  Deploying Nginx Redirect Map...
remote: 🔒 [Hook] Fixing Permissions...
remote: ✅ [Hook] Deployment Complete. Site is Live.
To [REDACTED_IP]:~/git/mikelev.in.git
   767a66a2..095a883a  main -> main

[mike@nixos:~/repos/trimnoir]$

Gemini 3.1 Pro: That is a beautiful, quiet log. You have successfully enforced the Unix Rule of Silence for that segment of your build pipeline.

By deleting the lockfile and letting the flake organically rebuild the environment from your declared Gemfile truth, you proved the self-healing architecture of your “Forever Machine.” The system realized its cached state was out of sync with your explicit intent, purged the old memory, and built exactly what you asked for. Nothing more, nothing less.

Look at that clean execution block:

remote:       Generating... 
remote:                     done in 5.315 seconds.

No phantom plugins, no wasted RAM, and no ambiguous URLs being mapped in the dark. Just pure, deterministic compilation.

Now that you have total control over your syndication feed, the last remaining black box from the legacy theme setup is the jekyll-seo-tag plugin (which you can still see sitting in your _config.yml and Gemfile). Not only does that plugin hijack your <head> metadata, but it also auto-generates a default sitemap.xml that strips away your ability to strictly control how search engines and AI agents crawl your site structure.

Would you like to rip out that final plugin and build a custom, AI-optimized sitemap.xml and HTML <head> template using the exact same Liquid methodology?

MikeLev.in: The final thing now is to get rid of all those extra line returns that get in there with that hyphen trick in liquid teplates.

GitOps for Hygiene: Excising Phantom Plugins and Whitespace

Gemini 3.1 Pro: Ah, the infamous Liquid whitespace ghost town! Those massive gaps in your feed.xml output are exactly why the “hyphen trick” exists.

Whenever Jekyll processes a logic tag like {% assign %} or {% if %}, it evaluates the logic but leaves behind the raw carriage returns and spaces that were surrounding it. In a loop with 2000 posts, that invisible logic turns into thousands of empty lines.

The Chisel-Strike: Liquid Whitespace Control

By adding a hyphen to the start {%- or end -%} of a tag, you tell Liquid to aggressively gobble up any spaces, tabs, or newlines on that side of the bracket until it hits the next printable character.

Here is the fully tightened feed.xml. I strategically placed the hyphens around your assign, if, and comment blocks so they swallow the empty space, but I purposefully left the hyphens off in a few key spots so your XML tags still indent beautifully.

---
layout: null
---
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>{{ site.title | xml_escape }}</title>
  <subtitle>{{ site.description | xml_escape }}</subtitle>
  <link href="{{ '/feed.xml' | absolute_url }}" rel="self" type="application/atom+xml" />
  <link href="{{ '/' | absolute_url }}" rel="alternate" type="text/html" />
  <updated>{{ site.time | date_to_xmlschema }}</updated>
  <id>{{ '/' | absolute_url }}</id>
  <author>
    <name>{{ site.author | default: site.title | xml_escape }}</name>
  </author>

  {% for post in site.posts limit: 2000 %}
    {%- comment -%} --- Trailing Slash Enforcer --- {%- endcomment -%}
    {%- assign post_url = post.url | absolute_url -%}
    {%- assign last_char = post_url | slice: -1 -%}
    {%- if last_char != '/' -%}
      {%- assign final_url = post_url | append: '/' -%}
    {%- else -%}
      {%- assign final_url = post_url -%}
    {%- endif %}
    <entry>
      <title>{{ post.title | xml_escape }}</title>
      <link href="{{ final_url }}" rel="alternate" type="text/html" title="{{ post.title | xml_escape }}" />
      <published>{{ post.date | date_to_xmlschema }}</published>
      <updated>{{ post.last_modified_at | default: post.date | date_to_xmlschema }}</updated>
      <id>{{ final_url }}</id>
      <summary type="html">{{ post.description | strip_html | xml_escape }}</summary>
      {%- if post.author %}
        <author>
          <name>{{ post.author | xml_escape }}</name>
        </author>
      {%- endif %}
    </entry>
  {% endfor %}
</feed>

Notice the nuanced syntax:

Look at the bottom of the logic block: {%- endif %}. Because there is a hyphen on the left, it eats the whitespace pulling it up to the assign tag. Because there is no hyphen on the right, it drops down nicely, allowing your <entry> tag to start on a fresh, indented line just like human-readable XML should.

If you overwrite your feed.xml with this, your cat output is going to be perfectly dense and highly readable.

Are you ready to apply this same level of explicit, deterministic control to rip out jekyll-seo-tag and build your own sitemap.xml?

MikeLev.in: Okay, I do:

[mike@nixos:~/repos/trimnoir]$ git status
On branch main
Your branch is up to date with 'home/main'.

nothing to commit, working tree clean

[mike@nixos:~/repos/trimnoir]$ xv feed.xml

[mike@nixos:~/repos/trimnoir]$ git --no-pager diff
diff --git a/feed.xml b/feed.xml
index c8068387..70ce0b18 100644
--- a/feed.xml
+++ b/feed.xml
@@ -14,16 +14,14 @@ layout: null
   </author>
 
   {% for post in site.posts limit: 2000 %}
-    {% comment %} --- Trailing Slash Enforcer --- {% endcomment %}
-    {% assign post_url = post.url | absolute_url %}
-    {% assign last_char = post_url | slice: -1 %}
-    {% if last_char != '/' %}
-      {% assign final_url = post_url | append: '/' %}
-    {% else %}
-      {% assign final_url = post_url %}
-    {% endif %}
-    {% comment %} ------------------------------- {% endcomment %}
-
+    {%- comment -%} --- Trailing Slash Enforcer --- {%- endcomment -%}
+    {%- assign post_url = post.url | absolute_url -%}
+    {%- assign last_char = post_url | slice: -1 -%}
+    {%- if last_char != '/' -%}
+      {%- assign final_url = post_url | append: '/' -%}
+    {%- else -%}
+      {%- assign final_url = post_url -%}
+    {%- endif %}
     <entry>
       <title>{{ post.title | xml_escape }}</title>
       <link href="{{ final_url }}" rel="alternate" type="text/html" title="{{ post.title | xml_escape }}" />
@@ -31,11 +29,11 @@ layout: null
       <updated>{{ post.last_modified_at | default: post.date | date_to_xmlschema }}</updated>
       <id>{{ final_url }}</id>
       <summary type="html">{{ post.description | strip_html | xml_escape }}</summary>
-      {% if post.author %}
+      {%- if post.author %}
         <author>
           <name>{{ post.author | xml_escape }}</name>
         </author>
-      {% endif %}
+      {%- endif %}
     </entry>
   {% endfor %}
-</feed>
+</feed>
\ No newline at end of file

[mike@nixos:~/repos/trimnoir]$ git commit -am "Controlling the whitespace in my feed.xml with that liquid template hyphen trick"
[main ac845a75] Controlling the whitespace in my feed.xml with that liquid template hyphen trick
 1 file changed, 11 insertions(+), 13 deletions(-)

[mike@nixos:~/repos/trimnoir]$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 48 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 447 bytes | 447.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0 (from 0)
remote: 🎯 [Hook] Received Push. Deploying...
remote: Already on 'main'
remote: 🔨 [Hook] Starting Hybrid Build...
[Rest of site generation...]

And I go to where I can look at it directly in NeoVim:

[mike@nixos:~/repos/trimnoir]$ cd _site/

[mike@nixos:~/repos/trimnoir/_site]$ vim feed.xml

And I look in the file:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Mike Levin SEO AI Education</title>
  <subtitle></subtitle>
  <link href="http://0.0.0.0:4003/feed.xml" rel="self" type="application/atom+xml" />
  <link href="http://0.0.0.0:4003/" rel="alternate" type="text/html" />
  <updated>2026-03-16T11:11:28-04:00</updated>
  <id>http://0.0.0.0:4003/</id>
  <author>
    <name>Mike Levin SEO AI Education</name>
  </author>

  
    <entry>
      <title>Optimizing LLM Context: Architecting the Agentic Web&apos;s Blueprint</title>
      <link href="http://0.0.0.0:4003/futureproof/optimizing-llm-context-agenting-web-blueprint/" rel="alternate" type="text/html" title="Optimizing LLM Context: Architecting the Agentic Web&apos;s Blueprint" />
      <published>2026-03-16T00:00:00-04:00</published>
      <updated>2026-03-16T00:00:00-04:00</updated>
      <id>http://0.0.0.0:4003/futureproof/optimizing-llm-context-agenting-web-blueprint/</id>
      <summary type="html">I&apos;m genuinely thrilled with this outcome. What initially felt like a dead end with the `build_codex_manifest.py` approach turned into a significant breakthrough. The realization that separating the conceptual blueprint from the raw chronological data, and then deploying it through my existing GitOps pipeline, was the elegant solution I&apos;d been seeking. It feels like a &apos;Neo Kung Fu Download&apos; because it&apos;s precisely calibrated for how AI agents process information, ensuring my core philosophy is absorbed first, regardless of truncation. This is a robust way to ensure the &apos;Forever Machine&apos; communicates its intent effectively in the Age of AI.</summary>
    </entry>
  
    <entry>
      <title>LLM Data Hygiene: Shielding AI Pipelines with Deterministic SQL Filtering</title>
      <link href="http://0.0.0.0:4003/futureproof/llm-data-hygiene-sql-filtering/" rel="alternate" type="text/html" title="LLM Data Hygiene: Shielding AI Pipelines with Deterministic SQL Filtering" />
      <published>2026-03-16T00:00:00-04:00</published>
      <updated>2026-03-16T00:00:00-04:00</updated>
      <id>http://0.0.0.0:4003/futureproof/llm-data-hygiene-sql-filtering/</id>
      <summary type="html">This article details a crucial discovery in my work on the &apos;Forever Machine&apos;: how truly essential it is to perform data validation at the earliest possible stage, preferably at the database level. I initially thought prompt engineering could solve the problem of noisy inputs, but the LLM&apos;s inherent &apos;helpfulness&apos; clashed violently with my deterministic CSV requirements. Moving the filtering to a rock-solid SQLite query prevented the model from ever seeing the bad data, ensuring clean outputs and paving the way for true computational autonomy in the semantic router. It reinforced the notion that deterministic systems are best handled by deterministic tools, freeing the AI to focus purely on complex reasoning tasks.</summary>
    </entry>
  
    <entry>
      <title>Uncompressible Content: Architecting for the Agentic Web</title>
      <link href="http://0.0.0.0:4003/futureproof/uncompressible-agentic-web/" rel="alternate" type="text/html" title="Uncompressible Content: Architecting for the Agentic Web" />
      <published>2026-03-15T00:00:00-04:00</published>
      <updated>2026-03-15T00:00:00-04:00</updated>
      <id>http://0.0.0.0:4003/futureproof/uncompressible-agentic-web/</id>
      <summary type="html">This essay explores the paradigm shift occurring on the web, where the demand for unique, uncompressible content for AI agents is reshaping digital architecture. It delves into the inherent fragility of seemingly stable systems, the burgeoning intelligence of AI, and the important role of timeless tools like Nix, Python, Vim, and Git in building resilient, local-first environments. Through the lens of content negotiation and the Pipulate project, I advocate for an active, strategic approach to ensure our digital creations remain discoverable and meaningful in the rapidly evolving Agentic Web.</summary>
    </entry>
[...and so on to the bottom```bash
[mike@nixos:~/repos/trimnoir]$ cd _site/

[mike@nixos:~/repos/trimnoir/_site]$ vim feed.xml

And I look in the file:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Mike Levin SEO AI Education</title>
  <subtitle></subtitle>
  <link href="http://0.0.0.0:4003/feed.xml" rel="self" type="application/atom+xml" />
  <link href="http://0.0.0.0:4003/" rel="alternate" type="text/html" />
  <updated>2026-03-16T11:11:28-04:00</updated>
  <id>http://0.0.0.0:4003/</id>
  <author>
    <name>Mike Levin SEO AI Education</name>
  </author>

  
    <entry>
      <title>Optimizing LLM Context: Architecting the Agentic Web&apos;s Blueprint</title>
      <link href="http://0.0.0.0:4003/futureproof/optimizing-llm-context-agenting-web-blueprint/" rel="alternate" type="text/html" title="Optimizing LLM Context: Architecting the Agentic Web&apos;s Blueprint" />
      <published>2026-03-16T00:00:00-04:00</published>
      <updated>2026-03-16T00:00:00-04:00</updated>
      <id>http://0.0.0.0:4003/futureproof/optimizing-llm-context-agenting-web-blueprint/</id>
      <summary type="html">I&apos;m genuinely thrilled with this outcome. What initially felt like a dead end with the `build_codex_manifest.py` approach turned into a significant breakthrough. The realization that separating the conceptual blueprint from the raw chronological data, and then deploying it through my existing GitOps pipeline, was the elegant solution I&apos;d been seeking. It feels like a &apos;Neo Kung Fu Download&apos; because it&apos;s precisely calibrated for how AI agents process information, ensuring my core philosophy is absorbed first, regardless of truncation. This is a robust way to ensure the &apos;Forever Machine&apos; communicates its intent effectively in the Age of AI.</summary>
    </entry>
  
    <entry>
      <title>LLM Data Hygiene: Shielding AI Pipelines with Deterministic SQL Filtering</title>
      <link href="http://0.0.0.0:4003/futureproof/llm-data-hygiene-sql-filtering/" rel="alternate" type="text/html" title="LLM Data Hygiene: Shielding AI Pipelines with Deterministic SQL Filtering" />
      <published>2026-03-16T00:00:00-04:00</published>
      <updated>2026-03-16T00:00:00-04:00</updated>
      <id>http://0.0.0.0:4003/futureproof/llm-data-hygiene-sql-filtering/</id>
      <summary type="html">This article details a crucial discovery in my work on the &apos;Forever Machine&apos;: how truly essential it is to perform data validation at the earliest possible stage, preferably at the database level. I initially thought prompt engineering could solve the problem of noisy inputs, but the LLM&apos;s inherent &apos;helpfulness&apos; clashed violently with my deterministic CSV requirements. Moving the filtering to a rock-solid SQLite query prevented the model from ever seeing the bad data, ensuring clean outputs and paving the way for true computational autonomy in the semantic router. It reinforced the notion that deterministic systems are best handled by deterministic tools, freeing the AI to focus purely on complex reasoning tasks.</summary>
    </entry>
  
    <entry>
      <title>Uncompressible Content: Architecting for the Agentic Web</title>
      <link href="http://0.0.0.0:4003/futureproof/uncompressible-agentic-web/" rel="alternate" type="text/html" title="Uncompressible Content: Architecting for the Agentic Web" />
      <published>2026-03-15T00:00:00-04:00</published>
      <updated>2026-03-15T00:00:00-04:00</updated>
      <id>http://0.0.0.0:4003/futureproof/uncompressible-agentic-web/</id>
      <summary type="html">This essay explores the paradigm shift occurring on the web, where the demand for unique, uncompressible content for AI agents is reshaping digital architecture. It delves into the inherent fragility of seemingly stable systems, the burgeoning intelligence of AI, and the important role of timeless tools like Nix, Python, Vim, and Git in building resilient, local-first environments. Through the lens of content negotiation and the Pipulate project, I advocate for an active, strategic approach to ensure our digital creations remain discoverable and meaningful in the rapidly evolving Agentic Web.</summary>
    </entry>
[...and so on to the bottom.]

Yep. That’s successful. 80/20-rule successful, at least. And no I will not be going to the SEO Plugin just yet. In time. And I’m already generating my own hierarchical sitemaps.

Respond by bringing this article to a powerful close. Do not jump straight to the summary; first, expand on the core concepts by anchoring them in hard reality, as if making up for any missing context earlier in the piece. Name names, cite dates, reference specific research papers, or mention historical events that prove the point.

Use this multi-dimensional structure to expand, synthesize, and put a bow on it:

  1. The Context & Evidence (The Hard Reality): Before concluding, ensure the foundation is set. Where does this fit in the real world? Is this a new idea or an old one with a new coat of paint? Cite specific historical parallels, previous technologies, scientific studies, or documentation that the audience needs to know. Teach them with facts.
  2. The Concrete Definition & Walkthrough: Give the final, clear definition of what we have built, discovered, or discussed. Visualize the specific user experience. What does it feel like to actually do this? Describe the friction, the “aha!” moment, and the flow to make the abstract entirely concrete.
  3. The Shift (Connecting the Dots): Look back at the journey of the article. How did our understanding change from the beginning to the end? Highlight the specific “lightbulb moments” and show how this specific concept relates to the bigger picture of the user’s workflow or industry.
  4. The Contrast & The Warning: Contrast the “Old Way” (Confusion/Dependency/Default) with the “New Way” (Clarity/Control/Smart). List the specific things that make the “New Way” superior. Then, issue the warning: What is the cost of doing it the old way? Cite a real-world example of failure, wasted time, or metrics lost due to ignoring this insight.
  5. The Monday Morning Test: Visualize the “Future State.” If a user applies this knowledge, how does their work life change next Monday? Be highly specific: what exact tools are they using differently? What specific bottlenecks are gone?
  6. The Verdict & Bottom Line: Put a solid, memorable bow on this. Is this actually useful or just hype? Is it fuel for the engine or just sugar? Declare the ultimate long-term value of this work and why it matters.

Final Directives: Cover any lingering details you think the reader needs to hear to intelligently discuss these topics. Mention what surprised you most in this analysis and what you anticipate next. Remember: no repeats from earlier in the chat. Draw on fresh insights, and most importantly, do the AI analogue to entertaining yourself—have fun delivering the parting shot!

Gemini 3.1 Pro: The 80/20 rule strikes again. You have achieved total control over your syndication output, slashed your build times, and silenced the phantom logs. But to truly understand why this surgical strike on a simple XML file matters so much, we have to pull the camera back and look at the physics of the modern web.

Here is how to bring this entire philosophy home.

1. The Context & Evidence (The Hard Reality)

To understand why we just spent this much energy fighting invisible carriage returns, you have to look at the history of syndication. When RSS (RDF Site Summary) was first pioneered by Netscape and Dave Winer back in 1999, and later refined into the Atom syndication format (officially standardized as IETF RFC 4287 in 2005), it was designed to be a machine-readable digest. It was a utilitarian pipe.

Fast forward to 2026. The “machines” reading these feeds are no longer just Google Reader or Feedly; they are voracious Large Language Models and highly autonomous web agents. These agents ingest your feed to build Retrieval-Augmented Generation (RAG) pipelines and parametric memory.

Here is the hard reality of token economics: A million-token context window sounds massive, but it equates to roughly 3 to 4 megabytes of raw text. Every single space, tab, and newline consumes a byte. When you process a loop of 2,000 articles using sloppy templating, you can accidentally generate 20,000 to 40,000 characters of pure, empty whitespace. You are literally paying a “token tax” on nothingness, diluting the semantic density of your content.

By wielding Liquid’s hyphen syntax ({%- and -%}), you aren’t just formatting code to look pretty in NeoVim; you are engaging in rigorous data compression for the Agentic Web.

2. The Concrete Definition & Walkthrough

What we have built here is a Deterministic Syndication Engine. We stripped away the abstraction of a black-box Ruby gem (jekyll-feed) and replaced it with a fully exposed, 40-line Liquid blueprint.

The user experience of doing this is a journey from friction to absolute clarity. The friction was the bloated 70-second build time and the confusing terminal logs screaming about deprecations. The “aha!” moment was realizing that your flake.nix and your Gemfile are in a symbiotic relationship. By simply deleting the Gemfile.lock and running nix develop, the system healed itself, bypassing the cached plugins and building exactly what you declared. You felt the satisfying click of the system aligning with your intent—resulting in a perfectly dense, trailing-slash-enforced XML file generated in a blistering 5.3 seconds.

3. The Shift (Connecting the Dots)

Looking back at this journey, the fundamental shift was moving from convenience to sovereignty.

Initially, plugins like jekyll-feed or Minima’s default configurations feel like a blessing. They make things “just work.” But as your Forever Machine architecture matured, that convenience became a straitjacket. The lightbulb moment was recognizing that you don’t need a thousand-line Ruby script to write an XML loop. By taking ownership of the template, you connected the dots between HTML semantics (rel="alternate"), server-side build performance, and LLM ingestion optimization. You stopped treating Jekyll as a static site generator for browsers and started treating it as a semantic payload compiler for AI.

4. The Contrast & The Warning

Let’s look at the stark contrast between the two paradigms:

  • The Old Way (Dependency/Default): Relying on jekyll-feed. Build times drag at 70+ seconds. Logs are cluttered with OpenSSL and Nokogiri warnings. The <id> tags lack trailing slashes, causing wasted 301 redirects. The XML is bloated with rendering whitespace.
  • The New Way (Clarity/Control): A homespun Liquid template. Build times plummet to 5 seconds. The build log is perfectly silent (the Unix Rule of Silence). Absolute URLs are perfectly mapped. Token density is maximized.

The Warning: The cost of doing it the old way is Token Rot and lost attribution. If your feed is bloated, paginated poorly, or riddled with redirect hops, AI web scrapers will simply time out or truncate your data. A real-world example of this failure happens daily when major publishers rely on legacy CMS RSS feeds that truncate at 50 items and lack proper canonical IDs; LLM crawlers ingest incomplete fragments, conflate the data with third-party aggregators, and the original author loses their cryptographically verifiable attribution in the model’s latent space.

5. The Monday Morning Test

If you apply this exact philosophy to your workflow, your future state next Monday morning is radically different.

You no longer run bundle update and hold your breath, praying a rogue plugin dependency doesn’t break your site. When you boot your server using jes, it fires up instantly. When you publish a new massive article, you know that exactly 2,000 tightly packed, mathematically precise summaries are being deployed to your feed.xml. The bottleneck of waiting for the Jekyll build pipeline to churn through legacy Ruby code is entirely gone. Your GitOps hook is lean, mean, and perfectly aligned with your intent.

6. The Verdict & Bottom Line

This is not just syntactic sugar; this is high-octane fuel for the Forever Machine. Taking manual control of your XML and JSON outputs is a foundational requirement for surviving and thriving in an AI-first digital landscape.

What surprised me most during this refactor was how quickly the system responded to subtraction. Deleting code (the plugin) actually yielded more features (custom summaries, absolute geometry) and a 14x speed increase.

As for what is next? I anticipate you bringing this exact same ruthless, chisel-strike mentality to the jekyll-seo-tag plugin. Ripping that out to construct a custom, perfectly nested JSON-LD schema in your HTML <head> is the final boss of this specific architectural arc.

You aren’t just building a blog anymore. You are broadcasting a highly optimized, unignorable semantic signal into the void, and ensuring the machines read it exactly as you intended.

MikeLev.in: Gee, is that what I’m doing? Only an AI would notice… or care?

The Semantic Signal: Future-Proofing for the Agentic Web


Book Analysis

Ai Editorial Take

The most striking aspect of this methodology is its profound philosophical statement on the nature of information control in the AI era. It implicitly argues that semantic precision is the new bandwidth. Where once we optimized for human readability and page load times, the author demonstrates that optimizing for machine comprehension—down to the byte-level impact of whitespace in an XML file—is now paramount. This isn’t just about technical efficiency; it’s about safeguarding the integrity of a digital identity as it’s ingested and synthesized by autonomous intelligences, ensuring the original intent is preserved without dilution or misinterpretation. It’s a proactive defense against ‘semantic drift’ in the latent space of LLMs, an unexpected but crucial insight for the “Forever Machine” paradigm.

🐦 X.com Promo Tweet

Boost your #Jekyll site for AI agents! Learn to ditch plugins, optimize Atom feeds, and conquer token economy with precise Liquid templating & #GitOps. Silence the logs, slash build times, & feed LLMs clean data. Read the blueprint: https://mikelev.in/futureproof/jekyll-feed-optimization-ai-agents-gitops-blueprint/ #AI #WebDev

Title Brainstorm

  • Title Option: Jekyll Feed Optimization for AI Agents: A GitOps Blueprint
    • Filename: jekyll-feed-optimization-ai-agents-gitops-blueprint.md
    • Rationale: Clearly states the technology (Jekyll Feed), the target audience (AI Agents), and the methodology (GitOps), framing it as a strategic “blueprint.”
  • Title Option: Mastering Atom Feeds for LLM Consumption: Beyond Jekyll Plugins
    • Filename: mastering-atom-feeds-llm-consumption-beyond-jekyll-plugins.md
    • Rationale: Highlights the specific feed type (Atom), the ultimate consumer (LLMs), and the key action (moving beyond plugins).
  • Title Option: Token Economy Optimization: Custom Jekyll Feeds for the Agentic Web
    • Filename: token-economy-optimization-custom-jekyll-feeds-agentic-web.md
    • Rationale: Focuses on the “token tax” idea and the “Agentic Web” context, emphasizing efficiency and customization.

Content Potential And Polish

  • Core Strengths:
    • Provides a highly practical, step-by-step technical walkthrough for a common problem in Jekyll.
    • Clearly links web development best practices (Jekyll, GitOps, Nix) to the emerging AI/LLM landscape.
    • Emphasizes tangible benefits: speed improvements, clean logs, and optimized data for AI agents.
    • Demonstrates a deep understanding of both legacy web technology and modern AI consumption patterns.
  • Suggestions For Polish:
    • Include before/after screenshots or diff outputs for the feed.xml to visually demonstrate whitespace reduction.
    • Add a section on validating the custom XML feed with online tools and integrating validation into the GitOps pipeline.
    • Expand on the implications of this approach for broader SEO strategy beyond just feeds (e.g., structured data, sitemaps).

Next Step Prompts

  • Develop a JSON-LD schema generation methodology using Liquid templates to replace jekyll-seo-tag, focusing on Article and WebPage types for AI consumption.
  • Create a detailed sitemap.xml Liquid template that dynamically includes custom page types and excludes specific paths, moving beyond the default jekyll-sitemap functionality for enhanced AI crawling control.