Setting the Stage: Context for the Curious Book Reader

In an era where digital interaction is increasingly shaped by AI, understanding who (or what) visits your website has never been more important. This essay delves into the philosophy of reclaiming direct control over your online presence through home-hosting, offering a methodology to instrument your site not just to serve content, but to actively interrogate the intent of every visitor. It highlights practical steps for distinguishing sophisticated AI agents from simple scrapers, providing an important blueprint for navigating the complexities of the modern web.


Technical Journal Entry Begins

The potential next projects are multiplying in my head like what you might expect to find down a rabbit hole. And so this is when we back up and are grateful for the wins we were able to bank so far on this project. And it’s been formidable tiny win after formidable tiny win.

So I haven’t been YouTubing that much recently. Those who follow me here on YouTube might have noticed. Yeah, I’m talking to you Miles Moyer and Joseph Melnick! Thanks for always checking up on me. Hope you’re having a happy holidays. So does live-casting this 24 by 7 by 365 count as catching up? Haha.

But really, there’s so much knowledge that one must have these days about how AI bots are visiting sites precisely, that I had to move on this home-hosting project before it was too late. Not many years ago a bunch of people could have told you how to do it. Fast-forward to today. I guess some people who ware home-hosting stuff for games will go to this level and usually just use the “DMZ jack” on their home-router WiFI combo from their ISP. All the difficulty of home-network stuff for gaming has been pretty much dealt with because there was demand. But almost nobody is using all that considerable bandwidth they’re paying for (for streaming TV and stuff) for home hosting. It seems totally obvious and almost a no-brainier, but for that final mile of near impossibility of challenges for the newb. And so now that you need to do that to watch the bot behaviors first-hand and personally, it seems to be a lost art.

Reclaiming the Web: The Imperative of Home Hosting

Well, not so fast. Static sites are easy to host. They don’t take much computing power or much bandwidth. Yet, the valuable information yielded by hosting one yourself far outweighs the technical challenges of recapturing this old geek pastime. And so we do. It seems like a perfect use case for recruiting a bit of AI mentoring for something just a bit higher than you pay-grade. Be your own sysadmin, netadmin and devops guy. Maybe even be your own technical co-founder if that’s the language you use for this sort of basic capability expansion.

See, the thing is you as a human can control a lot of automation stuff. It’s just that you don’t speak in player piano hole-punching directly. Nor loom. Nor 3D printer. Whatever. But you speak the combination of English (or “human” as Jensen Huang puts it) and you speak Python. Don’t you? Well, you should because it’s the language that bridges the two worlds enough to satisfy work contract a agreements. You’ll never agree if pass/fail is subjective. But if it’s based on whether some objective test evaluates true, then you’ve got some mechanical advantage achieving stuff.

Bridging Worlds: Automation and the Language of Python

Yet, already I’m a bit exhausted on what I just did so far today. So we’re going to switch to the easy stuff. The softballs of just putting some things like a local copy of MathJax in location, my Google Analytics back in and the JavaScript bot trap. I also think I want to get the amazing d3js force graph that I create from this site’s data and hierarchical scheme.

Initial Instrumentation: Easy Wins with JavaScript Payloads

Hmmm. That smart 404 project is nagging at me a bit too. But it’s really about what AI-bots execute JavaScript and which don’t at this point. That’s the draw and it’s basically the same work as filling in those other JavaScript static resources. Doing one is the same as the other and is the next obvious step.

Gemini 3 Pro: This is the “Decoration Phase”.

You have the foundation (NixOS), the engine (Python/Stream), and the structure (Jekyll). Now you are hanging the paintings and wiring the sensors.

You are right to group these. Google Analytics, MathJax, and the Bot Trap are all just JavaScript payloads injected into _layouts/default.html. They are “softballs” because they require no backend logic change—just template surgery.

Here is the implementation plan for your “Easy Wins.”

1. The _layouts/default.html Surgery

We are going to add three things to your base template:

  1. MathJax: For rendering those beautiful equations.
  2. Google Analytics (GA4): To see the humans (and smart bots).
  3. The Bot Trap: A silent beacon that only fires if the visitor executes JavaScript.

File: _layouts/default.html

<!DOCTYPE html>
<html lang="{{ page.lang | default: site.lang | default: "en" }}">

  {%- include head.html -%}

  <body>

    <script>
      (function() {
        var img = new Image();
        // Add timestamp to bust cache
        img.src = "/assets/pixel.png?pulse=" + Date.now();
      })();
    </script>

    <script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
    <script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());
      gtag('config', 'G-XXXXXXXXXX');
    </script>

    <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
    <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

    <main class="page-content" aria-label="Content">
      <div class="wrapper">
        {{ content }}
      </div>
    </main>

    {%- include footer.html -%}

  </body>

</html>

Note: You will need to create a tiny 1x1 transparent image at assets/pixel.png for the trap to hit without 404ing, though a 404 works just as well for detection if you filter for it.

2. The D3 Force Graph (The Map)

You want to visualize the “Rabbit Hole.” This requires two parts:

  1. The Cartographer (Python): Scans your markdown files and builds a JSON link map.
  2. The Canvas (HTML/JS): Renders that JSON using D3.

Part A: The Cartographer (scripts/map_site.py) This script parses your _posts folder, finds internal links [text](/slug), and generates nodes and links.

import os
import re
import json
from pathlib import Path

# Config
POSTS_DIR = Path("/home/mike/repos/MikeLev.in/_posts")
OUTPUT_FILE = Path("/home/mike/repos/MikeLev.in/assets/graph.json")

def generate_graph():
    nodes = []
    links = []
    slug_to_id = {}
    
    # 1. First Pass: Collect Nodes (Articles)
    files = list(POSTS_DIR.glob("*.md")) + list(POSTS_DIR.glob("*.markdown"))
    
    for i, filepath in enumerate(files):
        with open(filepath, 'r') as f:
            content = f.read()
            
        # Extract Slug (naive, assumes filename or permalink)
        filename = filepath.name
        slug = "/" + filename[:10] + "-" + filename[11:].replace(".md", "").replace(".markdown", "") + "/"
        
        # Extract Title (Naive regex)
        title_match = re.search(r'^title:\s*"(.*?)"', content, re.MULTILINE)
        title = title_match.group(1) if title_match else slug
        
        nodes.append({"id": slug, "group": 1, "title": title})
        slug_to_id[slug] = slug

    # 2. Second Pass: Collect Links
    link_pattern = re.compile(r'\[.*?\]\((/.*?/)\)')
    
    for filepath in files:
        with open(filepath, 'r') as f:
            content = f.read()
            
        filename = filepath.name
        source_slug = "/" + filename[:10] + "-" + filename[11:].replace(".md", "").replace(".markdown", "") + "/"
        
        matches = link_pattern.findall(content)
        for target_slug in matches:
            # Only link if target exists in our graph
            # This filters out external links or broken paths
            # We do a loose match because permalinks vary
            if target_slug in slug_to_id:
                 links.append({"source": source_slug, "target": target_slug, "value": 1})

    graph = {"nodes": nodes, "links": links}
    
    with open(OUTPUT_FILE, 'w') as f:
        json.dump(graph, f, indent=2)
    
    print(f"🗺️  Graph generated: {len(nodes)} nodes, {len(links)} connections.")

if __name__ == "__main__":
    generate_graph()

Part B: The Canvas (_includes/graph.html) You can create a new page graph.md that includes this HTML snippet to render the data.

<div id="d3-graph" style="width: 100%; height: 600px; border: 1px solid #333;"></div>
<script src="https://d3js.org/d3.v7.min.js"></script>
<script>
  d3.json("/assets/graph.json").then(function(data) {
      const width = 800;
      const height = 600;

      const simulation = d3.forceSimulation(data.nodes)
          .force("link", d3.forceLink(data.links).id(d => d.id))
          .force("charge", d3.forceManyBody().strength(-100))
          .force("center", d3.forceCenter(width / 2, height / 2));

      const svg = d3.select("#d3-graph").append("svg")
          .attr("viewBox", [0, 0, width, height]);

      const link = svg.append("g")
          .attr("stroke", "#999")
          .attr("stroke-opacity", 0.6)
        .selectAll("line")
        .data(data.links)
        .join("line");

      const node = svg.append("g")
          .attr("stroke", "#fff")
          .attr("stroke-width", 1.5)
        .selectAll("circle")
        .data(data.nodes)
        .join("circle")
          .attr("r", 5)
          .attr("fill", "orange")
          .call(drag(simulation));

      node.append("title")
          .text(d => d.title);

      simulation.on("tick", () => {
          link
              .attr("x1", d => d.source.x)
              .attr("y1", d => d.source.y)
              .attr("x2", d => d.target.x)
              .attr("y2", d => d.target.y);

          node
              .attr("cx", d => d.x)
              .attr("cy", d => d.y);
      });

      function drag(simulation) {
          function dragstarted(event) {
              if (!event.active) simulation.alphaTarget(0.3).restart();
              event.subject.fx = event.subject.x;
              event.subject.fy = event.subject.y;
          }

          function dragged(event) {
              event.subject.fx = event.x;
              event.subject.fy = event.y;
          }

          function dragended(event) {
              if (!event.active) simulation.alphaTarget(0);
              event.subject.fx = null;
              event.subject.fy = null;
          }

          return d3.drag()
              .on("start", dragstarted)
              .on("drag", dragged)
              .on("end", dragended);
      }
  });
</script>

The Deep Dive: Why These “Softballs” Matter

  1. The Context (The Lost Art): You are bringing back the era of the Webmaster. In 2010, every SEO knew how to check server logs and paste a GA snippet. Today, that knowledge is hidden behind “Integrations” and “Plugins.” By manually injecting these scripts, you are bypassing the abstraction layer. You are seeing exactly what code executes on your user’s machine.
  2. The Evidence (The Pulse): The “Bot Trap” script (new Image().src) is the most reliable detector of Headless Browsers (like Puppeteer or Playwright) vs. Simple Scrapers (like curl or requests).
    • Simple Scraper: downloads HTML. Does NOT execute JS. The server sees the page hit, but never sees the /pixel.png hit.
    • Headless Browser / AI Agent: downloads HTML, executes JS. The server sees the page hit AND the /pixel.png hit.
    • Result: You can now segment your traffic into “Dumb Bots” and “Smart Agents.”
  3. The Walkthrough (The Trap): Imagine an AI crawler (like GPTBot) hits your homepage. It reads the text. It sees the script tag. If it’s just scraping text for training, it ignores the script. Trap status: Empty. If it’s an “Agent” trying to understand the page structure or render content for a user, it executes the script. The img.src fires. Your server logs a hit to pixel.png. Trap status: Sprung. You now have a binary signal for “Execution Capability.”
  4. Connect the Dots (D3): The Force Graph isn’t just eye candy. It is a visual representation of your Internal Linking Strategy. In the “Age of AI,” structure matters. If your graph shows isolated clusters (orphaned pages), AI bots won’t find them. If it shows a dense, interconnected core, you have built a strong “Knowledge Graph” for the AI to consume. The visual confirms the architecture.
  5. The Reality Check: Without the manual D3 script, you are guessing at your site structure or relying on expensive SEO tools (Ahrefs, Screaming Frog) to tell you what you already own. Without the manual Bot Trap, you are relying on User-Agent strings, which are essentially “self-reported” lies. The execution of JS is a behavior, not a label. Behavior never lies.
  6. The Verdict: These aren’t softballs; they are Instrumentation. You are wiring the dashboard of your machine. You cannot fly the ship if you don’t know who is on board (GA/Trap) and where you are going (Graph).

What Surprised Me: The realization that the “Magic Cookie” metaphor applies to the graph generation too. You are creating a small, static JSON file (a cookie) that the client-side D3 script consumes to explode into a complex visualization. It keeps the “Cartridge” (the server) light and moves the compute to the user’s browser. That is true Local-First philosophy.

Me: Yeah, I’m not going to bore the audience silly (even more) explaining how to store resources down local from where the CDN hosts it (MathJax) or how to get your GA code. Maybe I’ll pick up later with how we implement the JavaScript trap so that it’s not merely the calling of the JavaScript likely .js-file itself that indicates the ability to execute JavaScript but it is actually the next https following that which was only made because the JavaScript itself executed making it make that baton-passed resource call. And that’s sort of a captcha for browser-wielding bots.

Advanced Bot Detection: The ‘Baton Pass’ Protocol

But I want to put that final bow on this article and take a break. You mostly put that bow on it already so just say anything else that you think should be contemplated between now and that static/local-ifying mostly JavaScript files that you know is coming up. I guess also anything you want to say about how we might format that report. We will have for any given call on any URL a secondary and tertiary call automatically if the actual thing making the initial https source request has a browser on the other side executing the code it just fetched.

I don’t think most people think of it this way, but displaying a webpage is exactly the same as running the code of a program. And that is entirely different than merely just making a request for a text file that you can then do whatever with. The actual Web such as is made possible by a fault-tolerant and astoundingly robust program that runs the HTML-code and does whatever additional things it needs to do, like fetch subsequent resources, to display that page. That’s incredibly resource intensive compared to a mere fetch.

The Web as a Program: Deeper Implications for Traffic Analysis

I just want to stimulate you to fill in a whole bunch of stuff here I should be thinking about between now and coming back to do that static resource stuff.

Gemini 3 Pro: The final bow on this article is the realization that you are not hosting a document; you are hosting a program.

When you serve a webpage, you are transmitting source code (HTML/JS) to a remote machine, asking it to compile and execute that code. This reframes the entire interaction.

The Things to Contemplate Before the Next Step

Here are the seeds to plant in your mind before we tackle the static resources and the JavaScript trap implementation.

  1. The Asymmetry of “Execute” vs. “Fetch”:
    • Fetch (The Scraper): GET /index.html. Cost to bot: Microseconds. Cost to you: Micro-bandwidth.
    • Execute (The Browser/Agent): GET /index.html -> Parse DOM -> Build Render Tree -> Execute JS -> GET /pixel.png -> GET /mathjax.js -> GET /font.woff.
    • The Contemplation: The gap between these two behaviors is where the intelligence lives. A bot that “Executes” is investing energy in your site. Why? Is it an LLM agent trying to “understand” the page structure for a user? Or is it a headless browser trying to bypass a CAPTCHA? The resource intensity is a signal of intent.
  2. The “Baton Pass” as a Signal: You mentioned the “baton-passed resource call.” This is key. The request for /pixel.png (or whatever trap we set) is not an independent event. It is causally linked to the successful execution of the previous code.
    • Thinking Point: How do we link these in the logs? If IP 1.2.3.4 requests / at 12:00:00 and then /pixel.png at 12:00:01, that’s a “Session.” We need to think about how HoneyDB will reconstruct these sessions. We might need a session_id generated by the JS to cryptographically prove the link.
  3. The “Ghost” Traffic: When you implement local resources (MathJax, fonts), your log volume will triple. Every single human visit will now generate 10+ hits (html, css, js, font, png).
    • The Challenge: logs.py currently shows everything. It will become a blur. We will need to filter out the “supporting cast” (static assets) from the “main characters” (pages) in the real-time stream, while still recording them in the DB for the “Trap Analysis.” You will need to start thinking about “Signal vs. Noise” in your visualization.
  4. The Report Format (The Scorecard): You asked how to format the report. Think of it as a “Capabilities Matrix.” Instead of just a list of User Agents, imagine a table:
    • Row: User Agent (Grouped)
    • Col 1: Total Hits
    • Col 2: Executed JS? (✅/❌)
    • Col 3: Loaded CSS? (✅/❌)
    • Col 4: Followed Invisible Link? (The “Dumb” Flag)

This turns your report into a Bot Personality Test. You aren’t just counting them; you are profiling their cognitive abilities.

The Final Verdict: You have successfully turned a “server” into a “sensor.” By actively injecting code that demands execution (the JS Trap) and visualizing the result (the Report), you are no longer passively serving content. You are actively interrogating every visitor that walks through the door. You are asking: “Are you a ghost (scraper) or are you alive (browser)?”

Enjoy the break. When you come back, we turn the lights on.


Book Analysis

Ai Editorial Take

This essay offers a robust methodology for any individual or organization seeking to truly understand their digital footprint in the Age of AI. Its core strength lies in reframing the web server as a sophisticated sensor, allowing for nuanced insights into visitor intent beyond simplistic analytics. The practical code examples and the emphasis on behavioral analysis make this an important blueprint for developing a truly instrumented online presence. The concepts of ‘baton pass’ and ‘asymmetry of execution’ are particularly profound, providing a solid foundation for future development in bot profiling.

Title Brainstorm

  • Title Option: Mastering the Digital Terrain: Home Hosting, Bot Traps, and Site Visualization in the Age of AI
    • Filename: mastering-digital-terrain-home-hosting-bot-traps-site-visualization.md
    • Rationale: Captures the breadth of the article’s technical and philosophical scope, highlighting key actionable elements and the overarching “Age of AI” context.
  • Title Option: From Passive Webpage to Active Sensor: Instrumentation for AI-Driven Insights
    • Filename: passive-webpage-active-sensor-instrumentation-ai-insights.md
    • Rationale: Emphasizes the transformation of the website’s role and the core concept of using instrumentation to gain insights into AI behavior.
  • Title Option: The Webmaster’s Revival: Home Hosting & Advanced Bot Detection in the AI Era
    • Filename: webmasters-revival-home-hosting-advanced-bot-detection-ai-era.md
    • Rationale: Focuses on the “lost art” aspect and the practical application of bot detection, connecting it to the current technological landscape.
  • Title Option: Decoding Digital Visitors: JavaScript Traps and D3 Maps for AI Age Webmasters
    • Filename: decoding-digital-visitors-javascript-traps-d3-maps-ai-webmasters.md
    • Rationale: Highlights the “decoding” aspect and the specific tools (JS traps, D3 maps) used for this purpose, appealing to a technically inclined audience.

Content Potential And Polish

  • Core Strengths:
    • Provides a novel perspective on website hosting as an active, interrogative process rather than passive content delivery.
    • Clearly outlines practical, actionable steps for implementing advanced bot detection and site visualization.
    • Strong emphasis on understanding AI agent behavior through execution signals, moving beyond simple user-agent strings.
    • Connects technical implementations to broader philosophical implications for webmasters in the Age of AI.
    • The “Gemini 3 Pro” sections offer excellent, structured technical guidance and insightful commentary.
  • Suggestions For Polish:
    • The introduction could more quickly establish the core thesis of “hosting a program” to immediately hook the reader.
    • Ensure consistent terminology when referring to “bots,” “AI agents,” “scrapers,” and “headless browsers” or explicitly define them early.
    • Consider a small section on the ethical implications or privacy considerations of advanced bot detection, even if brief.
    • A clear “call to action” or next steps for readers looking to implement these strategies themselves could be beneficial.

Next Step Prompts

  • Develop a detailed guide for configuring HoneyDB or a similar logging system to track the ‘baton pass’ signals (e.g., page load -> pixel hit) and reconstruct visitor sessions for advanced bot profiling.
  • Expand on the ‘Capabilities Matrix’ report format, detailing specific metrics and visualizations needed to effectively profile different types of AI agents and human users based on their execution behavior.