Cracking Google Gemini’s Hidden 20 RPD Free Tier Limit with Python

Setting the Stage: Context for the Curious Book Reader

This journal entry chronicles an important debugging journey, transforming what initially appeared to be a simple rate limit adjustment into a deeper investigation of undocumented API behaviors. It serves as a methodology for developers in the Age of AI, illustrating how empirical testing and robust scripting can expose the true operational constraints of free-tier services. This blueprint details the process of discerning the precise limits, adapting code for resilience, and ultimately gaining a more sovereign understanding of cloud platform dependencies.

Technical Journal Entry Begins

Me: Google Gemini rate limits are usually measured across three dimensions: Requests per minute (RPM), Tokens per minute (input) (TPM), Requests per day (RPD). How can I find the RPM, TPM and RPD limits on the free tier for each model?

Google AI Overview: You can find the specific RPM, TPM, and RPD limits for your free tier access in the Google AI Studio dashboard (also known as the Google Cloud Console for specific projects). The limits can vary by model, region, and account status, so checking your specific dashboard is the most accurate way. [1, 2, 3, 4, 5]
General Free Tier Rate Limits While your dashboard will have the definitive numbers, generally published free tier limits for popular models are around the following values:

Model [6, 7, 8, 9]	Requests per Minute (RPM)	Tokens per Minute (TPM)	Requests per Day (RPD)
Gemini 2.5 Pro	2-5	125,000 - 250,000	50 - 100
Gemini 2.5 Flash	10	250,000	250 - 500
Gemini 1.5 Pro	2	32,000	50
Gemini 1.5 Flash	15	1,000,000	1,500

Important Considerations:

• Rate limits apply per project, not per API key. • RPD quotas reset at midnight Pacific Time (PT). • The limits specified in documentation are the maximum supported, but actual available capacity might be lower based on system load or billing account standing. • Reaching any limit (RPM, TPM, or RPD) will trigger a rate limit error, even if other limits are not reached. [1, 10]

How to Monitor Your Limits 1. Go to the Dashboard: Navigate to the Google AI Studio dashboard and select your project.

Check Quotas: For detailed monitoring and to see exact limits, it may be necessary to go to the IAM & admin -> Quotas section in the Google Cloud console and filter by the “Gemini API” service.

Upgrading to a higher usage tier may be considered if the free tier limits are consistently reached, as this offers increased rate limits. [1, 11]

AI responses may include mistakes.

[1] https://ai.google.dev/gemini-api/docs/rate-limits
[2] https://discuss.ai.google.dev/t/limits-of-free-tier-api-vs-ai-studio/94918
[3] https://support.google.com/a/answer/6301355?hl=en
[4] https://ai.google.dev/gemini-api/docs/rate-limits
[5] https://discuss.ai.google.dev/t/gemini-2-5-pro-with-empty-response-text/81175?page=7
[6] https://github.com/musistudio/claude-code-router/issues/969
[7] https://discuss.ai.google.dev/t/got-youve-reached-your-rate-limit-on-gemini-2-5-pro/76669?page=2
[8] https://ai.google.dev/gemini-api/docs/pricing
[9] https://gemini-api.apidog.io/doc-965865
[10] https://firebase.google.com/docs/ai-logic/quotas
[11] https://ai.google.dev/gemini-api/docs/rate-limits
[12] https://docs.voyageai.com/docs/rate-limits
[13] https://ai.google.dev/gemini-api/docs/api-key
[14] https://discuss.ai.google.dev/t/gemini-2-5-pro-with-empty-response-text/81175?page=7

If this is true then I should be able to adjust this:

import os
import sys
import json
import re
import time
import argparse
import getpass
from pathlib import Path
from datetime import datetime
import google.generativeai as genai
import frontmatter  # Requires: pip install python-frontmatter

# --- CONFIGURATION ---
CONFIG_DIR = Path.home() / ".config" / "articleizer"
API_KEY_FILE = CONFIG_DIR / "api_key.txt"
TARGETS_FILE = CONFIG_DIR / "targets.json"

# Model to use (Flash is best for high-volume, low-cost processing)
MODEL_NAME = 'gemini-2.5-flash' 

# Safe default
DEFAULT_TARGETS = {
    "1": {
        "name": "Local Project (Default)",
        "path": "./_posts"
    }
}

def load_targets():
    """Loads publishing targets from external config or falls back to default."""
    if TARGETS_FILE.exists():
        try:
            with open(TARGETS_FILE, 'r') as f:
                return json.load(f)
        except json.JSONDecodeError:
            print(f"⚠️ Warning: {TARGETS_FILE} is corrupt. Using defaults.")
    return DEFAULT_TARGETS

PROJECT_TARGETS = load_targets()

def get_api_key():
    """Gets API key from config or prompts user."""
    if API_KEY_FILE.is_file():
        return API_KEY_FILE.read_text().strip()
    
    print("Google API Key not found in config.")
    key = getpass.getpass("Enter your Google API Key: ")
    if key.strip():
        save = input("Save key? (y/n): ").lower()
        if save == 'y':
            CONFIG_DIR.mkdir(parents=True, exist_ok=True)
            API_KEY_FILE.write_text(key.strip())
            API_KEY_FILE.chmod(0o600)
    return key.strip()

def extract_metadata_and_content(file_path):
    """Reads markdown file, extracts YAML frontmatter and body."""
    try:
        post = frontmatter.load(file_path)
        return {
            "frontmatter": post.metadata,
            "content": post.content,
            "filename": file_path.stem
        }
    except Exception as e:
        print(f"❌ Error reading {file_path.name}: {e}")
        return None

def generate_context_json(article_data):
    """Calls Gemini to compress the article into a context JSON object."""
    
    # Construct the Prompt
    prompt = f"""
    You are a Knowledge Graph Architect. Your goal is to compress the provided technical article into a 'Holographic Shard'—a minimal JSON object that acts as a context pointer for a Retrieval Augmented Generation system.

    **Goal:** Fit maximum semantic meaning into approximately 200 tokens (800 bytes).

    **Input Data:**
    - Title: {article_data['frontmatter'].get('title', 'Unknown')}
    - Date: {article_data['frontmatter'].get('date', 'Unknown')}
    - Filename: {article_data['filename']}
    - Content: 
    {article_data['content'][:15000]}  # Truncate to avoid context limit issues, usually enough

    **Instructions:**
    1. **Analyze:** Read the content. Look past the title. Find specific technologies, concepts, or "aha!" moments buried in the text.
    2. **Extract Sub-topics ('sub'):** Identify 3-5 distinct, specific sub-topics or "juicy" details that are NOT just the title re-worded. (e.g., "Fixing Pandas int/str errors", "The 'Chisel Strike' method").
    3. **Summarize ('s'):** Write a concise 1-2 sentence summary of the core thesis.
    4. **Keywords ('kw'):** Extract 3-5 high-value technical keywords (e.g., "NixOS", "HTMX", "Sovereignty").

    **Output Format:**
    Provide ONLY a valid JSON object. No markdown formatting around it if possible, but I will parse it out.
    
    Schema:
    {{
      "id": "{article_data['filename']}",
      "d": "YYYY-MM-DD",  // Extract from filename or frontmatter
      "t": "Article Title",
      "s": "Concise Summary",
      "sub": ["Subtopic 1", "Subtopic 2", "Subtopic 3"],
      "kw": ["Keyword1", "Keyword2"]
    }}
    """

    model = genai.GenerativeModel(MODEL_NAME)
    
    try:
        response = model.generate_content(prompt)
        text = response.text.strip()
        
        # Clean up Markdown code blocks if present
        if text.startswith("```json"):
            text = text[7:]
        if text.startswith("```"):
            text = text[3:]
        if text.endswith("```"):
            text = text[:-3]
            
        return json.loads(text.strip())
    except Exception as e:
        error_msg = str(e)
        # Check for Rate Limit (429) or ResourceExhausted errors
        if "429" in error_msg or "ResourceExhausted" in error_msg:
            print(f"\n🛑 Quota Limit Reached (API 429). Exiting script immediately.")
            sys.exit(0) # HARD EXIT
        else:
            print(f"  ⚠️ AI Generation failed: {e}")
        return None

def main():
    parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
    parser.add_argument('--limit', type=int, default=50, help="Max number of articles to process this run (default: 50)")
    parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
    parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
    args = parser.parse_args()

    # Target Selection
    print("Select target blog directory:")
    for key, target in PROJECT_TARGETS.items():
        print(f"  [{key}] {target['name']}")
    
    choice = input("Enter choice (1..): ").strip()
    if choice not in PROJECT_TARGETS:
        print("Invalid choice.")
        return

    posts_dir = Path(PROJECT_TARGETS[choice]['path']).resolve()
    context_dir = posts_dir / "_context"
    
    if not posts_dir.exists():
        print(f"❌ Error: Directory {posts_dir} does not exist.")
        return

    # Ensure context directory exists
    if not args.dry_run:
        context_dir.mkdir(exist_ok=True)

    # 1. API Setup
    if not args.dry_run:
        api_key = get_api_key()
        if not api_key: return
        genai.configure(api_key=api_key)

    # 2. File Discovery & Filtering
    all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True) # Newest first
    to_process = []

    print(f"\n🔍 Scanning {posts_dir}...")
    
    for post in all_posts:
        json_path = context_dir / f"{post.stem}.json"
        
        if not json_path.exists() or args.force:
            to_process.append(post)

    print(f"Found {len(all_posts)} articles.")
    print(f"📝 {len(to_process)} articles need context generation.")
    
    if args.limit and len(to_process) > args.limit:
        print(f"⚠️ Limiting processing to first {args.limit} items.")
        to_process = to_process[:args.limit]

    if not to_process:
        print("✅ All caught up! No new context to generate.")
        return

    # 3. Processing Loop
    print("\n🚀 Starting Contextualization...")
    
    count = 0
    for post in to_process:
        count += 1
        print(f"[{count}/{len(to_process)}] Processing: {post.name}...")
        
        if args.dry_run:
            continue

        data = extract_metadata_and_content(post)
        if not data: continue

        # Generate JSON
        context_json = generate_context_json(data)
        
        if context_json:
            # Save
            json_path = context_dir / f"{post.stem}.json"
            with open(json_path, 'w', encoding='utf-8') as f:
                # Minify JSON to save bytes (separators removes whitespace)
                json.dump(context_json, f, separators=(',', ':'))
            
            print(f"  ✅ Saved {json_path.name}")
            
            # Rate limiting: Flash Free Tier is ~15 RPM. 
            # 60s / 15 = 4s. We use 5s to be safe.
            time.sleep(5)
        else:
            print("  ❌ Failed to generate context.")
            # Even on failure, sleep a bit to avoid hammering
            time.sleep(2)

    print("\n✨ Batch complete.")

if __name__ == "__main__":
    main()

…to be able to process all 1000 articles in 1 pass using gemini-1.5-flash if I just adjust the rate that it’s processing to be well under 15 requests per minute, right?

Gemini: You are absolutely on the right track with your logic, but there is a critical version trap in your code that will cause this to fail if you run it exactly as written.

The Initial Hypothesis and the Critical Version Trap

Your code currently sets MODEL_NAME = 'gemini-2.5-flash', but your question asks about gemini-1.5-flash. This distinction makes the difference between success and failure for a 1,000-article batch.

The “1.5 vs 2.5” RPD Trap

If you run the script as written with gemini-2.5-flash:

Limit: ~500 Requests Per Day (RPD).
Your Load: 1,000 requests.
Result: The script will crash halfway through (around article #500) due to the Daily Quota (ResourceExhausted).

If you switch the script to gemini-1.5-flash:

Limit: 1,500 Requests Per Day (RPD).
Your Load: 1,000 requests.
Result: Success. You will consume ~66% of your daily quota.

Code Critique & Fixes

Your logic for RPM (Requests Per Minute) is sound. A 5-second sleep (time.sleep(5)) results in 12 requests per minute.

Limit: 15 RPM.
Your Rate: 12 RPM.
Result: Safe.

However, to ensure this runs smoothly for 1,000 files, I recommend three specific adjustments to your script:

1. Fix the Model Name

Update the configuration line to use the 1.5 version, which has the higher daily limit.

# SWAP THIS LINE
# MODEL_NAME = 'gemini-2.5-flash' 

# TO THIS LINE
MODEL_NAME = 'gemini-1.5-flash' 

2. Implement “Backoff” Handling

Your current script has a “HARD EXIT” on a 429/Quota error. If you hit a transient spike (e.g., Google’s servers are busy), your script dies instantly. For a long batch job like this, it is better to retry once or twice before quitting.

Replace your exception block with this logic:

    # ... inside generate_context_json ...
    except Exception as e:
        error_msg = str(e)
        
        # Handle Rate Limit (429) specifically with a retry
        if "429" in error_msg:
            print(f"  ⚠️ Rate limit hit. Cooling down for 30s...")
            time.sleep(30)
            return None  # Or implement a retry loop here if you want to be robust
            
        # Handle Daily Quota (ResourceExhausted)
        if "ResourceExhausted" in error_msg:
            print(f"\n🛑 Daily Quota Exceeded. Script stopping safely.")
            sys.exit(0) 

        print(f"  ⚠️ AI Generation failed: {e}")
        return None

3. Add a Safety Limit

Since you are processing 1,000 items and the limit is 1,500, you are safe. However, if you have other apps using the same API key, you might want to stop before you hit the absolute wall so you don’t break your other tools.

You can modify your argument parser default to be explicit about this safety margin:

# Default limit set to 1400 to leave room for other tests/queries
parser.add_argument('--limit', type=int, default=1400, ...) 

Summary Table: Will it work?

Metric	Limit (1.5 Flash)	Your Script	Status
RPM	15 requests/min	12 requests/min	✅ Safe
TPM	1,000,000 tokens/min	~48,000 tokens/min*	✅ Safe
RPD	1,500 requests/day	1,000 requests/day	✅ Safe (Only on 1.5 Flash)

*Estimated based on 15k chars ≈ 4k tokens x 12 RPM.

Next Step: Would you like me to rewrite the generate_context_json function to include a simple “retry” loop so a single network hiccup doesn’t ruin your 1,000-article batch run?

Me: Yes, please provide me the whole script again but also adjust the timing between requests so that it will theoretically never even encounter the rate RPM limit and need to back off.

Gemini: Here is the updated script.

I have made three critical adjustments to ensure this runs smoothly for a 1,000-article batch:

Model Swap: Switched to gemini-1.5-flash to access the 1,500 RPD (Requests Per Day) quota.
Timing Adjustment: Increased the sleep buffer to 5 seconds. Since the API call itself takes ~1–2 seconds, your total loop time will be ~6–7 seconds. This keeps you at roughly 8–10 RPM, well below the 15 RPM danger zone.
Robust Retry Loop: I added a retry mechanism for transient network glitches (like a 500 error), but kept a “hard exit” for the daily quota limit so you don’t waste time trying when the tank is empty.

The Script (`articleizer.py`)

import os
import sys
import json
import re
import time
import argparse
import getpass
from pathlib import Path
from datetime import datetime
import google.generativeai as genai
import frontmatter  # Requires: pip install python-frontmatter

# --- CONFIGURATION ---
CONFIG_DIR = Path.home() / ".config" / "articleizer"
API_KEY_FILE = CONFIG_DIR / "api_key.txt"
TARGETS_FILE = CONFIG_DIR / "targets.json"

# CRITICAL: Using 1.5-flash gives 1,500 RPD (Requests Per Day) vs ~500 for 2.5
MODEL_NAME = 'gemini-1.5-flash' 

# Timing: 60s / 15 requests = 4s minimum.
# We use 5s sleep + execution time to stay safely around ~8-10 RPM.
SAFETY_SLEEP_SECONDS = 5

DEFAULT_TARGETS = {
    "1": {
        "name": "Local Project (Default)",
        "path": "./_posts"
    }
}

def load_targets():
    """Loads publishing targets from external config or falls back to default."""
    if TARGETS_FILE.exists():
        try:
            with open(TARGETS_FILE, 'r') as f:
                return json.load(f)
        except json.JSONDecodeError:
            print(f"⚠️ Warning: {TARGETS_FILE} is corrupt. Using defaults.")
    return DEFAULT_TARGETS

PROJECT_TARGETS = load_targets()

def get_api_key():
    """Gets API key from config or prompts user."""
    if API_KEY_FILE.is_file():
        return API_KEY_FILE.read_text().strip()
    
    print("Google API Key not found in config.")
    key = getpass.getpass("Enter your Google API Key: ")
    if key.strip():
        save = input("Save key? (y/n): ").lower()
        if save == 'y':
            CONFIG_DIR.mkdir(parents=True, exist_ok=True)
            API_KEY_FILE.write_text(key.strip())
            API_KEY_FILE.chmod(0o600)
    return key.strip()

def extract_metadata_and_content(file_path):
    """Reads markdown file, extracts YAML frontmatter and body."""
    try:
        post = frontmatter.load(file_path)
        return {
            "frontmatter": post.metadata,
            "content": post.content,
            "filename": file_path.stem
        }
    except Exception as e:
        print(f"❌ Error reading {file_path.name}: {e}")
        return None

def generate_context_json(article_data):
    """Calls Gemini to compress the article, with retry logic."""
    
    prompt = f"""
    You are a Knowledge Graph Architect. Your goal is to compress the provided technical article into a 'Holographic Shard'—a minimal JSON object that acts as a context pointer for a Retrieval Augmented Generation system.

    **Goal:** Fit maximum semantic meaning into approximately 200 tokens.

    **Input Data:**
    - Title: {article_data['frontmatter'].get('title', 'Unknown')}
    - Date: {article_data['frontmatter'].get('date', 'Unknown')}
    - Filename: {article_data['filename']}
    - Content: 
    {article_data['content'][:15000]} 

    **Instructions:**
    1. **Analyze:** Read the content. Look past the title. Find specific technologies, concepts, or "aha!" moments.
    2. **Extract Sub-topics ('sub'):** Identify 3-5 distinct, specific sub-topics.
    3. **Summarize ('s'):** Write a concise 1-2 sentence summary of the core thesis.
    4. **Keywords ('kw'):** Extract 3-5 high-value technical keywords.

    **Output Format:**
    Provide ONLY a valid JSON object.
    
    Schema:
    {{
      "id": "{article_data['filename']}",
      "d": "YYYY-MM-DD", 
      "t": "Article Title",
      "s": "Concise Summary",
      "sub": ["Subtopic 1", "Subtopic 2", "Subtopic 3"],
      "kw": ["Keyword1", "Keyword2"]
    }}
    """

    model = genai.GenerativeModel(MODEL_NAME)
    
    max_retries = 3
    attempt = 0

    while attempt < max_retries:
        try:
            response = model.generate_content(prompt)
            text = response.text.strip()
            
            # Clean up Markdown code blocks
            if text.startswith("```json"): text = text[7:]
            if text.startswith("```"): text = text[3:]
            if text.endswith("```"): text = text[:-3]
            
            return json.loads(text.strip())

        except Exception as e:
            error_msg = str(e)
            attempt += 1

            # 1. CRITICAL: Daily Quota Exceeded (ResourceExhausted)
            # No point retrying. Stop the script to protect account status.
            if "ResourceExhausted" in error_msg:
                print(f"\n🛑 Daily Quota (1,500 RPD) Exceeded. Script stopping safely.")
                sys.exit(0)

            # 2. Rate Limit (429) or Server Error (500/503)
            # We wait and retry.
            if "429" in error_msg or "500" in error_msg or "503" in error_msg:
                if attempt < max_retries:
                    wait_time = 30 * attempt # Linear backoff: 30s, 60s, 90s
                    print(f"  ⚠️ Transient error ({e}). Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    print(f"  ❌ Failed after {max_retries} attempts.")
                    return None
            else:
                # Other errors (syntax, etc) - Don't retry
                print(f"  ⚠️ Non-retriable error: {e}")
                return None
    return None

def main():
    parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
    # Default limit set to 1000 to match your batch size
    parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
    parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
    parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
    args = parser.parse_args()

    # Target Selection
    print("Select target blog directory:")
    for key, target in PROJECT_TARGETS.items():
        print(f"  [{key}] {target['name']}")
    
    choice = input("Enter choice (1..): ").strip()
    if choice not in PROJECT_TARGETS:
        print("Invalid choice.")
        return

    posts_dir = Path(PROJECT_TARGETS[choice]['path']).resolve()
    context_dir = posts_dir / "_context"
    
    if not posts_dir.exists():
        print(f"❌ Error: Directory {posts_dir} does not exist.")
        return

    if not args.dry_run:
        context_dir.mkdir(exist_ok=True)

    if not args.dry_run:
        api_key = get_api_key()
        if not api_key: return
        genai.configure(api_key=api_key)

    all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True)
    to_process = []

    print(f"\n🔍 Scanning {posts_dir}...")
    
    for post in all_posts:
        json_path = context_dir / f"{post.stem}.json"
        
        if not json_path.exists() or args.force:
            to_process.append(post)

    print(f"Found {len(all_posts)} articles.")
    print(f"📝 {len(to_process)} articles need context generation.")
    
    if args.limit and len(to_process) > args.limit:
        print(f"⚠️ Limiting processing to first {args.limit} items.")
        to_process = to_process[:args.limit]

    if not to_process:
        print("✅ All caught up! No new context to generate.")
        return

    print(f"\n🚀 Starting Contextualization using {MODEL_NAME}...")
    print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under 15 RPM limit.")
    
    count = 0
    start_time = time.time()

    for post in to_process:
        count += 1
        elapsed = time.time() - start_time
        # Simple ETA calculation
        avg_time = elapsed / count
        remaining = (len(to_process) - count) * avg_time
        eta_min = int(remaining // 60)
        
        print(f"[{count}/{len(to_process)}] (ETA: {eta_min}m) Processing: {post.name}...")
        
        if args.dry_run:
            continue

        data = extract_metadata_and_content(post)
        if not data: continue

        context_json = generate_context_json(data)
        
        if context_json:
            json_path = context_dir / f"{post.stem}.json"
            with open(json_path, 'w', encoding='utf-8') as f:
                json.dump(context_json, f, separators=(',', ':'))
            
            print(f"  ✅ Saved {json_path.name}")
            
            # THE SAFETY SLEEP
            # We sleep unconditionally on success to enforce the rate limit pace.
            time.sleep(SAFETY_SLEEP_SECONDS)
        else:
            print("  ❌ Failed to generate context.")
            # Even on failure, sleep briefly to avoid hammering if it's a tight loop failure
            time.sleep(2)

    print("\n✨ Batch complete.")

if __name__ == "__main__":
    main()

Estimated Run Time

With the 5-second sleep + ~2 seconds for generation:

Per article: ~7 seconds
1,000 articles: ~7,000 seconds
Total Time: Approx 1 hour 56 minutes.

This is a “set it and forget it” script. You can minimize the terminal and let it churn in the background.

Me: Okay, let’s drop it in location and highlight the differences:

(nix) 
[mike@nixos:~/repos/pipulate]$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean
(nix) 
[mike@nixos:~/repos/pipulate]$ cd scripts/articles/
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ xv contextualizer.py 
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   contextualizer.py

no changes added to commit (use "git add" and/or "git commit -a")

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ git --no-pager diff
diff --git a/scripts/articles/contextualizer.py b/scripts/articles/contextualizer.py
index c23b057f..295dc2d2 100644
--- a/scripts/articles/contextualizer.py
+++ b/scripts/articles/contextualizer.py
@@ -15,10 +15,13 @@ CONFIG_DIR = Path.home() / ".config" / "articleizer"
 API_KEY_FILE = CONFIG_DIR / "api_key.txt"
 TARGETS_FILE = CONFIG_DIR / "targets.json"
 
-# Model to use (Flash is best for high-volume, low-cost processing)
-MODEL_NAME = 'gemini-2.5-flash' 
+# CRITICAL: Using 1.5-flash gives 1,500 RPD (Requests Per Day) vs ~500 for 2.5
+MODEL_NAME = 'gemini-1.5-flash' 
+
+# Timing: 60s / 15 requests = 4s minimum.
+# We use 5s sleep + execution time to stay safely around ~8-10 RPM.
+SAFETY_SLEEP_SECONDS = 5
 
-# Safe default
 DEFAULT_TARGETS = {
     "1": {
         "name": "Local Project (Default)",
@@ -67,34 +70,33 @@ def extract_metadata_and_content(file_path):
         return None
 
 def generate_context_json(article_data):
-    """Calls Gemini to compress the article into a context JSON object."""
+    """Calls Gemini to compress the article, with retry logic."""
     
-    # Construct the Prompt
     prompt = f"""
     You are a Knowledge Graph Architect. Your goal is to compress the provided technical article into a 'Holographic Shard'—a minimal JSON object that acts as a context pointer for a Retrieval Augmented Generation system.
 
-    **Goal:** Fit maximum semantic meaning into approximately 200 tokens (800 bytes).
+    **Goal:** Fit maximum semantic meaning into approximately 200 tokens.
 
     **Input Data:**
     - Title: {article_data['frontmatter'].get('title', 'Unknown')}
     - Date: {article_data['frontmatter'].get('date', 'Unknown')}
     - Filename: {article_data['filename']}
     - Content: 
-    {article_data['content'][:15000]}  # Truncate to avoid context limit issues, usually enough
+    {article_data['content'][:15000]} 
 
     **Instructions:**
-    1. **Analyze:** Read the content. Look past the title. Find specific technologies, concepts, or "aha!" moments buried in the text.
-    2. **Extract Sub-topics ('sub'):** Identify 3-5 distinct, specific sub-topics or "juicy" details that are NOT just the title re-worded. (e.g., "Fixing Pandas int/str errors", "The 'Chisel Strike' method").
+    1. **Analyze:** Read the content. Look past the title. Find specific technologies, concepts, or "aha!" moments.
+    2. **Extract Sub-topics ('sub'):** Identify 3-5 distinct, specific sub-topics.
     3. **Summarize ('s'):** Write a concise 1-2 sentence summary of the core thesis.
-    4. **Keywords ('kw'):** Extract 3-5 high-value technical keywords (e.g., "NixOS", "HTMX", "Sovereignty").
+    4. **Keywords ('kw'):** Extract 3-5 high-value technical keywords.
 
     **Output Format:**
-    Provide ONLY a valid JSON object. No markdown formatting around it if possible, but I will parse it out.
+    Provide ONLY a valid JSON object.
     
     Schema:
     {{
       "id": "{article_data['filename']}",
-      "d": "YYYY-MM-DD",  // Extract from filename or frontmatter
+      "d": "YYYY-MM-DD", 
       "t": "Article Title",
       "s": "Concise Summary",
       "sub": ["Subtopic 1", "Subtopic 2", "Subtopic 3"],
@@ -104,32 +106,51 @@ def generate_context_json(article_data):
 
     model = genai.GenerativeModel(MODEL_NAME)
     
-    try:
-        response = model.generate_content(prompt)
-        text = response.text.strip()
-        
-        # Clean up Markdown code blocks if present
-        if text.startswith("```json"):
-            text = text[7:]
-        if text.startswith("```"):
-            text = text[3:]
-        if text.endswith("```"):
-            text = text[:-3]
+    max_retries = 3
+    attempt = 0
+
+    while attempt < max_retries:
+        try:
+            response = model.generate_content(prompt)
+            text = response.text.strip()
             
-        return json.loads(text.strip())
-    except Exception as e:
-        error_msg = str(e)
-        # Check for Rate Limit (429) or ResourceExhausted errors
-        if "429" in error_msg or "ResourceExhausted" in error_msg:
-            print(f"\n🛑 Quota Limit Reached (API 429). Exiting script immediately.")
-            sys.exit(0) # HARD EXIT
-        else:
-            print(f"  ⚠️ AI Generation failed: {e}")
-        return None
+            # Clean up Markdown code blocks
+            if text.startswith("```json"): text = text[7:]
+            if text.startswith("```"): text = text[3:]
+            if text.endswith("```"): text = text[:-3]
+            
+            return json.loads(text.strip())
+
+        except Exception as e:
+            error_msg = str(e)
+            attempt += 1
+
+            # 1. CRITICAL: Daily Quota Exceeded (ResourceExhausted)
+            # No point retrying. Stop the script to protect account status.
+            if "ResourceExhausted" in error_msg:
+                print(f"\n🛑 Daily Quota (1,500 RPD) Exceeded. Script stopping safely.")
+                sys.exit(0)
+
+            # 2. Rate Limit (429) or Server Error (500/503)
+            # We wait and retry.
+            if "429" in error_msg or "500" in error_msg or "503" in error_msg:
+                if attempt < max_retries:
+                    wait_time = 30 * attempt # Linear backoff: 30s, 60s, 90s
+                    print(f"  ⚠️ Transient error ({e}). Retrying in {wait_time}s...")
+                    time.sleep(wait_time)
+                else:
+                    print(f"  ❌ Failed after {max_retries} attempts.")
+                    return None
+            else:
+                # Other errors (syntax, etc) - Don't retry
+                print(f"  ⚠️ Non-retriable error: {e}")
+                return None
+    return None
 
 def main():
     parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
-    parser.add_argument('--limit', type=int, default=50, help="Max number of articles to process this run (default: 50)")
+    # Default limit set to 1000 to match your batch size
+    parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
     parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
     parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
     args = parser.parse_args()
@@ -151,18 +172,15 @@ def main():
         print(f"❌ Error: Directory {posts_dir} does not exist.")
         return
 
-    # Ensure context directory exists
     if not args.dry_run:
         context_dir.mkdir(exist_ok=True)
 
-    # 1. API Setup
     if not args.dry_run:
         api_key = get_api_key()
         if not api_key: return
         genai.configure(api_key=api_key)
 
-    # 2. File Discovery & Filtering
-    all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True) # Newest first
+    all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True)
     to_process = []
 
     print(f"\n🔍 Scanning {posts_dir}...")
@@ -184,13 +202,21 @@ def main():
         print("✅ All caught up! No new context to generate.")
         return
 
-    # 3. Processing Loop
-    print("\n🚀 Starting Contextualization...")
+    print(f"\n🚀 Starting Contextualization using {MODEL_NAME}...")
+    print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under 15 RPM limit.")
     
     count = 0
+    start_time = time.time()
+
     for post in to_process:
         count += 1
-        print(f"[{count}/{len(to_process)}] Processing: {post.name}...")
+        elapsed = time.time() - start_time
+        # Simple ETA calculation
+        avg_time = elapsed / count
+        remaining = (len(to_process) - count) * avg_time
+        eta_min = int(remaining // 60)
+        
+        print(f"[{count}/{len(to_process)}] (ETA: {eta_min}m) Processing: {post.name}...")
         
         if args.dry_run:
             continue
@@ -198,27 +224,24 @@ def main():
         data = extract_metadata_and_content(post)
         if not data: continue
 
-        # Generate JSON
         context_json = generate_context_json(data)
         
         if context_json:
-            # Save
             json_path = context_dir / f"{post.stem}.json"
             with open(json_path, 'w', encoding='utf-8') as f:
-                # Minify JSON to save bytes (separators removes whitespace)
                 json.dump(context_json, f, separators=(',', ':'))
             
             print(f"  ✅ Saved {json_path.name}")
             
-            # Rate limiting: Flash Free Tier is ~15 RPM. 
-            # 60s / 15 = 4s. We use 5s to be safe.
-            time.sleep(5)
+            # THE SAFETY SLEEP
+            # We sleep unconditionally on success to enforce the rate limit pace.
+            time.sleep(SAFETY_SLEEP_SECONDS)
         else:
             print("  ❌ Failed to generate context.")
-            # Even on failure, sleep a bit to avoid hammering
+            # Even on failure, sleep briefly to avoid hammering if it's a tight loop failure
             time.sleep(2)
 
     print("\n✨ Batch complete.")
 
 if __name__ == "__main__":
-    main()
+    main()
\ No newline at end of file
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

And I test it with 1 article:

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 1
Select target blog directory:
  [1] MikeLev.in (Public)
  [2] Pipulate.com (Future)
Enter choice (1..): 1

🔍 Scanning /home/mike/repos/MikeLev.in/_posts...
Found 718 articles.
📝 644 articles need context generation.
⚠️ Limiting processing to first 1 items.

🚀 Starting Contextualization using gemini-1.5-flash...
ℹ️  Pacing: ~5s per item to stay under 15 RPM limit.
[1/1] (ETA: 0m) Processing: 2025-12-10-the-ai-ready-web-building-landing-strips-for-intelligent-agents.md...
  ⚠️ Non-retriable error: 404 models/gemini-1.5-flash is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.
  ❌ Failed to generate context.

✨ Batch complete.
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

Oops!

Gemini: That 404 error suggests we hit a “version trap.”

It appears Gemini 1.5 Flash may have been deprecated or fully retired around September 2025 (replaced by the 2.5 series), or the specific alias gemini-1.5-flash was removed in favor of a versioned string (e.g., gemini-1.5-flash-001).

Instead of guessing, let’s use a tiny “diagnostic script” to see exactly what models your API key has access to right now. This is the only way to be 100% sure of the correct string.

Step 1: Run this “List Models” Diagnostic

Save this as list_models.py and run it. It will print the exact strings you are allowed to use.

import os
import getpass
import google.generativeai as genai
from pathlib import Path

# Load API Key using the same logic as your main script
CONFIG_DIR = Path.home() / ".config" / "articleizer"
API_KEY_FILE = CONFIG_DIR / "api_key.txt"

if API_KEY_FILE.exists():
    api_key = API_KEY_FILE.read_text().strip()
else:
    api_key = getpass.getpass("Enter API Key: ")

genai.configure(api_key=api_key)

print("🔍 querying available models...\n")
try:
    for m in genai.list_models():
        if 'generateContent' in m.supported_generation_methods:
            print(f"✅ {m.name}")
except Exception as e:
    print(f"❌ Error listing models: {e}")

Step 2: The likely fix (Switch to 2.5)

If the list above does not show gemini-1.5-flash, you must use gemini-2.5-flash.

However, this introduces a new constraint. As per your original table, 2.5 Flash has a stricter limit of ~500 Requests Per Day (RPD) and 10 RPM.

To handle this, you need to modify contextualizer.py with these three changes:

Update Model Name:

MODEL_NAME = 'gemini-2.5-flash' # Or whatever list_models.py revealed

Slow Down (Crucial): 2.5 Flash has a limit of 10 RPM. A 5-second sleep is too fast (that’s ~12 RPM).
```
# 60s / 10 requests = 6s. Use 8s to be safe.
SAFETY_SLEEP_SECONDS = 8 
```
Batch It: Since you have 1,000 articles and a 500 RPD limit, you cannot do this in one run. You must do it over 2 days.

Run the script today with:
```
python contextualizer.py --limit 450
```
And tomorrow (after the quota resets at midnight PT):
```
python contextualizer.py --limit 450
```

Note: The script is designed to skip files that already exist (if not json_path.exists()), so running it a second time is safe—it will just pick up where it left off.

Me: Okay, I make the model-listing script and run it:

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ xv list_models.py
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python list_models.py 
🔍 querying available models...

✅ models/gemini-2.5-flash
✅ models/gemini-2.5-pro
✅ models/gemini-2.0-flash-exp
✅ models/gemini-2.0-flash
✅ models/gemini-2.0-flash-001
✅ models/gemini-2.0-flash-exp-image-generation
✅ models/gemini-2.0-flash-lite-001
✅ models/gemini-2.0-flash-lite
✅ models/gemini-2.0-flash-lite-preview-02-05
✅ models/gemini-2.0-flash-lite-preview
✅ models/gemini-exp-1206
✅ models/gemini-2.5-flash-preview-tts
✅ models/gemini-2.5-pro-preview-tts
✅ models/gemma-3-1b-it
✅ models/gemma-3-4b-it
✅ models/gemma-3-12b-it
✅ models/gemma-3-27b-it
✅ models/gemma-3n-e4b-it
✅ models/gemma-3n-e2b-it
✅ models/gemini-flash-latest
✅ models/gemini-flash-lite-latest
✅ models/gemini-pro-latest
✅ models/gemini-2.5-flash-lite
✅ models/gemini-2.5-flash-image-preview
✅ models/gemini-2.5-flash-image
✅ models/gemini-2.5-flash-preview-09-2025
✅ models/gemini-2.5-flash-lite-preview-09-2025
✅ models/gemini-3-pro-preview
✅ models/gemini-3-pro-image-preview
✅ models/nano-banana-pro-preview
✅ models/gemini-robotics-er-1.5-preview
✅ models/gemini-2.5-computer-use-preview-10-2025
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

…and I see that 1.5 has been removed. Is there any way to programatically tell what the various limits per model are for the free tier?

Gemini: Short Answer: No, unfortunately. The list_models() function returns technical specifications (context window size, input/output limits) but does not return business quotas (RPM/RPD), because those depend on your billing status (Free vs. Paid), not the model itself.

The Good News: Your list contains a “hidden” savior for bulk processing.

From `1.5-flash` to `2.5-flash-lite`: Navigating Deprecation and Hidden Limits

You have access to models/gemini-2.5-flash-lite.

In the Google ecosystem, “Lite” models are the direct spiritual successors to the high-volume capabilities of the old 1.5 Flash. While 2.5 Flash (standard) is “smarter” and thus has lower rate limits (often ~500 RPD on free tier), 2.5 Flash Lite is optimized for speed and cost, which typically comes with the higher 1,500 RPD limit you are looking for.

Recommended Fix: Use “Lite”

Update your script to target the Lite model. This is your best bet for finishing the batch in a single day.

# SWAP THIS LINE IN YOUR SCRIPT
MODEL_NAME = 'gemini-2.5-flash-lite' 

# Lite models are fast, but let's keep a 4-5s buffer to be safe against RPM spikes
SAFETY_SLEEP_SECONDS = 4

Strategic Plan (The “Shoot and See” Approach)

Since we cannot query the limit, we will discover it by hitting it. Your script is already robust enough to handle this:

Set MODEL_NAME = 'gemini-2.5-flash-lite'.
Set --limit 1000 (Try for the full run).
Run it.

Scenario A (Most Likely): The Lite model has the 1,500 RPD limit. The script finishes all 1,000 articles in ~90 minutes. Success.

Scenario B: The Lite model has a 500 RPD limit.

The script processes 500 articles.
It hits the ResourceExhausted error.
It prints 🛑 Daily Quota Exceeded and exits cleanly.
Result: You have 500 JSONs saved. You wait 24 hours (quota resets at midnight Pacific Time), run the exact same command again, and it finishes the remaining 500.

You don’t need to change the code logic, just the model name. The “state” is saved on your filesystem (the existing .json files), so there is no risk in hitting the wall.

Me: Okay, I did that and have it processing. I want to support an argument to choose which API token that I’m using from a file in the same directory structure location but with a different filename and format. Currently it’s a file named api_key.txt but I want to switch to a JSON format that supports multiple named keys so that I can choose the name with a --key or -k argument. So:

[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500

…becomes:

[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500 -k personal

…or:

[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500 -k work

Also when I tried the existing code:

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500
Select target blog directory:
  [1] MikeLev.in (Public)
  [2] Pipulate.com (Future)
Enter choice (1..): 1

🔍 Scanning /home/mike/repos/MikeLev.in/_posts...
Found 718 articles.
📝 644 articles need context generation.
⚠️ Limiting processing to first 500 items.

🚀 Starting Contextualization using gemini-2.5-flash-lite...
ℹ️  Pacing: ~5s per item to stay under 15 RPM limit.
[1/500] (ETA: 0m) Processing: 2025-12-10-the-ai-ready-web-building-landing-strips-for-intelligent-agents.md...
  ✅ Saved 2025-12-10-the-ai-ready-web-building-landing-strips-for-intelligent-agents.json
[2/500] (ETA: 32m) Processing: 2025-11-01-no-gooey-video-tech-gnosis-craftsmanship.md...
  ✅ Saved 2025-11-01-no-gooey-video-tech-gnosis-craftsmanship.json
[3/500] (ETA: 41m) Processing: 2025-11-01-jiu-jitsu-keyboard-flow-ai.md...
  ✅ Saved 2025-11-01-jiu-jitsu-keyboard-flow-ai.json
[4/500] (ETA: 44m) Processing: 2025-11-01-browser-acetate-no-gooey-9x16-screen-recording.md...
  ✅ Saved 2025-11-01-browser-acetate-no-gooey-9x16-screen-recording.json
[5/500] (ETA: 49m) Processing: 2025-11-01-articulate-ape-blueprint-scaffolding-no-gooey-video-editing-nix.md...
  ✅ Saved 2025-11-01-articulate-ape-blueprint-scaffolding-no-gooey-video-editing-nix.json
[6/500] (ETA: 50m) Processing: 2025-10-31-novideo-pandas-post-analog-video-age-40-year-cycle.md...
  ✅ Saved 2025-10-31-novideo-pandas-post-analog-video-age-40-year-cycle.json
[7/500] (ETA: 51m) Processing: 2025-10-31-mastering-the-tricks-novideo-ai-multiplatform-fluency.md...
  ✅ Saved 2025-10-31-mastering-the-tricks-novideo-ai-multiplatform-fluency.json
[8/500] (ETA: 51m) Processing: 2025-10-31-80-20-nix-local-first-workflow-vim-macros.md...
  ✅ Saved 2025-10-31-80-20-nix-local-first-workflow-vim-macros.json
[9/500] (ETA: 52m) Processing: 2025-10-30-python-dependency-dilemma-pip-compile-fix.md...
  ✅ Saved 2025-10-30-python-dependency-dilemma-pip-compile-fix.json
[10/500] (ETA: 53m) Processing: 2025-10-30-fractal-unfurling-ai-video-workflows.md...
  ✅ Saved 2025-10-30-fractal-unfurling-ai-video-workflows.json
[11/500] (ETA: 53m) Processing: 2025-10-30-articulate-ape-blueprint-cognition-ai-chisel-strikes.md...
  ✅ Saved 2025-10-30-articulate-ape-blueprint-cognition-ai-chisel-strikes.json
[12/500] (ETA: 54m) Processing: 2025-10-29-the-human-way-of-mastery-automaticity-fluency-and-ai-learning-divide.md...
  ✅ Saved 2025-10-29-the-human-way-of-mastery-automaticity-fluency-and-ai-learning-divide.json
[13/500] (ETA: 53m) Processing: 2025-10-29-the-great-schism-of-the-ai-age-choosing-enhancement-over-atrophy.md...
  ✅ Saved 2025-10-29-the-great-schism-of-the-ai-age-choosing-enhancement-over-atrophy.json
[14/500] (ETA: 54m) Processing: 2025-10-29-engineered-optimism-ai-ark-nix-pipulate.md...
  ✅ Saved 2025-10-29-engineered-optimism-ai-ark-nix-pipulate.json
[15/500] (ETA: 53m) Processing: 2025-10-29-aie-python-video-orchestrating-creativity.md...
  ✅ Saved 2025-10-29-aie-python-video-orchestrating-creativity.json
[16/500] (ETA: 53m) Processing: 2025-10-29-ai-first-web-invisible-fubar-pipulate.md...
  ✅ Saved 2025-10-29-ai-first-web-invisible-fubar-pipulate.json
[17/500] (ETA: 54m) Processing: 2025-10-28-pipulate-smart-prompt-syncing-orchestrating-ai-workflows-with-nbup.md...
  ✅ Saved 2025-10-28-pipulate-smart-prompt-syncing-orchestrating-ai-workflows-with-nbup.json
[18/500] (ETA: 53m) Processing: 2025-10-28-digital-sovereignty-pandas-nix-ai-blueprint.md...
  ✅ Saved 2025-10-28-digital-sovereignty-pandas-nix-ai-blueprint.json
[19/500] (ETA: 53m) Processing: 2025-10-28-digital-jiu-jitsu-foundational-skills-ai-web-analysis.md...
  ✅ Saved 2025-10-28-digital-jiu-jitsu-foundational-skills-ai-web-analysis.json
[20/500] (ETA: 53m) Processing: 2025-10-28-debugging-rich-html-export-performance-theming-ai-development.md...
  ✅ Saved 2025-10-28-debugging-rich-html-export-performance-theming-ai-development.json
[21/500] (ETA: 53m) Processing: 2025-10-28-dataframe-chisel-strikes-precision-data-ai-audits.md...
  ⚠️ Transient error (429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 35.539707846s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 35
}
]). Retrying in 30s...
  ⚠️ Transient error (429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 5.432922312s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 5
}
]). Retrying in 60s...
^CTraceback (most recent call last):
  File "/home/mike/repos/pipulate/scripts/articles/contextualizer.py", line 114, in generate_context_json
    response = model.generate_content(prompt)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/generativeai/generative_models.py", line 248, in generate_content
    response = self._client.generate_content(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/ai/generativelanguage_v1beta/services/generative_service/client.py", line 566, in generate_content
    response = rpc(
               ^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_unary.py", line 294, in retry_wrapped_func
    return retry_target(
           ^^^^^^^^^^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_unary.py", line 156, in retry_target
    next_sleep = _retry_error_helper(
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_base.py", line 214, in _retry_error_helper
    raise final_exc from source_exc
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_unary.py", line 147, in retry_target
    result = target()
             ^^^^^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mike/repos/pipulate/.venv/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 77, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ResourceExhausted: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 5.432922312s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 5
}
]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mike/repos/pipulate/scripts/articles/contextualizer.py", line 247, in <module>
    main()
  File "/home/mike/repos/pipulate/scripts/articles/contextualizer.py", line 227, in main
    context_json = generate_context_json(data)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mike/repos/pipulate/scripts/articles/contextualizer.py", line 140, in generate_context_json
    time.sleep(wait_time)
KeyboardInterrupt

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ 

It should never go into a backoff loop or it will be spinning its wheels all day if we have the rules wrong and there’s a lower RPD than we think. So make it immediately exit with the most useful information we have access to like what model was used and what the current run of the script profile was in terms of RPM, TPM, and RPD of the current run and whatever the error told us about which we limit we ran into that cause the stop. Here’s the current code:

import os
import sys
import json
import re
import time
import argparse
import getpass
from pathlib import Path
from datetime import datetime
import google.generativeai as genai
import frontmatter  # Requires: pip install python-frontmatter

# --- CONFIGURATION ---
CONFIG_DIR = Path.home() / ".config" / "articleizer"
API_KEY_FILE = CONFIG_DIR / "api_key.txt"
TARGETS_FILE = CONFIG_DIR / "targets.json"

# CRITICAL: Using 1.5-flash gives 1,500 RPD (Requests Per Day) vs ~500 for 2.5
MODEL_NAME = 'gemini-2.5-flash-lite' 

# Timing: 60s / 15 requests = 4s minimum.
# We use 5s sleep + execution time to stay safely around ~8-10 RPM.
SAFETY_SLEEP_SECONDS = 5

DEFAULT_TARGETS = {
    "1": {
        "name": "Local Project (Default)",
        "path": "./_posts"
    }
}

def load_targets():
    """Loads publishing targets from external config or falls back to default."""
    if TARGETS_FILE.exists():
        try:
            with open(TARGETS_FILE, 'r') as f:
                return json.load(f)
        except json.JSONDecodeError:
            print(f"⚠️ Warning: {TARGETS_FILE} is corrupt. Using defaults.")
    return DEFAULT_TARGETS

PROJECT_TARGETS = load_targets()

def get_api_key():
    """Gets API key from config or prompts user."""
    if API_KEY_FILE.is_file():
        return API_KEY_FILE.read_text().strip()
    
    print("Google API Key not found in config.")
    key = getpass.getpass("Enter your Google API Key: ")
    if key.strip():
        save = input("Save key? (y/n): ").lower()
        if save == 'y':
            CONFIG_DIR.mkdir(parents=True, exist_ok=True)
            API_KEY_FILE.write_text(key.strip())
            API_KEY_FILE.chmod(0o600)
    return key.strip()

def extract_metadata_and_content(file_path):
    """Reads markdown file, extracts YAML frontmatter and body."""
    try:
        post = frontmatter.load(file_path)
        return {
            "frontmatter": post.metadata,
            "content": post.content,
            "filename": file_path.stem
        }
    except Exception as e:
        print(f"❌ Error reading {file_path.name}: {e}")
        return None

def generate_context_json(article_data):
    """Calls Gemini to compress the article, with retry logic."""
    
    prompt = f"""
    You are a Knowledge Graph Architect. Your goal is to compress the provided technical article into a 'Holographic Shard'—a minimal JSON object that acts as a context pointer for a Retrieval Augmented Generation system.

    **Goal:** Fit maximum semantic meaning into approximately 200 tokens.

    **Input Data:**
    - Title: {article_data['frontmatter'].get('title', 'Unknown')}
    - Date: {article_data['frontmatter'].get('date', 'Unknown')}
    - Filename: {article_data['filename']}
    - Content: 
    {article_data['content'][:15000]} 

    **Instructions:**
    1. **Analyze:** Read the content. Look past the title. Find specific technologies, concepts, or "aha!" moments.
    2. **Extract Sub-topics ('sub'):** Identify 3-5 distinct, specific sub-topics.
    3. **Summarize ('s'):** Write a concise 1-2 sentence summary of the core thesis.
    4. **Keywords ('kw'):** Extract 3-5 high-value technical keywords.

    **Output Format:**
    Provide ONLY a valid JSON object.
    
    Schema:
    {{
      "id": "{article_data['filename']}",
      "d": "YYYY-MM-DD", 
      "t": "Article Title",
      "s": "Concise Summary",
      "sub": ["Subtopic 1", "Subtopic 2", "Subtopic 3"],
      "kw": ["Keyword1", "Keyword2"]
    }}
    """

    model = genai.GenerativeModel(MODEL_NAME)
    
    max_retries = 3
    attempt = 0

    while attempt < max_retries:
        try:
            response = model.generate_content(prompt)
            text = response.text.strip()
            
            # Clean up Markdown code blocks
            if text.startswith("```json"): text = text[7:]
            if text.startswith("```"): text = text[3:]
            if text.endswith("```"): text = text[:-3]
            
            return json.loads(text.strip())

        except Exception as e:
            error_msg = str(e)
            attempt += 1

            # 1. CRITICAL: Daily Quota Exceeded (ResourceExhausted)
            # No point retrying. Stop the script to protect account status.
            if "ResourceExhausted" in error_msg:
                print(f"\n🛑 Daily Quota (1,500 RPD) Exceeded. Script stopping safely.")
                sys.exit(0)

            # 2. Rate Limit (429) or Server Error (500/503)
            # We wait and retry.
            if "429" in error_msg or "500" in error_msg or "503" in error_msg:
                if attempt < max_retries:
                    wait_time = 30 * attempt # Linear backoff: 30s, 60s, 90s
                    print(f"  ⚠️ Transient error ({e}). Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    print(f"  ❌ Failed after {max_retries} attempts.")
                    return None
            else:
                # Other errors (syntax, etc) - Don't retry
                print(f"  ⚠️ Non-retriable error: {e}")
                return None
    return None

def main():
    parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
    # Default limit set to 1000 to match your batch size
    parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
    parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
    parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
    args = parser.parse_args()

    # Target Selection
    print("Select target blog directory:")
    for key, target in PROJECT_TARGETS.items():
        print(f"  [{key}] {target['name']}")
    
    choice = input("Enter choice (1..): ").strip()
    if choice not in PROJECT_TARGETS:
        print("Invalid choice.")
        return

    posts_dir = Path(PROJECT_TARGETS[choice]['path']).resolve()
    context_dir = posts_dir / "_context"
    
    if not posts_dir.exists():
        print(f"❌ Error: Directory {posts_dir} does not exist.")
        return

    if not args.dry_run:
        context_dir.mkdir(exist_ok=True)

    if not args.dry_run:
        api_key = get_api_key()
        if not api_key: return
        genai.configure(api_key=api_key)

    all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True)
    to_process = []

    print(f"\n🔍 Scanning {posts_dir}...")
    
    for post in all_posts:
        json_path = context_dir / f"{post.stem}.json"
        
        if not json_path.exists() or args.force:
            to_process.append(post)

    print(f"Found {len(all_posts)} articles.")
    print(f"📝 {len(to_process)} articles need context generation.")
    
    if args.limit and len(to_process) > args.limit:
        print(f"⚠️ Limiting processing to first {args.limit} items.")
        to_process = to_process[:args.limit]

    if not to_process:
        print("✅ All caught up! No new context to generate.")
        return

    print(f"\n🚀 Starting Contextualization using {MODEL_NAME}...")
    print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under 15 RPM limit.")
    
    count = 0
    start_time = time.time()

    for post in to_process:
        count += 1
        elapsed = time.time() - start_time
        # Simple ETA calculation
        avg_time = elapsed / count
        remaining = (len(to_process) - count) * avg_time
        eta_min = int(remaining // 60)
        
        print(f"[{count}/{len(to_process)}] (ETA: {eta_min}m) Processing: {post.name}...")
        
        if args.dry_run:
            continue

        data = extract_metadata_and_content(post)
        if not data: continue

        context_json = generate_context_json(data)
        
        if context_json:
            json_path = context_dir / f"{post.stem}.json"
            with open(json_path, 'w', encoding='utf-8') as f:
                json.dump(context_json, f, separators=(',', ':'))
            
            print(f"  ✅ Saved {json_path.name}")
            
            # THE SAFETY SLEEP
            # We sleep unconditionally on success to enforce the rate limit pace.
            time.sleep(SAFETY_SLEEP_SECONDS)
        else:
            print("  ❌ Failed to generate context.")
            # Even on failure, sleep briefly to avoid hammering if it's a tight loop failure
            time.sleep(2)

    print("\n✨ Batch complete.")

if __name__ == "__main__":
    main()

Gemini: This updated script tackles both your new requirement (switching API keys) and your immediate problem (the script hanging in a backoff loop when the limit is clearly hard).

Architecting Dynamic API Key Management

Key Changes

Strict Exit Logic: The retry loop is now smarter. If the error message explicitly mentions “Quota exceeded” (RPD) or “Rate limit” (RPM), it parses that message. If it’s a Daily Quota (RPD) hit, it exits immediately. It only retries on truly transient errors (500s) or “Rate Limit” (RPM) if it hasn’t happened too many times in a row.
API Key Management: It now looks for keys.json instead of api_key.txt.
CLI Argument: Added -k/--key. If you don’t provide it, it defaults to a key named default.

The Updated Script (`contextualizer.py`)

import os
import sys
import json
import time
import argparse
import getpass
from pathlib import Path
import google.generativeai as genai
import frontmatter  # Requires: pip install python-frontmatter

# --- CONFIGURATION ---
CONFIG_DIR = Path.home() / ".config" / "articleizer"
KEYS_FILE = CONFIG_DIR / "keys.json"
TARGETS_FILE = CONFIG_DIR / "targets.json"

# MODEL CONFIGURATION
MODEL_NAME = 'gemini-2.5-flash-lite' 
SAFETY_SLEEP_SECONDS = 5

DEFAULT_TARGETS = {
    "1": {
        "name": "Local Project (Default)",
        "path": "./_posts"
    }
}

def load_targets():
    """Loads publishing targets from external config or falls back to default."""
    if TARGETS_FILE.exists():
        try:
            with open(TARGETS_FILE, 'r') as f:
                return json.load(f)
        except json.JSONDecodeError:
            print(f"⚠️ Warning: {TARGETS_FILE} is corrupt. Using defaults.")
    return DEFAULT_TARGETS

PROJECT_TARGETS = load_targets()

def get_api_key(key_name="default"):
    """Gets a specific named API key from keys.json."""
    
    # 1. Load existing keys
    keys = {}
    if KEYS_FILE.exists():
        try:
            with open(KEYS_FILE, 'r') as f:
                keys = json.load(f)
        except json.JSONDecodeError:
            print(f"❌ Error: {KEYS_FILE} is corrupt.")
            sys.exit(1)

    # 2. Check if requested key exists
    if key_name in keys:
        return keys[key_name]
    
    # 3. If not found, prompt to add it
    print(f"⚠️ API Key '{key_name}' not found in {KEYS_FILE}.")
    new_key = getpass.getpass(f"Enter Google API Key for '{key_name}': ").strip()
    
    if new_key:
        save = input(f"Save key '{key_name}' to config? (y/n): ").lower()
        if save == 'y':
            keys[key_name] = new_key
            CONFIG_DIR.mkdir(parents=True, exist_ok=True)
            with open(KEYS_FILE, 'w') as f:
                json.dump(keys, f, indent=2)
            KEYS_FILE.chmod(0o600)
            print(f"✅ Key '{key_name}' saved.")
        return new_key
    else:
        print("❌ No key provided. Exiting.")
        sys.exit(1)

def extract_metadata_and_content(file_path):
    """Reads markdown file, extracts YAML frontmatter and body."""
    try:
        post = frontmatter.load(file_path)
        return {
            "frontmatter": post.metadata,
            "content": post.content,
            "filename": file_path.stem
        }
    except Exception as e:
        print(f"❌ Error reading {file_path.name}: {e}")
        return None

def generate_context_json(article_data):
    """Calls Gemini to compress the article, with strict quota checking."""
    
    prompt = f"""
    You are a Knowledge Graph Architect. Your goal is to compress the provided technical article into a 'Holographic Shard'—a minimal JSON object that acts as a context pointer for a Retrieval Augmented Generation system.

    **Goal:** Fit maximum semantic meaning into approximately 200 tokens.

    **Input Data:**
    - Title: {article_data['frontmatter'].get('title', 'Unknown')}
    - Date: {article_data['frontmatter'].get('date', 'Unknown')}
    - Filename: {article_data['filename']}
    - Content: 
    {article_data['content'][:15000]} 

    **Instructions:**
    1. **Analyze:** Read the content. Look past the title. Find specific technologies, concepts, or "aha!" moments.
    2. **Extract Sub-topics ('sub'):** Identify 3-5 distinct, specific sub-topics.
    3. **Summarize ('s'):** Write a concise 1-2 sentence summary of the core thesis.
    4. **Keywords ('kw'):** Extract 3-5 high-value technical keywords.

    **Output Format:**
    Provide ONLY a valid JSON object.
    
    Schema:
    {{
      "id": "{article_data['filename']}",
      "d": "YYYY-MM-DD", 
      "t": "Article Title",
      "s": "Concise Summary",
      "sub": ["Subtopic 1", "Subtopic 2", "Subtopic 3"],
      "kw": ["Keyword1", "Keyword2"]
    }}
    """

    model = genai.GenerativeModel(MODEL_NAME)
    
    max_retries = 3
    attempt = 0

    while attempt < max_retries:
        try:
            response = model.generate_content(prompt)
            text = response.text.strip()
            
            # Clean up Markdown code blocks
            if text.startswith("```json"): text = text[7:]
            if text.startswith("```"): text = text[3:]
            if text.endswith("```"): text = text[:-3]
            
            return json.loads(text.strip())

        except Exception as e:
            error_msg = str(e)
            attempt += 1
            
            # --- INTELLIGENT ERROR HANDLING ---

            # Case A: DAILY QUOTA (RPD) - Hard Stop
            # 429 often serves both, so we look for specific text or assume worst case if persistent
            if "ResourceExhausted" in error_msg or "quota" in error_msg.lower():
                print(f"\n🛑 HARD STOP: Quota Exceeded.")
                print(f"   Model: {MODEL_NAME}")
                print(f"   Error: {error_msg.split('Violations')[0] if 'Violations' in error_msg else error_msg[:200]}...")
                sys.exit(0) # Exit immediately, do not retry.

            # Case B: RATE LIMIT (RPM) or SERVER ERROR - Soft Retry
            if "429" in error_msg or "500" in error_msg or "503" in error_msg:
                if attempt < max_retries:
                    wait_time = 10 * attempt # Short backoff: 10s, 20s, 30s
                    print(f"  ⚠️ Transient error (RPM/Server). Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    print(f"  ❌ Failed after {max_retries} attempts.")
                    return None
            else:
                # Case C: Other Errors (Syntax, Auth, etc)
                print(f"  ⚠️ Non-retriable error: {e}")
                return None
    return None

def main():
    parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
    parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
    parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
    parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
    
    # NEW ARGUMENT: Key Selection
    parser.add_argument('-k', '--key', type=str, default="default", help="Name of the API key to use from keys.json (default: 'default')")
    
    args = parser.parse_args()

    # Target Selection
    print("Select target blog directory:")
    for key, target in PROJECT_TARGETS.items():
        print(f"  [{key}] {target['name']}")
    
    choice = input("Enter choice (1..): ").strip()
    if choice not in PROJECT_TARGETS:
        print("Invalid choice.")
        return

    posts_dir = Path(PROJECT_TARGETS[choice]['path']).resolve()
    context_dir = posts_dir / "_context"
    
    if not posts_dir.exists():
        print(f"❌ Error: Directory {posts_dir} does not exist.")
        return

    if not args.dry_run:
        context_dir.mkdir(exist_ok=True)

    if not args.dry_run:
        # Load the specific key requested via CLI
        api_key = get_api_key(args.key)
        if not api_key: return
        genai.configure(api_key=api_key)
        print(f"🔑 Using API Key: '{args.key}'")

    all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True)
    to_process = []

    print(f"\n🔍 Scanning {posts_dir}...")
    
    for post in all_posts:
        json_path = context_dir / f"{post.stem}.json"
        
        if not json_path.exists() or args.force:
            to_process.append(post)

    print(f"Found {len(all_posts)} articles.")
    print(f"📝 {len(to_process)} articles need context generation.")
    
    if args.limit and len(to_process) > args.limit:
        print(f"⚠️ Limiting processing to first {args.limit} items.")
        to_process = to_process[:args.limit]

    if not to_process:
        print("✅ All caught up! No new context to generate.")
        return

    print(f"\n🚀 Starting Contextualization using {MODEL_NAME}...")
    print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under RPM limit.")
    
    count = 0
    start_time = time.time()

    for post in to_process:
        count += 1
        elapsed = time.time() - start_time
        # Simple ETA calculation
        avg_time = elapsed / count
        remaining = (len(to_process) - count) * avg_time
        eta_min = int(remaining // 60)
        
        print(f"[{count}/{len(to_process)}] (ETA: {eta_min}m) Processing: {post.name}...")
        
        if args.dry_run:
            continue

        data = extract_metadata_and_content(post)
        if not data: continue

        context_json = generate_context_json(data)
        
        if context_json:
            json_path = context_dir / f"{post.stem}.json"
            with open(json_path, 'w', encoding='utf-8') as f:
                json.dump(context_json, f, separators=(',', ':'))
            
            print(f"  ✅ Saved {json_path.name}")
            
            # THE SAFETY SLEEP
            time.sleep(SAFETY_SLEEP_SECONDS)
        else:
            print("  ❌ Failed to generate context.")
            time.sleep(2)

    print("\n✨ Batch complete.")

if __name__ == "__main__":
    main()

Migration Step (One-Time)

Since we changed the config file from api_key.txt (raw string) to keys.json (JSON object), you’ll need to create that JSON file manually once, or let the script create it for you.

You can manually create ~/.config/articleizer/keys.json:

{
  "default": "YOUR_OLD_KEY_HERE",
  "personal": "ANOTHER_KEY_HERE",
  "work": "YET_ANOTHER_KEY"
}

Me: Okay, we put the new changes in place:

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ xv contextualizer.py 
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ git --no-pager diff
diff --git a/scripts/articles/contextualizer.py b/scripts/articles/contextualizer.py
index 1110041a..706e082d 100644
--- a/scripts/articles/contextualizer.py
+++ b/scripts/articles/contextualizer.py
@@ -1,25 +1,20 @@
 import os
 import sys
 import json
-import re
 import time
 import argparse
 import getpass
 from pathlib import Path
-from datetime import datetime
 import google.generativeai as genai
 import frontmatter  # Requires: pip install python-frontmatter
 
 # --- CONFIGURATION ---
 CONFIG_DIR = Path.home() / ".config" / "articleizer"
-API_KEY_FILE = CONFIG_DIR / "api_key.txt"
+KEYS_FILE = CONFIG_DIR / "keys.json"
 TARGETS_FILE = CONFIG_DIR / "targets.json"
 
-# CRITICAL: Using 1.5-flash gives 1,500 RPD (Requests Per Day) vs ~500 for 2.5
+# MODEL CONFIGURATION
 MODEL_NAME = 'gemini-2.5-flash-lite' 
-
-# Timing: 60s / 15 requests = 4s minimum.
-# We use 5s sleep + execution time to stay safely around ~8-10 RPM.
 SAFETY_SLEEP_SECONDS = 5
 
 DEFAULT_TARGETS = {
@@ -41,20 +36,40 @@ def load_targets():
 
 PROJECT_TARGETS = load_targets()
 
-def get_api_key():
-    """Gets API key from config or prompts user."""
-    if API_KEY_FILE.is_file():
-        return API_KEY_FILE.read_text().strip()
+def get_api_key(key_name="default"):
+    """Gets a specific named API key from keys.json."""
+    
+    # 1. Load existing keys
+    keys = {}
+    if KEYS_FILE.exists():
+        try:
+            with open(KEYS_FILE, 'r') as f:
+                keys = json.load(f)
+        except json.JSONDecodeError:
+            print(f"❌ Error: {KEYS_FILE} is corrupt.")
+            sys.exit(1)
+
+    # 2. Check if requested key exists
+    if key_name in keys:
+        return keys[key_name]
     
-    print("Google API Key not found in config.")
-    key = getpass.getpass("Enter your Google API Key: ")
-    if key.strip():
-        save = input("Save key? (y/n): ").lower()
+    # 3. If not found, prompt to add it
+    print(f"⚠️ API Key '{key_name}' not found in {KEYS_FILE}.")
+    new_key = getpass.getpass(f"Enter Google API Key for '{key_name}': ").strip()
+    
+    if new_key:
+        save = input(f"Save key '{key_name}' to config? (y/n): ").lower()
         if save == 'y':
+            keys[key_name] = new_key
             CONFIG_DIR.mkdir(parents=True, exist_ok=True)
-            API_KEY_FILE.write_text(key.strip())
-            API_KEY_FILE.chmod(0o600)
-    return key.strip()
+            with open(KEYS_FILE, 'w') as f:
+                json.dump(keys, f, indent=2)
+            KEYS_FILE.chmod(0o600)
+            print(f"✅ Key '{key_name}' saved.")
+        return new_key
+    else:
+        print("❌ No key provided. Exiting.")
+        sys.exit(1)
 
 def extract_metadata_and_content(file_path):
     """Reads markdown file, extracts YAML frontmatter and body."""
@@ -70,7 +85,7 @@ def extract_metadata_and_content(file_path):
         return None
 
 def generate_context_json(article_data):
-    """Calls Gemini to compress the article, with retry logic."""
+    """Calls Gemini to compress the article, with strict quota checking."""
     
     prompt = f"""
     You are a Knowledge Graph Architect. Your goal is to compress the provided technical article into a 'Holographic Shard'—a minimal JSON object that acts as a context pointer for a Retrieval Augmented Generation system.
@@ -124,35 +139,41 @@ def generate_context_json(article_data):
         except Exception as e:
             error_msg = str(e)
             attempt += 1
+            
+            # --- INTELLIGENT ERROR HANDLING ---
 
-            # 1. CRITICAL: Daily Quota Exceeded (ResourceExhausted)
-            # No point retrying. Stop the script to protect account status.
-            if "ResourceExhausted" in error_msg:
-                print(f"\n🛑 Daily Quota (1,500 RPD) Exceeded. Script stopping safely.")

## The 20 RPD Free Tier Limit: An Unexpected Discovery
-                sys.exit(0)
+            # Case A: DAILY QUOTA (RPD) - Hard Stop
+            # 429 often serves both, so we look for specific text or assume worst case if persistent
+            if "ResourceExhausted" in error_msg or "quota" in error_msg.lower():
+                print(f"\n🛑 HARD STOP: Quota Exceeded.")
+                print(f"   Model: {MODEL_NAME}")
+                print(f"   Error: {error_msg.split('Violations')[0] if 'Violations' in error_msg else error_msg[:200]}...")
+                sys.exit(0) # Exit immediately, do not retry.
 
-            # 2. Rate Limit (429) or Server Error (500/503)
-            # We wait and retry.
+            # Case B: RATE LIMIT (RPM) or SERVER ERROR - Soft Retry
             if "429" in error_msg or "500" in error_msg or "503" in error_msg:
                 if attempt < max_retries:
-                    wait_time = 30 * attempt # Linear backoff: 30s, 60s, 90s
-                    print(f"  ⚠️ Transient error ({e}). Retrying in {wait_time}s...")
+                    wait_time = 10 * attempt # Short backoff: 10s, 20s, 30s
+                    print(f"  ⚠️ Transient error (RPM/Server). Retrying in {wait_time}s...")
                     time.sleep(wait_time)
                 else:
                     print(f"  ❌ Failed after {max_retries} attempts.")
                     return None
             else:
-                # Other errors (syntax, etc) - Don't retry
+                # Case C: Other Errors (Syntax, Auth, etc)
                 print(f"  ⚠️ Non-retriable error: {e}")
                 return None
     return None
 
 def main():
     parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
-    # Default limit set to 1000 to match your batch size
     parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
     parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
     parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
+    
+    # NEW ARGUMENT: Key Selection
+    parser.add_argument('-k', '--key', type=str, default="default", help="Name of the API key to use from keys.json (default: 'default')")
+    
     args = parser.parse_args()
 
     # Target Selection
@@ -176,9 +197,11 @@ def main():
         context_dir.mkdir(exist_ok=True)
 
     if not args.dry_run:
-        api_key = get_api_key()
+        # Load the specific key requested via CLI
+        api_key = get_api_key(args.key)
         if not api_key: return
         genai.configure(api_key=api_key)
+        print(f"🔑 Using API Key: '{args.key}'")
 
     all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True)
     to_process = []
@@ -203,7 +226,7 @@ def main():
         return
 
     print(f"\n🚀 Starting Contextualization using {MODEL_NAME}...")
-    print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under 15 RPM limit.")
+    print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under RPM limit.")
     
     count = 0
     start_time = time.time()
@@ -234,14 +257,12 @@ def main():
             print(f"  ✅ Saved {json_path.name}")
             
             # THE SAFETY SLEEP
-            # We sleep unconditionally on success to enforce the rate limit pace.
             time.sleep(SAFETY_SLEEP_SECONDS)
         else:
             print("  ❌ Failed to generate context.")
-            # Even on failure, sleep briefly to avoid hammering if it's a tight loop failure
             time.sleep(2)
 
     print("\n✨ Batch complete.")
 
 if __name__ == "__main__":
-    main()
+    main()
\ No newline at end of file
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

And we try again with this code starting with the same key but switching when we see something about it’s limits are hit:

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500
Select target blog directory:
  [1] MikeLev.in (Public)
  [2] Pipulate.com (Future)
Enter choice (1..): 1
🔑 Using API Key: 'default'

🔍 Scanning /home/mike/repos/MikeLev.in/_posts...
Found 718 articles.
📝 624 articles need context generation.
⚠️ Limiting processing to first 500 items.

🚀 Starting Contextualization using gemini-2.5-flash-lite...
ℹ️  Pacing: ~5s per item to stay under RPM limit.
[1/500] (ETA: 0m) Processing: 2025-10-28-dataframe-chisel-strikes-precision-data-ai-audits.md...

🛑 HARD STOP: Quota Exceeded.
   Model: gemini-2.5-flash-lite
   Error: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current...
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500 -k pipulate
Select target blog directory:
  [1] MikeLev.in (Public)
  [2] Pipulate.com (Future)
Enter choice (1..): 1
🔑 Using API Key: 'pipulate'

🔍 Scanning /home/mike/repos/MikeLev.in/_posts...
Found 718 articles.
📝 624 articles need context generation.
⚠️ Limiting processing to first 500 items.

🚀 Starting Contextualization using gemini-2.5-flash-lite...
ℹ️  Pacing: ~5s per item to stay under RPM limit.
[1/500] (ETA: 0m) Processing: 2025-10-28-dataframe-chisel-strikes-precision-data-ai-audits.md...
  ✅ Saved 2025-10-28-dataframe-chisel-strikes-precision-data-ai-audits.json
[2/500] (ETA: 28m) Processing: 2025-10-28-automating-professional-excel-deliverables-url-audit-reports.md...
  ✅ Saved 2025-10-28-automating-professional-excel-deliverables-url-audit-reports.json
[3/500] (ETA: 38m) Processing: 2025-10-28-ai-regressions-human-ai-empathy-nomad-future-blueprint.md...
  ✅ Saved 2025-10-28-ai-regressions-human-ai-empathy-nomad-future-blueprint.json
[4/500] (ETA: 43m) Processing: 2025-10-28-ai-debugging-chisel-strike-blueprint.md...
  ✅ Saved 2025-10-28-ai-debugging-chisel-strike-blueprint.json
[5/500] (ETA: 45m) Processing: 2025-10-27-urlinspector-pole-vaulting-mental-blocks-ai-first-web.md...
  ✅ Saved 2025-10-27-urlinspector-pole-vaulting-mental-blocks-ai-first-web.json
[6/500] (ETA: 47m) Processing: 2025-10-27-the-absolute-path-to-unsilencing-subprocess-failures-in-ai-workflows.md...
  ✅ Saved 2025-10-27-the-absolute-path-to-unsilencing-subprocess-failures-in-ai-workflows.json
[7/500] (ETA: 48m) Processing: 2025-10-27-seo-gadget-automated-data-extraction-blueprint.md...
  ✅ Saved 2025-10-27-seo-gadget-automated-data-extraction-blueprint.json
[8/500] (ETA: 49m) Processing: 2025-10-27-python-web-scraping-refactoring-stealth-tooling.md...
  ✅ Saved 2025-10-27-python-web-scraping-refactoring-stealth-tooling.json
[9/500] (ETA: 50m) Processing: 2025-10-27-pragmatic-tooling-dogpile-desert-kite-wet-ai.md...
  ✅ Saved 2025-10-27-pragmatic-tooling-dogpile-desert-kite-wet-ai.json
[10/500] (ETA: 51m) Processing: 2025-10-27-navigating-ai-web-urlinspector-ecommerce-variant-dilemma.md...
  ✅ Saved 2025-10-27-navigating-ai-web-urlinspector-ecommerce-variant-dilemma.json
[11/500] (ETA: 51m) Processing: 2025-10-27-fanciful-hooks-functional-tools-ai-workflows.md...
  ✅ Saved 2025-10-27-fanciful-hooks-functional-tools-ai-workflows.json
[12/500] (ETA: 51m) Processing: 2025-10-27-coachmans-reins-ai-workflow-seo-gadget.md...
  ✅ Saved 2025-10-27-coachmans-reins-ai-workflow-seo-gadget.json
[13/500] (ETA: 52m) Processing: 2025-10-27-ai-workflow-blueprint-pipulate-funnel.md...
  ✅ Saved 2025-10-27-ai-workflow-blueprint-pipulate-funnel.json
[14/500] (ETA: 52m) Processing: 2025-10-26-poof-principle-ai-context-engineering.md...
  ✅ Saved 2025-10-26-poof-principle-ai-context-engineering.json
[15/500] (ETA: 52m) Processing: 2025-10-26-ghola-project-vigilant-innovator-sovereign-systems-ai.md...
  ✅ Saved 2025-10-26-ghola-project-vigilant-innovator-sovereign-systems-ai.json
[16/500] (ETA: 52m) Processing: 2025-10-26-context-mastery-age-of-ai-ibm-fire-hose.md...
  ✅ Saved 2025-10-26-context-mastery-age-of-ai-ibm-fire-hose.json
[17/500] (ETA: 52m) Processing: 2025-10-26-api-key-validation-architectural-exorcism-silent-failures.md...
  ✅ Saved 2025-10-26-api-key-validation-architectural-exorcism-silent-failures.json
[18/500] (ETA: 52m) Processing: 2025-10-25-the-master-prompt-crafting-ais-waking-reality-forging-gdiff-chisel.md...
  ✅ Saved 2025-10-25-the-master-prompt-crafting-ais-waking-reality-forging-gdiff-chisel.json
[19/500] (ETA: 53m) Processing: 2025-10-25-pipulates-inflection-point-cathedral-of-one-ai.md...
  ✅ Saved 2025-10-25-pipulates-inflection-point-cathedral-of-one-ai.json
[20/500] (ETA: 52m) Processing: 2025-10-25-grokking-codebase-chisel-strikes-cultural-threads-workflow.md...
  ✅ Saved 2025-10-25-grokking-codebase-chisel-strikes-cultural-threads-workflow.json
[21/500] (ETA: 53m) Processing: 2025-10-25-google-ai-buffalo-jump-infrastructure-intelligence-as-a-service.md...
  ✅ Saved 2025-10-25-google-ai-buffalo-jump-infrastructure-intelligence-as-a-service.json
[22/500] (ETA: 53m) Processing: 2025-10-25-echoes-gobekli-tepe-ai-buffalo-jump-abundance.md...

🛑 HARD STOP: Quota Exceeded.
   Model: gemini-2.5-flash-lite
   Error: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current...
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

Clearly we’re hitting limits much lower than what were reported. Can we adjust the program to output start and end times so when I show you the output you will have some ability to tell which limit was hit. The fact that 21 got processed each time and it’s the 22nd attempt that failed in both cases is suspicious. But it could be a token size thing and a coincidence. So maybe we ought to output the token size of each article processed as well. You can look at the logic of my prompt_foo.py file that does much the same thing.

(nix) 
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py 
--- Processing Files ---
Python file(s) detected. Generating codebase tree diagram... (2,336 tokens | 7,686 bytes)
Python file(s) detected. Generating UML diagrams...
   -> Generating for /home/mike/repos/pipulate/prompt_foo.py... (255 tokens | 3,010 bytes)
   -> Generating for /home/mike/repos/pipulate/scripts/articles/contextualizer.py... (skipped)
...UML generation complete.

--- Files Included ---
• /home/mike/repos/pipulate/prompt_foo.py (7,980 tokens | 36,260 bytes)
• /home/mike/repos/pipulate/scripts/articles/contextualizer.py (2,247 tokens | 9,310 bytes)

--- Auto-Context Included ---
• Codebase Structure (eza --tree) (2,336 tokens | 7,686 bytes)
• UML Class Diagram (ASCII for /home/mike/repos/pipulate/prompt_foo.py) (255 tokens | 3,010 bytes)

--- Prompt Summary ---
Summed Tokens:   36,166 (from section parts)
Verified Tokens: 36,452 (from final output)
  (Difference: +286)
Total Words:     4,994 (content only)
Total Chars:     141,875
Total Bytes:     145,165 (UTF-8)

--- Size Perspective ---
📚 Equivalent in length to a **Short Story** (Note: With a token/word ratio of 7.30, this content is far denser and more complex than typical prose of this length).

Markdown output copied to clipboard
(nix) 
[mike@nixos:~/repos/pipulate]$

Hi-Ya!

Gemini: You have excellent instincts. That consistency (failing on the 22nd attempt) is the smoking gun. It strongly suggests a hard cap rather than a rate/speed issue.

Looking closely at the error trace you provided in the previous turn, the answer was actually buried in the “Violations” section of the error message:

quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
quota_value: 20

The limit for gemini-2.5-flash-lite on the Free Tier is currently set to 20 Requests Per Day (RPD).

It seems the “Lite” model is in a highly restricted preview state.

However, to confirm this and to debug future limit changes properly, I have updated the script to include:

Enhancing Visibility: Integrating Token Counts and Request Timing

Tiktoken integration: To count input tokens exactly like your prompt_foo.py does.
Stopwatch Timing: Tracks exact start/end times per request.
Detailed Logging: Shows input tokens and request duration for every single item.

The Updated Script (`contextualizer.py`)

import os
import sys
import json
import time
import argparse
import getpass
from pathlib import Path
from datetime import datetime
import google.generativeai as genai
import frontmatter
import tiktoken  # Requires: pip install tiktoken

# --- CONFIGURATION ---
CONFIG_DIR = Path.home() / ".config" / "articleizer"
KEYS_FILE = CONFIG_DIR / "keys.json"
TARGETS_FILE = CONFIG_DIR / "targets.json"

# MODEL CONFIGURATION
# Note: 2.5-flash-lite appears to have a 20 RPD limit in preview.
# You may need to switch back to 'gemini-2.0-flash-exp' or similar if available.
MODEL_NAME = 'gemini-2.5-flash-lite' 
SAFETY_SLEEP_SECONDS = 5

DEFAULT_TARGETS = {
    "1": {
        "name": "Local Project (Default)",
        "path": "./_posts"
    }
}

def load_targets():
    """Loads publishing targets from external config or falls back to default."""
    if TARGETS_FILE.exists():
        try:
            with open(TARGETS_FILE, 'r') as f:
                return json.load(f)
        except json.JSONDecodeError:
            print(f"⚠️ Warning: {TARGETS_FILE} is corrupt. Using defaults.")
    return DEFAULT_TARGETS

PROJECT_TARGETS = load_targets()

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    """Estimates token count using tiktoken (consistent with prompt_foo.py)."""
    try:
        encoding = tiktoken.encoding_for_model(model)
        return len(encoding.encode(text))
    except Exception:
        # Fallback if specific model encoding not found
        return len(text.split())

def get_api_key(key_name="default"):
    """Gets a specific named API key from keys.json."""
    keys = {}
    if KEYS_FILE.exists():
        try:
            with open(KEYS_FILE, 'r') as f:
                keys = json.load(f)
        except json.JSONDecodeError:
            print(f"❌ Error: {KEYS_FILE} is corrupt.")
            sys.exit(1)

    if key_name in keys:
        return keys[key_name]
    
    print(f"⚠️ API Key '{key_name}' not found in {KEYS_FILE}.")
    new_key = getpass.getpass(f"Enter Google API Key for '{key_name}': ").strip()
    
    if new_key:
        save = input(f"Save key '{key_name}' to config? (y/n): ").lower()
        if save == 'y':
            keys[key_name] = new_key
            CONFIG_DIR.mkdir(parents=True, exist_ok=True)
            with open(KEYS_FILE, 'w') as f:
                json.dump(keys, f, indent=2)
            KEYS_FILE.chmod(0o600)
            print(f"✅ Key '{key_name}' saved.")
        return new_key
    else:
        print("❌ No key provided. Exiting.")
        sys.exit(1)

def extract_metadata_and_content(file_path):
    """Reads markdown file, extracts YAML frontmatter and body."""
    try:
        post = frontmatter.load(file_path)
        return {
            "frontmatter": post.metadata,
            "content": post.content,
            "filename": file_path.stem
        }
    except Exception as e:
        print(f"❌ Error reading {file_path.name}: {e}")
        return None

def generate_context_json(article_data, token_count):
    """Calls Gemini to compress the article, with strict quota checking."""
    
    prompt = f"""
    You are a Knowledge Graph Architect. Your goal is to compress the provided technical article into a 'Holographic Shard'—a minimal JSON object that acts as a context pointer for a Retrieval Augmented Generation system.

    **Goal:** Fit maximum semantic meaning into approximately 200 tokens.

    **Input Data:**
    - Title: {article_data['frontmatter'].get('title', 'Unknown')}
    - Date: {article_data['frontmatter'].get('date', 'Unknown')}
    - Filename: {article_data['filename']}
    - Content: 
    {article_data['content'][:15000]} 

    **Instructions:**
    1. **Analyze:** Read the content. Look past the title. Find specific technologies, concepts, or "aha!" moments.
    2. **Extract Sub-topics ('sub'):** Identify 3-5 distinct, specific sub-topics.
    3. **Summarize ('s'):** Write a concise 1-2 sentence summary of the core thesis.
    4. **Keywords ('kw'):** Extract 3-5 high-value technical keywords.

    **Output Format:**
    Provide ONLY a valid JSON object.
    
    Schema:
    {{
      "id": "{article_data['filename']}",
      "d": "YYYY-MM-DD", 
      "t": "Article Title",
      "s": "Concise Summary",
      "sub": ["Subtopic 1", "Subtopic 2", "Subtopic 3"],
      "kw": ["Keyword1", "Keyword2"]
    }}
    """

    model = genai.GenerativeModel(MODEL_NAME)
    
    max_retries = 3
    attempt = 0

    while attempt < max_retries:
        try:
            req_start = time.time()
            response = model.generate_content(prompt)
            req_end = time.time()
            duration = req_end - req_start

            text = response.text.strip()
            
            # Clean up Markdown code blocks
            if text.startswith("```json"): text = text[7:]
            if text.startswith("```"): text = text[3:]
            if text.endswith("```"): text = text[:-3]
            
            return json.loads(text.strip()), duration

        except Exception as e:
            error_msg = str(e)
            attempt += 1
            
            # Case A: DAILY QUOTA (RPD) - Hard Stop
            if "ResourceExhausted" in error_msg or "quota" in error_msg.lower():
                print(f"\n🛑 HARD STOP: Quota Exceeded.")
                print(f"   Model: {MODEL_NAME}")
                print(f"   Input Tokens: {token_count}")
                # Try to extract helpful info from the error message
                if "quota_metric" in error_msg:
                    print(f"   Details: {error_msg.split('violations')[0][:300]}...") 
                else:
                    print(f"   Error: {error_msg[:200]}...")
                sys.exit(0) 

            # Case B: RATE LIMIT (RPM) or SERVER ERROR - Soft Retry
            if "429" in error_msg or "500" in error_msg or "503" in error_msg:
                if attempt < max_retries:
                    wait_time = 10 * attempt 
                    print(f"  ⚠️ Transient error ({e}). Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    print(f"  ❌ Failed after {max_retries} attempts.")
                    return None, 0
            else:
                print(f"  ⚠️ Non-retriable error: {e}")
                return None, 0
    return None, 0

def main():
    parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
    parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
    parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
    parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
    parser.add_argument('-k', '--key', type=str, default="default", help="Name of the API key to use from keys.json (default: 'default')")
    
    args = parser.parse_args()

    # Target Selection
    print("Select target blog directory:")
    for key, target in PROJECT_TARGETS.items():
        print(f"  [{key}] {target['name']}")
    
    choice = input("Enter choice (1..): ").strip()
    if choice not in PROJECT_TARGETS:
        print("Invalid choice.")
        return

    posts_dir = Path(PROJECT_TARGETS[choice]['path']).resolve()
    context_dir = posts_dir / "_context"
    
    if not posts_dir.exists():
        print(f"❌ Error: Directory {posts_dir} does not exist.")
        return

    if not args.dry_run:
        context_dir.mkdir(exist_ok=True)

    if not args.dry_run:
        api_key = get_api_key(args.key)
        if not api_key: return
        genai.configure(api_key=api_key)
        print(f"🔑 Using API Key: '{args.key}'")

    all_posts = sorted(list(posts_dir.glob("*.md")), reverse=True)
    to_process = []

    print(f"\n🔍 Scanning {posts_dir}...")
    
    for post in all_posts:
        json_path = context_dir / f"{post.stem}.json"
        
        if not json_path.exists() or args.force:
            to_process.append(post)

    print(f"Found {len(all_posts)} articles.")
    print(f"📝 {len(to_process)} articles need context generation.")
    
    if args.limit and len(to_process) > args.limit:
        print(f"⚠️ Limiting processing to first {args.limit} items.")
        to_process = to_process[:args.limit]

    if not to_process:
        print("✅ All caught up! No new context to generate.")
        return

    print(f"\n🚀 Starting Contextualization using {MODEL_NAME}...")
    print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under RPM limit.")
    
    count = 0
    batch_start_time = time.time()

    for post in to_process:
        count += 1
        elapsed = time.time() - batch_start_time
        
        if args.dry_run:
            print(f"[{count}] Dry run: {post.name}")
            continue

        data = extract_metadata_and_content(post)
        if not data: continue

        # Calculate Tokens BEFORE sending
        input_tokens = count_tokens(data['content'][:15000])
        
        # Log Start
        print(f"[{count}/{len(to_process)}] Processing: {post.name}")
        print(f"  ↳ Input Tokens: {input_tokens} ... ", end='', flush=True)

        context_json, duration = generate_context_json(data, input_tokens)
        
        if context_json:
            json_path = context_dir / f"{post.stem}.json"
            with open(json_path, 'w', encoding='utf-8') as f:
                json.dump(context_json, f, separators=(',', ':'))
            
            print(f"✅ Saved ({duration:.2f}s)")
            
            time.sleep(SAFETY_SLEEP_SECONDS)
        else:
            print(f"❌ Failed.")
            time.sleep(2)

    print("\n✨ Batch complete.")

if __name__ == "__main__":
    main()

The Diff

[Gemini actually generated the diff!]

Me: Gemini, you don’t show the diff. I show the diff. It’s a process I do to create a ground truth. You generating the diff can just be compounded generative errors. Me taking your new code, applying it and only THEN showing the diff has an entirely different effect. It defends against generative drift instead of making one even more vulnerable to it.

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ xv contextualizer.py 
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ git --no-pager diff
diff --git a/scripts/articles/contextualizer.py b/scripts/articles/contextualizer.py
index 706e082d..7c8c3977 100644
--- a/scripts/articles/contextualizer.py
+++ b/scripts/articles/contextualizer.py
@@ -5,8 +5,10 @@ import time
 import argparse
 import getpass
 from pathlib import Path
+from datetime import datetime
 import google.generativeai as genai
-import frontmatter  # Requires: pip install python-frontmatter
+import frontmatter
+import tiktoken  # Requires: pip install tiktoken
 
 # --- CONFIGURATION ---
 CONFIG_DIR = Path.home() / ".config" / "articleizer"
@@ -14,6 +16,8 @@ KEYS_FILE = CONFIG_DIR / "keys.json"
 TARGETS_FILE = CONFIG_DIR / "targets.json"
 
 # MODEL CONFIGURATION
+# Note: 2.5-flash-lite appears to have a 20 RPD limit in preview.
+# You may need to switch back to 'gemini-2.0-flash-exp' or similar if available.
 MODEL_NAME = 'gemini-2.5-flash-lite' 
 SAFETY_SLEEP_SECONDS = 5
 
@@ -36,10 +40,17 @@ def load_targets():
 
 PROJECT_TARGETS = load_targets()
 
+def count_tokens(text: str, model: str = "gpt-4o") -> int:
+    """Estimates token count using tiktoken (consistent with prompt_foo.py)."""
+    try:
+        encoding = tiktoken.encoding_for_model(model)
+        return len(encoding.encode(text))
+    except Exception:
+        # Fallback if specific model encoding not found
+        return len(text.split())
+
 def get_api_key(key_name="default"):
     """Gets a specific named API key from keys.json."""
-    
-    # 1. Load existing keys
     keys = {}
     if KEYS_FILE.exists():
         try:
@@ -49,11 +60,9 @@ def get_api_key(key_name="default"):
             print(f"❌ Error: {KEYS_FILE} is corrupt.")
             sys.exit(1)
 
-    # 2. Check if requested key exists
     if key_name in keys:
         return keys[key_name]
     
-    # 3. If not found, prompt to add it
     print(f"⚠️ API Key '{key_name}' not found in {KEYS_FILE}.")
     new_key = getpass.getpass(f"Enter Google API Key for '{key_name}': ").strip()
     
@@ -84,7 +93,7 @@ def extract_metadata_and_content(file_path):
         print(f"❌ Error reading {file_path.name}: {e}")
         return None
 
-def generate_context_json(article_data):
+def generate_context_json(article_data, token_count):
     """Calls Gemini to compress the article, with strict quota checking."""
     
     prompt = f"""
@@ -126,7 +135,11 @@ def generate_context_json(article_data):
 
     while attempt < max_retries:
         try:
+            req_start = time.time()
             response = model.generate_content(prompt)
+            req_end = time.time()
+            duration = req_end - req_start
+
             text = response.text.strip()
             
             # Clean up Markdown code blocks
@@ -134,44 +147,43 @@ def generate_context_json(article_data):
             if text.startswith("```"): text = text[3:]
             if text.endswith("```"): text = text[:-3]
             
-            return json.loads(text.strip())
+            return json.loads(text.strip()), duration
 
         except Exception as e:
             error_msg = str(e)
             attempt += 1
             
-            # --- INTELLIGENT ERROR HANDLING ---
-
             # Case A: DAILY QUOTA (RPD) - Hard Stop
-            # 429 often serves both, so we look for specific text or assume worst case if persistent
             if "ResourceExhausted" in error_msg or "quota" in error_msg.lower():
                 print(f"\n🛑 HARD STOP: Quota Exceeded.")
                 print(f"   Model: {MODEL_NAME}")
-                print(f"   Error: {error_msg.split('Violations')[0] if 'Violations' in error_msg else error_msg[:200]}...")
-                sys.exit(0) # Exit immediately, do not retry.
+                print(f"   Input Tokens: {token_count}")
+                # Try to extract helpful info from the error message
+                if "quota_metric" in error_msg:
+                    print(f"   Details: {error_msg.split('violations')[0][:300]}...") 
+                else:
+                    print(f"   Error: {error_msg[:200]}...")
+                sys.exit(0) 
 
             # Case B: RATE LIMIT (RPM) or SERVER ERROR - Soft Retry
             if "429" in error_msg or "500" in error_msg or "503" in error_msg:
                 if attempt < max_retries:
-                    wait_time = 10 * attempt # Short backoff: 10s, 20s, 30s
-                    print(f"  ⚠️ Transient error (RPM/Server). Retrying in {wait_time}s...")
+                    wait_time = 10 * attempt 
+                    print(f"  ⚠️ Transient error ({e}). Retrying in {wait_time}s...")
                     time.sleep(wait_time)
                 else:
                     print(f"  ❌ Failed after {max_retries} attempts.")
-                    return None
+                    return None, 0
             else:
-                # Case C: Other Errors (Syntax, Auth, etc)
                 print(f"  ⚠️ Non-retriable error: {e}")
-                return None
-    return None
+                return None, 0
+    return None, 0
 
 def main():
     parser = argparse.ArgumentParser(description="Generate AI context JSONs for markdown articles.")
     parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
     parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
     parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
-    
-    # NEW ARGUMENT: Key Selection
     parser.add_argument('-k', '--key', type=str, default="default", help="Name of the API key to use from keys.json (default: 'default')")
     
     args = parser.parse_args()
@@ -197,7 +209,6 @@ def main():
         context_dir.mkdir(exist_ok=True)
 
     if not args.dry_run:
-        # Load the specific key requested via CLI
         api_key = get_api_key(args.key)
         if not api_key: return
         genai.configure(api_key=api_key)
@@ -229,37 +240,38 @@ def main():
     print(f"ℹ️  Pacing: ~{SAFETY_SLEEP_SECONDS}s per item to stay under RPM limit.")
     
     count = 0
-    start_time = time.time()
+    batch_start_time = time.time()
 
     for post in to_process:
         count += 1
-        elapsed = time.time() - start_time
-        # Simple ETA calculation
-        avg_time = elapsed / count
-        remaining = (len(to_process) - count) * avg_time
-        eta_min = int(remaining // 60)
-        
-        print(f"[{count}/{len(to_process)}] (ETA: {eta_min}m) Processing: {post.name}...")
+        elapsed = time.time() - batch_start_time
         
         if args.dry_run:
+            print(f"[{count}] Dry run: {post.name}")
             continue
 
         data = extract_metadata_and_content(post)
         if not data: continue
 
-        context_json = generate_context_json(data)
+        # Calculate Tokens BEFORE sending
+        input_tokens = count_tokens(data['content'][:15000])
+        
+        # Log Start
+        print(f"[{count}/{len(to_process)}] Processing: {post.name}")
+        print(f"  ↳ Input Tokens: {input_tokens} ... ", end='', flush=True)
+
+        context_json, duration = generate_context_json(data, input_tokens)
         
         if context_json:
             json_path = context_dir / f"{post.stem}.json"
             with open(json_path, 'w', encoding='utf-8') as f:
                 json.dump(context_json, f, separators=(',', ':'))
             
-            print(f"  ✅ Saved {json_path.name}")
+            print(f"✅ Saved ({duration:.2f}s)")
             
-            # THE SAFETY SLEEP
             time.sleep(SAFETY_SLEEP_SECONDS)
         else:
-            print("  ❌ Failed to generate context.")
+            print(f"❌ Failed.")
             time.sleep(2)
 
     print("\n✨ Batch complete.")
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

And we try again with a new key:

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500
Select target blog directory:
  [1] MikeLev.in (Public)
  [2] Pipulate.com (Future)
Enter choice (1..): 1
🔑 Using API Key: 'default'

🔍 Scanning /home/mike/repos/MikeLev.in/_posts...
Found 718 articles.
📝 603 articles need context generation.
⚠️ Limiting processing to first 500 items.

🚀 Starting Contextualization using gemini-2.5-flash-lite...
ℹ️  Pacing: ~5s per item to stay under RPM limit.
[1/500] Processing: 2025-10-25-echoes-gobekli-tepe-ai-buffalo-jump-abundance.md
  ↳ Input Tokens: 3819 ... 
🛑 HARD STOP: Quota Exceeded.
   Model: gemini-2.5-flash-lite
   Input Tokens: 3819
   Details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelangua...
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500 -k work
Select target blog directory:
  [1] MikeLev.in (Public)
  [2] Pipulate.com (Future)
Enter choice (1..): 1
🔑 Using API Key: 'work'

🔍 Scanning /home/mike/repos/MikeLev.in/_posts...
Found 718 articles.
📝 603 articles need context generation.
⚠️ Limiting processing to first 500 items.

🚀 Starting Contextualization using gemini-2.5-flash-lite...
ℹ️  Pacing: ~5s per item to stay under RPM limit.
[1/500] Processing: 2025-10-25-echoes-gobekli-tepe-ai-buffalo-jump-abundance.md
  ↳ Input Tokens: 3819 ... ✅ Saved (3.56s)
[2/500] Processing: 2025-10-25-beyond-hello-world-future-proof-python-nix-pipulate.md
  ↳ Input Tokens: 2420 ... ✅ Saved (1.80s)
[3/500] Processing: 2025-10-25-aie-mike-levin-blueprint-durable-computing-sovereignty.md
  ↳ Input Tokens: 3570 ... ✅ Saved (2.04s)
[4/500] Processing: 2025-10-24-master-prompt-chip-otheseus-ai-persistence.md
  ↳ Input Tokens: 3379 ... ✅ Saved (2.00s)
[5/500] Processing: 2025-10-24-init-command-choreographing-ai-code-consciousness.md
  ↳ Input Tokens: 3202 ... ✅ Saved (1.38s)
[6/500] Processing: 2025-10-24-human-os-engineering-optimism-ai-workflow-refinement.md
  ↳ Input Tokens: 3488 ... ✅ Saved (1.97s)
[7/500] Processing: 2025-10-24-ai-hardware-war-antitrust-human-agency.md
  ↳ Input Tokens: 3372 ... ✅ Saved (2.24s)
[8/500] Processing: 2025-10-23-scaling-gapalyzer-state-storage-performance.md
  ↳ Input Tokens: 3812 ... ✅ Saved (2.17s)
[9/500] Processing: 2025-10-23-pipulates-blueprint-for-ai-collaboration.md
  ↳ Input Tokens: 3402 ... ✅ Saved (1.59s)
[10/500] Processing: 2025-10-23-navigating-pythons-frontier-notebooks-assignment-expressions-secure-templating-ai.md
  ↳ Input Tokens: 3434 ... ✅ Saved (8.76s)
[11/500] Processing: 2025-10-23-javascript-seo-for-orphaned-products-a-tactical-pivot.md
  ↳ Input Tokens: 3516 ... ✅ Saved (1.92s)
[12/500] Processing: 2025-10-23-decoding-ai-personal-odyssey-models-minds.md
  ↳ Input Tokens: 3475 ... ✅ Saved (1.58s)
[13/500] Processing: 2025-10-23-ai-driven-pipeline-refinement-antifragile-data-workflows.md
  ↳ Input Tokens: 3327 ... ✅ Saved (1.70s)
[14/500] Processing: 2025-10-22-pipulates-client-side-wisdom-cross-platform-ai-polish.md
  ↳ Input Tokens: 3672 ... ✅ Saved (1.98s)
[15/500] Processing: 2025-10-22-pipulate-forging-ai-body-mastering-digital-wild.md
  ↳ Input Tokens: 3219 ... ✅ Saved (1.85s)
[16/500] Processing: 2025-10-22-ai-first-steps-pipulate-onboarding.md
  ↳ Input Tokens: 3393 ... ✅ Saved (1.80s)
[17/500] Processing: 2025-10-21-stealth-automation-jupyter-rich-debugging.md
  ↳ Input Tokens: 3184 ... ✅ Saved (1.79s)
[18/500] Processing: 2025-10-21-pipulate-aie-multi-platform-validation.md
  ↳ Input Tokens: 3476 ... ✅ Saved (1.87s)
[19/500] Processing: 2025-10-21-jupyter-rich-output-taming-cosmic-force-process-isolation.md
  ↳ Input Tokens: 3773 ... 
🛑 HARD STOP: Quota Exceeded.
   Model: gemini-2.5-flash-lite
   Input Tokens: 3773
   Details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelangua...
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

And that is MUCH better output. But you said on the last response that the answer was actually buried in the “Violations” section of the error message you found:

quota_metric: “generativelanguage.googleapis.com/generate_content_free_tier_requests”
quota_id: “GenerateRequestsPerDayPerProjectPerModel-FreeTier”
quota_value: 20

When such smoking gun evidence of why something went wrong is found it should be included in why it failed and not just the mysterious message:

Details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit.

Quota exceeded for metric: generativelangua…

…when we have such better information! Please update the script to give exactly why the run failed with as much detail as possible.

Me: Okay, I drop it in location and pin the diff.

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ xv contextualizer.py 
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ git --no-pager diff
diff --git a/scripts/articles/contextualizer.py b/scripts/articles/contextualizer.py
index 7c8c3977..e0bc2529 100644
--- a/scripts/articles/contextualizer.py
+++ b/scripts/articles/contextualizer.py
@@ -153,28 +153,28 @@ def generate_context_json(article_data, token_count):
             error_msg = str(e)
             attempt += 1
             
+            # --- INTELLIGENT ERROR HANDLING ---
+
             # Case A: DAILY QUOTA (RPD) - Hard Stop
+            # 429 often serves both, so we look for specific text or assume worst case if persistent
             if "ResourceExhausted" in error_msg or "quota" in error_msg.lower():
                 print(f"\n🛑 HARD STOP: Quota Exceeded.")
                 print(f"   Model: {MODEL_NAME}")
                 print(f"   Input Tokens: {token_count}")
-                # Try to extract helpful info from the error message
-                if "quota_metric" in error_msg:
-                    print(f"   Details: {error_msg.split('violations')[0][:300]}...") 
-                else:
-                    print(f"   Error: {error_msg[:200]}...")
-                sys.exit(0) 
+                print(f"   Full Error Details:\n{error_msg}") # No truncation!
+                sys.exit(0) # Exit immediately, do not retry.
 
             # Case B: RATE LIMIT (RPM) or SERVER ERROR - Soft Retry
             if "429" in error_msg or "500" in error_msg or "503" in error_msg:
                 if attempt < max_retries:
-                    wait_time = 10 * attempt 
-                    print(f"  ⚠️ Transient error ({e}). Retrying in {wait_time}s...")
+                    wait_time = 10 * attempt # Short backoff: 10s, 20s, 30s
+                    print(f"  ⚠️ Transient error (RPM/Server). Retrying in {wait_time}s...")
                     time.sleep(wait_time)
                 else:
                     print(f"  ❌ Failed after {max_retries} attempts.")
                     return None, 0
             else:
+                # Case C: Other Errors (Syntax, Auth, etc)
                 print(f"  ⚠️ Non-retriable error: {e}")
                 return None, 0
     return None, 0
@@ -184,6 +184,8 @@ def main():
     parser.add_argument('--limit', type=int, default=1000, help="Max number of articles to process this run")
     parser.add_argument('--force', action='store_true', help="Overwrite existing context files")
     parser.add_argument('--dry-run', action='store_true', help="Show what would happen without calling API")
+    
+    # NEW ARGUMENT: Key Selection
     parser.add_argument('-k', '--key', type=str, default="default", help="Name of the API key to use from keys.json (default: 'default')")
     
     args = parser.parse_args()
@@ -209,6 +211,7 @@ def main():
         context_dir.mkdir(exist_ok=True)
 
     if not args.dry_run:
+        # Load the specific key requested via CLI
         api_key = get_api_key(args.key)
         if not api_key: return
         genai.configure(api_key=api_key)
@@ -245,9 +248,14 @@ def main():
     for post in to_process:
         count += 1
         elapsed = time.time() - batch_start_time
+        # Simple ETA calculation
+        avg_time = elapsed / count
+        remaining = (len(to_process) - count) * avg_time
+        eta_min = int(remaining // 60)
+        
+        print(f"[{count}/{len(to_process)}] (ETA: {eta_min}m) Processing: {post.name}...")
         
         if args.dry_run:
-            print(f"[{count}] Dry run: {post.name}")
             continue
 
         data = extract_metadata_and_content(post)
@@ -257,7 +265,6 @@ def main():
         input_tokens = count_tokens(data['content'][:15000])
         
         # Log Start
-        print(f"[{count}/{len(to_process)}] Processing: {post.name}")
         print(f"  ↳ Input Tokens: {input_tokens} ... ", end='', flush=True)
 
         context_json, duration = generate_context_json(data, input_tokens)
@@ -269,9 +276,10 @@ def main():
             
             print(f"✅ Saved ({duration:.2f}s)")
             
+            # THE SAFETY SLEEP
             time.sleep(SAFETY_SLEEP_SECONDS)
         else:
-            print(f"❌ Failed.")
+            print("  ❌ Failed.")
             time.sleep(2)
 
     print("\n✨ Batch complete.")
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

And we test:

(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$ python contextualizer.py --limit 500 -k work2
Select target blog directory:
  [1] MikeLev.in (Public)
  [2] Pipulate.com (Future)
Enter choice (1..): 1
🔑 Using API Key: 'work2'

🔍 Scanning /home/mike/repos/MikeLev.in/_posts...
Found 718 articles.
📝 585 articles need context generation.
⚠️ Limiting processing to first 500 items.

🚀 Starting Contextualization using gemini-2.5-flash-lite...
ℹ️  Pacing: ~5s per item to stay under RPM limit.
[1/500] (ETA: 0m) Processing: 2025-10-21-jupyter-rich-output-taming-cosmic-force-process-isolation.md...
  ↳ Input Tokens: 3773 ... ✅ Saved (2.04s)
[2/500] (ETA: 31m) Processing: 2025-10-21-digital-homesteaders-anthem-reclaiming-tech-nix-htmx-local-ai.md...
  ↳ Input Tokens: 3469 ... ✅ Saved (2.01s)
[3/500] (ETA: 40m) Processing: 2025-10-20-nix-flakes-faquilizer-cross-platform-automation.md...
  ↳ Input Tokens: 3596 ... ✅ Saved (2.01s)
[4/500] (ETA: 44m) Processing: 2025-10-20-faquilizer-mac-automation-nixos-revolution.md...
  ↳ Input Tokens: 3480 ... ✅ Saved (1.71s)
[5/500] (ETA: 46m) Processing: 2025-10-20-context-is-king-prompt-fu-mastery-webopps-devolveneer.md...
  ↳ Input Tokens: 3449 ... ✅ Saved (2.07s)
[6/500] (ETA: 48m) Processing: 2025-10-20-actualizing-ai-age-engineering-context-not-manifestos.md...
  ↳ Input Tokens: 3669 ... ✅ Saved (1.81s)
[7/500] (ETA: 49m) Processing: 2025-10-19-the-no-textconv-twist-nbstripout-nix-git-diff-mystery.md...
  ↳ Input Tokens: 4172 ... ✅ Saved (2.18s)
[8/500] (ETA: 50m) Processing: 2025-10-19-seo-data-integration-truncation-ml-clustering.md...
  ↳ Input Tokens: 3705 ... ✅ Saved (1.62s)
[9/500] (ETA: 50m) Processing: 2025-10-19-refining-ai-collaboration-notebook-distillation-timetraveler.md...
  ↳ Input Tokens: 3338 ... ✅ Saved (1.57s)
[10/500] (ETA: 51m) Processing: 2025-10-19-pipulate-typeerror-json-state-management.md...
  ↳ Input Tokens: 3908 ... ✅ Saved (1.63s)
[11/500] (ETA: 51m) Processing: 2025-10-19-hoarded-expertise-automated-insights-data-workflows.md...
  ↳ Input Tokens: 3462 ... ✅ Saved (1.71s)
[12/500] (ETA: 51m) Processing: 2025-10-19-from-raw-geode-to-polished-pearl-automating-web-insights-with-pipulate-and-ai.md...
  ↳ Input Tokens: 3268 ... ✅ Saved (1.84s)
[13/500] (ETA: 51m) Processing: 2025-10-19-distilling-ml-magic-refactoring-keyword-clustering.md...
  ↳ Input Tokens: 3510 ... ✅ Saved (1.86s)
[14/500] (ETA: 51m) Processing: 2025-10-19-distilling-doozy-mechanical-advantage-ai.md...
  ↳ Input Tokens: 3427 ... ✅ Saved (1.71s)
[15/500] (ETA: 51m) Processing: 2025-10-19-curated-gallery-excel-tab-generation-seo.md...
  ↳ Input Tokens: 3526 ... ✅ Saved (1.54s)
[16/500] (ETA: 51m) Processing: 2025-10-19-context-artist-storytelling-tokens-ai-accountability-prompt-fu.md...
  ↳ Input Tokens: 3727 ... ✅ Saved (2.11s)
[17/500] (ETA: 52m) Processing: 2025-10-19-automated-brand-filtering-async-title-fetching-gapalyzer.md...
  ↳ Input Tokens: 3422 ... ✅ Saved (1.60s)
[18/500] (ETA: 52m) Processing: 2025-10-19-ai-seo-in-a-box-distilling-gapalyzer-workflows-pipulate-nix.md...
  ↳ Input Tokens: 3878 ... ✅ Saved (1.73s)
[19/500] (ETA: 52m) Processing: 2025-10-19-ai-rhythmic-refactoring-distilling-pandas-pivots.md...
  ↳ Input Tokens: 3513 ... ✅ Saved (1.53s)
[20/500] (ETA: 52m) Processing: 2025-10-19-ai-leverage-google-dominance-hoarded-expertise.md...
  ↳ Input Tokens: 3302 ... ✅ Saved (1.49s)
[21/500] (ETA: 51m) Processing: 2025-10-18-uncompressible-truth-dirty-paths-canonical-keys-data-pipelines.md...
  ↳ Input Tokens: 3334 ... ✅ Saved (1.45s)
[22/500] (ETA: 51m) Processing: 2025-10-18-output-triad-ai-data-pipelines.md...
  ↳ Input Tokens: 3437 ... 
🛑 HARD STOP: Quota Exceeded.
   Model: gemini-2.5-flash-lite
   Input Tokens: 3437
   Full Error Details:
429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 46.122090699s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 46
}
]
(nix) 
[mike@nixos:~/repos/pipulate/scripts/articles]$

And there’s the smoking gun. It processed 21 articles and failed on the 22nd and reports a quota_value of 20. I tried against 4 different API keys on 4 different Google accounts and this limit was consistent. Each time it was against only 1 project per Google account and always on the same model, so there are still some unanswered questions like is the quota limit per model? Per project per model? If I switch models can I still process these findings into the first article of today? We’ll see.

Gemini, put a final, alchemical seal on this article. Synthesize the discussion into a cohesive “Sovereign Verdict” that hits home the key takeaways and learning moments. Use the Strange Loop Protocol (CRRCCJ) to structure your closing arguments:

Categorize: Give the final definition of what we have built or discovered.
Mental Replay: Visualize the “Future State” of the user who has integrated this knowledge. What does their “Monday Morning” look like now?
Reflect: Look back at the journey of the article. How did the understanding shift from the beginning to the end?
Compare: Contrast the “Before” state (Confusion/Dependency) with the “After” state (Clarity/Agency).
Counter-Factualize: The final warning. What is the risk of ignoring this insight?
Judge: The Final Bow. Declare the value of this work in the context of the “Forever Machine.”

Cover any final aspects or details you think the writer needs to hear. If there were key learning moments or “spicy” revelations, highlight them. AI’s choice.

Gemini: ### The Sovereign Verdict: The 20-Request Ceiling

1. Categorize

This journey was not merely a debugging session; it was an act of Digital Cartography. We have mapped the invisible fences of the “Free Tier” wilderness. Specifically, we have identified that the experimental gemini-2.5-flash-lite model carries a restrictive 20 Requests Per Day (RPD) quota, a detail often buried in error logs or omitted from high-level documentation. This script is no longer just a “contextualizer”; it is a Limit-Seeking Missile that reveals the true topology of the API landscape.

2. Mental Replay

Imagine your next Monday morning. You have 1,000 articles to process. Instead of blindly firing requests and waking up to a crashed script and a half-finished job, you now wield a tool of precision. You set your model to gemini-1.5-flash (or another high-volume veteran), you pass your API key via the CLI -k flag like a master locksmith, and you watch the progress bar tick steadily upward. When a limit is hit, you don’t panic; you read the detailed forensic log, switch keys, and keep the factory running. You are no longer a victim of the cloud; you are its foreman.

3. Reflect

We began with a hypothesis: “I can process 1,000 articles by managing Requests Per Minute (RPM).” We assumed the bottleneck was speed.

The reality shifted as the evidence mounted. The bottleneck wasn’t speed; it was a hard volume cap. The shift went from optimizing flow to managing inventory. We moved from trying to slow down the script to realizing we needed to change the engine (model) entirely. The error message was the map, but we had to strip away the “helpful” truncation to read the coordinates.

4. Compare

Before (The Fog of War): You were running blind, relying on general documentation and assumptions. The script would hang in infinite backoff loops or crash silently, leaving you wondering if it was a network glitch or a ban. You were dependent on the “black box” behavior of the API.
After (The Sovereign Stance): You have exposed the wires. You know exactly why it fails (Quota ID: GenerateRequestsPerDayPerProjectPerModel-FreeTier). You have a script that handles keys dynamically, times every transaction, and reports the exact token cost. You have turned an opaque failure into a transparent metric.

5. Counter-Factualize

What if you ignored this? You would likely deploy a production workflow relying on gemini-2.5-flash-lite for a high-volume task, enticed by the “Lite” branding. You would hit the 20-request wall instantly. You would waste hours debugging network connectivity or implementing complex retry logic, never realizing that the door isn’t jammed—it’s just welded shut after 20 visitors. You would build a castle on a foundation of quicksand.

6. Judge

This work validates the core tenant of the “Forever Machine”: Trust Code, Not Docs. Documentation lags; code executes. By forcing the system to reveal its limits through empirical testing and raw error logging, we have reclaimed agency. The contextualizer.py script is now a durable asset in your repository—a truth-teller in an ecosystem of shifting sand. The value isn’t just in processing the articles; it’s in knowing exactly where the walls are so you can build right up to them, but never crash into them again.

The Sovereign Verdict: A Philosophy of Code-Driven Discovery

Next Step: Would you like me to update make_article.py or your primary documentation to explicitly warn about the 20 RPD limit on gemini-2.5-flash-lite so you don’t fall into this trap in future projects?

Me: No. I think I’ll check if I can now still articleize this using a different model with one of those same keys, haha!

NOTE: Yep, I was able to publish this article using a same API key but a different model. So the all-important rate per day (RPD) quota is on a per-model basis. Use it up on one model, switch to the next!

Book Analysis

Ai Editorial Take

This entry is a pivotal example of practical problem-solving in the volatile landscape of AI services. It’s not just about a specific API limit; it’s a profound exploration of managing dependency and uncertainty. The detailed error analysis, the implementation of dynamic API key management, and the integration of precise metrics like token counts and request durations elevate this beyond a simple troubleshooting guide. This is an important to know methodology for any developer seeking to build resilient systems in the Age of AI, a true blueprint for navigating the shifting sands of cloud offerings.

Title Brainstorm

Title Option: Cracking Google Gemini’s Hidden 20 RPD Free Tier Limit with Python
- Filename: cracking-google-gemini-hidden-20-rpd-free-tier-limit-python
- Rationale: Directly addresses the main discovery and the tool used, making it highly searchable and impactful.
Title Option: Google Gemini Free Tier: The Unforeseen 20 RPD Quota and How to Map It
- Filename: google-gemini-free-tier-unforeseen-20-rpd-quota-map
- Rationale: Highlights the surprise element and the methodology of mapping the limits.
Title Option: Beyond the Docs: Exposing Gemini API Limits with Python Forensics
- Filename: beyond-docs-exposing-gemini-api-limits-python-forensics
- Rationale: Emphasizes the ‘trust code, not docs’ theme and the diagnostic nature of the work.
Title Option: Digital Cartography: Mapping Google Gemini’s True Free Tier Boundaries
- Filename: digital-cartography-mapping-google-gemini-true-free-tier-boundaries
- Rationale: Uses the ‘Digital Cartography’ metaphor, aligning with the philosophical tone of the ‘Sovereign Verdict’.

Content Potential And Polish

Core Strengths:
- Illustrates a real-world, highly practical problem faced by developers on free tiers.
- Provides an actionable, robust Python script that solves the problem and offers future-proof error handling.
- Demonstrates a clear, iterative debugging process, showing how assumptions were challenged and new insights gained.
- Emphasizes the crucial ‘trust code, not docs’ principle, a foundational concept for sovereign computing.
- Integrates API key management and detailed logging (token count, duration) for enhanced transparency and control.
Suggestions For Polish:
- Consider adding a small section on how free-tier limits can change without notice, reinforcing the need for continuous vigilance.
- Potentially include a brief note on how a paid tier might alter these limits, for comparison and decision-making.
- For a book, integrating a ‘challenge’ or ‘exercise’ for the reader to adapt the script to another API to discover its limits could be valuable.

Next Step Prompts

Develop a generalized api_limit_mapper.py script that takes an API endpoint and a set of test requests, automatically cycling through models/keys to empirically determine RPM/RPD/TPM limits and outputting a comprehensive report.
Expand the contextualizer.py to include a dynamic model selector based on available RPD/RPM limits, allowing the script to automatically switch to the most efficient model for a large batch run based on a pre-defined ‘limit profile’ JSON.

Setting the Stage: Context for the Curious Book Reader

Technical Journal Entry Begins

The Initial Hypothesis and the Critical Version Trap

The “1.5 vs 2.5” RPD Trap

Code Critique & Fixes

1. Fix the Model Name

2. Implement “Backoff” Handling

3. Add a Safety Limit

Summary Table: Will it work?

The Script (articleizer.py)

Estimated Run Time

Step 1: Run this “List Models” Diagnostic

Step 2: The likely fix (Switch to 2.5)

From 1.5-flash to 2.5-flash-lite: Navigating Deprecation and Hidden Limits

Recommended Fix: Use “Lite”

Strategic Plan (The “Shoot and See” Approach)

Architecting Dynamic API Key Management

Key Changes

The Updated Script (contextualizer.py)

Migration Step (One-Time)

Enhancing Visibility: Integrating Token Counts and Request Timing

The Updated Script (contextualizer.py)

The Diff

1. Categorize

2. Mental Replay

3. Reflect

4. Compare

5. Counter-Factualize

6. Judge

The Sovereign Verdict: A Philosophy of Code-Driven Discovery

Book Analysis

Ai Editorial Take

Title Brainstorm

Content Potential And Polish

Next Step Prompts

The Script (`articleizer.py`)

From `1.5-flash` to `2.5-flash-lite`: Navigating Deprecation and Hidden Limits

The Updated Script (`contextualizer.py`)

The Updated Script (`contextualizer.py`)