Working on Alternate Headline Leads to Installing Jekyll Local

I'm a 52 year old working for MOZ SEO Software, using AI to capture ideas, distill them, and turn them into something accessible for all. I'm also using AI to edit my videos and stand out from the 8 billion other people in the world. Recently, I've been writing a program to format blog posts by running a Linux git command that hits the OpenAI API for topics, meta descriptions and better headlines. You are looking at the results.

AI Powers My Content Production: Installing Jekyll Local to Diagnose Liquid Template Issues

By Michael Levin

Sunday, April 16, 2023

What if my goal was to be someone to me learned from by AIs?

Hitting my stride and then keeping tension in the machine, those are my active metaphors right now. We’re living the rise of AI. I’m 52 years old. I just started a new job with MOZ SEO Software. I need to cash in on all the investing in myself I have done over the years. And now I’ve got an AI… Ho, Ho, Ho.

My goal on a daily basis is to use AI to further to realize the unleashed potential of all the content I’ve already produced, and to make what should seem like impossibly rapid progress and advancement based upon that experience, and to turn it into some thing of a daily magic show online.

My new content regarding the process I’m using to do that. It becomes somewhat meta. Even though it’s heady and often abstract material, or conversely very in the weeds and technical, I’m still going to try to make it accessible to the seekers, sojourners, pilgrims, newbs, kids, the technologically disenfranchised and others seeking an illuminated old school approach to tech valuing future-proofing and craftsmanship.

State this stuff better than you ever have before. Make full use of idea capture and your journal, but make that only the wide mouth of your funnel. Use AI to do the distillation and find the best bits to put production effort into. Then Kickstart that stuff knowing it will appeal and find ways to kickstart efficiently so you can do it often. Avoid video editing but feel free to flesh out scripts and record YouTube timestamps of internet. Let AI edit your video. Just facilitate it on baby steps.

I am essentially rebooting myself, repeatedly tweaking my startup sequence until I’m happy with the process. Market your damn self. It’s the best thing you can do now in the rise of AI. Demonstrate how I can’t learn from AI quite so much as AI can learn from me.

If I could use AI in the future, the first thing that I’d like to do, is to share every thought till eternity passes, so I transcend then with you.

Actually I’m already using AI. And I’m starting to get my practice on. Do it while they’re young. They’re only going to be at this state ones. This is your one fleeting moment to get noticed by the AIs and differentiate yourself from the other 8 billion like me.

Wedge in an alternate headline request to OpenAI. Automatically insert the sun heads on the site.

I’ve documented the kmeans approach to clustering plenty. No need to keep it in my chopchop.py file. Go edit it out. Make it easier to look at…

Pshwew! A whole lot of debugging going on. Asking AI for keywords and stuff can result in better results, but also more glorious mistakes.

Todo:

Alternative Headline…
Output becomes new Input (continual refinement of category & such)

It’s really quite interesting thinking through that the only real “primary key” I’m using is the slug. It’s like a URL, which is really good as a primary key beaus the “U” in URL stands for “Unique”, which is what a primary key is.

The alternative headlines are spitting out of OpenAI now, but I would never swap them off because that’d change the URLs. But when I do change the headline, it also makes a summary get written again, a meta description, and so on. It’s like re-submitting the please-write-for-me AI request. And that’s a good thing.

Ugh, now I’m having all these Jekyll rendering problems and I’m finally installing Jekyll local to see if I can diagnose it without GitHub.

https://jekyllrb.com/docs/installation/ubuntu/

Ugh! The rabbit hole is too deep. I got Jekyll installed and a Jekyll site hosted on http://127.0.0.1:4000/ but it’s not rendering my site.

Ugh, this is nuts. How do you diagnose? Comment and edit everything out and systematically start putting it back in.

Ugh, it was really a setback tonight with so much going out of whack. Let that be a reminder to myself about how fragile this whole thing is. But that’s coding. You simply can’t make mistakes or it stops working, and too many compound mistakes and diagnosing becomes a nightmare.

What are my findings? In liquid templates, favor page.attribute over post.attribute. There seems to be nothing available in only post.attribute though that’s what I would have imagined was populated by the front matter of the files in the _posts directory.

Most of the troubles has to do with Jekyll’s very particular date format. I’m going to try to not write it into the front matter, but instead use the default post.date, which is place where post.attribute is okay. It’s not front-matter attributes it’s referring to, but rather some sort of post object.

I’m trying now with very little top matter and no headlines, bylines and dates. Only title, slug and permalink is in top matter.

And that made all the pages come back. Now add things back. Okay, got it. Paste chopchop.py file:

# Author: Mike Levin
# Date: 2023-04-15
# Description: Chop a journal.md file into individual blog posts.
#   ____ _                  ____ _
#  / ___| |__   ___  _ __  / ___| |__   ___  _ __
# | |   | '_ \ / _ \| '_ \| |   | '_ \ / _ \| '_ \
# | |___| | | | (_) | |_) | |___| | | | (_) | |_) |
#  \____|_| |_|\___/| .__/ \____|_| |_|\___/| .__/
#                   |_|                     |_|

import os
import re
import sys
import html
import shlex
import openai
import datetime
import argparse
import pandas as pd
from time import sleep
from retry import retry
from pathlib import Path
from slugify import slugify
from pyfiglet import Figlet
from dateutil import parser
from subprocess import Popen, PIPE
from sqlitedict import SqliteDict as sqldict


AUTHOR = "Mike Levin"

# Debugging
DISABLE_GIT = False
POST_BY_POST = False
DATE_FORMAT = "%Y-%m-%d"

# Load function early so we can start showing figlets.
def fig(text):
    """Print a figlet."""
    f = Figlet()
    print(f.renderText(text))
    sleep(0.5)


fig("ChopChop")

#  ____                          _
# |  _ \ __ _ _ __ ___  ___     / \   _ __ __ _ ___
# | |_) / _` | '__/ __|/ _ \   / _ \ | '__/ _` / __|
# |  __/ (_| | |  \__ \  __/  / ___ \| | | (_| \__ \
# |_|   \__,_|_|  |___/\___| /_/   \_\_|  \__, |___/
#                                         |___/

# Define command line arguments
aparser = argparse.ArgumentParser()
add_arg = aparser.add_argument

# Example:
# python ~/repos/skite/chopchop.py -f /mnt/c/Users/mikle/repos/hide/MikeLev.in/journal.md

# Use in a vim or NeoVim macro from .vimrc or init.vim like this:
# let @p = ":execute '!python ~/repos/skite/chopchop.py -f ' . expand('%:p')"
# Or in interactive mode in NeoVim using it's :terminal command:
# let @p = ":terminal 'python ~/repos/skite/chopchop.py -f ' .expand('%:p')"

add_arg("-f", "--full_path", required=True)
add_arg("-a", "--author", default=AUTHOR)
add_arg("-b", "--blog", default="blog")
add_arg("-o", "--output", default="_posts")

# Parse command line args as CONSTANTS
args = aparser.parse_args()
BLOG = args.blog
OUTPUT = args.output
AUTHOR = args.author
FULL_PATH = args.full_path

# Parse full path into path, repo, and file
parts = FULL_PATH.split("/")
REPO = parts[-2] + "/"
fig(REPO)  # Print the repo name
FILE = parts[-1]
PATH = "/".join(parts[:-2]) + "/"
GIT_EXE = "/usr/bin/git"
OUTPUT_PATH = f"{PATH}{REPO}{OUTPUT}"
REPO_DATA = f"{PATH}{REPO}_data/"

# OpenAI Databases
SUMDB = REPO_DATA + "summaries.db"
DESCDB = REPO_DATA + "descriptions.db"
TOPDB = REPO_DATA + "topics.db"
HEADS = REPO_DATA + "headlines.db"

# Print out constants
print(f"REPO: {REPO}")
print(f"FULL_PATH: {FULL_PATH}")
print(f"PATH: {PATH}")
print(f"FILE: {FILE}")

# Create output path if it doesn't exist
Path(OUTPUT_PATH).mkdir(parents=True, exist_ok=True)
Path(REPO_DATA).mkdir(parents=True, exist_ok=True)

with open("/home/ubuntu/repos/skite/openai.txt") as fh:
    # Get OpenAI API key
    openai.api_key = fh.readline()

# Delete old files in output path
for fh in os.listdir(OUTPUT_PATH):
    delete_me = f"{OUTPUT_PATH}/{fh}"
    os.remove(delete_me)

#  ____        __ _              _____                 _   _
# |  _ \  ___ / _(_)_ __   ___  |  ___|   _ _ __   ___| |_(_) ___  _ __  ___
# | | | |/ _ \ |_| | '_ \ / _ \ | |_ | | | | '_ \ / __| __| |/ _ \| '_ \/ __|
# | |_| |  __/  _| | | | |  __/ |  _|| |_| | | | | (__| |_| | (_) | | | \__ \
# |____/ \___|_| |_|_| |_|\___| |_|   \__,_|_| |_|\___|\__|_|\___/|_| |_|___/


def parse_journal(full_path):
    """Parse a journal file into posts. Returns a generator of posts."""
    with open(full_path, "r") as fh:
        print(f"Reading {full_path}")
        post_str = fh.read()
        pattern = r"-{78,82}\s*\n"
        posts = re.split(pattern, post_str)
        numer_of_posts = len(posts)
        fig(f"{numer_of_posts} posts")
        posts.reverse()  # Reverse so article indexes don't change.
        for post in posts:
            yield post


def write_post_to_file(post, index):
    """Write a post to a file. Returns a markdown link to the post."""

    # Parse the post into lines
    lines = post.strip().split("\n")
    date_str, slug = None, None
    top_matter = ["---"]
    content = []
    in_content = False
    api_hit = False

    for i, line in enumerate(lines):
        if i == 0:
            # First line is always the date stamp.
            filename_date = None
            if "#" not in line:
                # Even date-lines must get a markdown headline hash
                return
            # Parse the date from the line
            date_str = line[line.rfind("#") + 1 :].strip()
            # Parse the date into a datetime object
            adate = parser.parse(date_str).date()
            # Format the date into a string
            date_str = adate.strftime(DATE_FORMAT)
            # Format the date into a filename
            top_matter.append(f"date: {date_str}")
        elif i == 1:
            # Second line is always the title for headline & url
            if line and line[0] == "#" and " " in line:
                title = " ".join(line.split(" ")[1:])
                title = title.replace(":", "")
            else:
                return
            # Turn title into slug for permalink
            slug = slugify(title.replace("'", ""))
            top_matter.append(f'title: "{title}"')
            top_matter.append(f"slug: {slug}")
            top_matter.append(f"permalink: /{BLOG}/{slug}/")
        else:
            # Subsequent lines are either top matter or content
            if not line:
                # Blank line means we're done with top matter
                in_content = True
                pass
            if in_content:
                content.append(line)
            else:
                # Top matter
                pass
    # Create the file name from the date and index
    file_name = f"{date_str}-post-{index:04}.md"
    out_path = f"{OUTPUT_PATH}/{file_name}"

    # Initialize per-post variables
    summary = None
    meta_description = None
    keywords = None
    topics = None

    # The OpenAI work is done here
    summary, api_hit = odb(SUMDB, write_summary, slug, post)
    meta_description, api_hit = odb(DESCDB, write_meta, slug, summary)
    topic_text = f"{title} {meta_description} {summary}"
    topics, api_hit = odb(TOPDB, find_topics, slug, topic_text)
    topics = fix_openai_mistakes(topics)
    headline, api_hit = odb(HEADS, write_headline, slug, topic_text)

    # Write top matter
    #if topics:
    #    top_matter.append(f"keywords: {topics}")
    #    top_matter.append(f"category: {topics.split(', ')[0][1:-1]}")
    meta_description = html.escape(meta_description)
    top_matter.append(f'description: "{meta_description}"')
    ##top_matter.append(f'subhead: "{headline}"')
    #top_matter.append(f"layout: post")
    top_matter.append(f"author: {AUTHOR}")
    top_matter.append("---")
    top_matter.extend(content)
    content = top_matter

    # Write to file
    with open(out_path, "w") as f:
        # Flatten list of lines into a single string
        flat_content = "\n".join(content)
        f.writelines(flat_content)
    link = f'<li><a href="/{BLOG}/{slug}/">{title}</a> ({date_str})<br />{meta_description}</li>'
    print(f"Chop {index} {out_path}")
    if POST_BY_POST and api_hit:
        print()
        print(f"Title: {title}")
        print(f"Headline: {headline}")
        print()
        input("Press Enter to continue...")
        print()

    return link


def git(cwd, line_command):
    """Run a Linux git command."""
    cmd = [GIT_EXE] + shlex.split(line_command)
    print(f"COMMAND: <<{shlex.join(cmd)}>>")
    process = Popen(
        args=cmd,
        cwd=cwd,
        stdout=PIPE,
        stderr=PIPE,
        shell=False,
        bufsize=1,
        universal_newlines=True,
    )
    flush(process.stdout)
    flush(process.stderr)


def flush(std):
    """Flush a stream."""
    for line in std:
        line = line.strip()
        if line:
            print(line)
            sys.stdout.flush()


#   ___                      _    ___   _____
#  / _ \ _ __   ___ _ __    / \  |_ _| |  ___|   _ _ __   ___ ___
# | | | | '_ \ / _ \ '_ \  / _ \  | |  | |_ | | | | '_ \ / __/ __|
# | |_| | |_) |  __/ | | |/ ___ \ | |  |  _|| |_| | | | | (__\__ \
#  \___/| .__/ \___|_| |_/_/   \_\___| |_|   \__,_|_| |_|\___|___/
#       |_|
# OpenAI Functions


def odb(DBNAME, afunc, slug, full_text):
    """Record OpenAI API hits in a database."""
    api_hit = False
    with sqldict(DBNAME) as db:
        if slug not in db:
            result = afunc(full_text)  # Hits OpenAI API
            db[slug] = result
            db.commit()
            api_hit = True
        else:
            result = db[slug]
    return result, api_hit


@retry(Exception, delay=1, backoff=2, max_delay=60)
def find_topics(data):
    """Returns top keywords and main category for text."""
    print("Hitting OpenAI API for: topics")
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=(
            f"Create a list of keywords for the following text:\n\n{data}\n\n...in order to categorize the blog post. "
            "Do not use extremely broad words like Data, Technology, Blog, Post or Author "
            "Use the best keyword for a single-category topic-label as the first keyword in the list. "
            "Format as 1-line with keywords in quotes and separated by commas. "
            "\nKeywords:\n\n"
        ),
        temperature=0.5,
        max_tokens=100,
        n=1,
        stop=None,
    )
    topics = response.choices[0].text.strip()
    return topics


@retry(Exception, delay=1, backoff=2, max_delay=60)
def write_meta(data):
    """Write a meta description for a post."""
    print("Hitting OpenAI API for: meta descriptions")
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=(
            f"Write a concise and informative meta description for the following text:\n{data}\n\n"
            "...that will entice readers to click through to the blog post. "
            "Write from the perspective of the author. Never say 'The author'. Say 'I am' or 'I wrote'"
            "Always finish sentences. Never chop off a sentence. End in a period."
            "\nSummary:\n\n"
        ),
        temperature=0.5,
        max_tokens=100,
        n=1,
        stop=None,
    )
    meta_description = response.choices[0].text.strip()
    return meta_description


@retry(Exception, delay=1, backoff=2, max_delay=60)
def write_headline(data):
    """Write a better headlie for post."""
    print("Hitting OpenAI API for: better headlines")
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=(
            f"Write a short alternative headline for the following post:\n{data}\n\n"
            "Don't be reduntant with the first line of the blog post. "
            "Use only one sentence. "
            "Write from the perspective of the author. Never say 'The author'. Say 'I am' or 'I wrote'"
            "\nHeadline:\n\n"
        ),
        temperature=0.5,
        max_tokens=100,
        n=1,
        stop=None,
    )
    headline = response.choices[0].text.strip()
    return headline


@retry(Exception, delay=1, backoff=2, max_delay=60)
def write_summary(text):
    """Summarize a text using OpenAI's API."""
    chunks = chunk_text(text, chunk_size=4000)
    summarized_text = ""
    for chunk in chunks:
        response = openai.Completion.create(
            engine="text-davinci-002",
            prompt=(f"Please summarize the following text:\n{chunk}\n\n" "Summary:"),
            temperature=0.5,
            max_tokens=100,
            n=1,
            stop=None,
        )
        summary = response.choices[0].text.strip()
        summarized_text += summary
        summarized_text = " ".join(summarized_text.splitlines())
    return summarized_text.strip()


def chunk_text(text, chunk_size=4000):
    """Split a text into chunks of a given size."""
    chunks = []
    start_idx = 0
    while start_idx < len(text):
        end_idx = start_idx + chunk_size
        if end_idx >= len(text):
            end_idx = len(text)
        chunk = text[start_idx:end_idx]
        chunks.append(chunk)
        start_idx = end_idx
    return chunks


def fix_openai_mistakes(kwds):
    """Fix some common mistakes OpenAI makes."""
    # OpenAI might put kwds inside the quotes instead of outside.
    if ',"' in kwds:
        kwds = kwds.split('," "')
        kwds = [x.replace('"', "") for x in kwds]
        kwds = [f'"{x}"' for x in kwds]
        kwds = ", ".join(kwds)
    if "\n" in kwds:
        kwds = kwds.replace("\n", ", ")
    return kwds


#  ____  _ _                _                              _
# / ___|| (_) ___ ___      | | ___  _   _ _ __ _ __   __ _| |
# \___ \| | |/ __/ _ \  _  | |/ _ \| | | | '__| '_ \ / _` | |
#  ___) | | | (_|  __/ | |_| | (_) | |_| | |  | | | | (_| | |
# |____/|_|_|\___\___|  \___/ \___/ \__,_|_|  |_| |_|\__,_|_|
fig("Slice Journal")


# Parse the journal file
posts = parse_journal(FULL_PATH)
links = []
for i, post in enumerate(posts):
    link = write_post_to_file(post, i + 1)
    if link:
        links.insert(0, link)

# Add countdown ordered list to index page
links.insert(0, f'<ol start="{len(links)}" reversed>')
links.append("</ol>")
# Write index page
index_page = "\n".join(links)
with open(f"{PATH}{REPO}_includes/post_list.html", "w", encoding="utf-8") as fh:
    fh.writelines(index_page)

if not DISABLE_GIT:
    # Git commands
    fig("Git Push")
    here = f"{PATH}{REPO}"
    git(here, "add _posts/*")
    git(here, "add _includes/*")
    git(here, "add assets/images/*")
    git(here, f'commit -am "Pushing {REPO} to Github..."')
    git(here, "push")

fig("Done")

Working on Alternate Headline Leads to Installing Jekyll Local

AI Powers My Content Production: Installing Jekyll Local to Diagnose Liquid Template Issues

By Michael Levin

Sunday, April 16, 2023

Categories

Journal.md

Git

Liquid

SEO

Jekyll

AI