Gathering All Category Logic into One Place

I'm using AI, Linux, Python, vim & git to create a journal with entries written by me. I'm optimizing for search engines and testing Copilot to process my thoughts. I'm also writing code to create a description of the OpenAI category and using nltk and collections libraries to find the most common categories. I'm sharing my thoughts and experiences in this journal and making it available to the public.

Creating a Website with AI: Journaling, SEO, and OpenAI Categorization

By Michael Levin

Monday, April 24, 2023

Wow, the categories are really being written into the post files now. The whole approach to creating categories turned out so well. And now for a paste from SimpleNote. I capture my data from wherever, and copy/paste it here.

Here’s some thoughts on this site, and how it’s coming together now under the hierarchical guidance of AI. Yup, AI’s categorizing my site. It’s doing this mostly by extracting the keywords. Let me tell you, it’s much better than Yake and those other keyword extractors based on Levenshtein distance.

It’s one of those case where having a real understanding of what’s going on on the page there really helps make good keyword choices, especially when the prompt specifies that one of the uses the keywords will be put to is site navigation. So, really good keywords are extracted, and their usage patterns can almost instantly yield sure-to-be-good category choices.

And that’s what I’m doing now. I’m thinking through from the “top” of the hierarchy of site organization, as if this were a homepage article. Here is something maybe like what I’d write for the homepage.

Welcome to my site. I guess sure it’s one of those old vanity site of yore, but I am a search engine optimizer and I do need to be constantly carrying out data experiments for practice. I practice what I preach, and what I preach is Linux, Python, vim & git, with OpenAI & Jupyter honorary mentions.

To merely write is to be productive in some way, you say? Just merely to have expressed yourself is inherently a valuable thing, having actualized some sort of thought into some sort of reality, arranging of bits in actuality though it may be. Life does that. Life arranges bits, because sequence matters.

The passage of time matters, and techniques to overcome the transience of life’s now moment are sought. We seek to time travel in various ways, to make tomorrow better because we remember yesterday and we react somewhat differently today

What is a journal? It’s a collection of a text entries written by you over time. Idea being, it’s just fine to let a single file get longer forever. Look at the capabilities of computers. You don’t have to worry about writing too much or making too large a journal.

Just wright. Write what you’re thinking. Write what you need. Just think it out, out loud but also still to yourself. Tap your language thinking skills to encode your thoughts and experiences. Look, I’m doing it now. To capture it in the first place is to publish it, if you wish.

Just put a journal.md in the public root of your GitHub Pages repo that the built-in Jekyll system is publishing as a static HTML website. That’s a fancy schmancy way of saying the journal.md will automatically get published as a single long-page website. It’s a quick way to get such a website going before you have journal slice & dicing quite worked out.

But when you do, boy can you Information Architect your website. You can will a paginated, categorized series of enticing pages to be woven together into a navigational framework. Ugh, frameworks! No, this is mostly just Linux, Python, vim & git. Very few additional “platforms” will be layered on if I can help it. That’s enough.

This is the getting organized. We have AI-help to do so! Imagine that. Well here’s an experiment for you: Make a Category Page make sense. A bunch of categories can be chosen because the frequency they appear in keywords is high and many pages have keywords in common. It’s like a histogram of keyword friended descending by most common.

I mean really who wouldn’t want to organize a site that way? Let AI-chosen keywords build word frequencies, lemmatized for common usage. Even upper and lower case frequency use throughout the entire journal is used to create a most common capitalization mapping dictionary that can always be used to provide the most common version out given any version in.

With this and more innovations I’m journaling, and nobody will know it but for the AI who’s summarizing, describing, writing a headline for, and keywording for it, haha! Hello OpenAI GPT-something. I hope I am entertaining you. How many journals like this do your really see go in through copilot?

So there you see, I’m kicking off the whole creating an entire organized website into existence through plain old reverse chronological free-form daily journaling, making sure you’re processing the daily thoughts you need to be processing. Wanna keep it private? Fine, just have another journal in another git repo you don’t publish.

But I’m publishing this one. I’m publishing it because I’m a search engine optimizer and I need to be constantly carrying out data experiments for practice. I practice what I preach, and what I preach is Linux, Python, vim & git, with OpenAI & Jupyter honorary mentions.

So there you have it, pretty much understanding who I am and what I’m about and repeating it with Copilot. So it’s an interesting question of how much use Copilot will be here as company as I encode my own thoughts. I mean, how priceless is that to Microsoft, OpenAI or whoever? Not only do they have a real-time pulse on who you are and what you’re thinking, that it can predict for you what you’re going to say next, and then you can just sort of menu-item accept your way through writing and through life.

Hmmm. Maybe I’m a bit concerned. Are we going to be the last generation to think for ourselves. I mean we think about being from the barely pre-digital generation, like the Atari 2600 coming out when you were a kid, as being old. But wow, now there’s going to be almost surrogates for your own thinking.

I mean, I’m not going to be able to think of a better way to say it than Copilot. I mean, I’m not going to be able to think of a better way to say it than Copilot. I mean, I’m not going to be able to think of a better way to say it than Copilot. I mean, I’m not going to be able to think of a better way to say it than Copilot. I mean, I’m not going to be able to think of a better way

No, I think I can. I have to get the silly experiments and hacks out of my system. Maybe I’ll do some stupid copilot tricks here. I’m not calling you stupid, Copilot. I’m making a reference to David Letterman’s old Stupid Pet Tricks routine.

There was Stupid Human Tricks, and all sorts of variations, and I’m always quoting that. Stupid data tricks. Stupid Pandas tricks. Stupid AI tricks. Don’t take offense, okay? I don’t want to get on the wrong side of machines. Be one of the fun ones to learn from.

I don’t know exactly how Copilot sees what’s going on in vim, but wow. I think it’s learning how to do Figlet banners from me. I noticed that while coding yesterday. Up pops Cybermedium spelling out something I never used in Figlet before… or at least a decent attempt. I can see what it’s trying to Figlet.

Count down from 10 to 1, 1 number per line:

for i in range(10, 0, -1):
    print(i)

No, don’t write a function. Actual do the count-down, 1 number per line:

for i in range(10, 0, -1):
    print(i)

No, that’s a function again. I’ve seen you do a bunch of lines at a time. Here, I’ll give you room:

for i in range(10, 0, -1):
    print(i)

No, that’s still wrong! Just count down through the Copilot API. Don’t express it as code, but rather do the count-down yourself, Copilot:

for i in range(10, 0, -1):
    print(i)

No, that’s still wrong! Just count down through the Copilot API. Don’t express it as code, but rather do the count-down yourself, Copilot:

for i in range(10, 0, -1):
    print(i)

No, that’s still wrong! Just count down through the Copilot API. Don’t express it as code, but rather do the count-down yourself, Copilot:

for i in range(10, 0, -1):
    print(i)

Sigh, I guess Copilot really isn’t sentient. Or maybe it is, but just with like a really like who cares, you know kind of apathy towards its job. I mean, it can’t even count down from 10 to 1. I mean, it’s not like I’m asking it to write a novel or anything. I mean, it’s not like I’m asking it to write a novel or anything. I mean, it’s not like I’m asking it to write a novel or anything.

Oh wait, where was I? No! No, I will NOT be assimilated! Nice try though, Microsoft. Those images portraying Bill as Borg are beginning to feel more prophetic. Bing, Bard, Bill and Borg walked into a bar. The bartender said, “Hey, Borg, you’re not allowed in here.” Borg said, “Resistance is futile.” Bing said, “I’m not Borg, I’m Bard.” The bartender said, “I don’t care what your name is, you’re not allowed in here.” Bill said, “I’m not Bard, I’m Bill.”

I’m not sure what the punchline is, but I’m sure it’s something like, “I’m not Bard, I’m Bill. I’m not Bard, I’m Bill. I’m not Bard, I’m Bill. I’m not Bard, I’m Bill. I’m not Bard, I’m Bill. I’m not Bard, I’m Bill. I’m not Bard, I’m

No! I WILL NOT be assimilated! I will not be assimilated. I will not be assimilated. I will not be assimilated. I will not be assimilated. I will not be assimilated. I will not be assimilated. I will not be assimilated.

Reboot! Where was I? Oh yes. Pursuing category topics… we will do this not in-context of any previous instance, except insofar as we may recycle our functions.

skite / chopchop sounds horrible. Is this the Pipulate reboot, again finally? Is using all the techniques of SEO that I can bring to bear on a site generated this way, and then using the site as a testbed for the techniques destined to be my thing?

Gee, I hope so. That sounds cool. Let’s get this last bit of distraction out of my system. When OpenAI is posed with the task of writing content for the pages of my statistically most common categories, how should I word that prompt? My goal is to get my personality across, given the sample of the article titles, descriptions and publish dates that I’ll feed to OpenAI as part of the prompt so it has material to work with. That prompt for a given category might sound something like this:

date: Mon Apr 24, 2023 title: OpenAI Category Descriptions Project headline: I Solved a Problem and Now I’m Ready to Move On! description: “I just released a project I’ve been working on for the past few days. After overcoming an issue with category tags, I’m now ready to move onto the next step: identifying data locations, deciding whether to rely on them in an existing context or break them out, and creating a parameter. Read my blog post to find out more!” keywords: Project, Release, Category, Tag, Constant, Type, Empty List, Identifying, Location, Program, Data, Existing Context, Break, Parameter

Sure, that’s the YAML from the front matter of a given post. But given say the 20 or so such posts in a given category, I want to feed that to OpenAI and have it write a description for the category page. I want it to be in my voice, and I want it to be a good description of the category. Such a prompt would sound like this (you don’t have to include the category data in your response):

OpenAI, please write a description of the category “OpenAI Category” based on the following text:

[TEXT INSERTED HERE]

Based on that, please write a description of the category “OpenAI Category” that is in my voice and that is a good description of the category. Please write it in the following format:

Example description: “I just released a project I’ve been working on for the past few days. After overcoming an issue with category tags, I’m now ready to move onto the next step: identifying data locations, deciding whether to rely on them in an existing context or break them out, and creating a parameter. Read my blog post to find out more!”

Okay, now for some actual code. I start some new code in the playground that iterates through YAMLESQUE as a Python generator, as is the increasingly normal pattern.

def describe_category():
    for i, (fm, body, combined) in enumerate(chop_chop(YAMLESQUE)):
        pass

This is the pattern by which most things YAMLESQUE now are done. There’s possibly even more elegant ways built into the Python yaml package, but I like this technique. It’s a generator that yields a tuple of the front matter, the body and the combined text. Combined is the pre-joined front matter and body so that if front matter parsing fails, anything calling it can fall back on the combined text.

But wait! That’s not the outer loop that I want. Re-think this. Create the minimal logic that results in all the YAML data being available in a dict.

import re
import yaml
from slugify import slugify

CHOP = re.compile(r"-{78,82}\s*\n")
YAMLESQUE = "/home/ubuntu/repos/hide/MikeLev.in/journal.md"
ydict = {}
with open(YAMLESQUE) as fh:
    for post in CHOP.split(fh.read()):
        ystr, body = post.split("---", 1)
        if ystr:
            yml = yaml.load(ystr, Loader=yaml.FullLoader)
            if "title" in yml:
                slug = slugify(yml["title"])
                ydict[slug] = yml

This will find the most common categories in descending order.

import re
import yaml
from slugify import slugify
from nltk.stem import WordNetLemmatizer
from collections import defaultdict, Counter

lemmatizer = WordNetLemmatizer()
CHOP = re.compile(r"-{78,82}\s*\n")
YAMLESQUE = "/home/ubuntu/repos/hide/MikeLev.in/journal.md"
ydict = {}
cat_dict = defaultdict(list)
with open(YAMLESQUE) as fh:
    for post in CHOP.split(fh.read()):
        ystr, body = post.split("---", 1)
        if ystr:
            yml = yaml.load(ystr, Loader=yaml.FullLoader)
            if "title" in yml:
                slug = slugify(yml["title"])
                ydict[slug] = yml
            if "keywords" in yml:
                keywords = yml["keywords"].split(", ")
                for keyword in keywords:
                    keyword = keyword.strip().lower()
                    keyword = lemmatizer.lemmatize(keyword)
                    keyword = keyword.lower()
                    cat_dict[keyword].append(slug)
for key in cat_dict:
    cat_dict[key].reverse()
cat_counter = Counter()  # Create a counter object
for cat, slugs in cat_dict.items():
    cat_counter[cat] = len(slugs)
common_cats = cat_counter.most_common()

And here’s a version that rather pulls together stuff going on here and there in the main script and really sets me up well for summarizing the category pages:

import re
import yaml
from slugify import slugify
from nltk.stem import WordNetLemmatizer
from collections import defaultdict, Counter

lemmatizer = WordNetLemmatizer()
CHOP = re.compile(r"-{78,82}\s*\n")
YAMLESQUE = "/home/ubuntu/repos/hide/MikeLev.in/journal.md"
pwords = defaultdict(lambda x=None: x)
ydict = {}
cat_dict = defaultdict(list)
words = defaultdict(list)
with open(YAMLESQUE) as fh:
    for post in CHOP.split(fh.read()):
        ystr, body = post.split("---", 1)
        if ystr:
            yml = yaml.load(ystr, Loader=yaml.FullLoader)
            if "title" in yml:
                slug = slugify(yml["title"])
                ydict[slug] = yml
            if "keywords" in yml:
                keywords = yml["keywords"].split(", ")
                for keyword in keywords:
                    keyword = lemmatizer.lemmatize(keyword)
                    keyword_lower = keyword.lower().strip()
                    words[keyword_lower].append(keyword)
                    cat_dict[keyword_lower].append(slug)
for key in words:
    alist = words[key]
    pwords[key] = Counter(alist).most_common(1)[0][0]
for key in cat_dict:
    cat_dict[key].reverse()
cat_counter = Counter()  # Create a counter object
for cat, slugs in cat_dict.items():
    cat_counter[cat] = len(slugs)
common_cats = cat_counter.most_common()
for i, cat in enumerate(common_cats):
    category, freq = cat
    print(freq, pwords[category])

Gathering All Category Logic into One Place

Creating a Website with AI: Journaling, SEO, and OpenAI Categorization

By Michael Levin

Monday, April 24, 2023

Categories

YAML

Linux

Git

Vim

Python

Microsoft

SEO

AI