Future-proof your skills with Linux, Python, vim & git as I share with you the most timeless and love-worthy tools in tech through my two great projects that work great together.

OpenAI Vs. KMeans for Keyword Clustering

I recently used OpenAI to create a prompt to extract the top keywords and main category for my article. Using my old keywords.db and topics.db files, I re-generated my Meta Descriptions and Keywords based on OpenAI prompts. I discovered that intelligence is a product, and I'm excited to share the results of my experiment with you!

Discovering the Benefits of OpenAI for Keyword Clustering

By Michael Levin

Saturday, April 15, 2023

My adventures of late have shown me that I am very comfortable in vim, and now also NeoVim, but have a difficult time getting comfortable with VSCode. Another thing I learned recently is that my whole KMeans approach to keyword grouping produces terrible results, and I’m going to try with OpenAI now. I can’t treat the database nearly as disposable.

Get a prompt down for OpenAI to do this work.

Extract the top keywords that best represent the article’s content and main ideas. Additionally, please identify the main category to which the article belongs.

I have an old keywords.db and topics.db file that I can reuse. Preserving the code to demonstrate the kmeans approach is worth it… for awhile.

I really need to activate an interactive mode and put some good output to see how OpenAI is doing on the keyword process. I can run the Yake keyword extractor and KMeans grouping as much as I like at no cost, but OpenAI API has a cost.

Wow, a realization is settling in on me. The craft of writing that prompt is so much more important than it would seem at first. I want to re-write the meta descriptions even though it will cost a bit. They’re currently too dry. I won’t re-write the summaries because of how expensive it is to chunk large text.

Alright, I re-generated all my Meta Descriptions and Keywords based on OpenAI prompts rather than Yake + Kmeans.

Intelligence is a product. I’m still absorbing that fact. I never used virtual assistants or Mechanical Turk, though I’ve been aware of them. Now that such intelligence is penny’s per request and the quality level is predictable (even if not perfect), it’s time to hop on that bandwagon. Cheap labor as a service. IaaS is already used for Infrastructure. Maybe like E2 for Elastic Computing Cloud, Amazon’s IQ as a service could be I2 for Intelligence and Infrastructure. Or maybe IQ2.

Though frankly, Amazon they can keep their infrastructure as the most interesting projects are going to be on our own hardware. Incubating AIs quickly using already trained models as a starting point is already a thing. Trick is, both business and governments aren’t going to want you to be able to do those things.

Let’s work on that future. Moore’s Law will give us personal massive GPU clusters in the next 5 to 10 years of the sort Elon Musk just bought this March for his new AI startup. Max Tegmark’s call for a moratorium on GPT-5 makes a lot of sense knowing that. He needs time to catch up now with OpenAI.

But you are really wrangling a human-like thing when it comes to the mistakes it could be making.