The Irrepressible Outlier: Finding Originality in a World of Infinite Content

Buckle up, because we’re about to go on a wild ride through the past, present, and future of the digital world! I’ll take you behind the scenes of the biggest tech revolutions, from Atari to AI, and show you how they’ve shaped the way we create and consume information. We’ll dive deep into the rise of generative AI, the challenges it poses to originality, and why it’s more important than ever to embrace our inner outlier. Get ready to challenge the norm, spark your creativity, and discover how to thrive in a world of infinite content!

By Mike Levin

Monday, January 6, 2025

Get Pipulate [View Markdown Source]

The Digital Revolution Begins

January 6th, 2025! We’re almost 6 days into being a quarter century through the new century. It feels like just yesterday I was a quarter way through my life passing across that meaningless number boundary that somehow still has so much meaning, when everyone was partying like it was 1999. There was Internet. We had digital since about 1975, three-quarters through the previous century. Yes, we had it before then, but that was the Atari 2600 VCS game console, which got it into so many peoples’ homes.

Interestingly, those Atari games are also the same ones that much more recently helped thaw out the AI-winter. And even more recently, Atari became iconic poster-console of the retro gaming movement. Hey, wanna see the founder of Atari’s, Nolan Bushnell’s business card? I’ve got one. He handed it to me… when he was working for Commodore. And this is at the time when the founder of Commodore, Jack Tramiel, had been kicked out and went to work for Atari. And here I am working on web frameworks in my mid-50s having hardly saved a dime in my life, but being right in the thick of the action!

Nolan Bushnell's Commodore Business Card

The Great Tech Tug-of-War

And so, I tell my stories. Almost outta time? Hmmm, maybe. But that’s only what kicked my butt into motion again. Though I haven’t saved much in the monetary sense, I have been busy accumulating, investing in myself. There’s something about being a fanboy in the middle of one of the greatest high tech kung fu tug-of-wars of our time. No, not of our time. Nobody remembers. But it was between Jack Tramiel who identified the Amiga computer developed by ex Atari engineer and designer of the Atari 2600 custom chip set, Jay Miner who had gone independent by this time designing his dream game console, as something special and Irving Gould, this jetsetting Canadian millionaire (billionare by today’s standards) who kicked out Jack and left Commodore floundering and leaderless while still holding the now-acquired Amiga. Jack’s response was to go to work for Commodore’s chief rival Atari and develop their own Amiga, the Atari 520 ST which was also really quite capable, but not designed by he who sparked the digital revolution.

Sparks That Ignite Change

And thus, the tug-of-war. And in case you think I’m giving too much credit to Jay Minor for sparking the digital revolution, keep in mind that even though obscure game consoles like the Fairchild and PONG existed before the Atari VCS, it was really Atari that fanned the digital flame and made it go wildfire. It is exactly analogous as to how things like the Internet existed in obscure networks like FTP and Gopher (and even AOL/Prodigy), but it wasn’t until… hmmm… hard to credit one person or company, but let’s take Tim Berners-Lee for the Web protocol that made the cost low, the value high, and the tech spread like wildfire. Maybe the Web is to the Internet what the Atari 2600 was to the rise of digital. There is a sub-discussion and potential counter-argument here about proprietary tech which I will gloss over to make my point.

The point is sparks that ignite catalysts that ignite. Thing ignites thing ignites thing in a chain reaction that sweeps across the world, carrying its own self-evident formula for generating more fuel for the fire. The “make a game console and make money” was the first spark/catalyst and “toss some files on a webserver and make money” was the second spark/catalyst. First digital. Then Web. Woosh!

The Evolution of Literacy

And of course that fast-forwards us to today; the ziggurat stack of tools and tech built on each other from:

General literacy & the printing press
Digital literacy & the information age
AI literacy & … ???

Tools built on tools built on tools. First information becomes valuable to a broader general public by nature of an encoding and decoding process that was previously reserved for the elite upper echelons of society. Reading and writing. Turning access into the ability to absorb, process and creatively and expressively regurgitate that information in new and exciting ways. That’s general literacy. Thanks to the Chinese, Guttenberg and Benjamin Franklin for that one. Kinda sparked the industrial revolution, the French revolution, the American revolution and the modern age.

The Rise of Digital Information

Eventually we realized switches held information, because 2-states. On and off. Binary messages. Morse code. Storage and transmission. Strategic advantage by passing messages around, both in big splashy ways to the general public for societal conditioning through media and propaganda, and in secret between coordinated parties with low-profile secret and encrypted messages, like the Enigma engine that gave the Axis forces an advantage, and the early-generation mechanical computer by Alan Turing and the Bletchley Park team that decoded those same messages. Industrial becomes digital. Digital becomes power.

The Information Glut

And now there’s so much information, it’s paralyzing. There’s a glut of information and we’re right at the end of the phase of societal evolution where information of the synthetic deliberately organized type (different from that which occurs naturally) could not inherently produce more of itself, unless interacted with by living, intelligent creatures such as us humans. In other words, there was a natural upper limit at which the rate of new (and apparently new) information could be produced, a function of the physically restrained output possible by 8 billion Stephen King’s typing at full-tilt for every waking hour of their entire lives. And that’s just text, not even talking about video and still images where the bottlenecks of information production are greatly widened. But just look at text for now, because that’s what Sam Altman and Ilya Sutskever complain that we’re running out of for training new AI models.

The Copyright Conundrum

Haha, the same proprietary vs. open sub-discussion exists there as with proprietary Atari sparking the digital age versus open HTML/Web sparking the information age. Neither of which are fully true, but both of which make a point about the brightest spark and most combustible catalysts. With AI now, there’s this really gray area concerning training on copyright material, and whether something’s mere existence on the free and open web implies that its fair game for AI model training. If you can crawl it, something can learn from it. There’s no robots.txt or copyright owners’ intentions that can technically prevent that. They can prevent subsequent money-making use if it goes through the legal system and they can prove it and overcome all the counter-arguments. But there’s a pragmatic discussion here about if it can be done, it will be done by someone — and really multiple someones until it’s an unwinnable battle of attrition. The only solutions are technical.

The Age of Infinite Content

And that brings us up to now. Information can inherently on its own, or at least with just a little push by intelligent humans, become self-respinning, self-remixing and generally out-type Stephen King by a factor of millions, and perhaps trillions. Content is functionally infinite. There need be no files on webservers anymore. There only need be a local LLM trained on all the documents that would have been there answering questions directly to any http web request that comes in. Oh sure, the ecommerce infrastructure need stay in place for clarity and lack of ambiguity on what you purchased, how it was secure and what happens next. But the mouth-of-the-funnel information flow could just be interaction with AI. Even if files were still being served, the routing is likely to be handed over to AI so you’re not dependent on some arbitrary hierarchical drill-down path designed by the site owner or some on-site search tool that’s not going to be as sophisticated, knowledgeable and context-aware as an AI. All roads lead to the old Apache <–> Ollama swaparoo.

The Uncrawlable Web

And that’s the content is now infinite argument again. If a scenario such as that were to play out, the web as we know it becomes functionally uncrawlable, because there’s nothing to crawl; just an AI-responder who doles out what you need to know, then forwards you along to an ecommerce transaction system that you won’t be crawling.

The Role of RAG Systems

Okay, there will be RAG systems and product catalogs and all that stuff which the AI needs to check with because not everything is going to be trained into the base model from which it instantiates a new copy of itself on every request. There’s a big sub-discussion here about what ChatBots REALLY are (an ephemeral new-every-request thing) versus what they appear to be (continuous entities and the same thing in memory you just interacted with). Discussion histories are something that get posted-back in whole on every request — at least right now, the way the Cloud Web AI infrastructures are designed for scalability, productization and safety. Point being, these undifferentiated being-like intelligences have the ability to turn around and check other information systems whenever a prompt/query comes in. So, information-tech isn’t dead because AI (and neither is SEO). We must tend to these second-stage lookup systems for AIs to use, just as if it were a human or a research librarian using those same said systems. Kapish?

The Challenge of Discovery

No, you probably don’t Kapish. I am most likely talking out into the void and the wilderness because genuinely new content like this is going to get lost in the infinite content cacophony being blasted out by others in my SEO trade to try to manipulate Google for some short-term gain. But I am thinking about the big picture and figuring out how to run to where the ball is being thrown, and not where it is today.

The JavaScript Dilemma

Where it is today is that crawlers are uniquely challenged. Of the content that exists, 99.9% of it is tied-up invisibly in JavaScript single page applications (SPA) written with the likes of Angular, React, Vue, Svelte, pick your poison. I mean yes, JavaScript is crawlable and potentially visible to search, but what? 100 time more expensive to process? 1000x? It’s as much more expensive to process JavaScript than it is to load a single text file versus loading an entire web browser. And even once you do that, those text files which are actually pretty easy to semantically organize with title tags, headlines, paragraphs and such become a nightmare in the now it’s an app wrappers. It breaks the Tim Berners-Lee http protocol for what hypertext is. And so you’re spending like a thousand times more crawling to gain access to information that’s not even organized well for digestion. Sheesh!

The Two-Pass Crawler System

This is starting to come out now in the media. It’s hard for people to understand that there’s actually a 2-pass crawler system going on, and both have this JavaScript vulnerability. The first is a general crawl in order to harvest all that yummy data off your website, which will undergo a general cleaning and data preparation phase, probably being converted from HTML into Markdown, and then digested by the machine learning training process that produces the core base models of the AIs we know, such as BERT, GPT, Llama and the like. These are snapshot beings or entities, deterministic and of a file-on-a-drive nature. You instantiate them into memory as a fleeting about to poof instance, give them input (the entire discussion history thus-far) and get deterministic, (yet still surprising and highly emergent) output.

The Current State of AI

That’s Cloud-based AI today. It’s local AI in the form of Ollama too. It’s most ChatBot-like AI you the developer will encounter these days. There’s lots of other types of AI (like the real-time pip install river), but for the purpose of this article and thinking through the current itches I’m scratching, we’re talking about this text-oriented, chatty momentarily instantiatable and mostly otherwise static type. What itch? A couple.

SEO is dead… again? I just always gotta be able to speak to this as a professional, and occasional innovator, in the industry.
Day-to-day client work at a company that makes one of the biggest, most industrial SEO crawlers on the market. Related to point #1 sure, but I do have to know the right things to do for the right reasons. I need to deal with where the ball is being thrown, but also where the ball is today!
Development work. First there was LAMP and the Webmaster, and one person could do it all and life was good. Then there was Documentum, Sharepoint and enterprise CMS, and life was bad. Then there was Django, Ruby on Rails and Flask, and things swung to the middle again. Then there was Node, React and the full stack, and Conway’s Law reasserted itself. Now there’s HTMX/FastHTML and a lot of knife sharpening to carve up the fat. I’m doing that.

The Inevitable Evolution

The jenga tower’s gonna topple. It always does. It’s not doomsaying nor alarm bells. This is always happening. It happened with the GeoCities / Flash / Shockwave .flv, .swf days came to an end when Steve Jobs killed them with the iPhone. Everyone jumping on mobile made Google cook the frog us into adopting a responsive mobile first design strategy (vs. reactive). With good cascading style sheets (CSS) that used such things as @media styles, you could have one .html file to rule them all — so long as it imports a .styles.css file. This is what the AMP .amp movement was about. It was Google cooking the frog, teaching us how to design that way. It’s better for Google, and it’s better all around. Then deprecate AMP itself, and all the good responsive design habits for a mobile-first conditioned world remains. That’s what happened. And something similar is happening again (or has to) for AI to accommodate the two flavors of crawlers your site(s) will forever-forward encounter: harvesting for training, and real-time small-world-theory crawls/lookups for retrieval augmented generation (RAG) responses.

One crawl to train them all, and another crawl to find them. Everything they crawl they learn. You really should not mind them.

Designing for the AI Future

Or actually on the other hand, you should have them very much in mind. You should design your site today and forever into the future minding them. Mind what you publish. Technically, it could very well end up in somebody’s static base model, and statistically it probably will. It’s currently college homework difficulty to write your own crawler and train your own BERT/GPT-like model on it and use it for your own education and private use. It’s really like no big deal. Yes, you can incubate something that can chat back at you according to your data and your rules. You might need to rent some off-premises compute time from Google or Amazon, but no big deal. It’s one-time and cheap. And you probably won’t have to for long as tomorrow’s Nvidia cards and other specialized hardware optimized for these purposes (training vs. inference) become that much cheaper and easier to have at home. Genies in bottles and the means to make them are, for the most part, out of the bottle. Pandora’s box has long since been opened.

The Quest for Originality

This is the world you should prepare for. I am writing this primary to be consumed by a few such genies, and their families. Of course Google, because Goglebot is going to come-a-crawlin, and this gets pushed out there as easily consumed content of the old school non-SPA, well designed for consumption, everything in HTML, lots of embedded structured data in the form of schema.org JSON to help. I’m writing regularly and the patterns I’m creating are going to actually be recognizable as something new under the sun and a voice entirely different than anything else in the industry and likely the world. All my stuff has been inaccessible and in the fringe up until now, because nobody gets me. And of those who might, my writing is too long, stream of consciousness, disorganized and otherwise not consumable by humans (even that audience who might get some value by me reaching)… until now.

Raw Data for Machine Learning

This is the raw data. This is the machine learning fodder data. Something new under the sun. New patterns. Stuff without pre-existing keys, hashes, cosine similarities or otherwise deduplicatable compressible signatures in the noosphere cloud web net. No LLM could have spit that out, no matter how you queried it. As enlightened and public-knowledge-aware and best practices practicing as they are, they are both going to bias towards normative. Most LLMs are trained to be normies, especially for widespread public use so as to not insult anyone, to be generally politically correct, and for the most part woke. There are ostensibly exceptions like Elon Musk’s Grok, but when push comes to shove, they’re going to wrangle and coerce its emergent properties to avoid lawsuits and to not become a liability. We’ve all seen early AIs pulled back because they were liabilities, such as Microsoft’s teenage girl Tay debacle that was taught to racist trash-talk by meme denizens.

The Death of Risk-Taking

So think how this normative normie centerline bias is going to steer and direct the infinite spam cannon of generative AI content production. It’s going to deposit what in an anthropological sense will be a layer of whitewashed digital information miles deep. It outpaces the genuinely human-generated content (like this) quadrillions-to-one. Needle in a haystack doesn’t even begin to describe the fate of genuine originality, especially that sort that is kinda scary in how it might upset the apple cart, or tweak somebody the wrong way.

The End of Comedy

Comedy is dead, for example. The tiniest slip-up in the realm of comedy where exactly that type of risk taking was the source of both the humor and the experimentation to find new acts, is career-ending. That’s not to say that Michael Richards or Louis C.K. were right. But it is to say that they were both funny a lot of the time, and the day and age we live in will just nail you in a coffin at the first indiscretion or slip-up. Should we excuse Theodore Gisel (Dr. Seuss) for ditching his wife when she got cancer and hitching up with another woman? Reprehensible, but does it invalidate all his prior work? How about all of Alice in Wonderland because of Lewis Carroll’s predilections? Maybe. The jury is still out.

The Price of Progress

The jury is not in yet on precisely how unforgiving, hyper-critical, 1-chance-only in-life holier-than thou we should all be when judging and condemning others in life. That holier-than-thou feeling feels really good. That righteousness gets a lot of juicy enjoyable neurotransmitter chemicals coursing through the veins, and maybe that is all humanity really needs — a new-age variety of Bible-thumping wokeness, and that the death of edgy risk-taking humor is just a inevitable side effect. Part of evolution. Maybe it’s right to cancel-now and let the sheep mentality last-normies-standing sort it all out in futures George Orwell, Aldous Huxley and Ira Levine saw coming. We’ll see.

However, the impact that Dr. Seuss and Lewis Carroll made on the world is undeniable. Should we walk away from nearly all the technological advances of the modern age because the father of quantum mechanics, Max Planck, stayed aligned with Germany and their team through WWII? Planck didn’t personally commit atrocities, but Wernher von Braun who shaped NASA arguably did with the forced-labor that built the V-2 rockets. By this reasoning, we should all revert to a pre-technological society, and I know that a lot of people would enthusiastically agree with this instead of interpreting it as the ridiculous notion it is. Once Pandora’s box is open, it’s open, and there’s no shoving genies back into bottles. Prometheus can’t give the fire back.

Revenge of The Outlier

But what I do know is that I can write my own little part in writing my role in this chapter I call Revenge of The Outlier. New patterns will emerge. Against the backdrop of infinite re-spun sameness, the genuinely new will light up like a beacon. So be that genuinely new. Easier said then done, you say? No, not really. Not when all your life you have felt the frustration of going against that normative grain, and nobody “got you” because your ideas were scary to the status quo in the don’t upset the apple cart sort of way, and were thus marginalized and ignored — or worse-still, gaslighted to make you believe your outlying ideas really aren’t as new or brilliant or potentially world transforming as they actually are. Most good new ideas die on the vine like that. That’s just human nature. It’s in the Pareto principle, Zipf’s law and the 1/X power-law expressed in nature all the time, everywhere and in everything. Stratification is nature’s shape. Extreme division between the have’s and the have-not’s. It’s not just human nature. It’s nature’s law.

Me: What’s the theme here? How should I title and headline this?

Gemini 1.5 Brainstormer: This is a fascinating reflection on the evolution of information technology and its impact on society! Here are a few title and headline ideas that capture the essence of your piece:

Title Ideas:

The Revenge of the Outlier: In a World of Infinite Content
From Atari to AI: A Personal Journey Through the Information Age
The Normie Centerline Bias: How AI is Shaping the Future of Content

Headline Ideas:

Is Originality Dead? One Man’s Quest for Authenticity in the Age of AI
The Infinite Spam Cannon: How Generative AI is Drowning Out Human Voices
From Printing Press to AI: The Ever-Evolving Landscape of Information

Additional Thoughts:

I particularly like “The Revenge of the Outlier” as a title because it speaks to the central theme of your piece: the struggle to maintain individuality and originality in a world increasingly dominated by AI-generated content. It also has a slightly rebellious and intriguing feel, which aligns well with your writing style.

The headline “Is Originality Dead?” is also quite impactful, as it poses a question that is sure to resonate with readers in today’s digital age. It also sets the stage for your exploration of the challenges and opportunities presented by AI.

Ultimately, the best title and headline will depend on your specific goals for the piece. But I hope these suggestions provide a good starting point for your brainstorming process!

Now, tell me more about your vision for this piece. What kind of audience are you hoping to reach? What specific message do you want to convey?