Introducing A Next Gen SEO Project
I'm introducing a complex project that uses Linux, browser automation, and Microsoft's implementation of Linux to automate data tasks. I'm also transitioning to Linux, Python, vim, git, and Jupyter for web development, and I'll be using a Linux VM and a web browser to access a Linux CLI from my Windows system. This will allow me to automate tasks and build a Linux server from my Windows system.
Unveiling My Next-Gen Interoperable Project: Linux, Python, vim, git, and Jupyter
By Michael Levin
Wednesday, March 22, 2023
The concept of using Linux as an interoperable, portable code-layer, like a VM in a similar spirit as Java intended, but so much better for so many reasons has to be made clear.
There will be objections. Power through them and get to a state of suspended disbelief via story-telling quickly,
We’re going to start with a screen snapshot machine. It’s an interesting and rather complex first project, involving browser automation as it does. It uses the Microsoft implementation of Linux and sets expectations for the later Mac version of Pipulate.
That’s much digest, but it means our first SEO project using Pipulate tackles a unique automation problem that’s difficult solve in the cloud: browser control. Often when we automate work for SEO or other data collection purposes, we need to automate a browser.
Eventually, such automations can be on very inexpensive “headless” Linux cloud servers, or equivalent. They simulate running a displaying web browser, but don’t really ever display it. Yet, you can take screen snapshots, interact with Web Apps, record video or the DOM.
Simply put, a lot of modern data tasks, particularly ones where data is available on the Web but there is no formal API, are enabled by such browser automation. Sometimes you don’t need an actual browser there, but Net tutorials on that abound. This is the secret weapon stuff.
You want to control all the execution context details and you don’t in the cloud. It’s hard to run playwright just how you want. Can you instal the Moz toolbar in the browser you’re automating so the screenshots you generate can also contain the plug-in output?
With all the Bing vs. Bard debate on the rise, it is urgent we start to record these interactive ChatBot experiences. And you needn’t be beholden to proprietary tech not desktop recording software when Microsoft Playwright can video record as it would screen snapshot.
Using browser automation software on this way, you can do an objective side by side experience of the two browsers. Login required? No problem. Save the DOM? No problem.
I know, this is not necessarily a simple first project. But follow along. There are clear goals and ideas of what a final deliverable should be. Two videos side-by-side, a benchmark test of each. Invaluable!
But we’re going to be trying to do it all under a Linux version of the browser, so it can later be Linux server-based wherever, where the same graphics as what you developed on is no longer necessary.
The automation works. You can bottle it up and ship it. In fact, you can make these little browser automations work exactly like a command-line program, supporting parameters like dates and websites. Imagine the utility!
So there’s a lot sexy in free and open source SEO software, but these days because of the rise of ChatBots, these video screenshot server-like apps at home doing browser automation is hot. Follow along and see. We do it in baby steps.
Good morning Alice! Someone’s been reading you facts and it’s coming off as nonsense. The more you ponder this nonsense the more you suspect hidden facts. Opps, there goes a white rabbit! Maybe he has some facts. Such white rabbits occur a lot in tech.
It could be a wonderful adventure form which you return the better for it, often having been tested, and hopefully, grown. Or you could get lost forever in a vast underground never-ending labyrinth. Some rabbit holes are of the former type, some of the later.
Rabbit holes surrounding such tech as LISP or the “full web stack” are deep, for reasons we will explore later. My approach was a general one, without a particular “web” bias. Tools (language, included) carry bias.
If you’re the type who learns slow, but wants to learn well and have it last a long time, then get out of tech. Haha, no just kidding. Then choose just a hand-full of broadly applicable, yet timeless tools, and excel at using them.
This craft-oriented and fad-adverse approach to tech tools can be sliced and diced a lot of different, equally valid ways. It took my years of a sort of A/B tool-testing, intentionally and otherwise. It took me awhile, but now I have it.
The rabbit hole down which I chase you is Linux, Python, vim and git. There will be a few other technologies sprinkled on to ease the transition, mostly Jupyter Notebooks under JupyterLab in your familiar web browser.
This setup where JupyterLab runs in your native Windows or Mac desktop browser seems to allow you to run Python in the browser. It’s not really running in the browser, but rather is being served from a little webserver Jupyter puts on your system.
Our million-dollar trick is to make that Jupyter server actually run on Linux even though all the code is really still executing on your local laptop or whatever. This trick is called virtual machines (VMs). They’re now very common.
On Windows, Microsoft provides a Linux virtual machine officially and rather easily on both Windows 10 and 11. This means the most popular OS in the world had a portable interoperable universal canonical Linux VM.
Popular, free and powerful code execution environments such as standard Debian-derivative Linux Ubuntu 20.04 can be used as generic tech plumbing parts. Even without fancy tech like Docker, a simple install.bat script can knit together something awesome.
This is not for the feint of heart, Alice. This repo for this script in Github is named DrinkMe. And you are expected to take certain things on faith, because of the vast might that is Microsoft working to ensure that your experience along this path is a good one.
But still, if you want to enter a Linux Wonderland that eases you into running Python code through Jupyter but on Linux, follow me! First, using the Start menu under Windows 10 or 11, go to the Microsoft Store and install Ubuntu 20.04 LTS.
There may be a reboot to allow Hypervisor, the hardware-level support in the CPU processors for virtualization. Linux VMs run a lot more stable and with a lot more local resource access than they may otherwise have because the chips and OSes all understand.
If you already had WSL (WSL 1 dating back as far as 2017) on your machine, there’s some other stuff you have to do. But fresh Windows machines that never had WSL active before should do fine through the Microsoft store.
Once you’ve done that, you may have Linux running. Quit out however. Yes, it may still be running on the background. Don’t worry, we’ll get to that. But that’s like drifting asleep. Microsoft will keep your wsl.exe up to date from now own. Yay! Yikes? Both.
Either way, a rabbit comes running by. It says follow the instructions here to replace the default Linux the Microsoft just installed here with one we’re installing instead. Oh and I’ll use control of the install to do one or two more things, thank you very much.
Please examine the script. There is much to learn. But basically we install JupyterLab server and deal with the local virtual network post mapping issued to get localhost:8888 when requested by your browser served by Linux running on a WSL VM lalala bzzz.
Some buzzwords are important, so listen. You will not be beholden or reliant upon or obsoleted by a software or service vendor again. If this never happens to you, you don’t know how devastating it is to have the lovingly reminisced tech tools of your rights of passage suddenly gone.
While not gone, they do become lack lusteringly boring and unable to keep you in work after awhile. Passé. But some boring is best. Boring that never goes away, serves every purpose, and which you can get better at over time because fixed frame muscle memory is best.
Fixing your frame is hard. Most take upon “frameworks” for this style, but frameworks opinionated as they are often become a sign of the times. Fads. Not always, but often because of how feeds recurring profits.
Designed obsolescence is the effect to look out for particularly in how “frameworks” intertwine with platforms that rapidly change per proprietary consumer hardware, such as Mobile. Mobile development work is thus unappealing to me.
Most lines of reasoning here lead you to Web development one way or the other. Here too, but as a sub-priority, a distant sixth behind Linux, Python, vim, git and Jupyter—four great tools that chain great together. LPvg, the framework-less framework toolchain.
We skillfully sidestep Docker. You don’t need it. Maybe later, but for now, one laptop, one Linux VM. One very close to native Linux command-line interface (CLI) experience right there on Windows. I’ll cover the Mac equivalent as well.
If you want the knowledge and know how you develop in tech to basically last forever forward, this is what you do. Accept the fact that one of these things, vim, is just like driving and it’s going to take commitment and it’s going to take awhile. Put that aside.
After your DrinkMe blind leap of faith, you’re going to have to open a Linux CLI, aka Terminal or Shell. You’ll hear purists insist it’s a Shell. Either way, it should be available because modern Windows and because the Store install of Ubuntu. If not, get Microsoft Terminal from the Store.
Now go to your browser and visit localhost:8888. Through an annoying decision of Microsoft, Jupyter will only be served there in your browser so long as that Ubuntu 22.04 window is open through Microsoft Terminal.
It should run in the background when you close that command-line but because of 80/20-rule performance decisions regarding WSL Microsoft made, Linux system services (easy to get running into the background) are frozen when no Terminals are open.
Don’t worry about it now, but this open terminal window is where we’re going to keep your daily vim journal textfile (for daily journaling) open all the time later. But for now think of it as a placeholder for that.
This we have the stage set for a journey down the JupyterLab rabbit hole to a Linux Wonderland. The promise is that of automation. What goes on in Wonderland and Oz when Alice and Dorothy return? The answer in Dorothy’s case is automation by fiends.
That’s right, the smart Scarecrow takes over running Oz in the absence of the Wizard. The Tin Woodsman keeps the Scarecrow aligned while the Lion is clearly the enforcer. But we’ll get to all that AI stuff soon enough. For now know we just set the stage for Linux server-like automation.
The same server-build scripts initiated from Windows can build an identical Linux server anywhere with little more tech overhead than “Linux included” now already comes with Windows. It’s win/win using Linux as your interoperable code execution layer.