Mike Levin SEO

Future-proof your technology-skills with Linux, Python, vim & git... and me!

My Switch To FOSS (Debian, QEMU, Mercurial, vi & Python) For My 5th-Gen Framework

by Mike Levin SEO & Datamaster, 01/05/2011

Things are going fairly well with my work at the office. I’ve created my fifth-generation framework for doing “general stuff”. The first four times, which dates back to before Active Server Pages circa 1996, were some variation of auto-CRUD (create, update, delete) of the sort that we’ve come to know as Ruby on Rails. Today, we hardly even need these joyful frameworks due to Office-apps like spreadsheets being Web-based.

You rarely hear it mentioned, I think maybe because we’re all still so infatuated with Web application development frameworks, but the whole reason for doing nine out of ten of these apps, easy Web-based collaboration with security, have suddenly become obsolete because achieving the same goals is simply easier under Google Docs, with a superior Web-based UI than any of these frameworks provide.

We are now in a world where you can pretty much assume the average office-schmoe can do in a few seconds what it used to take an app developer a week or more to accomplish. Sure, you might object to relying on Google for this (today), but you rely on silicon chips and electricity already. Eliminating dependencies is an illusion. You always rely on something, and unless you’re going off the grid, its just a matter of choosing your dependencies. The trick is to use abstraction layers, so you can wire your stuff into different but equivalent service providers as they become available. Some day, a locally hosted API-compatible alternative to Google Docs will come along, and I can easily re-wire my work to that.

It’s with these changing-world realizations in mind that I set out on my fifth-generation system designed to re-capture the super-powers that I lost in the Web’s evolution. Or rather: what good is being an old-school superhero if everyone on the planet just got better powers than you? Every single average Joe was just bitten by a radioactive spider, and so now how can you be a superhero in a world of superheros?

And even if such supposed next generation powers could exist, how can we ensure their longevity and marketability? These are questions I struggled with. Is it even worth becoming an old-school programmer anymore, or should you just use whatever built-in scripting language a vendor provides you, like Google Apps Script built into Google Docs? And as voice recognition and gesture recognition approach perfect, keyboards themselves may be going away. But that’s a long way off still, I think. It’s really a matter of choosing the least of multiple evils, and planning for maybe the next twenty years or so, keeping the last twenty in mind.

Speaking of the last twenty years, an important criteria in my mind, is to not get “stung” again. I started on the Commodore Amiga Computer with the AREXX programming language for inter-process communication and automation. I then begrudgingly moved to Windows as the Amiga died, endlessly complaining about how much worse things just got. I mastered VBScript just in time for it to be deprecated for .NET. Life is too short to keep HAVING TO learn new programming languages due to the platform changing underneath of you, and vendor decisions. Joel Spolsky calls this “fire-and-motion” where the vendors keep you scrambling from one framework to the next for various nefarious reasons. The thing that this prevents from happening for you is the benefit of compounding returns. The giant “undo button” is symbolically pressed on you every few years. What if you had to keep learning new spoken languages because English kept going away? Yes indeed, a certain timelessness had become a key criteria in my platform selection for my fifth-generation framework.

For example, I was no longer willing to be beholden to a particular vendor’s OS or integrated developer environment (IDE). I wanted stuff that’s been around for 20 years and was likely to be around for another 20. That’s a pretty tall order, especially if you expect it to also be modern, or competitive with modern systems, or marketable in the job market. The trick here is that you need to plan it so that your increasing familiarity and spontaneous mastery and expertise of timeless tools quickly overcomes any advantages provided by trendy bells-and-whistles tools, like VisualStudio.NET or even Eclipse. The criteria ended up being easy, modern, well maintained & updated open systems on which important commercial ventures were based—although it certainly wouldn’t hurt if it was around for 20 years. Some, but not all, of my fifth-generation selections actually met that 20-year criteria.

After much soul-searching, painful deliberation, and cursing the technology for not being in a better state, I chose *Nix systems on any hardware as my core foundation, and the Debian distribution of Linux in particular, because it installs easily on such a wide variety of hardware including ARM and MIPS, plus subsequent software installations are so easy, and it’s the underpinnings of Ubuntu, which is the underpinnings of Google’s ChromeOS. Yep, I know Google is a vendor, and this sounds like being beholden to a vendor, but it’s a lesser-of-two-evils compromise, especially when you consider it is actually the basic underpinnings of ChromeOS that I chose, and not ChromeOS itself. There’s a lot of reason to be optimistic about Debian.

Next, I needed a hardware platform on which to run it. I can’t recount to you the number of times I lost years of work because my access to hardware had been shut off from switching jobs or hardware eventually being retired. Traditional backups are never good enough, because you need to reproduce the server environment to get your code running again off of backups, and unless you personally keep a copy of every incremental backup, you will never have the latest. The solution turned out to be a combination of operating system that runs on nearly all hardware (the aforementioned Debian), plus virtualization, and then finally a distributed revision control system. These three pieces work together to form a sort of magic lifeboat.

Virtualization is one of the principle technologies making the much touted “cloud” possible. As CPU manufacturers like Intel figure out how to cram more and more processors onto a single chip, there are very few ways today in mainstream software and the non-research world to take advantage of such multi-processing power other than partitioning it up into lots of smaller computers that look like full-fledged computers to their remote users, but are actually a partitioned resource from a larger computer. This trend of partitioning up single machines into lots of little machines is virtualization, and there are tons of variations on this and tons of ways to accomplish this. My need was portability of an encapsulated computer that I could move between Macs and Windows without rebooting or installing. I essentially wanted a development system on a key-chain, which I could reproduce on other virtualization systems and even any “real” hardware easily, whenever and wherever I wanted.

I ended up selecting QEMU, the only truly free and open source (FOSS) virtualization platform that I know of. It’s maybe the slowest one out there, but it has several advantages over the competition. Not only can it run on Macs and Windows without so much as an install or reboot, but it can also emulate other hardware platforms, such as ARM. Google chose it as the virtual testing environment for Android apps before installing them on a phone. Being FOSS means that I can freely distribute complete working virtual machines with little to no licensing issues. This is just an amazing thing. Think about a file you can keep on a USB key-chain, stick in an OS X box, pull up your development work, carry it to a Windows box, continue your work without so much as copying a file, and distributing it to friends so they can pull up the identical environment. I like this process so much, I made a website dedicated to helping others do the same, called ShankServer.org.

A virtualized Linux server was only half the formula to a platform being impervious to the giant undo button. The other half was a distributed revision control system. I cannot overstate how a real DRCS is not the same thing as an old-fashioned concurrent versioning system live the ubiquitous CVS. This is one of the places where my 20-year track-record rule could not apply, because they didn’t have this stuff back then. What DRCS does versus CVS is a complete edit-by-edit history of the entire project from every participating node—of which there are infinite potential ones, thanks to the aforementioned virtualization approach. Think of the entire edit history and working copies of your work as being impervious to even a nuclear bomb, because you’d have to wipe out every node everywhere to get rid of the fully-functional copy of your work with it’s entire accompanying history. It also provides a code deployment strategy.

There are really two competing, nearly identical, DRCS systems out there. The first was tempered in the fire that is the Linux kernel itself—git. And the choice of using git would have been a total no-brainer, were it not for the existence of a nearly identical, and marginally easier package called Mercurial. Mercurial had nearly identical design goals, and was developed at about the same time as git, along with having almost identical command syntax. For typical day-to-day use, learning one is the same as learning the other. But what swayed me towards Mercurial was the fact that the documentation was so much better, and it was written in a language that happened to be the one I ultimately chose for programming my gen-5 system, opening the door to allowing me to integrate revision-control features into my own software… but I get ahead of myself. I don’t really want to talk about the language selection until I at least mentioned the text editor.

That’s right. Part of my fifth-generation system platform includes the selection of a text editor as my programming environment instead of an IDE like VisualStudio.NET or Eclipse. I’m doing this for a few reasons, namely that two text editors that satisfy the 20-year rule also satisfy the capability requirements. IDE’s are one of the biggest vulnerabilities to fire-and-motion, where you become the slave of a particular vendor, helpless but to suffer regular setbacks, and move along with them from one version to the next. I pick on VisualStudio.NET, but with Oracle’s acquisition of SUN, we can see that even Eclipse has issues. A sufficiently powerful FOSS text editor is not a bad choice as an IDE.

There are two granddaddy text editors that perform well as programming environments for different reasons, and the choice between the two is like religion. The first is emacs, and as it’s name sort of implies, it’s all about macros. It’s customizable to the point of nearly reading your mind. I chose the other one, vi, for precisely this reason. I want to be able to sit down at nearly any machine, know the text editor is there, and be productive in seconds without having to set up a my familiar customized environment. vi has the additional advantage of being part of the POSIX Unix standard, meaning that at least a minimal copy of it is already installed on almost any machine you sit down at. There’s a pretty good chance you could log into the tiny embedded version of Unix in your WiFi router or mobile phone, and fire up vi. Plus, using vi is a lot like playing a video-game, getting used to fancy key combinations, and an in-game “language” that lets you make very sophisticated statements, like copy this block of text from here to there, without the use of a mouse, and getting better and faster over time. The more you use vi, the better you get, until you start looking like those super-geeks in the movies who just keep typing away doing awesome stuff without ever reaching for the mouse. I’m actually using vi Enhanced (vim), a compatible but updated version of vi that is quickly replacing it as the standard.

Okay, now for the last choice, and it’s the biggest—language. Selecting a language was one of the more difficult steps in choosing a timeless platform. Few still in-use languages have withstood the twenty-year test, except for FORTRAN and LISP, each with over a quarter century under their belts. And indeed LISP is the recipient of ongoing cases being made for it on the modern Web, by the likes of the prolific Yahoo-Store programmer and venture capitalist, Paul Graham, and the ongoing winners of robotics competitions who swear by it. LISP is what you would call a highly reflective language, or a meta-language. That is, it likes to talk about itself, because there is no difference between data and syntax. In other words, it is very easy to write code that writes code, making it the darling of the artificial intelligence crowd. As such, you can occasionally write things that, some believe, are simply unachievable in other languages.

While the case for LISP is strong, the main drawback is that it is just so different from other languages that you alienate yourself from the programming community at large, and everything that is actually mainstream looks foreign to you. Namely, in LISP, operators come before arguments. So to ad one and one, most languages express it as 1+1, but in LISP, it’s expressed as (+ 1 1). Now, there’s nothing inherently wrong with that, and it even has advantages in managing complexity, but it’s just not how the mainstream world operates, and could put the computer language learner at a disadvantage in the larger world (or job market) context. Even though you’re choosing your primary language, you want it to be broadly marketable and similar enough to other languages so that you can at least make some sense of them merely by looking at the code.

So what other languages are a consideration? And what factors do you look at? Well, for speed, the only real choice is C++ from decades of compiler and processor optimization. It is the language of hardware drivers and operating system creation. Things like PhotoShop are written in C++, and a big goal is the ability to pre-compile fast-executing binary files that can be ported to different, but similar, processor architectures. It’s why Linux can be compiled on such a large diversity of hardware and probably a major reason Apple felt compelled to switch the Mac over to Intel-x86 processors (easier software ports). But the cost of code execution speed is programmer responsibility—so much so, that you have to dedicate your professional career to being a C++ programmer, in order to stay sharp and write stable code. For similar reasons, I discounted Java as my language choice, but only after several attempts to adopt it. It turns out that part of my criteria is the ability to program casually on the side. I would be a crappy C++ or Java programmer.

On the other extreme of programming (choosing usability over performance) is the now nearly ubiquitous JavaScript. JavaScript is ubiquitous for one simple reason: it’s the language built into nearly every Web browser. Whenever your browser does something fancy, chances are JavaScript is being used. And the language has matured incredibly fast recently under all that pressure, facilitated by a performance war kicked off by Google Chrome. JavaScript meets the “casual programmer” criteria, and shares some of the “meta” characteristics with LISP. For example, the code for representing an “object” in JavaScript is the same as the command for creating the, making it very simple to transfer data between different systems—so much so that JSON (JavaScript object notation) is rapidly replacing other very popular forms of data interchange (XML). Unfortunately, there is no mature mainstream way to run JavaScript on servers for “normal” (non-browser-based) programming. It didn’t used to be that way. JavaScript was actually the language for Netscape Commerce Server and the secondary language for Active Server Page, but it never made the transition into the Web 2.0 world, except in disparate attempts to keep it alive in specialized server niches, with such endeavors as Helma, node.js and Flusspferd. But a default Apache module, it is not. There is sadly no such thing as mod_js. This unfortunately excluded JavaScript from being my language choice, because most of my work is server-based.

I had to look to other languages, of which there are tons out there. Some are relatively new, and for highly specialized purposes like optimization (Haskell) or concurrency (Erlang). I discounted these, because I wanted a popular general purpose language. This led me to the mainstream scripting-like languages, which are usually referred to as the “P” in the LAMP platform (Linux, Apache, MySQL and Python/PERL/PHP/Ruby). Ruby is an honorary P language, thanks to belonging philosophically to the set, and the breakout popularity of the joyful framework, Ruby on Rails, based on it. First, we ask why not the default “p”—PERL.

I had learned PERL in the earliest days of the Web, and would never touch it again with 100-foot pole. Like JavaScript on browsers, PERL is the pee-in-the-pool of servers because it is use for nearly every Linux management script that isn’t written as a Unix scripts. PERL has some excellent ways of running on the server, namely a mod_perl Apache default module, but the language is an eclectic hodgepodge, with about 1000 ways to solve any problem, and a community of people who relish competing to devise the most obscure and confusing ones. It’s like being Alice in Wonderland, with all the people you encounter basically being mean to you for fun. However, because PERL satisfied the 20-year rule, PERL could very easily have become my choice, but because I was going to wrap up a good deal of my personal identity in this language choice, PERL just wasn’t for me. It didn’t match my personality. I wanted clean, sharp, precise. I discounted PHP for similar reasons. It felt too much like VBScript to me. Been there, done that.

Clean, sharp, precise, you say? Well then Ruby’s for me! It’s almost 20 years old. It’s very meta, like LISP. And it has all the power of PERL, particularly the built-in Regular Expressions, and a super-cool way of iterating commands with a single line. It’s a well thought-out language, with a very cool philosophy that manifests as a sparse, easy-to-read syntax by which there tends to be one correct (and easily readable) way of doing each thing—which helps a lot with code maintenance. It forces you into object oriented programming (OO) in a good way. There’s also this really nice system for community module expansion, called Ruby Gems—and unlike the Alice in Wonderland feeling of PERL, you feel a bit more like Dorthy in OZ where everyone wants to be your friend. I really loved everything about Ruby, except for a few minor details that made me keep my mind open to alternatives. Namely, the lack of API “client libraries” written in Ruby, which are shortcuts for programmers to interact with external services. This caused me a great deal of pain in attempting to start my fifth-generation framework, which needed to do considerable interaction with the outside world through API’s—particularly Google. In fact, leveraging other services, right down to the Google Docs user interface, was really going to be the main characteristic of this next-generation framework. If I wanted these short-cuts for myself, I was going to have to deal with the other “P” langauge that was the elephant in the corner… Python.

What the heck was Python aside from that other “P” language I heard nothing about? Well, it turns out that in the early days, almost 25% of Google was written in Python, and to deal with the frustration that it’s not more C++ -like, they’re developing their own panacea language “Go” to solve the language conundrum. But that’s a story for another time. Google actually hired Python’s creator (who acknowledges it’s named for Monty Python), and it continues to open source, maintained and aggressively advanced by committee despite it’s creator being in Google’s employ. Espousing Python’s benefits here would take another whole article. But in short, it shares many of the benefits of Ruby, but has a variety of server support, built in list-handlers, and performance optimization options to make your head spin. Also, Python has a certain beauty and elegance reminiscent of Ruby, characterized by “perfect indentation” and the absence of curly-brackets or any other code-block delimiter (aside from indents). It’s hard to express the subtle psychological boost in confidence of knowing your code is indented perfectly. And finally, with the object-oriented mandate of Perl removed, I felt like my shackles were removed, and I became 10x more productive, not being forced through the OO learning curve. I could have continued working object oriented (Python gives you that choice). I just didn’t.

Well, there you have it. Unlike my previous four incarnations of my generalized system, in which I completely followed the Microsoft path-of-least resistance, from a primitive pre-VBScript Web technology called IDC/HTX up to the brink of .NET, I stopped at the edge to take stock before falling into that abyss and—giving away the rest of my soul to a vendor. I analyzed what were my most painful problems over the years, and they turned out to be an effect described by Joel Spolsky as “fire-and-motion” wherein everything changes around you, and you never get the benefit of long-term advancement on a single, stable platform. Fire-and-motion was aggravated by loss-of-work, resulting from the shifting conditions of life. This is a one/two puch in which even if you still had your now obsolete work, you couldn’t even run it.

The cure, which I am determined to have a 20-year track-record, and the promise of a 20-year future, is Debian Server that can run on ANY hardware, but often I have running on a QEMU virtual server that can be pulled up under Windows and OS X. On this Debian virtual server, I’m using the Mercurial distributed revision control system to ensure that all my work magically flows out to all the many other instances of my system that I have running (easy, due to permissively licensed virtualized software), making every node a full-history potential master repository. On these nodes, I’m using vi (or vim) as my text editor, in which I’m writing Python code easily interacts with API’s all over the planet, but particularly Google’s. The Python code runs fast and efficiently under various contexts, including Apache, Unix daemons, and scheduled cron tasks.

With such a platform, I am NOT re-inventing the wheel on things that could be done better under the increasingly sophisticated Web services, such as Google Docs, but rather I am using it as my agent to knit together these services.