Debugging an HTML5 window.postMessage Event

by Mike Levin SEO & Datamaster, 02/01/2012

As productive as yesterday ways, today is full of meetings. Hopefully, the second half of the day will be more productive than the first. I have about 45 minutes before my next meeting, and I want to make the best of it, getting myself on the right track so that I might hit the ground running as soon as opportunity provides.

Yesterday, I took care of a core fundamental piece of making my work public, which is getting rid of the need to manually reformat spreadsheets for readability after they are created through the Google Spreadsheets API, so I resorted to the Google Documents List API in order to start out with a previously existing and pre-formatted spreadsheet. I’ve been calling them templates, but they’re just another Google Spreadsheet sitting there with columns already named, cell word-wrapping turned off, and a little bit of text formatting.

I resisted this approach over the years for purity reasons, which I now see is greatly outweighed by the benefit of things just making more sense to the user—necessary for introducing to the public. This will always bug me, and I hereby request the Google Spreadsheets API team to add formatting capabilities to the API.

Today, the thing I need to enforce is “stop means stop”. In other words, there’s a task that’s running between an external server and the Google API’s, which you’re privileged to see the data populating in real-time in Google Spreadsheets as you look at it, but for all intents and purposes, you’re just a passive observer with no reliable back-channel of communication to actually make the task stop. So, if you ask for a 1000-page site crawl, you might have to wait for all 1000 pages to be collected before you can do anything else with that document—a hugely frustrating position to be in.

You might say: “Wait! You have a pop-up panel showing with a close-gadget. Surely clicking that close-gadget could reliably tell the task to stop.” But I’m using Apache2 here, and it is essentially one continuous page-load during the duration of the crawl. Clicking the close-gadget would be like saying: “Stop loading this page that I already asked for!” The most graceful way to do that is to make the page stop loading on its own—or, reach completion. So any communication back-channel we might have access to needs to make the already-occurring page-load decide to stop loading—or more specifically, exit out of whatever loop it might be in.

State, state, state… oh, how I hate state. My app is built to be nomadic, shifting and flowing from server to server, instance to instance. There should be no configuration files and no maintaining of session between page-loads… or for that matter, session within the same page-load, since a single page-load can take so incredibly long with this system. Consequently, I resist applications that involve local locking-files and such. But Python made this approach using Shelves so easy, I could hardly resist? And as such, I write name/value pairs into a database in the tmp directory to track who is locked and who is released, and have a very simple set of supporting functions: lock, unlock, islocked and isunlocked. The interface is ideal, but the reliability is somewhat less-than ideal—and, it has to be rock-solid for public consumption.

My first guard against this is going to be a limiting of the number of rows that can be processed in the spreadsheet by a fixed amount every time you click the bookmarklet. But that’s subject-matter for another blog post. This article is about the fail-safe mechanism and not the usage throttle. Soooooo…

The locking/unlocking method that I created used to work really well, but now it’s flaky. I reviewed the code, and nothing in the code has changed, but I am changing the servers all the time, shifting from rack servers to SheevaPlugs to qemu virtual machines to Rackspace cloud servers. Execution context is very dynamic, and there is the possibility that any reading and writing to local files is subject to permission fubar—even on /tmp, which should be able to written-to and read-from reliably as any user.

Before going and basing a solution on this assumption, I need to test my hypothesis. I am going to try to stop a simple 25-page site crawl and confirm that the “Stop” link actually isn’t working…

Ugh! An 800-row URL validation job is being started, and because of it for some reason, my site-crawl can’t even start. That doesn’t bode well for the public version, except for the fact that the load-balancer will mitigate it. In the meanwhile, I have an issue with doing my own development work without interfering with their work. Shall I solve this now? I have the means by just switching over to a virtual machine of SheevaPlug… no, not right now.

Okay, I was able to stop a crawl. My method probably is working, and it’s just on question-mark replacement where it’s not working. Test that. Yep, confirmed. It’s the canceling of question-mark replacement that’s fubar’d. Okay…

…okay. Okham’s friggin’ razor. I had properly implemented file locking and unlocking through Python shelves, and debugging demonstrated to me that I was just passing variables around incorrectly on JavaScript. I am relying on a feature of HTML5 whereby window event listeners can pass messages between page components—a happy little feature that HTML5 had to add in order to open up a little pinhole of communication that all the cross-site scripting (XXS) hacking had forced browsers to shut off.

Okay, so I fixed the bug, and now hitting the stop gadget stops my application, whether it’s crawling a site from scratch, or doing the question-mark replacement with already populated rows. By the way, anyone bothering to read this, it will make sense once the app is made public. This will probably just serve as fun historical notes and view into the sausage factory. I feel that most of my blog entries like this I’m going to just let fade into oblivion by only allowing them to be linked-in through the archive via date links. That’s my way of just using this blog for day-to-day notes like this, but not pollute up the (eventually) pristine and designed experience I wish to present here.

I know from experience now however that I can’t stop there satisfied. The technology of “stopping” my application is fragile. There is a pop-up panel in an iframe. That iframe contains a graphic with a JavaScript link set with variables passed across frames, which when clicked sends messages through HTML5-dependent window event listeners. It’s something of a miracle this whole system works at all—and I have in fact have had to make several adjustments as browsers evolved, and the thing hardly runs in MSIE at all now, due to XSS exploit lock-downs—including some ridiculous precautions.

Okay, don’t concern yourself with that at all now. Instead, build in a backdoor way of stopping the process that can be easily implemented, and easily communicated among users. See if you can’t slam this out in a half-hour!

There are two loops where this would apply: a crawl-loop and a question-mark replacement loop. And potentially listpop and dedupe behavior, but forget about those for now. How SIMPLE can this be implemented? How similar can it be to the existing lock/release system? Okay, I ran out of time for today. I got a nice chisel-strike done today, in what turned out to be debugging. I have another chisel-strike project, which will be fail-safing. I would have liked to have lumped them together and done them both today, but let that be a lesson to me. This is where time gets lost. I just have to evaluate when and where it’s worth it.