Mike Levin SEO

Future-proof your technology-skills with Linux, Python, vim & git... and me!

I’m going to use Apache ReWrite Rules to handle URL rewriting in ezscrape tab

by Mike Levin SEO & Datamaster, 07/16/2012

Okay, it’s a bit frustrating to be off-plan—and now even off my off-plan plan. But it’s what I have to do to put food on the table. I am now in a race to re-demonstrate my value as a tool-maker that benefits everyone across-the-board. Now that Tiger is public, I can do this in a large, flashy way that gets some cross-verification factors happening. This is a Peter Drucker knowledge worker thing. This is a personal brand thing. This is a career management thing. You should be becoming most valuable now for what comes totally naturally to you, and which is of immense value in the industry in general. And Tiger should be a tool to help you remain completely immersed in all the right things. Finish ezscraping!

And I have a 5:00 to 5:30 meeting coming up now, so the terribly broken momentum of the day continues right on into the closing hours of the day. Ugh! Okay, let’s see what you can do ANYWAY! Think!

Okay, there’s one thing I want to nip in the bud right now. I want URL rewriting to be a common and easy thing to do in ezscrape, and I don’t want to allow it to be a hacking vector with Python, or to remain unexposed except only to internal users. It can’t contain Python code, no matter how easy that makes it. It should be pure regex. Consequently, I need a URL rewrite syntax that is easily understood and won’t waste TWO fields in the scrapelist object. There seems to be only one obvious choice: Apache mod_rewrite URL rewrite rules!

RewriteRule Pattern Substitution [Flags]

RewriteRule ^/user/(.*)$ /user/$1/videos [NC]

But in Tiger, this URL rewriting can’t even assume it’s using a URL as input (although in a great many cases it will be). But as with Twitter, it’s common enough that the main input will be a one-word username, that I can’t assume (and hence exclude) the protocol and domain. So, the URL rewrite I’m doing to go from a YouTube channel URL to a YouTube videos URL would be…

^/http://www.youtube.com/user/(.*)$ http://www.youtube.com/user/$1/videos