XML, CMS and SEO

by Mike Levin SEO & Datamaster, 07/11/2005

Tonight was a big night for me because I re-asserted my control over my website. My main homepage was so out of date partly because the new job I started in NYC was all-consuming, and partly because the my custom-built CMS system and data resided on the server of my prior employer. I pulled my personal website data down as XML files and I wrote the “absorption” program to plug the data back into the new version of the custom CMS. With a little field mapping, I can now probably pull the data from just about any XML source, making it easy to migrate from Blogger or TypePad to my CMS system, or visa versa.

Separation of data from presentation delivers on its promise in Web development. It doesn’t matter what CMS system you’re using, so long as you can pull the data out of it, transform it, and render it properly for display. Rendering rules vary from system to system, but each pretty much has a rendering engine that I expect I will be able to set on a per record basis. In other words, should this render with a transform from an XSLT style sheet, or as Wiki RexEx replacement, or as simple CSS stylization, or come combination of all?

Because everyone agrees on XML as the interoperable data format, and we have such powerful standardized transformation technologies in XSLT and regular expression matching, that the data can come from almost anywhere, be re-shaped in almost any way, and be plugged into almost anything. The particular choice of content management system matters less and less, so long as there are enough well thought out fields to work with when transforming into HTML elements. In other words, something needs to become the filename, something needs to become the title, and the like. Well thought out abstracts are better than abruptly chopped of paragraphs with an ellipsis for syndication.

Why did I want to grab back control of my system just now? Because it’s time to start tipping my hand regarding the systems I have developed, and I need a soapbox. I’ve created a hyper-competitive system for organic search engine optimization. I’m well in the ranks of the white hat optimizers. I just happen to be a white hatter who can transform any data from any source into perfectly constructed pages employing the search optimization formula du jour. Wait, you say. What about PageRank? The control you have over internal link structure as a result of these transforms plays directly into PageRank. You have a certain amount of your own authority to throw around between pages, and most people don’t do it, because constructing the proper internal link structures is very, very labor intensive (without the right tools, of course).