Mike Levin SEO

Future-proof your technology-skills with Linux, Python, vim & git... and me!

Every Program Eventually Has To Learn How to Email

by Mike Levin SEO & Datamaster, 05/24/2012

Wow, it’s 4:30 PM and I’m only just starting to do my daily journal entry. Today was dominated by meetings, but I also burned some time trying to advance my mission with Tiger2 and getting over the first hurdles. I’m making some interesting discoveries about the price of those qemu binaries for Mac, Linux and PCs being so small. They have a ton of library dependencies. I need to figure out which libraries are common to all systems and leave those out, but then take the more unsual libraries, and include those with my Levinux distribution so that it truly works like its self-contained, and no install is necessary. But I lost some valuable time, and I really need to deliver on Tameka’s item, which is required for a client deliverable.

This Tameka thing is going to take a whole day dedicated to it. But see if you can’t think through a logical and reasonable way to output CSV files from Tiger without hard-wiring it to this function. Make it generic to the entire system. Coming out as CSV would merely be a side-effect of… of what? Maybe this can be handled like the Influencer tab in Tweet capture, or maybe like listpopping. So, you’ve got a worksheet with lots of separate Google Analytics profiles, and you want to get potentially up to 10,000 keywords PER ROW, which in itself would clobber memory on whatever server it ended up on. But the solution is to email to the user whatever the output of that cell was… or perhaps the entire row. Each row becomes an email.

THAT’S where the difficulty resides. Data that’s just too large to shove into a cell in a spreadsheet, and I can’t keep it saved on the local file system, because these servers are just load-balanced webheads. There are different ways to tackle this. Different generalizations. You have to be very clear about what it’s doing. There’s also the possibility that I don’t do this in a generalized fashion, and that it’s just hardwired into the function, and just sets a pattern that I could reuse in those occasions such strangeness is necessary. So, basically, this is just Python programming. You almost have to forget about Tiger being there, except for providing part of the loop structure.

Any function can create at temporary location where no other user function can collide. So, basically when Tameka runs this, a folder is made from her email address, and a sub-folder with the function name. It’s created when the function is first encountered, and destroyed when Tiger stops. Whenever Tiger is restarted, any existing folder in that space that may have been left there gets destroyed. Fresh slate each time. That alone is going to be a bit of a programming challenge—merely WHERE these files are going to go. At least I have a little bit of file writing going on for locks.

At some point after question-mark replacement has occurred, there has to be an emailing phase, where something steps through all the function-named folders for that user, and emails them the zipped-up contents of each of those folders, then deletes the folders. That would be the nice generic behavior, but Tameka actually COMBINES all the files into one for each column. She does that manually right now, but it should be able to be done programatically at this point much more easily than manually later. What I’ve described is some pretty solid generic behavior that any function could take advantage of knowing that the email phase is sitting there lurking. But concatenating all the files… Hmmmmm.

Okay, how to append together… oh, okay. It’s just up to each function. Any function can request storage, and the system will generically generate a collision-proof location for storage. The function can then proceed to do whatever the heck it wants to in that location. Some functions will drop a file-per-row, but others will always append to a file of the same name. It can always check for the existence of a file of a certain name, and if it doesn’t fine it, create it, and then proceed to append to it. In that way, the last and middle rows are all doing the same things as the first row.

Okay, we have a plan. This won’t feel like hacking or anything. This is a clean, solid approach. I will be able to reuse the capability whenever it comes up. The discussion goes: Oh, that will generate too much data to shove into a spreadsheet… No problem, I’ll just email it to you. But won’t that be too many emails? Won’t it trip off a spam filter? No, because I can only generate one email per column (that needs it) per Tiger-click. And typically only one column will be capable of sending the email. And it will only ever send an email only if a question-mark gets replaced.

The question-mark gets replaced with the word “cached” which when the email phase occurs, gets replaced with the word “zipped”… maybe with the file-size reported, and the last cell in the row getting replaced with the word “sent”. There will have to be some sort of stateful memory of where it left off to know what cell to stick the word “sent” into, but that should be no problem, especially if it’s in the same function.

So break it down. You’re adding something exactly like the lock and unlock function that any other function can use as a resource. And just like lock and unlock, it’s creating file system objects using the user’s name to ensure uniqueness… but now, it’s also using the function name. This establishes a folder that no other user or function can collide with. That function can do whatever the heck it wants to in that space. Some will deposit a file-per-row, while others will append to a file of the same name every time. The details of that file-name and contents of the file are entirely up to the function. So for Tameka, I’ll be taking the gaprofile value, a keyword and a hit-count for that keyword and appending it onto the end of a file each time. The name of the file is only for identification by the person who it is ultimately going to be sent to. But because of the unique location, the filename could be the same even across columns, and it would still work without collisions. The file, in Tamaka’s case, will wind up as a CSV file, because I will write the function so that it does. After all rows containing question-marks are processed, comes the email phase. The email phases checks the file system for the existence of such file-deposit locations. For each directory it finds, it spins through the files inside that directory, zips them up into an attachment (unless they’re pictures), deleting each file as it finishes with it. That way at the end, you either have a single archive or a bunch of pics. Another spin-through of each directory is performed, adding as a file-attachment each file it adds. If a zip was created, it will only have one file. If it’s pics (which are already compressed, and don’t get much advantage from file-compression, and are not visible in emails that way, then it will have multiple attachments. The email is sent to the email from the bookmarklet, and then the files and directories are deleted.

I’m set up well for tomorrow, thought-work and momentum-wise.