Write a Linux Scheduler Service in Python

by Mike Levin

Monday, August 01, 2022

These are the notes for a more involved video soon to come that takes this scheduling stuff to the next step with scheduling external files and then data pipelining.

Have you heard of Data Pipelining?

I’m not talking about operating system data pipes that use those vertical bars (|) in Unix, Linux and AmigaDOS to funnel data out of one program into another, although there are similarities. Rather, the data pipeline process is adopting various workflow conventions so that everyone in an organization getting data from point-A to point-B is following is doing it properly.

Honestly it’s the sort of stuff that stinks of extra moving parts and technical liability to me unless it’s really, really called for. Probably the most popular data pipeline system is Apache Airflow. In the Python world, Luigi is pretty popular. I would rather not touch either with a 100-foot pole. I prefer a lightweight approach. Luckily, there is lightweight data pipelining, and one of them is Huey, which you saw me install just after creating a new Linux container and installing Python 3.10 on the last video.

In this video I’m going to make a precursor to the data pipelining video based on a more basic concept of scheduling.

I like to:

pip install schedule

This gives me “scheduling for humans” in Python. You can read about it (https://pypi.org/project/schedule/)[https://pypi.org/project/schedule/] Their example is this:

import schedule
import time

def job():
    print("I'm working...")

schedule.every(10).seconds.do(job)
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)

while True:
    schedule.run_pending()
    time.sleep(1)

I’ve used this sort of scheduling to great effect for many years and when data pipelining came along, I’ve resisted because this gave me most of what I wanted with the least amount of effort. Scheduled items in the above scenario are defined Python functions.

End of story! But this does not provide a lot of flexibility in terms of managing files and keeping schedule-able tasks as separate folders (git/Github repos). Processes that need to get scheduled in my way of thinking usually turn out to be their own scripts, usually in their own repos (folder or directory). There’s a certain “tied together-ness” implied here,

All scheduled items would have to be in that file or importable as Python modules.

import shlex
from os import environ
from sys import stdout
from subprocess import Popen, PIPE


environ["PYTHONUNBUFFERED"] = "1"


# Through standard Python
def pulse():
    anow = f"{datetime.now()}"
    print(anow)
    with open("/tmp/pulse.txt", 'a') as fh:
        fh.write(anow + '\n')


def run(command, cwd=None):
    process = Popen(
        shlex.split(command),
        stdout=PIPE,
        cwd=cwd,
        bufsize=1,
        universal_newlines=True,
        shell=False,
    )
    for line in process.stdout:
        line = line.rstrip()
        print(line)
        # Put logging here
        stdout.flush()


def onepulse():
    pyx = "/home/ubuntu/py310/bin/python3.10"
    cwd = "/home/ubuntu/github/pulse/"
    cmd = f"{pyx} {cwd}onepulse.py"
    run(cmd, cwd=cwd)