Future-proof your skills with Linux, Python, vim & git as I share with you the most timeless and love-worthy tools in tech through my two great projects that work great together.

Turning Python Huey Data Pipeline into Schedule Repo Alternative

I recently discovered Huey, a task scheduler for Python that is similar to Schedule but with the added convenience of a huey_consumer.py file that is automatically added to the command-line path after a pip install. I'm currently in the process of transitioning my scheduler and data pipeline dask queue from two Linux containers to Huey on my Windows system. Read my blog post to learn more about this transition!

Transitioning My Scheduler and Data Pipeline to Huey on Windows

By Michael Levin

Saturday, August 6, 2022

My previous videos walked through setting up a scheduler and a proper data pipeline dask queue on two different Linux containers on my Windows system. I kept these processes separate because I understood:

pip install schedule

…bit I didn’t understand:

pip install huey

In so many ways they are the same.

In the case of Schedule though, you simply feed the file that imports schedule and defines the scheduling to the Python interpreter:

python scheduler.py

…however, in the case of Huey, they have a huey_consumer.py file that is magically in the command-line path after a pip install huey. That’s a bit freaky to me, but nbdev does it as well, which I’ve been using for awhile now. However nbdev eliminates the .py file-extension and makes it feel like a native OS command (across OS’s, which is an awesome trick). Huey doesn’t hide the .py extension. My ~/py310/bin directory currently contains:

(py310) ubuntu@Huey:~/py310/bin$ ls -la
total 56
drwxrwxr-x 1 ubuntu ubuntu  222 Aug  1 19:25 .
drwxrwxr-x 1 ubuntu ubuntu   56 Aug  1 19:23 ..
-rw-r--r-- 1 ubuntu ubuntu 1987 Aug  1 19:23 activate
-rw-r--r-- 1 ubuntu ubuntu  913 Aug  1 19:23 activate.csh
-rw-r--r-- 1 ubuntu ubuntu 2055 Aug  1 19:23 activate.fish
-rw-r--r-- 1 ubuntu ubuntu 9033 Aug  1 19:23 Activate.ps1
-rwxrwxr-x 1 ubuntu ubuntu  978 Aug  1 19:25 huey_consumer
-rwxrwxr-x 1 ubuntu ubuntu 1555 Aug  1 19:25 huey_consumer.py
-rwxrwxr-x 1 ubuntu ubuntu  238 Aug  1 19:25 pip
-rwxrwxr-x 1 ubuntu ubuntu  238 Aug  1 19:25 pip3
-rwxrwxr-x 1 ubuntu ubuntu  238 Aug  1 19:25 pip3.10
lrwxrwxrwx 1 ubuntu ubuntu   10 Aug  1 19:23 python -> python3.10
lrwxrwxrwx 1 ubuntu ubuntu   10 Aug  1 19:23 python3 -> python3.10
lrwxrwxrwx 1 ubuntu ubuntu   19 Aug  1 19:23 python3.10 -> /usr/bin/python3.10
(py310) ubuntu@Huey:~/py310/bin$

Interesting! There’s a huey_consumer file but without a .py extension there, but it’s still Python. It’s different from the .py file too.

Huey is an alternative to Celery and Luigi. I wonder if those are this screwy. Huey is screwy. The naming conventions boggle. But I have to wrap my mind around it. Forget the huey_consumer that has no extension for now.

The demo.py file that I made per the minimal configuration in the documentation looks like this:

# demo.py
from huey import SqliteHuey
from huey import crontab

huey = SqliteHuey(filename='/tmp/demo.db')

def add(a, b):
    return a + b

def every_three_minutes():
    print('This task runs every three minutes')

And this is turned into a service with a file in /etc/systemd/system/ named huey.service which looks like this:

Description=Run Python script to handle scheduling

ExecStart=/usr/bin/screen -dmS huey /home/ubuntu/py310/bin/huey_consumer.py demo.huey


And so moving this work over to my other container that has only primitive scheduling is just a matter of doing a pip install huey and putting those 2 files in place. But instead of 2 scheduler services, I’ll replace the old one with Huey. I wish huey had such a good API for setting scheduling as pip install schedule.