Turning Python Huey Data Pipeline into Schedule Repo Alternative
by Mike LevinSaturday, August 06, 2022
My previous videos walked through setting up a scheduler and a proper data pipeline dask queue on two different Linux containers on my Windows system. I kept these processes separate because I understood:
pip install schedule
…bit I didn’t understand:
pip install huey
In so many ways they are the same.
- They both allow you to schedule tasks with Python decorators at the top of Python functions.
- They both require you to run a long-running Python script for the scheduled events to occur.
In the case of Schedule though, you simply feed the file that imports schedule and defines the scheduling to the Python interpreter:
…however, in the case of Huey, they have a huey_consumer.py file that is magically in the command-line path after a pip install huey. That’s a bit freaky to me, but nbdev does it as well, which I’ve been using for awhile now. However nbdev eliminates the .py file-extension and makes it feel like a native OS command (across OS’s, which is an awesome trick). Huey doesn’t hide the .py extension. My ~/py310/bin directory currently contains:
(py310) ubuntu@Huey:~/py310/bin$ ls -la total 56 drwxrwxr-x 1 ubuntu ubuntu 222 Aug 1 19:25 . drwxrwxr-x 1 ubuntu ubuntu 56 Aug 1 19:23 .. -rw-r--r-- 1 ubuntu ubuntu 1987 Aug 1 19:23 activate -rw-r--r-- 1 ubuntu ubuntu 913 Aug 1 19:23 activate.csh -rw-r--r-- 1 ubuntu ubuntu 2055 Aug 1 19:23 activate.fish -rw-r--r-- 1 ubuntu ubuntu 9033 Aug 1 19:23 Activate.ps1 -rwxrwxr-x 1 ubuntu ubuntu 978 Aug 1 19:25 huey_consumer -rwxrwxr-x 1 ubuntu ubuntu 1555 Aug 1 19:25 huey_consumer.py -rwxrwxr-x 1 ubuntu ubuntu 238 Aug 1 19:25 pip -rwxrwxr-x 1 ubuntu ubuntu 238 Aug 1 19:25 pip3 -rwxrwxr-x 1 ubuntu ubuntu 238 Aug 1 19:25 pip3.10 lrwxrwxrwx 1 ubuntu ubuntu 10 Aug 1 19:23 python -> python3.10 lrwxrwxrwx 1 ubuntu ubuntu 10 Aug 1 19:23 python3 -> python3.10 lrwxrwxrwx 1 ubuntu ubuntu 19 Aug 1 19:23 python3.10 -> /usr/bin/python3.10 (py310) ubuntu@Huey:~/py310/bin$
Interesting! There’s a huey_consumer file but without a .py extension there, but it’s still Python. It’s different from the .py file too.
Huey is an alternative to Celery and Luigi. I wonder if those are this screwy. Huey is screwy. The naming conventions boggle. But I have to wrap my mind around it. Forget the huey_consumer that has no extension for now.
The demo.py file that I made per the minimal configuration in the documentation looks like this:
# demo.py from huey import SqliteHuey from huey import crontab huey = SqliteHuey(filename='/tmp/demo.db') @huey.task() def add(a, b): return a + b @huey.periodic_task(crontab(minute='*/3')) def every_three_minutes(): print('This task runs every three minutes')
And this is turned into a service with a file in /etc/systemd/system/ named huey.service which looks like this:
[Unit] Description=Run Python script to handle scheduling [Service] Type=forking Restart=always RestartSec=5 User=ubuntu Group=ubuntu WorkingDirectory=/home/ubuntu/github/demo/ ExecStart=/usr/bin/screen -dmS huey /home/ubuntu/py310/bin/huey_consumer.py demo.huey StandardOutput=syslog StandardError=syslog [Install] WantedBy=multi-user.target
And so moving this work over to my other container that has only primitive scheduling is just a matter of doing a pip install huey and putting those 2 files in place. But instead of 2 scheduler services, I’ll replace the old one with Huey. I wish huey had such a good API for setting scheduling as pip install schedule.