Future-proof your tech-skills with Linux, Python, vim & git as I share
with you the most timeless and love-worthy tools in tech and on staying
valuable while machines learn... and beyond.
Use Perceptual Image Hash as Database Primary Key for Cats
by Mike Levin
Thursday, August 04, 2022
Next step? Switch pip install scheduler to pip install huey!
But first! I started showing Python persistent dictionaries yesterday using
sqlitedict and sqlite3. I eluded to the fact that:
The “This Cat Does Not Exit” images I’m downloading could go into a database
The primary key for that cat record could be a perceptual image hash
Showing all the database keys would be the same as showing image thumbnails
Because the primary key is a perceptual image hash, it will naturally dedupe
And it’s done. I captured the session for your YouTube viewing pleasure. Go
through this video and you’ll see how we:
Use the Python PIL and ImageHash libraries to calculate a perceptual image
hash for each cat
Use that image hash as the new filename for the cat, replacing sequential
numbering like cat1.jpg, cat2.jpg
Use that image hash as the database primary key in a SQLite3 database
Put that SQlite3 database on a location shared by Windows, Linux, Hosts and
Containers
Access that SQLite3 database from a Jupyter Notebook on a Windows system in
JypyterLab Desktop
Step through the database and display the cat images from the Notebook