Setting the Stage: Context for the Curious Book Reader
A Serendipitous Spiral Into Python Packaging History
This journal entry captures the serendipitous, spiraling nature of deep technical work. It begins with a focused analysis of a modern, opinionated Python web framework, FastHTML, and its tightly-integrated database. However, the practical challenge of refactoring a personal project, Pipulate, to function both within that framework and independently within a Jupyter Notebook, forces a deeper architectural consideration. This leads to the formulation of a design pattern I’ve dubbed the ‘Chameleon Constructor.’ The quest to understand the foundation of these tools then triggers a full-blown historical investigation into the decades-long, often messy evolution of Python’s entire packaging ecosystem. What follows is a journey from a specific problem to a general understanding, demonstrating how a single coding challenge can connect the dots between modern frameworks and the foundational decisions that made them possible.
Technical Journal Entry Begins
The Opinionated World of FastHTML
One of the weird wonderful things about FastHTML (vs. FastAPI, Flask and such)
is that FastHTML provides a built-in database as part of what’s called a
singleton class called fast_app
. This is also known as a factory class and
is how you generate the webserver object — which everyone who uses
Flask-inspired web microframeworks will recognize as the app
object. The app
object is the persistent connection to the Uvicorn
webserver, a separate
product that gets installed when you pip install python-fasthtml
and is sort
of like the equivalent of NodeJS. It’s what’s called an ASGI
webserver
standing for Asynchronous Server Gateway Interface. The app
object COULD be
created without any database services running alongside by using FastHTML like
this without going through fast_app
:
Gemini, help me write this example. Double-check it based on the fact that the canonical example at GitHub is:
from fasthtml.common import *
app,rt = fast_app()
@rt('/')
def get(): return Div(P('Hello World!'), hx_get="/change")
serve()
…knowing that from fasthtml.common import *
does this:
import uvicorn
from dataclasses import dataclass
from typing import Any
from .starlette import *
from fastcore.utils import *
from fastcore.xml import *
from apswutils import Database
from fastlite import *
from .basics import *
from .pico import *
from .authmw import *
from .live_reload import *
from .toaster import *
from .js import *
from .fastapp import *
…and that fastapp.py
contains this:
"""The `fast_app` convenience wrapper"""
import inspect,uvicorn
from fastcore.utils import *
from fastlite import *
from .basics import *
from .pico import *
from .starlette import *
from .live_reload import FastHTMLWithLiveReload
__all__ = ['fast_app']
def _get_tbl(dt, nm, schema):
render = schema.pop('render', None)
tbl = dt[nm]
if tbl not in dt: tbl.create(**schema)
else: tbl.create(**schema, transform=True)
dc = tbl.dataclass()
if render: dc.__ft__ = render
return tbl,dc
def _app_factory(*args, **kwargs) -> FastHTML | FastHTMLWithLiveReload:
"Creates a FastHTML or FastHTMLWithLiveReload app instance"
if kwargs.pop('live', False): return FastHTMLWithLiveReload(*args, **kwargs)
kwargs.pop('reload_attempts', None)
kwargs.pop('reload_interval', None)
return FastHTML(*args, **kwargs)
def fast_app(
db_file:Optional[str]=None, # Database file name, if needed
render:Optional[callable]=None, # Function used to render default database class
hdrs:Optional[tuple]=None, # Additional FT elements to add to <HEAD>
ftrs:Optional[tuple]=None, # Additional FT elements to add to end of <BODY>
tbls:Optional[dict]=None, # Experimental mapping from DB table names to dict table definitions
before:Optional[tuple]|Beforeware=None, # Functions to call prior to calling handler
middleware:Optional[tuple]=None, # Standard Starlette middleware
live:bool=False, # Enable live reloading
debug:bool=False, # Passed to Starlette, indicating if debug tracebacks should be returned on errors
title:str="FastHTML page", # Default page title
routes:Optional[tuple]=None, # Passed to Starlette
exception_handlers:Optional[dict]=None, # Passed to Starlette
on_startup:Optional[callable]=None, # Passed to Starlette
on_shutdown:Optional[callable]=None, # Passed to Starlette
lifespan:Optional[callable]=None, # Passed to Starlette
default_hdrs=True, # Include default FastHTML headers such as HTMX script?
pico:Optional[bool]=None, # Include PicoCSS header?
surreal:Optional[bool]=True, # Include surreal.js/scope headers?
htmx:Optional[bool]=True, # Include HTMX header?
exts:Optional[list|str]=None, # HTMX extension names to include
canonical:bool=True, # Automatically include canonical link?
secret_key:Optional[str]=None, # Signing key for sessions
key_fname:str='.sesskey', # Session cookie signing key file name
session_cookie:str='session_', # Session cookie name
max_age:int=365*24*3600, # Session cookie expiry time
sess_path:str='/', # Session cookie path
same_site:str='lax', # Session cookie same site policy
sess_https_only:bool=False, # Session cookie HTTPS only?
sess_domain:Optional[str]=None, # Session cookie domain
htmlkw:Optional[dict]=None, # Attrs to add to the HTML tag
bodykw:Optional[dict]=None, # Attrs to add to the Body tag
reload_attempts:Optional[int]=1, # Number of reload attempts when live reloading
reload_interval:Optional[int]=1000, # Time between reload attempts in ms
static_path:str=".", # Where the static file route points to, defaults to root dir
body_wrap:callable=noop_body, # FT wrapper for body contents
nb_hdrs:bool=False, # If in notebook include headers inject headers in notebook DOM?
**kwargs)->Any:
"Create a FastHTML or FastHTMLWithLiveReload app."
h = (picolink,) if pico or (pico is None and default_hdrs) else ()
if hdrs: h += tuple(hdrs)
app = _app_factory(hdrs=h, ftrs=ftrs, before=before, middleware=middleware, live=live, debug=debug, title=title, routes=routes, exception_handlers=exception_handlers,
on_startup=on_startup, on_shutdown=on_shutdown, lifespan=lifespan, default_hdrs=default_hdrs, secret_key=secret_key, canonical=canonical,
session_cookie=session_cookie, max_age=max_age, sess_path=sess_path, same_site=same_site, sess_https_only=sess_https_only,
sess_domain=sess_domain, key_fname=key_fname, exts=exts, surreal=surreal, htmx=htmx, htmlkw=htmlkw,
reload_attempts=reload_attempts, reload_interval=reload_interval, body_wrap=body_wrap, nb_hdrs=nb_hdrs, **(bodykw or {}))
app.static_route_exts(static_path=static_path)
if not db_file: return app,app.route
db = database(db_file)
if not tbls: tbls={}
if kwargs:
if isinstance(first(kwargs.values()), dict): tbls = kwargs
else:
kwargs['render'] = render
tbls['items'] = kwargs
dbtbls = [_get_tbl(db.t, k, v) for k,v in tbls.items()]
if len(dbtbls)==1: dbtbls=dbtbls[0]
return app,app.route,*dbtbls
…which as you can see from the way it returns an app
object, a rt
object
(Werkzeug-style app.route
method but abbreviated and from Starlette) and
something splat from a fastlite
component we could provide the more “pure”
app
object alone following the classing Flask pattern without going through
fast_app
for educational purposes that reveals fastlite
being layered in.
Gemini: Of course. You’ve done an excellent job breaking down the purpose of the fast_app
factory function. Your analysis is spot on: fast_app
is a convenience wrapper that instantiates the core FastHTML
web application object and optionally layers in database services via fastlite
.
By bypassing the factory, you can instantiate the FastHTML
class directly, which mirrors the classic pattern of app = Flask(__name__)
or app = FastAPI()
.
Here is the example of how to create the app
object without the fast_app
factory, with explanations based on the source code you provided.
“Pure” FastHTML App Instantiation
This example bypasses the fast_app
factory to create a web application without the integrated database setup.
# We import the specific classes we need instead of using a wildcard.
# FastHTML is the core application class, similar to Flask or FastAPI.
from fasthtml.common import FastHTML, Div, P, hx_get, serve
# 1. Instantiate the app object directly from the FastHTML class.
# This is the "pure" webserver object without the database factory logic.
app = FastHTML()
# 2. Use the `app.route` decorator directly.
# The `rt` object returned by `fast_app()` is just an alias for `app.route`,
# as shown in the factory's return statement: `return app, app.route`.
@app.route('/')
def get():
# The route handler's logic remains the same.
return Div(P('Hello from a "pure" FastHTML app!'), hx_get="/change")
@app.route('/change')
def change():
return P('The content has changed!')
# 3. Use the `serve()` function to run the Uvicorn server.
# This convenience function is still useful for launching the app.
if __name__ == '__main__':
serve()
How This Works (and Why It’s Educational)
Your investigation of the fastapp.py
source code perfectly reveals why this works:
-
The Core Object is
FastHTML
: Thefast_app
function ultimately calls the internal_app_factory
, which does one simple thing: it returns an instance ofFastHTML
(orFastHTMLWithLiveReload
iflive=True
). Our example simply callsFastHTML()
directly, achieving the same result for the base app object. -
rt
is justapp.route
: The signature offast_app
ends with this conditional logic:if not db_file: return app,app.route ... return app,app.route,*dbtbls
This shows that the second item returned (
rt
in the canonical exampleapp,rt = fast_app()
) is alwaysapp.route
. By instantiatingapp
ourselves, we can just use the standard@app.route('/')
decorator, which is more familiar to users of other microframeworks. -
Database is an Add-on: The entire database setup, including the use of
fastlite
and the processing oftbls
, only happens insidefast_app
if adb_file
is provided. By bypassing the factory, we skip that entire block of code, demonstrating that the database is a tightly integrated but fundamentally optional component.
This “pure” example is excellent for teaching because it clearly separates the concerns and reveals the layered architecture of FastHTML:
- Starlette: The underlying ASGI toolkit providing the routing and web server foundation.
FastHTML
class: The core application object that manages routes, middleware, and rendering FastHTML components.fast_app()
factory: A high-level convenience function that bundles theFastHTML
app withfastlite
database integration and sensible defaults for a rapid development experience.
The End of Templating: Python-Native UI
Me: Right, you got it. And so forcing everything through fast_app
is a
radical decision from a traditional Python microframework sense because it’s a
highly opinionated helper class layer layering-in all sorts of opinions really
hitting home how FastHTML is not FastAPI. Some of these conventions include as
you mentioned turning live-reload on by default, something you really want to
see happen in order to differentiate yourself from the full web stack of the
JavaScript world requiring comically large stack of frameworks, transpilers,
bundlers, and containerization layers. Here we just edit file, save and watch
the effect that it has. But there’s also the basics.py
which does:
from .core import *
from .components import *
from .xtend import *
…including the FT components.py
like so:
"""`ft_html` and `ft_hx` functions to add some conveniences to `ft`, along with a full set of basic HTML components, and functions to work with forms and `FT` conversion"""
# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbs/api/01_components.ipynb.
# %% auto 0
__all__ = ['named', 'html_attrs', 'hx_attrs', 'hx_evts', 'js_evts', 'hx_attrs_annotations', 'hx_evt_attrs', 'js_evt_attrs',
'evt_attrs', 'attrmap_x', 'ft_html', 'ft_hx', 'File', 'show', 'fill_form', 'fill_dataclass', 'find_inputs',
'html2ft', 'sse_message', 'A', 'Abbr', 'Address', 'Area', 'Article', 'Aside', 'Audio', 'B', 'Base', 'Bdi',
'Bdo', 'Blockquote', 'Body', 'Br', 'Button', 'Canvas', 'Caption', 'Cite', 'Code', 'Col', 'Colgroup', 'Data',
'Datalist', 'Dd', 'Del', 'Details', 'Dfn', 'Dialog', 'Div', 'Dl', 'Dt', 'Em', 'Embed', 'Fencedframe',
'Fieldset', 'Figcaption', 'Figure', 'Footer', 'Form', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'Head', 'Header',
'Hgroup', 'Hr', 'I', 'Iframe', 'Img', 'Input', 'Ins', 'Kbd', 'Label', 'Legend', 'Li', 'Link', 'Main', 'Map',
'Mark', 'Menu', 'Meta', 'Meter', 'Nav', 'Noscript', 'Object', 'Ol', 'Optgroup', 'Option', 'Output', 'P',
'Picture', 'PortalExperimental', 'Pre', 'Progress', 'Q', 'Rp', 'Rt', 'Ruby', 'S', 'Samp', 'Script', 'Search',
'Section', 'Select', 'Slot', 'Small', 'Source', 'Span', 'Strong', 'Style', 'Sub', 'Summary', 'Sup', 'Table',
'Tbody', 'Td', 'Template', 'Textarea', 'Tfoot', 'Th', 'Thead', 'Time', 'Title', 'Tr', 'Track', 'U', 'Ul',
'Var', 'Video', 'Wbr']
# %% ../nbs/api/01_components.ipynb
from dataclasses import dataclass, asdict, is_dataclass, make_dataclass, replace, astuple, MISSING
from bs4 import BeautifulSoup, Comment
from typing import Literal, Mapping, Optional
from fastcore.utils import *
from fastcore.xml import *
from fastcore.meta import use_kwargs, delegates
from fastcore.test import *
from .core import fh_cfg, unqid
import types, json
try: from IPython import display
except ImportError: display=None
# %% ../nbs/api/01_components.ipynb
@patch
def __str__(self:FT): return self.id if self.id else to_xml(self, indent=False)
# %% ../nbs/api/01_components.ipynb
@patch
def __radd__(self:FT, b): return f'{b}{self}'
# %% ../nbs/api/01_components.ipynb
@patch
def __add__(self:FT, b): return f'{self}{b}'
# %% ../nbs/api/01_components.ipynb
named = set('a button form frame iframe img input map meta object param select textarea'.split())
html_attrs = 'id cls title style accesskey contenteditable dir draggable enterkeyhint hidden inert inputmode lang popover spellcheck tabindex translate'.split()
hx_attrs = 'get post put delete patch trigger target swap swap_oob include select select_oob indicator push_url confirm disable replace_url vals disabled_elt ext headers history history_elt indicator inherit params preserve prompt replace_url request sync validate'
hx_evts = 'abort afterOnLoad afterProcessNode afterRequest afterSettle afterSwap beforeCleanupElement beforeOnLoad beforeProcessNode beforeRequest beforeSwap beforeSend beforeTransition configRequest confirm historyCacheError historyCacheMiss historyCacheMissError historyCacheMissLoad historyRestore beforeHistorySave load noSSESourceError onLoadError oobAfterSwap oobBeforeSwap oobErrorNoTarget prompt pushedIntoHistory replacedInHistory responseError sendAbort sendError sseError sseOpen swapError targetError timeout validation:validate validation:failed validation:halted xhr:abort xhr:loadend xhr:loadstart xhr:progress'
js_evts = "blur change contextmenu focus input invalid reset select submit keydown keypress keyup click dblclick mousedown mouseenter mouseleave mousemove mouseout mouseover mouseup wheel"
hx_attrs = [f'hx_{o}' for o in hx_attrs.split()]
hx_attrs_annotations = {
"hx_swap": Literal["innerHTML", "outerHTML", "afterbegin", "beforebegin", "beforeend", "afterend", "delete", "none"] | str,
"hx_swap_oob": Literal["true", "innerHTML", "outerHTML", "afterbegin", "beforebegin", "beforeend", "afterend", "delete", "none"] | str,
"hx_push_url": Literal["true", "false"] | str,
"hx_replace_url": Literal["true", "false"] | str,
"hx_disabled_elt": Literal["this", "next", "previous"] | str,
"hx_history": Literal["false"] | str,
"hx_params": Literal["*", "none"] | str,
"hx_validate": Literal["true", "false"],
}
hx_attrs_annotations |= {o: str for o in set(hx_attrs) - set(hx_attrs_annotations.keys())}
hx_attrs_annotations = {k: Optional[v] for k,v in hx_attrs_annotations.items()}
hx_attrs = html_attrs + hx_attrs
hx_evt_attrs = ['hx_on__'+camel2snake(o).replace(':','_') for o in hx_evts.split()]
js_evt_attrs = ['hx_on_'+o for o in js_evts.split()]
evt_attrs = js_evt_attrs+hx_evt_attrs
# %% ../nbs/api/01_components.ipynb
def attrmap_x(o):
if o.startswith('_at_'): o = '@'+o[4:]
return attrmap(o)
# %% ../nbs/api/01_components.ipynb
fh_cfg['attrmap']=attrmap_x
fh_cfg['valmap' ]=valmap
fh_cfg['ft_cls' ]=FT
fh_cfg['auto_id']=False
fh_cfg['auto_name']=True
# %% ../nbs/api/01_components.ipynb
def ft_html(tag: str, *c, id=None, cls=None, title=None, style=None, attrmap=None, valmap=None, ft_cls=None, **kwargs):
ds,c = partition(c, risinstance(Mapping))
for d in ds: kwargs = {**kwargs, **d}
if ft_cls is None: ft_cls = fh_cfg.ft_cls
if attrmap is None: attrmap=fh_cfg.attrmap
if valmap is None: valmap =fh_cfg.valmap
if not id and fh_cfg.auto_id: id = True
if id and isinstance(id,bool): id = unqid()
kwargs['id'] = id.id if isinstance(id,FT) else id
kwargs['cls'],kwargs['title'],kwargs['style'] = cls,title,style
tag,c,kw = ft(tag, *c, attrmap=attrmap, valmap=valmap, **kwargs).list
if fh_cfg['auto_name'] and tag in named and id and 'name' not in kw: kw['name'] = kw['id']
return ft_cls(tag,c,kw, void_=tag in voids)
# %% ../nbs/api/01_components.ipynb
@use_kwargs(hx_attrs+evt_attrs, keep=True)
def ft_hx(tag: str, *c, target_id=None, hx_vals=None, hx_target=None, **kwargs):
if hx_vals: kwargs['hx_vals'] = json.dumps(hx_vals) if isinstance (hx_vals,dict) else hx_vals
if hx_target: kwargs['hx_target'] = '#'+hx_target.id if isinstance(hx_target,FT) else hx_target
if target_id: kwargs['hx_target'] = '#'+target_id
return ft_html(tag, *c, **kwargs)
# %% ../nbs/api/01_components.ipynb
_g = globals()
_all_ = [
'A', 'Abbr', 'Address', 'Area', 'Article', 'Aside', 'Audio', 'B', 'Base', 'Bdi', 'Bdo', 'Blockquote', 'Body', 'Br',
'Button', 'Canvas', 'Caption', 'Cite', 'Code', 'Col', 'Colgroup', 'Data', 'Datalist', 'Dd', 'Del', 'Details', 'Dfn',
'Dialog', 'Div', 'Dl', 'Dt', 'Em', 'Embed', 'Fencedframe', 'Fieldset', 'Figcaption', 'Figure', 'Footer', 'Form',
'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'Head', 'Header',
'Hgroup', 'Hr', 'I', 'Iframe', 'Img', 'Input', 'Ins', 'Kbd', 'Label', 'Legend', 'Li',
'Link', 'Main', 'Map', 'Mark', 'Menu', 'Meta', 'Meter', 'Nav', 'Noscript', 'Object', 'Ol', 'Optgroup', 'Option', 'Output',
'P', 'Picture', 'PortalExperimental', 'Pre', 'Progress', 'Q', 'Rp', 'Rt', 'Ruby', 'S', 'Samp', 'Script', 'Search',
'Section', 'Select', 'Slot', 'Small', 'Source', 'Span', 'Strong', 'Style', 'Sub', 'Summary', 'Sup', 'Table', 'Tbody',
'Td', 'Template', 'Textarea', 'Tfoot', 'Th', 'Thead', 'Time', 'Title', 'Tr', 'Track', 'U', 'Ul', 'Var', 'Video', 'Wbr']
for o in _all_: _g[o] = partial(ft_hx, o.lower())
# %% ../nbs/api/01_components.ipynb
def File(fname):
"Use the unescaped text in file `fname` directly"
return NotStr(Path(fname).read_text())
# %% ../nbs/api/01_components.ipynb
def show(ft, *rest, iframe=False, height='auto', style=None):
"Renders FT Components into HTML within a Jupyter notebook."
if rest: ft = (ft,)+rest
res = to_xml(ft)
if iframe:
style = "border: none; " + (style or "")
cfg = dict(frameborder=0, width='100%', height=height, style=style)
res = to_xml(Iframe(srcdoc=res, **cfg))
with warnings.catch_warnings():
warnings.simplefilter("ignore", UserWarning)
display.display(display.HTML(res))
# %% ../nbs/api/01_components.ipynb
def _fill_item(item, obj):
if not isinstance(item,FT): return item
tag,cs,attr = item.list
if isinstance(cs,tuple): cs = tuple(_fill_item(o, obj) for o in cs)
name = attr.get('name', None)
val = None if name is None else obj.get(name, None)
if val is not None and not 'skip' in attr:
if tag=='input':
if attr.get('type', '') == 'checkbox':
if isinstance(val, list):
if attr['value'] in val: attr['checked'] = '1'
else: attr.pop('checked', '')
elif val: attr['checked'] = '1'
else: attr.pop('checked', '')
elif attr.get('type', '') == 'radio':
if val and val == attr['value']: attr['checked'] = '1'
else: attr.pop('checked', '')
else: attr['value'] = val
if tag=='textarea': cs=(val,)
if tag == 'select':
if isinstance(val, list):
for opt in cs:
if opt.tag == 'option' and opt.get('value') in val:
opt.selected = '1'
else:
option = next((o for o in cs if o.tag=='option' and o.get('value')==val), None)
if option: option.selected = '1'
return FT(tag,cs,attr,void_=item.void_)
# %% ../nbs/api/01_components.ipynb
def fill_form(form:FT, obj)->FT:
"Fills named items in `form` using attributes in `obj`"
if is_dataclass(obj): obj = asdict(obj)
elif not isinstance(obj,dict): obj = obj.__dict__
return _fill_item(form, obj)
# %% ../nbs/api/01_components.ipynb
def fill_dataclass(src, dest):
"Modifies dataclass in-place and returns it"
for nm,val in asdict(src).items(): setattr(dest, nm, val)
return dest
# %% ../nbs/api/01_components.ipynb
def find_inputs(e, tags='input', **kw):
"Recursively find all elements in `e` with `tags` and attrs matching `kw`"
if not isinstance(e, (list,tuple,FT)): return []
inputs = []
if isinstance(tags,str): tags = [tags]
elif tags is None: tags = []
cs = e
if isinstance(e, FT):
tag,cs,attr = e.list
if tag in tags and kw.items()<=attr.items(): inputs.append(e)
for o in cs: inputs += find_inputs(o, tags, **kw)
return inputs
# %% ../nbs/api/01_components.ipynb
def __getattr__(tag):
if tag.startswith('_') or tag[0].islower(): raise AttributeError
tag = tag.replace("_", "-")
def _f(*c, target_id=None, **kwargs): return ft_hx(tag, *c, target_id=target_id, **kwargs)
return _f
# %% ../nbs/api/01_components.ipynb
_re_h2x_attr_key = re.compile(r'^[A-Za-z_-][\w-]*$')
def html2ft(html, attr1st=False):
"""Convert HTML to an `ft` expression"""
rev_map = {'class': 'cls', 'for': 'fr'}
def _parse(elm, lvl=0, indent=4):
if isinstance(elm, str): return repr(elm.strip()) if elm.strip() else ''
if isinstance(elm, list): return '\n'.join(_parse(o, lvl) for o in elm)
tag_name = elm.name.capitalize().replace("-", "_")
if tag_name=='[document]': return _parse(list(elm.children), lvl)
cts = elm.contents
cs = [repr(c.strip()) if isinstance(c, str) else _parse(c, lvl+1)
for c in cts if str(c).strip()]
attrs, exotic_attrs = [], {}
for key, value in sorted(elm.attrs.items(), key=lambda x: x[0]=='class'):
if value is None or value == True: value = True # handle boolean attributes
elif isinstance(value,(tuple,list)): value = " ".join(value)
key, value = rev_map.get(key, key), value or True
if _re_h2x_attr_key.match(key): attrs.append(f'{key.replace("-", "_")}={value!r}')
else: exotic_attrs[key] = value
if exotic_attrs: attrs.append(f'**{exotic_attrs!r}')
spc = " "*lvl*indent
onlychild = not cts or (len(cts)==1 and isinstance(cts[0],str))
j = ', ' if onlychild else f',\n{spc}'
inner = j.join(filter(None, cs+attrs))
if onlychild:
if not attr1st: return f'{tag_name}({inner})'
else:
# respect attr1st setting
attrs = ', '.join(filter(None, attrs))
return f'{tag_name}({attrs})({cs[0] if cs else ""})'
if not attr1st or not attrs: return f'{tag_name}(\n{spc}{inner}\n{" "*(lvl-1)*indent})'
inner_cs = j.join(filter(None, cs))
inner_attrs = ', '.join(filter(None, attrs))
return f'{tag_name}({inner_attrs})(\n{spc}{inner_cs}\n{" "*(lvl-1)*indent})'
soup = BeautifulSoup(html.strip(), 'html.parser')
for c in soup.find_all(string=risinstance(Comment)): c.extract()
return _parse(soup, 1)
# %% ../nbs/api/01_components.ipynb
def sse_message(elm, event='message'):
"Convert element `elm` into a format suitable for SSE streaming"
data = '\n'.join(f'data: {o}' for o in to_xml(elm).splitlines())
return f'event: {event}\n{data}\n\n'
…and thus eliminating any Liquid Templates-like Jinja2 templating language
besides just native Python itself. It’s also pretty important to point out that
these 2 lines from the fast_app
signature are going to make sure that the
HTMX JavaScript library is going to be included on the HTML source thus
eliminating the need for JavaScript in most common web development use cases:
default_hdrs=True, # Include default FastHTML headers such as HTMX script?
...
htmx:Optional[bool]=True, # Include HTMX header?
…as does this one for including PicoCSS:
pico:Optional[bool]=None, # Include PicoCSS header?
Talk about opinionated microframework, right? But in exchange for conceding to these few opinions, you get nothing less than the elimination of database complexity (sqlite3 is built into Python), JavaScript and CSS in the most common use cases of web development.
Gemini: You’ve hit the nail on the head. This is an exceptionally sharp analysis of FastHTML’s core philosophy and its radical departure from the unopinionated, “micro” framework tradition. What you’re describing isn’t just a collection of features; it’s a cohesive and highly opinionated development paradigm.
Let’s break down the implications of the points you’ve raised.
1. The End of Templating: Python-Native Components
Your insight about components.py
is crucial. By providing Python functions that directly map to every standard HTML tag, FastHTML completely eliminates the need for a separate templating language like Jinja2, Liquid, or Mustache.
The code you highlighted:
_g = globals()
_all_ = [ # A huge list of HTML tags as strings...
'A', 'Abbr', 'Address', ... 'Video', 'Wbr']
for o in _all_: _g[o] = partial(ft_hx, o.lower())
This is the magic wand. It programmatically creates a Python function for each tag (e.g., Div
, P
, Form
). This design choice has profound consequences:
- No Context Switching: Developers stay in Python. You don’t have to switch your brain from Python logic to a template’s domain-specific language (
{{ variable }}
,{% for item in items %}
). - Full Power of Python: Your “templates” are now just Python code. You can use loops, comprehensions, conditionals, functions, and even classes to build your UI. This is vastly more powerful and flexible than any templating language.
- Tooling Benefits: Linters, type checkers, and IDE auto-completion work perfectly because it’s all just Python. You can’t easily lint a Jinja2 template for logic errors in the same way.
2. “Batteries Included” Defaults: HTMX and PicoCSS
You are absolutely right to point out these lines from the fast_app
signature. They are the embodiment of the framework’s opinionated nature.
default_hdrs=True,
pico:Optional[bool]=None, # Defaults to True when default_hdrs is True
htmx:Optional[bool]=True,
The framework’s author is making a clear statement: “For most web apps, you want modern interactivity and a clean, responsive design out of the box. So I’m giving them to you.”
- Eliminating JavaScript Boilerplate: By including HTMX by default, FastHTML obviates the need for writing custom JavaScript for 90% of common dynamic UI tasks (e.g., live search, infinite scroll, form validation, partial page updates). This single decision is a massive blow against the complexity of the modern JS toolchain.
- Eliminating CSS Boilerplate: By including PicoCSS, the framework ensures that your application is presentable and responsive from the very first line of code, without you ever having to write a media query or a CSS reset. It removes the initial “blank page” problem and lets developers focus on functionality.
The Grand Bargain: Trading Convention for Simplicity
You summarized the value proposition perfectly. FastHTML presents a “grand bargain” to the developer:
In exchange for conceding to a few strong opinions, you get to eliminate entire domains of complexity that have plagued web development for over a decade.
You Concede To… | You Eliminate… |
---|---|
Using SQLite as the default, integrated database via fastlite . |
ORM selection, database connection pooling, setup scripts, and often Docker Compose for a separate DB container. |
Writing UI with Python functions (Div , P , etc.). |
An entire templating language (Jinja2), its specific syntax, and the “stringly-typed” interface between backend and frontend. |
Using HTMX for interactivity. | The need for React/Vue/Svelte, npm /yarn , Webpack/Vite, Babel, and the entire JavaScript build/transpilation pipeline. |
Accepting PicoCSS as a default style. | The need to select a CSS framework (Bootstrap, Tailwind), configure it, and write boilerplate CSS for basic usability. |
A simplified “save and refresh” development loop. | Complex Hot Module Replacement (HMR) configurations, watchers, and bundlers. |
This is why comparing FastHTML to FastAPI or Flask is an apples-to-oranges comparison. FastAPI and Flask are libraries for building web endpoints. FastHTML is a cohesive system for building web applications with a clear, opinionated philosophy designed to maximize developer productivity by minimizing tools and dependencies.
Pipulate’s Two Realities: Server vs. Notebook
Me: And so now I’ve built a system that works with FastHTML called Pipulate
but it also works (or soon will) WITHOUT FastHTML — the “alternative
environment” being inside a Jupyter Notebook (instead of inside server.py
).
The reason for this is to mock-up Pipulate apps/
fast and make them fully
functional and indeed wonderful to work with directly in the Notebook .ipynb
file, yet nonetheless be perfectly staged for easy porting to Pipulate which
makes you work with all the Pipulate HTMX framework scaffolding which is not for
the weak of heart or uninitiated to .run_all_cells()
or returning fragments of
over the wire HTML in return
statements.
So what I do at this point to help you immerse yourself as my code assistant in
the context behind this is use my nifty new list_articles.py
script to get my
last few articles that cover implementation.
NOTE: Here we go on a bit of a tangent stepping into some of yesterday’s work where I built a way to get the storytelling context of my articles here in the right order and sized chunks for my 200,000 token Prompt Fu queries. I need them in 2 different potential orders and this is me getting picking up mid-conversation to get that feature.
Me: This is ABSOLUTELY INCREDIBLE (in reference to yesterday’s chronological
article-listing work)! Now here’s the rub. When I use -r
an -t
together, I
might be copying articles from the bottom-up to get up to my 200,000 token
limit that you can handle and the way the cumulative token count is calculated
top-down is not very useful. So I want the inverse list of the cumulative count
always when the -t
parameter is used to be next to the ascending token
count. So the format should end up like this:
/home/mike/repos/MikeLev.in/_posts/2024-09-08-Future-proofing.md # 3,186 tokens (3,186 / 6,134,534 total)
/home/mike/repos/MikeLev.in/_posts/2024-09-09-LPvg.md # 2,569 tokens (5,755 / 6,124,065 total)
/home/mike/repos/MikeLev.in/_posts/2024-09-10-nixos-language.md # 4,699 tokens (10,454 / 6,112,568 total)
…and so on down the list so I can copy from the top-down or the bottom-up and
so long as I start from the beginning or the end I have a good idea of how many
tokens of the context story I’m sending to you. I am aware this will cut into
the real-time streaming feature and will make me wait until a cached list
(likely in memory) is parsed but that’s fine. Just make sure the user has some
reasonable feedback. Current state of list_articles.py
:
[Original code pasted]
Gemini: Excellent idea. This is a very powerful way to gauge how much context you’re grabbing, whether you’re starting from the most recent articles or the oldest ones.
You’re right that this requires a two-pass approach: first, we calculate all the token counts to get a grand total, and then we loop again to print the formatted output.
Here are the edits to implement this functionality in your script.
Talking Through the Edits
The core change will happen in the if __name__ == '__main__':
block. The get_post_order
function and other helpers can remain untouched.
- Check for the
--token
flag early: The logic will split. If the flag isn’t used, we’ll just print the file list as before. If it is used, we’ll perform the new two-pass calculation. - Pass 1: Pre-calculation:
- We’ll give the user feedback that we’re calculating tokens, since this might take a moment.
- We’ll loop through the
ordered_files
list once. In this loop, we’ll read each file, count its tokens, and store the path and token count together in a new list (e.g.,file_data
). This “caches” all the token counts in memory. - After the loop, we’ll calculate the
grand_total_tokens
by summing up all the cached counts.
- Pass 2: Formatted Printing:
- We’ll initialize two counters:
ascending_total = 0
anddescending_total = grand_total_tokens
. - We’ll loop through our cached
file_data
list. - For each file, we’ll update the
ascending_total
and then print the new formatted string, which includes the file’s individual count, theascending_total
, and the currentdescending_total
. - After printing, we’ll subtract the current file’s tokens from
descending_total
to prepare it for the next line.
- We’ll initialize two counters:
The Updated Script
Here is the complete file with the new logic implemented.
#!/usr/bin/env python3
# list_posts_chronologically_config.py
import os
import sys
import yaml
import argparse
import tiktoken
from datetime import datetime
# NOTE: This script now requires 'tiktoken' and 'PyYAML'.
# Install them with: pip install tiktoken PyYAML
# --- CONFIGURATION ---
# Hardwire the absolute path to your posts directory here.
POSTS_DIRECTORY = "/home/mike/repos/MikeLev.in/_posts"
def count_tokens(text: str, model: str = "gpt-4") -> int:
"""Counts the number of tokens in a text string using the tiktoken library."""
try:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
except Exception:
# Fallback for any tiktoken errors
return 0
def get_post_order(posts_dir=POSTS_DIRECTORY, reverse_order=False):
"""
Parses Jekyll posts from a specified directory, sorts them by date
and 'sort_order', and returns an ordered list of full absolute file paths.
"""
posts_data = []
if not os.path.isdir(posts_dir):
print(f"Error: Could not find the configured directory at {posts_dir}", file=sys.stderr)
return []
for filename in os.listdir(posts_dir):
filepath = os.path.join(posts_dir, filename)
if not os.path.isfile(filepath) or not filename.endswith(('.md', '.markdown')):
continue
try:
date_str = filename[:10]
post_date = datetime.strptime(date_str, '%Y-%m-%d').date()
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
if not content.startswith('---'):
front_matter = {}
else:
parts = content.split('---', 2)
front_matter = yaml.safe_load(parts[1]) or {}
sort_order = int(front_matter.get('sort_order', 0))
posts_data.append({
'path': filepath,
'date': post_date,
'sort_order': sort_order
})
except (ValueError, yaml.YAMLError):
continue
except Exception as e:
print(f"Could not process {filepath}: {e}", file=sys.stderr)
# The 'reverse' flag of the sorted function is controlled by the new argument
sorted_posts = sorted(
posts_data,
key=lambda p: (p['date'], p['sort_order']),
reverse=not reverse_order
)
return [post['path'] for post in sorted_posts]
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description="List Jekyll posts in chronological order, optionally with token counts."
)
parser.add_argument(
'-t', '--token',
action='store_true',
help='Calculate and display the GPT-4 token count for each file.'
)
parser.add_argument(
'-r', '--reverse',
action='store_true',
help='List posts in chronological order (oldest first) instead of the default reverse chronological.'
)
args = parser.parse_args()
# Pass the reverse flag to the function
ordered_files = get_post_order(reverse_order=args.reverse)
order_description = "chronological (oldest first)" if args.reverse else "reverse chronological (newest first)"
print(f"Posts in {order_description} order (full paths):")
if args.token:
# --- PASS 1: Pre-calculate all token counts ---
print("Calculating token counts for all files, this may take a moment...", file=sys.stderr)
file_data = []
for filepath in ordered_files:
try:
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
token_count = count_tokens(content)
file_data.append({'path': filepath, 'tokens': token_count})
except Exception as e:
print(f"{filepath} # Error: Could not read file - {e}", file=sys.stderr)
# Add a record with 0 tokens to avoid breaking the logic
file_data.append({'path': filepath, 'tokens': 0})
grand_total_tokens = sum(item['tokens'] for item in file_data)
print("", file=sys.stderr) # Add a newline after the status message
# --- PASS 2: Print formatted output with dual cumulative counts ---
ascending_total = 0
descending_total = grand_total_tokens
for item in file_data:
filepath = item['path']
token_count = item['tokens']
ascending_total += token_count
# Print the new format with individual, ascending, and descending counts
print(f"{filepath} # {token_count:,} tokens ({ascending_total:,} / {descending_total:,} total)")
# Decrement the descending total for the next iteration
descending_total -= token_count
else:
# If --token is not used, just print the file paths as before
for filepath in ordered_files:
print(filepath)
…and for those interested in what changed from the original program:
[mike@nixos:~/repos/pipulate/scripts/article_automation]$ git --no-pager diff
diff --git a/scripts/article_automation/list_articles.py b/scripts/article_automation/list_articles.py
index f9b814f4..390477b5 100644
--- a/scripts/article_automation/list_articles.py
+++ b/scripts/article_automation/list_articles.py
@@ -1,6 +1,7 @@
#!/usr/bin/env python3
# list_posts_chronologically_config.py
import os
+import sys
import yaml
import argparse
import tiktoken
@@ -30,7 +31,7 @@ def get_post_order(posts_dir=POSTS_DIRECTORY, reverse_order=False):
posts_data = []
if not os.path.isdir(posts_dir):
- print(f"Error: Could not find the configured directory at {posts_dir}")
+ print(f"Error: Could not find the configured directory at {posts_dir}", file=sys.stderr)
return []
for filename in os.listdir(posts_dir):
@@ -63,7 +64,7 @@ def get_post_order(posts_dir=POSTS_DIRECTORY, reverse_order=False):
except (ValueError, yaml.YAMLError):
continue
except Exception as e:
- print(f"Could not process {filepath}: {e}")
+ print(f"Could not process {filepath}: {e}", file=sys.stderr)
# The 'reverse' flag of the sorted function is controlled by the new argument
sorted_posts = sorted(
@@ -83,7 +84,6 @@ if __name__ == '__main__':
action='store_true',
help='Calculate and display the GPT-4 token count for each file.'
)
- # Add the new reverse argument
parser.add_argument(
'-r', '--reverse',
action='store_true',
@@ -97,22 +97,41 @@ if __name__ == '__main__':
order_description = "chronological (oldest first)" if args.reverse else "reverse chronological (newest first)"
print(f"Posts in {order_description} order (full paths):")
- # Initialize a variable to keep the running total
- cumulative_tokens = 0
-
- for filepath in ordered_files:
- if args.token:
+ if args.token:
+ # --- PASS 1: Pre-calculate all token counts ---
+ print("Calculating token counts for all files, this may take a moment...", file=sys.stderr)
+ file_data = []
+ for filepath in ordered_files:
try:
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
-
token_count = count_tokens(content)
- # Add the current file's tokens to the cumulative total
- cumulative_tokens += token_count
-
- # Print the new format with both individual and cumulative counts
- print(f"{filepath} # {token_count:,} tokens ({cumulative_tokens:,} total)")
+ file_data.append({'path': filepath, 'tokens': token_count})
except Exception as e:
- print(f"{filepath} # Error: Could not read file - {e}")
- else:
+ print(f"{filepath} # Error: Could not read file - {e}", file=sys.stderr)
+ # Add a record with 0 tokens to avoid breaking the logic
+ file_data.append({'path': filepath, 'tokens': 0})
+
+ grand_total_tokens = sum(item['tokens'] for item in file_data)
+ print("", file=sys.stderr) # Add a newline after the status message
+
+ # --- PASS 2: Print formatted output with dual cumulative counts ---
+ ascending_total = 0
+ descending_total = grand_total_tokens
+
+ for item in file_data:
+ filepath = item['path']
+ token_count = item['tokens']
+
+ ascending_total += token_count
+
+ # Print the new format with individual, ascending, and descending counts
+ print(f"{filepath} # {token_count:,} tokens ({ascending_total:,} / {descending_total:,} total)")
+
+ # Decrement the descending total for the next iteration
+ descending_total -= token_count
+
+ else:
+ # If --token is not used, just print the file paths as before
+ for filepath in ordered_files:
print(filepath)
\ No newline at end of file
…and so I grab about 100K of context leading up to where we are:
/home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-the-magic-wand-refactor.md # 20,662 tokens (6,063,518 / 91,678 total)
/home/mike/repos/MikeLev.in/_posts/2025-09-30-script-driven-refactoring-pipulate.md # 7,140 tokens (6,070,658 / 71,016 total)
/home/mike/repos/MikeLev.in/_posts/2025-10-01-crafting-the-automation-wand.md # 27,378 tokens (6,098,036 / 63,876 total)
/home/mike/repos/MikeLev.in/_posts/2025-10-01-automation-whack-a-mole-debugging.md # 14,532 tokens (6,112,568 / 36,498 total)
/home/mike/repos/MikeLev.in/_posts/2025-10-01-refactoring-pipulate-persistent-notebook-workflows.md # 11,497 tokens (6,124,065 / 21,966 total)
/home/mike/repos/MikeLev.in/_posts/2025-10-01-forging-the-prompt-200k-token-context.md # 10,469 tokens (6,134,534 / 10,469 total)
And I add a bunch of code from the project:
/home/mike/repos/pipulate/flake.nix
/home/mike/repos/pipulate/server.py
/home/mike/repos/pipulate/config.py
/home/mike/repos/pipulate/pipulate/__init__.py
/home/mike/repos/pipulate/pipulate/core.py
/home/mike/repos/pipulate/apps/040_hello_workflow.py
…and of course add this article and make sure it ends in a prompt-y style. Hey
Gemini, help continue this article that builds on what I’ve written so far about
the tight coupling of fastlite
with FastHTML and how I’m taking a feature
built under that one reality and making that feature able to be transposed
over to another reality — the JupyterLab one.
[mike@nixos:~/repos/pipulate]$ python prompt_foo.py
--- Files Included ---
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-forging-the-magic-wand-refactor.md (20,662 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-09-30-script-driven-refactoring-pipulate.md (7,140 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-01-crafting-the-automation-wand.md (27,378 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-01-automation-whack-a-mole-debugging.md (14,532 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-01-refactoring-pipulate-persistent-notebook-workflows.md (11,497 tokens)
• /home/mike/repos/MikeLev.in/_posts/2025-10-01-forging-the-prompt-200k-token-context.md (10,469 tokens)
• /home/mike/repos/pipulate/flake.nix (6,890 tokens)
• /home/mike/repos/pipulate/server.py (54,706 tokens)
• /home/mike/repos/pipulate/config.py (4,128 tokens)
• /home/mike/repos/pipulate/pipulate/__init__.py (7 tokens)
• /home/mike/repos/pipulate/pipulate/core.py (17,886 tokens)
• /home/mike/repos/pipulate/apps/040_hello_workflow.py (7,736 tokens)
• /home/mike/repos/pipulate/prompt.md (11,320 tokens)
--- Token Summary ---
Total tokens: 205,671 tokens
Total words: 97,432 words
--- Size Perspective ---
📝 Content size: Long novel
🤖 Token size: Standard novel
📊 Token-to-word ratio: 2.11 (higher = more technical/structured content)
Markdown output copied to clipboard
[mike@nixos:~/repos/pipulate]$
NOTE: Woot! I went over the 200K token limit and was still able to paste it into Chrome. It is unclear whether the submit limit-size of a prompt in the Web UI is Chrome-limited or model-limited.
Me: But now back to the Prompt Fu query this resulted in, putting 200K tokens of this project’s history plus all of the relevant codebase of the git repo into a single 1-shot prompt context on the Web UI. And the response is:
Gemini: Of course. You’ve provided an incredible amount of context, painting a vivid picture of the architectural journey. Your analysis of FastHTML’s opinionated nature is spot-on. It makes a deliberate trade: it asks you to embrace its conventions, and in return, it absolves you of the need to manage entire ecosystems of tools for databases, templating, and front-end interactivity.
Now, let’s continue the article, building on that foundation to explain the challenge and the solution you’re architecting: transposing the tightly-coupled database feature from the “FastHTML reality” into the “JupyterLab reality.”
The Chameleon Constructor: An Elegant Solution
The Two Realities of Pipulate’s Database
The tight integration of fastlite
within the fast_app
factory creates a powerful but specific reality for how Pipulate works. To make Pipulate truly portable, we have to teach it to live in two different worlds.
Reality #1: The Integrated World of server.py
In its original form, Pipulate lives entirely within the world created by the fast_app
factory in server.py
. Think of fast_app
as the architect.
When server.py
runs, this architect does all the foundational work:
- It reads the
db_file
path. - It uses
fastlite
to either create or connect to the SQLite database at that path. - It reads the table schemas you define (for
store
,pipeline
,profile
, etc.) and ensures those tables exist in the database file. - Finally, it hands off the fully-formed, ready-to-use table handler objects to the
Pipulate
class constructor.
In this reality, the Pipulate
object is a passive consumer. It is given its database connections; it doesn’t create them. This is the classic dependency injection pattern, and it works perfectly as long as you are inside the server.py
ecosystem.
Reality #2: The Standalone World of JupyterLab
This is the new reality you’re building towards. Inside a Jupyter Notebook, there is no fast_app
factory running. There is no server.py
to act as the architect. If you were to simply import pipulate
, it would have no way of knowing where the database is or how to talk to it.
This is the core challenge your massive refactoring effort—the “Great Stuffing”—is designed to solve. The goal is to make the Pipulate
class itself smart enough to detect which reality it’s in and act accordingly.
Bridging the Worlds: The “Chameleon” Constructor
The brilliant solution, which you’ve meticulously planned, lies in transforming the Pipulate.__init__
method in pipulate/core.py
into a “chameleon” constructor. It adapts its behavior based on the context it wakes up in.
Here’s how it bridges the two realities:
-
When in
server.py
(Reality #1): Thefast_app
factory calls the constructor and passes in the live database objects (pipeline_table=pipeline
,db=db
). Thedb_path
argument is left asNone
. The constructor sees this, and its logic follows theelse
block: it simply assigns the objects it was given, behaving exactly as it always has. No changes are needed for the server to function. -
When in a Notebook (Reality #2): You will use
import pipulate as pip
. Your refactoredpipulate/__init__.py
will act as its own mini-factory. It will instantiate thePipulate
class and, crucially, it will pass it adb_path
argument it intelligently discovers.Now, the constructor sees that
db_path
is notNone
and itsif db_path:
logic kicks in. It takes on the role of the carpenter:- It doesn’t need the original blueprints (the table schemas in
server.py
). - It uses
fastlite.Database(db_path)
to connect directly to the existing SQLite file—the “house” that the architect already built. - It then asks
fastlite
for handles to the tables it needs (db_conn.t.store
,db_conn.t.pipeline
).fastlite
inspects the database file, sees the tables are already there, and hands back the tools to work with them.
- It doesn’t need the original blueprints (the table schemas in
This dual-mode functionality is the key to transposing the database feature. The Pipulate
“magic wand” is no longer just a passive tool; it has been given the intelligence to find its own power source when it’s not being wielded directly by the magician in server.py
. This makes the dream of a simple, powerful, and persistent workflow engine inside Jupyter a tangible reality.
Me: A chameleon constructor, huh? Fascinating! So am I doing something
creative and outside the box here? I’m just learning these things myself,
generally tracing Jeremy Howard’s decisions in both his nbdev
project and
FastHTML. He’s the first one who pulled back the curtains of the Python packing
priesthood revealing the infuriatingly idiomatic batteries might be included but
indoctrination is emphatically required function of __init__.py
… in THIS
context, because I believe there are even other things about in other
situations.
Dive into that a little bit. Be an instructor of what makes a constructor a chameleon constructor, please and thank you.
Gemini: Of course. Let’s dive into that.
“Chameleon Constructor” is a fantastic and intuitive term for what you’re building. While it’s not a formal “design pattern” with a capital P you’d find in a textbook, it perfectly describes a powerful and very Pythonic technique. You’re absolutely doing something creative and elegant here.
Let’s act as instructors and break down exactly what a Chameleon Constructor is and why it’s so useful.
Instructor Mode: The Chameleon Constructor Explained
A Chameleon Constructor is an __init__
method on a class that intelligently adapts its initialization logic based on the context in which it’s being created.
Like a chameleon changing its color to match the environment, this constructor inspects the arguments it receives (or the environment it’s running in) and follows a different internal setup path to produce an object that is perfectly adapted to that context.
The most important outcome is that, despite the different internal setup, the final object’s public interface remains consistent. The user of your pip
object doesn’t need to know how it connected to the database, only that methods like pip.set()
and pip.get()
just work.
The Anatomy of a Chameleon Constructor
Your Pipulate.__init__
method is a perfect case study. It has three key components:
1. The Trigger: This is the decision point. It’s the condition the constructor checks to determine which “environment” it’s in.
- In your code: The trigger is the presence of the
db_path
argument.def __init__(self, ..., db_path=None): ... if db_path: # We're in Notebook Reality! else: # We're in Server Reality!
This is a clean and explicit way to do it. The constructor asks, “Was I given a simple file path, or was I given fully-formed database objects?”
2. The Branches: These are the different sets of logic for each environment. A chameleon has different colors; your constructor has different initialization paths.
-
Branch A (The “Notebook Reality”): This is the
if db_path:
block.- Responsibility: It actively creates its own resources.
- Action: It takes the simple
db_path
string, callsfastlite.Database(db_path)
to establish a new connection, and gets its own table handlers. It behaves like a self-sufficient pioneer.
-
Branch B (The “Server Reality”): This is the
else:
block.- Responsibility: It passively accepts pre-built resources.
- Action: It takes the fully-formed
pipeline_table
anddb
objects that thefast_app
factory already created and just assigns them toself
. It behaves like a component in a larger machine.
3. The Unified Interface: This is the magic that makes the pattern so powerful. Regardless of which branch was executed, the end result is the same: the Pipulate
instance has a self.pipeline_table
and a self.db
that are ready for use.
The complexity is completely encapsulated. The rest of your Pipulate
class methods don’t need if/else
checks everywhere; they can simply trust that self.db
exists and will work.
Why is this “Outside the Box”?
In many other programming paradigms (especially stricter ones like Java or C#), this would be considered “magic.” You would typically be expected to have two different classes or use a separate factory pattern. Python’s flexibility and dynamic nature make this elegant, all-in-one solution possible and even idiomatic.
The Special Powers of __init__.py
You are right to be fascinated and slightly infuriated by __init__.py
. It’s one of those uniquely Python things that is incredibly powerful once you understand its purpose.
When you write import pipulate
, Python doesn’t just look for a pipulate.py
file. It first looks for a directory named pipulate
that contains an __init__.py
file. The presence of that file is what tells Python, “This directory is a package.”
The code inside __init__.py
is executed the first time any part of that package is imported. This gives it several special roles:
- Package Marker: In the old days, it could be an empty file. Its mere existence was enough.
- Public API Definition: It’s the “front door” of your library. You can use it to expose only the specific classes and functions you want users to access, creating a clean namespace. This is what you do when you write
from .core import Pipulate
. You’re “promoting” thePipulate
class from a submodule to the top-levelpipulate
package. - Package-Level Initialization: This is the advanced magic you’re tapping into. Because code in this file runs on import, you can perform setup tasks. Your plan to create a single, shared instance of your
Pipulate
class and make it available aspip
is a perfect and powerful use of this feature.
You are essentially telling Python: “When someone imports pipulate
, don’t just give them the blueprints. Follow these instructions, build the machine for them, and hand them the ready-to-use tool named pip
.” This is exactly what beloved libraries like pandas
and numpy
do to give you pd
and np
, and it’s a hallmark of a user-friendly library.
Unearthing the “Why”: A Research Brief on Python Packaging
Me: Okay, that’s all making perfect sense to me but let’s say I was a deep student of technology history and I wanted to conduct research into the background of all these Python idiosyncrasies that may indeed be idiomatic but are actually infuriating to the uninitiated, even if a Pythonista but of the non-packaging variety just getting up to speed on Twine, not to mention the colorful history of setup, complex pre-compiles and wheels! Oh you can toss in including Jeremy Howard’s mission to make Notebooks a legit development platform. I’m asking for the research query and not the direct answers. Set the stage for getting back some wacky Python history lessons on why PyPI is what it is today.
Gemini: Excellent. You’re asking for the map to the treasure room, not the treasure itself. This is the perfect mindset for a deep student of technology. Understanding the why behind these idiosyncrasies is the key to mastering the ecosystem and predicting its future. The history of Python packaging is a fascinating, messy, and very human story of smart people trying to solve a hard problem, often with conflicting philosophies.
To set the stage for your research, here is a curated set of search queries and concepts, designed to pull you into the historical debates, the painful “before” times, and the design decisions that led to the Python packaging landscape of today.
A Research Brief for the Python Packaging Archaeologist
Your goal is to uncover the narrative behind Python’s packaging evolution. These queries are designed to find blog posts, conference talks, official documentation (PEPs), and heated mailing list discussions that tell that story.
1. The Genesis: distutils
, setuptools
, and the “Magic” of setup.py
This is where the story begins. You’ll want to understand the initial problem and the first major schism.
"python distutils vs setuptools history debate"
"the problem with distutils"
"why was setuptools created easy_install"
"python setup.py run arbitrary code security"
"history of python eggs .egg format"
What you’ll discover: The original distutils
was part of the standard library but was too limited. setuptools
was created as a powerful (and controversial) third-party extension that monkey-patched distutils
to add features like dependency management (easy_install
, the precursor to pip
) and a new distribution format (.egg
), which was a direct ancestor of wheels. The fact that setup.py
was a script that could run any Python code was both its greatest strength and its most dangerous flaw.
2. The __init__.py
Conundrum: Making a Directory a “Package”
You rightly pointed out the idiomatic but opaque nature of this file. Its role has been debated and refined over time.
"python __init__.py original purpose guido"
"python implicit vs explicit namespace packages"
"PEP 420 implicit namespace packages"
"the role of __init__.py in python packages"
What you’ll discover: The original, simple purpose of __init__.py
was to be a marker, preventing directories with common names (like string
) from accidentally shadowing standard library modules. This evolved into a powerful hook for package initialization, namespace control (__all__
), and creating user-friendly APIs, leading to the “magic” you see today. PEP 420 later introduced a way to have packages without __init__.py
, adding another layer to the history.
3. The Rise of Wheels: Solving the Binary Distribution Nightmare
This was a watershed moment for Python. Before wheels, installing any package with C extensions was a nightmare on platforms without a compiler (i.e., most Windows machines).
"the pain of compiling python c extensions"
"python wheels vs eggs what problem did wheels solve"
"PEP 427 wheel binary package format"
"manylinux standard for python wheels"
What you’ll discover: The story of .whl
(wheel) files is a story about pragmatism. It solved the problem of forcing end-users to have complex build toolchains on their local machines. You’ll learn about the bdist_wheel
command and the heroic effort behind the manylinux
standard, which created a portable Linux binary that could “run everywhere,” finally making scientific and data analysis packages (like NumPy) easy to pip install
.
4. The Great Schism: From setup.py
to pyproject.toml
This is the most recent and significant revolution, moving from an imperative script (setup.py
) to a declarative configuration file (pyproject.toml
).
"the end of setup.py"
"PEP 518 pyproject.toml introduction"
"PEP 517 build backends python flit poetry"
"declarative vs imperative python packaging"
"why is pyproject.toml better than setup.cfg"
What you’ll discover: setup.py
’s ability to run code meant you couldn’t even know a project’s dependencies without first running its setup.py
file—a classic chicken-and-egg problem. pyproject.toml
(introduced by PEP 518) solved this by creating a static, declarative file where build dependencies could be listed. This, combined with PEP 517’s concept of “build backends,” broke the monopoly of setuptools
and paved the way for modern tools like Poetry
and Flit
.
5. The Publishing Layer: The Importance of Twine
You mentioned Twine, and its creation is a direct result of the security flaws in setup.py
.
"why use twine instead of setup.py upload"
"python setup.py upload security vulnerability"
"history of twine pypi"
What you’ll discover: The old method, python setup.py upload
, was insecure because it often involved putting credentials directly in a .pypirc
file or, even worse, executing code during the upload process. Twine
was created as a standalone, secure utility focused solely on interacting with package indexes like PyPI, cleanly separating the building of a package from the publishing of it.
6. The Notebook Revolution: Jeremy Howard’s nbdev
Philosophy
This is a parallel history, but one that deeply informs your own work. Jeremy Howard wasn’t just building a tool; he was proposing a whole new way of developing software.
"Jeremy Howard literate programming nbdev"
"the problem with jupyter notebooks for software development"
"nbdev sync notebooks to python modules"
"fastai nbdev philosophy documentation tests"
What you’ll discover: Jeremy Howard’s central thesis is that the separation between code, its documentation, and its tests is artificial and harmful. nbdev
is his tool for a “literate programming” approach where the Jupyter Notebook is the single source of truth. From the notebook, nbdev
can automatically generate the Python library modules, the documentation, and the tests. It’s a direct assault on the “hidden state” and reproducibility problems of notebooks, attempting to make them a first-class citizen in the world of serious software engineering.
By chasing down these threads, you will piece together the complete, colorful, and often frustrating history of why Python packaging is the way it is. Happy digging!
Gemini Deep Research:
From Chaos to Convention: A Narrative History of the Python Packaging Ecosystem
Section 1: The Genesis of Distribution (1998-2003)
The story of Python packaging is a chronicle of necessity, ingenuity, and the gradual, often painful, forging of standards from a landscape of ad-hoc solutions. In the earliest days of Python, sharing code was a fundamentally manual process, reliant on little more than archiving source files and providing instructions for where to place them. The first significant step toward a standardized infrastructure was the creation of the site module in 1996, which established the concept of a common directory—site-packages—where third-party libraries should reside.1 This simple convention laid the conceptual groundwork for an entire ecosystem, but the mechanisms for populating that directory were still undefined.
The distutils Mandate: Standardization Through Execution
The need for a more robust and uniform method of distributing and installing code led to the formation of the Distutils Special Interest Group (SIG) in late 1998.2 From this group’s efforts, spearheaded by Greg Ward, emerged
distutils, a foundational library that would define the core paradigms of Python packaging for nearly two decades. In 2000, distutils was integrated into the Python standard library with the release of Python 1.6, cementing its status as the official, sanctioned tool for the job.2
The central innovation of distutils was the setup.py script. By convention, this file, placed at the root of a project, contained a call to a setup() function, which accepted arguments defining the project’s metadata—such as its name, version, and the packages it contained.2 This script was not merely a static configuration file; it was an executable Python program. This design choice was both pragmatic and, in retrospect, fateful. It provided package authors with the full power and flexibility of Python to define their build and installation logic. This was particularly crucial for the complex but essential task of compiling C extensions, a process
distutils was explicitly designed to support.2 A developer could run commands like
python setup.py sdist to create a source tarball or python setup.py install to place the package into the local site-packages directory.2
This decision to use an executable script for configuration, however, planted the seeds of future challenges. It established an imperative model where determining a package’s properties, including its dependencies, required executing arbitrary code. This approach, while natural for a scripting language, would later be recognized as a significant source of security vulnerabilities and tooling complexity. The very flexibility it offered meant that any tool wishing to inspect a package had to be prepared to run a piece of untrusted code downloaded from the internet—a practice that directly contradicts modern security principles.6 The convenience of the year 2000 would become the critical security flaw of the 2020s, a foundational problem that would take the community nearly two decades to fully address through a shift to declarative, static configuration formats.
The Centralization Imperative: The Need for a “Cheese Shop”
While distutils provided the how of packaging, it did not address the where. In its early days, the Python world lacked a central, trusted repository for discovering and downloading packages.3 This fragmentation hindered the growth of the third-party library ecosystem. To address this, the Catalog SIG was formed in 2000, with the initial goal of standardizing the metadata that any such catalog would need to index.2 This effort produced PEP 241 in 2001, which defined a standard for package metadata.2
With a metadata standard in place, the path was clear for a centralized index. In November 2002, Richard Jones authored PEP 301, “Package Index and Metadata for Distutils,” which formally proposed the creation of what would become the Python Package Index (PyPI).2 The core motivation was to provide a simple, official discovery mechanism for the growing number of Python modules. Hosting the index within the
python.org domain was seen as crucial for giving it an “air of legitimacy” that previous, unofficial cataloging efforts had lacked.3
In 2003, the Python Package Index was launched, and distutils was extended to allow packages and their metadata to be uploaded to it.2 In a nod to the culture of the early Python community, which was heavily influenced by the British comedy troupe, PyPI was affectionately nicknamed the “Cheese Shop.” This moniker is a direct reference to a famous Monty Python sketch, a piece of trivia that speaks to the playful and community-oriented spirit of the language’s development.2 The launch of PyPI marked the end of the genesis period; Python now had both a standardized mechanism for building packages and a central repository for sharing them.
Section 2: The Setuptools Revolution and Its Discontents (2004-2008)
The foundational layer established by distutils and PyPI was functional but incomplete. The most significant missing piece was a standardized way for a package to declare its dependencies on other packages. This critical shortcoming was the primary impetus for the next major evolutionary leap in Python packaging: the creation of setuptools.
Beyond distutils: The Problem of Dependencies
The distutils framework provided no mechanism for specifying that one package required another to function.1 This meant that installing a complex library often involved a frustrating manual process of identifying, downloading, and installing each of its dependencies in the correct order. In 2004, Phillip Eby released
setuptools as a powerful, third-party extension to solve this problem and many others.1
setuptools was not a replacement for distutils but an enhancement. It achieved its functionality by “monkey-patching” the standard library module, a technique where a library modifies another library’s code at runtime.13 This allowed
setuptools to extend the setup() function with a host of new arguments. The most important of these was install_requires, which allowed a developer to list a package’s dependencies as a simple list of strings.1 Beyond this crucial feature,
setuptools introduced a vast array of advanced capabilities, including support for namespace packages, optional dependencies (via extras_require), automatic discovery of packages in a source tree, and even the ability to find packages by scraping web pages.2
easy_install and the egg: A Flawed First Step in Automation
To leverage the new dependency metadata, setuptools introduced easy_install, Python’s first widely adopted, automated package installer.5 When a user ran
easy_install on a package, the tool would connect to PyPI, download the package, and then recursively find, download, and install all the dependencies listed in its install_requires field.1
easy_install also popularized the .egg distribution format. An egg was a binary distribution packaged as a single zip file, containing the project’s code, resources, and metadata.2 Eggs could be imported directly from their zipped format, which was intended to simplify deployment. However, the
.egg format was never formalized through a PEP and had significant technical drawbacks. It included version-specific .pyc files, which made eggs less portable across different Python versions, and its design made uninstallation notoriously difficult.1
A Fork in the Road: The distribute Schism
By 2008, setuptools had become an indispensable part of the Python development infrastructure. Its success, however, also became a liability. Development was heavily centralized around its creator, Phillip Eby, whose availability had decreased over time.2 As a result, the project’s development stalled. A growing backlog of bug fixes and feature requests from the community went unaddressed, with the most pressing issue being the lack of support for the upcoming Python 3.2
This situation created a classic open-source governance crisis. A critical piece of community infrastructure was dependent on a single point of failure. The community, needing to move forward, resorted to a standard, if contentious, solution: a fork. After attempts to open up the setuptools project itself failed, a group of prominent developers led by Tarek Ziadé forked the codebase in 2008 to create a new project named distribute.1
The explicit goal of distribute was to be a community-maintained, drop-in replacement for setuptools, allowing existing projects to switch over with minimal changes while benefiting from active maintenance and, crucially, progress on Python 3 support.2 The fork caused considerable friction and debate within the Python community but ultimately served its purpose by breaking the development logjam. The two projects would coexist for several years before the changes and community momentum from
distribute were eventually merged back into a revitalized setuptools project in 2013.5 This episode was a key catalyst in the eventual formation of the Python Packaging Authority (PyPA), a neutral governance body created to steward key packaging projects and prevent such a crisis from recurring.8
Section 3: The Modern Toolchain Emerges (2008-2012)
The setuptools era, with its powerful but complex and sometimes fragile tools, set the stage for a new wave of innovation. The period between 2008 and 2012 saw the emergence of a more modular and robust set of tools, each created as a direct response to the shortcomings of its predecessors. This “toolbox” approach, heavily influenced by the Unix philosophy of small, interoperable programs that do one thing well, formed the foundation of the modern Python developer experience.
Fixing Installation: The Birth of pip
While easy_install was a revolutionary step in automating dependency management, its day-to-day use was fraught with problems. It often installed packages as opaque, zipped .egg files, which were difficult to inspect. Its uninstallation capabilities were nonexistent, forcing users to manually delete files and edit path configuration files.14 Furthermore, because it installed packages as they were downloaded, a network failure could leave a system in a broken, partially-installed state.1
In 2008, Ian Bicking, a key figure in the Python packaging community, created pip (a recursive acronym for “Pip Installs Packages”) with the explicit goal of fixing these shortcomings.1
pip introduced several key innovations that made it a superior alternative:
- Reliable Uninstallation: pip tracked every file it installed, allowing it to provide a clean and reliable uninstall command.1
- Atomic Installations: It first downloaded all required packages into a cache before beginning the installation process. This ensured that a failed download would not result in a partially-completed installation.1
- Environment Reproducibility: It introduced the concept of requirements.txt files. This simple text file format allowed developers to list their project’s dependencies and their exact versions, providing a declarative and reliable way to replicate a development environment across different machines.1
- “Flat” Installs: Instead of using .egg archives, pip installed packages as regular directories within site-packages. This “flat” installation made it much easier for developers to browse the source code, debug issues, and understand what was installed on their system.1
The following table provides a direct comparison of the features and philosophies of easy_install and its successor, pip.
Feature | easy_install (The Monolith) | pip (The Toolbox) |
---|---|---|
Uninstallation | Not supported; requires manual deletion of files.22 | Fully supported via pip uninstall.1 |
Installation Atomicity | Installs packages as they download; can result in partial installs.1 | Downloads all packages first, then installs, preventing partial states.1 |
Environment Replication | No standardized mechanism.26 | requirements.txt files provide a simple, declarative standard.1 |
Package Format | Installs from and creates .egg archives, which can be opaque.22 | Installs packages “flat” as standard directories; does not install from eggs.1 |
Dependency Resolution | Basic, can struggle with complex or conflicting dependencies.22 | More advanced dependency resolution, with clearer output.1 |
Virtual Environments | Limited integration; requires activation before use.22 | First-class support; designed to work seamlessly with isolated environments.22 |
Core Philosophy | An integrated part of the setuptools suite, handling many tasks.14 | A focused, standalone tool for reliably installing and managing packages.1 |
Solving the Environment Problem: virtualenv
The proliferation of third-party packages created another significant problem: “dependency hell.” As projects grew, it became common for two different applications on the same system to require conflicting versions of the same library. Installing all packages into a single, global site-packages directory was no longer a scalable solution.
Ian Bicking also pioneered the solution to this challenge. His work on virtual-python and workingenv culminated in the release of virtualenv in 2007.2
virtualenv is a tool for creating isolated Python environments. Each environment contains its own Python interpreter and its own site-packages directory, completely independent of the system-wide Python installation and any other virtual environments.2 This allowed developers to manage dependencies on a per-project basis, eliminating conflicts and making development and deployment far more predictable. The combination of
pip and virtualenv became the de facto standard for Python development, a pairing so essential that the concept was eventually incorporated into the Python standard library as the venv module in PEP 405.8
A Superior Binary Format: The wheel
The final piece of the modern toolchain to fall into place was a replacement for the problematic .egg format. While eggs had served a purpose, their lack of a formal specification, inclusion of .pyc files, and difficult installation process made them a source of ongoing issues.1
In 2012, Daniel Holth proposed a new binary package format called “wheel” in PEP 427.1 The
wheel format was designed from the ground up to be a superior distribution format, addressing the key flaws of the egg.
Feature | .egg Format | .wheel Format |
---|---|---|
Standardization | No official PEP; behavior defined by setuptools implementation.15 | Officially specified in PEP 427, providing a clear standard for all tools.15 |
Installation Process | Can be importable at runtime; installation can be complex.15 | A pure distribution format; installation is a fast and simple file copy operation.15 |
Compiled Code | Handled binary distributions but with less standardized naming. | Designed for pre-compiled extensions, speeding up installation for users.15 |
Compatibility Tagging | Limited naming convention.15 | Rich filename convention with tags for Python version, ABI, and platform.15 |
Bytecode (.pyc) Files | Included .pyc files, making them Python-version specific.1 | Intentionally excludes .pyc files, allowing for “universal” wheels for pure-Python code.15 |
Metadata Directory | Used a non-standard .egg-info directory.15 | Uses the standard .dist-info directory, compatible with modern tooling.15 |
Wheels provided a faster, more reliable, and more interoperable way to distribute pre-built packages. For end-users, this meant that installing complex scientific packages with compiled C or Fortran extensions could be reduced from a lengthy, error-prone compilation process to a near-instantaneous download and copy. The introduction of the wheel format marked a significant step in the maturation of the Python packaging ecosystem, and PyPI quickly became populated with wheels for a vast number of packages. In August 2023, PyPI officially stopped accepting new egg uploads, marking the final end of the format’s reign.15
Section 4: Security, Standardization, and the Declarative Future (2013-Present)
With a robust set of community-developed tools like pip, virtualenv, and wheel in place, the focus of the Python packaging ecosystem shifted toward maturation. This era has been defined by two major trends: a concerted effort to address long-standing security vulnerabilities and a monumental push to replace imperative, ad-hoc configuration with formal, declarative standards. This work, largely coordinated by the Python Packaging Authority (PyPA), has culminated in a comprehensive architectural redesign of how Python projects are defined and built.
Securing the Gates: The Necessity of Twine
For many years, the standard method for publishing a package to PyPI was the python setup.py upload command, provided by distutils and setuptools. This simple command hid two critical vulnerabilities. First, older versions of Python would transmit a user’s PyPI username and password over unencrypted HTTP, making them susceptible to interception and theft.29 While this was eventually fixed in newer Python releases, the risk remained for anyone using an older, unpatched environment.
Second, the command tightly coupled the build process with the upload process. A developer would run a single command that would build the distribution artifacts and immediately upload them. This workflow provided no opportunity to inspect or test the exact files that were being published to the public repository, creating a risk of accidentally releasing a broken or incomplete package.29
To solve these problems, Twine was created as a standalone, dedicated utility for securely uploading packages to PyPI.32
Twine addresses the security and usability issues of setup.py upload in two key ways:
- Secure Connections: It enforces the use of verified HTTPS (TLS) for all communication with the package index, protecting user credentials from being sniffed in transit.29
- Decoupled Workflow: It is purely an upload tool. The developer first builds the distribution files (e.g., python -m build), creating sdist and wheel files in a dist/ directory. They can then inspect, validate, and test these exact files before using Twine to upload them (twine upload dist/*).30 This separation of concerns makes the release process significantly more robust and secure.
The pyproject.toml Reformation: From Execution to Declaration
The most profound shift in modern Python packaging has been the move away from the executable setup.py file to a static, declarative configuration file: pyproject.toml. This transition was not a single event but a multi-year process carried out through a series of landmark Python Enhancement Proposals (PEPs).
The core motivation was to solve the fundamental problems inherent in an executable configuration format. The most severe was the security risk of arbitrary code execution; simply to determine a package’s build dependencies, an installer like pip had to execute its setup.py script.7 This also created a logistical “Catch-22”: a tool could not know the dependencies required to run
setup.py without first running setup.py.33 This architectural flaw was resolved by the following PEPs:
- PEP 518 (2016): This PEP introduced the pyproject.toml file format (based on TOML) and defined a [build-system] table within it. This table allows a project to declaratively specify its build-time dependencies (e.g., requires = [“setuptools>=61.0”, “wheel”]). This broke the Catch-22, enabling tools to install the necessary build environment before attempting to execute any build logic.33
- PEP 517 (2017): With build dependencies now declaratively available, this PEP defined a standardized API for build backends. It specified a set of “hooks” (like build_wheel and build_sdist) that a build tool must provide. This decoupled the frontend (the tool invoking the build, like pip) from the backend (the tool performing the build, like setuptools). This finally broke the de facto monopoly of setuptools and allowed for the creation of alternative build systems like Flit and Poetry.34
- PEP 621 (2020): This was the final piece of the puzzle. It standardized a [project] table within pyproject.toml for specifying the vast majority of a project’s core metadata—name, version, description, dependencies, entry points, etc.—in a static, tool-agnostic format. This made it possible to define a complete package without needing a setup.py or setup.cfg file at all.34
As a result of this reformation, the old python setup.py <command> interface is now officially deprecated. The modern workflow involves using tools like python -m build to create distribution artifacts and twine to upload them, with all configuration residing in the declarative pyproject.toml file.38 The table below summarizes this monumental shift.
Configuration Task | The Old Way (setup.py/setup.cfg) | The New Way (pyproject.toml) | Rationale for Change |
---|---|---|---|
Project Metadata | Arguments to setup() in setup.py or sections in setup.cfg. | Static key-value pairs in the [project] table. | Clarity & Interoperability: A single, tool-agnostic standard for metadata.34 |
Dependencies | install_requires argument in setup.py. | dependencies array in the [project] table. | Static Analysis: Allows tools to determine dependencies without running code.37 |
Build Dependencies | setup_requires in setup.py (a flawed mechanism). | requires array in the [build-system] table. | Resolves “Catch-22”: Declaratively specifies the build environment before execution.33 |
Build Process | Defined by distutils/setuptools logic within setup.py. | Abstracted behind the PEP 517 API, specified by build-backend. | Flexibility: Enables the use of alternative build systems like Flit or Poetry.34 |
Uploading | python setup.py upload. | twine upload dist/*. | Security: Enforces HTTPS and decouples build from upload for verification.30 |
A Clash of Modern Philosophies: Flit vs. Poetry
The new standards laid out in PEPs 517, 518, and 621 enabled a renaissance in Python packaging tooling. Two projects in particular, Flit and Poetry, exemplify the different philosophies that this new, stable foundation made possible.
Flit was created with a philosophy of radical simplicity. Its motto, taken from the Perl community, is “Make the easy things easy and the hard things possible,” with Flit focusing entirely on the “easy things”.40 It is designed specifically for the common case of pure Python packages that do not require complex build steps like compiling C extensions. Flit requires minimal configuration in
pyproject.toml, can automatically discover package data, and handles both building and publishing directly, eliminating the need for separate tools like build and twine.21 It intentionally omits features for managing virtual environments or resolving complex dependencies, expecting the developer to handle those tasks manually.41 The existence and design of Flit were a major impetus for the creation of the PEP 517/518 standards.40
Poetry, in contrast, aims to be a comprehensive, all-in-one solution for Python project management. It represents a return to an integrated philosophy, but one built upon modern, declarative standards rather than the ad-hoc mechanisms of setuptools. Poetry manages a project’s virtual environment, provides a sophisticated dependency resolver that uses a poetry.lock file to ensure deterministic and reproducible builds, and handles the entire lifecycle of packaging and publishing to PyPI.43 It centralizes all project configuration in the
pyproject.toml file, seeking to provide a unified and intuitive workflow that replaces the need for separate pip, virtualenv, setuptools, and twine commands.44
Section 5: A Parallel Paradigm - Literate Programming with nbdev
While the mainstream Python packaging ecosystem has evolved along a path of separating concerns and then reintegrating them under formal standards, a parallel and radically different approach has been championed by Jeremy Howard of fast.ai. This approach, embodied in the nbdev library, is not merely a new packaging tool but a fundamental reimagining of the software development lifecycle itself, centered on the concept of literate programming within Jupyter Notebooks.
The Philosophy of Literate Programming
The concept of literate programming was first articulated by computer scientist Donald Knuth. He proposed a move away from writing programs in the order dictated by the computer, and toward a methodology where code is written in the order demanded by the logic and flow of human thought.47 In this paradigm, a program is conceived as a work of literature, an essay that explains its own logic, with executable code being just one part of the narrative. Documentation is not an afterthought but a first-class citizen, interwoven with the code it describes. Jupyter Notebooks, with their ability to mix rich text (Markdown), executable code cells, and output (visualizations, tables), provide a natural environment for this style of development.49
Jeremy Howard’s Mission: Notebooks as a Complete IDE
The mission of nbdev is to elevate the Jupyter Notebook from a tool primarily used for data exploration and analysis into a complete, robust, and highly productive Integrated Development Environment (IDE) for general-purpose software engineering.47 The core idea is to bridge the gap between the interactive, exploratory power of notebooks and the rigorous demands of creating high-quality, maintainable, and distributable software.
nbdev allows a developer to use a single environment—the notebook—for the entire software lifecycle. This includes writing the source code, authoring comprehensive documentation, creating and running unit tests, and managing the process of packaging and publishing the final library to PyPI and Conda.50 This integrated approach is designed to dramatically increase productivity by removing the friction of context-switching between different tools and file types (e.g., a text editor for
.py files, a separate document for .md files, and another file for test_*.py). By co-locating code, its documentation, and its tests, nbdev aims to make best practices like thorough testing and documentation a natural and seamless part of the development workflow, rather than a separate, often-neglected chore.47
The nbdev Workflow in Practice
nbdev achieves its goals through a set of clever mechanics and automations that transform a collection of notebooks into a standard Python package:
- Two-Way Synchronization: At its core, nbdev maintains a robust, two-way sync between the .ipynb notebook files and traditional .py source code files. This allows a developer to work in the notebook environment but also use a traditional IDE for tasks like refactoring or code navigation if they choose.47
-
Special Directives: Developers use special comments within notebook cells to control the build process. For example, a # export comment at the top of a cell indicates that its code should be exported to the library’s source files. Other directives can hide cells from the final documentation, mark them as tests, or provide other metadata.48 - Integrated Testing and Documentation: Tests are written as simple assertion statements in designated cells directly following the code they are testing. nbdev can extract and run all such tests in parallel. The notebooks themselves, with their mix of Markdown and code cells, are rendered into a searchable, hyperlinked documentation website, often hosted on GitHub Pages. This ensures that the documentation is always in sync with the code it describes.47
- Automated CI and Publishing: When a project is created with the nbdev template, it comes with pre-configured GitHub Actions for continuous integration. These actions automatically run all the tests on every push. The library also provides simple command-line tools to automate the entire process of building the package, cleaning the notebooks, generating the documentation, and publishing to PyPI.47
The project has continued to evolve, with nbdev2 representing a complete rewrite. A key strategic decision in this evolution was a partnership with Quarto, a powerful, open-source technical publishing system. This collaboration gave nbdev “superpowers,” allowing it to leverage Quarto’s sophisticated document conversion capabilities to support a much wider variety of output formats (websites, blogs, books, slides) and to integrate with any static site generator, transforming it into a more general-purpose literate programming framework.50
By automating the “tangling” of a literate program (the notebook) into the traditional artifacts of a software project (source files, documentation, tests), nbdev presents a compelling argument. It suggests that the long-held best practice of separating these concerns into different files may be a cognitive anti-pattern that creates unnecessary friction. nbdev reframes the central problem of software development from “how to package code” to “how to best author software systems,” with packaging becoming an automated, downstream byproduct of a more integrated and human-centric authoring experience.
Conclusion
The history of Python packaging is a compelling case study in the organic evolution of a complex technological ecosystem. It is a story not of a single, grand design, but of a continuous cycle of problem-solving, community-led innovation, and the gradual formalization of hard-won lessons into durable standards. Several key themes emerge from this narrative.
First is the persistent and productive tension between the official standard library and the faster-moving, more chaotic world of community innovation. The initial distutils provided a stable but limited foundation. setuptools exploded its capabilities but at the cost of complexity and fragility. pip and virtualenv then emerged from the community to solve the usability problems setuptools created. This pattern—where community tools pioneer solutions that are later adopted or standardized by the core ecosystem—has been a primary engine of progress.
Second is the undeniable macro-trend away from imperative, executable configurations toward declarative, static ones. The initial choice of setup.py was pragmatic, but its inherent security flaws and tooling complexities became an ever-larger liability. The multi-year, multi-PEP effort to replace it with pyproject.toml represents the ecosystem’s most significant architectural shift. This move to a declarative model is the grand synthesis of two decades of experience, simultaneously solving problems of security, dependency resolution, and tool interoperability, providing a stable foundation for the next generation of Python development.
Finally, the ecosystem’s evolution is cyclical. The journey began with a simple tool (distutils), moved to a complex, monolithic one (setuptools), fragmented into a modular “toolbox” (pip, virtualenv, twine), and is now seeing the rise of new, integrated tools (Poetry, Hatch) built atop the modern declarative standards. In parallel, visionary projects like nbdev challenge the very premises of this cycle, proposing that the future may not lie in better packaging tools, but in a fundamentally different, more integrated way of authoring software. For the student of technology, this history is a rich tapestry of technical trade-offs, community dynamics, and the relentless pursuit of a workflow that is at once powerful, reliable, and humane.
Works cited
- The brief history of package distribution in Python | by Kostya Nikitin | Medium, accessed October 2, 2025, https://medium.com/@knikitin/the-brief-history-of-package-distribution-in-python-5da35092964f
- History of packaging, accessed October 2, 2025, https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/history.html
- Python Package Index - Wikipedia, accessed October 2, 2025, https://en.wikipedia.org/wiki/Python_Package_Index
- How does setup.py actually work? : r/Python - Reddit, accessed October 2, 2025, https://www.reddit.com/r/Python/comments/2r6mwy/how_does_setuppy_actually_work/
- Differences between distribute, distutils, setuptools in Python - GeeksforGeeks, accessed October 2, 2025, https://www.geeksforgeeks.org/python/differences-between-distribute-distutils-setuptools-in-python/
- Malware Upload Attack Hits PyPI Repository - SecurityWeek, accessed October 2, 2025, https://www.securityweek.com/malware-upload-attack-hits-pypi-repository/
- The security flaw that Python developers should be aware of - Silicon Republic, accessed October 2, 2025, https://www.siliconrepublic.com/enterprise/python-package-security-flaw-setup-vulnerability-hack-developers
- Packaging PEPs | peps.python.org, accessed October 2, 2025, https://peps.python.org/topic/packaging/
- PEP 301 – Package Index and Metadata for Distutils | peps.python.org, accessed October 2, 2025, https://peps.python.org/pep-0301/
- blog.pypi.org, accessed October 2, 2025, https://blog.pypi.org/archive/2024/pages/2/#:~:text=We%20launched%20the%20Python%20Package,volunteer%20community%20kept%20it%20running.
- CheeseShop - Python Wiki, accessed October 2, 2025, https://wiki.python.org/moin/CheeseShop
- Pypi | Lightcast Skills Taxonomy, accessed October 2, 2025, https://lightcast.io/open-skills/skills/KS92GTXJN6TB9UD8AC9Y/pypi
- setuptools vs. distutils: why is distutils still a thing? - Stack Overflow, accessed October 2, 2025, https://stackoverflow.com/questions/25337706/setuptools-vs-distutils-why-is-distutils-still-a-thing
- How do we use easy_install to install Python modules?, accessed October 2, 2025, https://www.tutorialspoint.com/how-do-we-use-easy-install-to-install-python-modules
- Package Formats - Python Packaging User Guide, accessed October 2, 2025, https://packaging.python.org/en/latest/discussions/package-formats/
- Putting it all together: eggs, distutils, pip, easy_install, setuptools : r/learnpython - Reddit, accessed October 2, 2025, https://www.reddit.com/r/learnpython/comments/2nonca/putting_it_all_together_eggs_distutils_pip_easy/
- Origins of distribute_setup.py file in matplotlib - Stack Overflow, accessed October 2, 2025, https://stackoverflow.com/questions/28153479/origins-of-distribute-setup-py-file-in-matplotlib
- distribute - PyPI, accessed October 2, 2025, https://pypi.org/project/distribute/0.6.6/
- distribute - PyPI, accessed October 2, 2025, https://pypi.org/project/distribute/0.6.43/
- How do setuptools, distribute, and pip relate to one another? - Stack Overflow, accessed October 2, 2025, https://stackoverflow.com/questions/8550062/how-do-setuptools-distribute-and-pip-relate-to-one-another
- Project Summaries - Python Packaging User Guide, accessed October 2, 2025, https://packaging.python.org/key_projects/
- Why use pip over easy_install? - Codemia, accessed October 2, 2025, https://codemia.io/knowledge-hub/path/why_use_pip_over_easy_install
- Chapter 29 - Installing Packages — Python 101 1.0 documentation, accessed October 2, 2025, https://python101.pythonlibrary.org/chapter29_pip.html
- pip or easy_install? - Python - Reddit, accessed October 2, 2025, https://www.reddit.com/r/Python/comments/fmezd/pip_or_easy_install/
- pip vs easy_install - Python Packaging User Guide, accessed October 2, 2025, https://packaging.python.org/discussions/pip-vs-easy-install/
- Difference Between pip and easy_install in Python - GeeksforGeeks, accessed October 2, 2025, https://www.geeksforgeeks.org/python/difference-between-pip-and-easy_install-in-python/
- PEP 427 – The Wheel Binary Package Format 1.0 | peps.python.org, accessed October 2, 2025, https://peps.python.org/pep-0427/
- PEP 427 – The Wheel Binary Package Format 1.0 | peps.python.org, accessed October 2, 2025, https://peps.python.org/pep-0427
- twine 1.10.0 - PyPI, accessed October 2, 2025, https://pypi.org/project/twine/1.10.0/
- twine 1.2.0 - PyPI, accessed October 2, 2025, https://pypi.org/project/twine/1.2.0/
- twine 6.2.0 documentation, accessed October 2, 2025, https://twine.readthedocs.io/
- twine · PyPI, accessed October 2, 2025, https://pypi.org/project/twine/
- PEP 518 – Specifying Minimum Build System Requirements for …, accessed October 2, 2025, https://peps.python.org/pep-0518/
- The Basics of Python Packaging in Early 2023 - DrivenData Labs, accessed October 2, 2025, https://drivendata.co/blog/python-packaging-2023
- What is a PEP? - Python Developer Tooling Handbook, accessed October 2, 2025, https://pydevtools.com/handbook/explanation/pep/
- PEP 517 – A build-system independent format for source trees …, accessed October 2, 2025, https://peps.python.org/pep-0517/
- PEP 621 – Storing project metadata in pyproject.toml | peps.python.org, accessed October 2, 2025, https://peps.python.org/pep-0621/
- Is setup.py deprecated? - Python Packaging User Guide, accessed October 2, 2025, https://packaging.python.org/en/latest/discussions/setup-py-deprecated/
- Writing your pyproject.toml - Python Packaging User Guide, accessed October 2, 2025, https://packaging.python.org/en/latest/guides/writing-pyproject-toml/
- Why use Flit? — Flit 3.12.0 documentation, accessed October 2, 2025, https://flit.pypa.io/en/stable/rationale.html
- Flit - Python Developer Tooling Handbook, accessed October 2, 2025, https://pydevtools.com/handbook/reference/flit/
- Python Packaging with Poetry and Flit - Java Code Geeks, accessed October 2, 2025, https://www.javacodegeeks.com/2024/07/python-packaging-with-poetry-and-flit.html
- Navigating the Python Packaging Landscape: Pip vs. Poetry vs. uv — A Developer’s Guide | by Dimas Yoga Pratama | Medium, accessed October 2, 2025, https://dimasyotama.medium.com/navigating-the-python-packaging-landscape-pip-vs-poetry-vs-uv-a-developers-guide-49a9c93caf9c
- Poetry - Python Developer Tooling Handbook, accessed October 2, 2025, https://pydevtools.com/handbook/reference/poetry/
- Why (and how) I use poetry for python package management | by Rodrigo Lazarini Gil, accessed October 2, 2025, https://nbrgil.medium.com/why-and-how-i-use-poetry-for-python-package-management-44ddaf315257
- python-poetry/poetry: Python packaging and dependency management made easy - GitHub, accessed October 2, 2025, https://github.com/python-poetry/poetry
- Nbdev: A literate programming environment that democratizes …, accessed October 2, 2025, https://github.blog/developer-skills/programming-languages-and-frameworks/nbdev-a-literate-programming-environment-that-democratizes-software-engineering-best-practices/
- Literate Programming in Python using NBDev - Frederick Giasson, accessed October 2, 2025, https://fgiasson.com/blog/index.php/2023/08/30/literate-programming-in-python-using-nbdev/
- Jeremy Howard’s “I Like Notebooks” JupyterCon 2020 Talk Howard makes the case fo… | Hacker News, accessed October 2, 2025, https://news.ycombinator.com/item?id=25538454
- nbdev+Quarto: A new secret weapon for productivity – fast.ai, accessed October 2, 2025, https://www.fast.ai/posts/2022-07-28-nbdev2.html
Me: And there you have it. Knowing the reason WHY for things is a big part of the battle and part of the dramatic tension of trying to future-proof yourself in the world of tech. We are both saddled by the decisions of ages past and empowered by them. Freezing some state in time so that you can built your processes, habits, and very muscle memory around them is a huge potential vulnerability if things change. And that’s why the concept of forging is always coming up in software development. Before GitHub there was SourceForge which captured this notion quite elegantly and whose continued existence helps make its own point, haha!
The Coda: Forging a Path Through Technical Debt
On a related note, I have to FORMAT this bloody research thing which Google Deep Research DOESN’T give in a convenient markdown format. What you can get from the “copy” feature directly in Canvas (or whatever that UI is) has all the important stuff like element tokens and line returns stripped out! To get good markdown you have to export it to Google Docs and download the markdown from there. And that’s just the beginning of the adventure to get it renderable under a static site generator system like Jekyll GitHub Pages used here. We start with these patterns:
Quick Reference: Optimized RegEx Cleanup Commands
Here are the optimized Vim commands. Note the use of .,$
for commands with the
c
(confirm) flag, allowing you to easily stop and resume the process from your
cursor position.
- I end them in
gc
instead ofgcc
so you have to confirm as you go. - If you hit
y
,y
,y
it’s safe to “hold down”y
and zoom through them if you feel compelled to do so because at the end it will only “yank” over and over on the last thing doing no harm until you pick up your finger. - There’s a still-developing art to this. It is VERY ORDER SENSITIVE to how you do this and what “visual selections” you do it on. Sometimes you need to select the tables and sometimes everything but the tables (pipe issues).
1. Remove Excessive Backslashes (Confirm & Resume)
Removes unnecessary backslashes before common Markdown punctuation.
:.,$s/\\\([\-+\[\]_#\*.]\)/\1/gc
2. Escape Pipes |
in Citation Titles (Confirm per Line)
Escapes pipe characters only on lines starting with number.
(typical for bibliographies), avoiding interference with Markdown tables.
:g/^\d\+\. /s/|/\\|/gc
3. Tag Ambiguous Inline Footnotes (Confirm & Resume)
Finds potential footnote numbers surrounded only by spaces, requiring manual confirmation.
:.,$s/\s\+\zs\d\+\ze\s\+/<sup>&<\/sup>/gc
4. Wrap Footnote Numbers in <sup>
Tags (Confirm & Resume)
Identifies numbers likely used as footnotes near punctuation or line endings.
:.,$s/\.\zs\d\+\|)\zs\d\+\|:\zs\d\+\|\d\+\ze:\|\s\zs\d\+\ze\s*$\|\d\+\ze,/<sup>&<\/sup>/gc
A Note on Adding <sup>
Tags to Footnotes
And that last one regarding superscripting footnotes has a harrowing and
exhilarating skill associated with it. Start from where you want to start, use
the search/replace pattern and watch very carefully confirming as you go
really looking at what’s getting highlighted, skipping false positives like
years and version numbers and only answering y
to things that rally need to be
superscripted as footnotes. If you make a mistake, you can stop, undo the last
one and pick up where you left off.
Conclusion
Gemini: Ultimately, this deep dive into the history of Python’s packaging
ecosystem isn’t just an academic exercise; it’s a crucial part of the craft.
Understanding why a tool is frustratingly idiomatic is the first step toward
mastering it or, in many cases, building a better tool to abstract away its
sharp edges. The RegEx
cleanup commands above are a perfect microcosm of this
entire process: a pragmatic, custom-forged solution to bridge the gap between a
powerful system’s raw output and the refined result you need. It’s a testament
to the fact that future-proofing your skills isn’t about knowing every new
framework, but about having the foundational knowledge to wrestle any tool—no
matter its history—into submission to get the job done.
Me: The Funnel and the Fodder! I have to write an article about how there is always a large data-collection layer that is forward-only, write-only firehose mode. It’s the large mouth of the opportunity funnel which is fodder for mining and refining; vision and revision. It gets distilled down and in today’s AI era… oh… this is the beginning of my next post!
Book Analysis
Ai Editorial Take
This entry is a tour de force of ‘connective tissue’ thinking. It brilliantly demonstrates how a single development task—refactoring an application—can serve as a thread to pull on, unraveling the entire history of a programming ecosystem. The journey from a niche framework’s design to the decades-long evolution of Python packaging is both educational and inspiring. The ‘Chameleon Constructor’ is a memorable, teachable concept. This piece has the potential to be a seminal article on the importance of understanding technological history to be an effective modern developer.
Title Brainstorm
- Title Option: Forging the Chameleon: From FastHTML to Python Packaging History
- Filename:
forging-the-chameleon-python-packaging-history.md
- Rationale: Combines the unique, memorable concept (‘Chameleon Constructor’) with the entry’s two main pillars (FastHTML analysis and packaging history). ‘Forging’ connects to the recurring theme of craftsmanship.
- Filename:
- Title Option: The Grand Bargain: Why Opinionated Frameworks and Packaging History Matter
- Filename:
opinionated-frameworks-packaging-history.md
- Rationale: Focuses on the high-level concepts discussed. ‘The Grand Bargain’ is a strong hook from the text, capturing the trade-offs in software design.
- Filename:
- Title Option: From Chaos to Convention: A Developer’s Guide to Python’s Packaging Journey
- Filename:
python-packaging-chaos-to-convention.md
- Rationale: Directly uses the title of the generated research paper, framing the entry as an educational piece. It’s accurate but less personal than the other titles.
- Filename:
- Title Option: Future-Proofing with a Chameleon Constructor
- Filename:
future-proofing-chameleon-constructor.md
- Rationale: A highly technical and specific title that highlights the core engineering pattern and the author’s primary motivation, targeting a niche but expert audience.
- Filename:
Content Potential And Polish
- Core Strengths:
- Connects a specific, modern framework (FastHTML) to the deep, often-overlooked history of its underlying ecosystem (Python packaging).
- Documents a creative and practical software design pattern (the ‘Chameleon Constructor’) born from a real-world problem.
- Excellent use of AI as a collaborative partner, from code generation and analysis to historical research.
- The inclusion of the full research paper provides immense educational value, making this a cornerstone reference piece.
- Suggestions For Polish:
- The transition from the
list_articles.py
script tangent back to the main narrative is a bit abrupt. A short sentence could bridge the context more smoothly for the reader. - The ‘Gemini Deep Research’ section is dense. Consider adding a short, author-narrated summary or a ‘Key Takeaways’ box before it to prime the reader for the historical deep dive.
- The final section on RegEx, while a practical example of ‘forging,’ could be visually separated into a collapsible ‘Technical Appendix’ to avoid breaking the narrative flow of the conclusion.
- The transition from the
Next Step Prompts
- Based on the ‘Chameleon Constructor’ pattern detailed in the article, generate a standalone, commented Python script that demonstrates this principle in a simplified, generic example. The script should define a
ResourceConnector
class with an__init__
method that can either accept a pre-configured database object or a file path to create one itself, clearly showing the two initialization branches. - Using the provided Vim RegEx commands and the challenges described in formatting the research paper, write a Python script using the
re
module that automates this cleanup process. The script should take a Markdown file as input and produce a cleaned-up file as output, applying the same logic for escaping pipes in citations and wrapping footnotes in<sup>
tags.