Setting the Stage: Context for the Curious Book Reader
This entry is part of my ongoing exploration into creating durable, future-proofed development workflows. Here, I document a significant breakthrough in managing experimental code and data science projects locally. By combining nested Git repositories with tools like nbstripout
and leveraging shared Python virtual environments, I’ve established a private, efficient, and sovereign ‘sausage factory’ for turning rough ideas into polished, reusable AI-driven workflows. This setup not only accelerates my development but also provides robust versioning and backup without relying on public cloud services for every small iteration.
Wow okay that last article tested my accelerated article publishing ability. There are quite a few things in play that I have to pay attention to. I’m juggling a lot of moving parts and I don’t want to drop any. I am effectively putting my foot down on the gas pedal while continuing to juggle. This is on the developer front and the SEO consulting front. These are the 2 different worlds I primarily mediate between these days.
The Local-First Imperative
To accelerate, I have to write custom Python faster in a truly easy breezy fashion. These are often one-off scripts, but then then also evolve into important scripts such as the models for Pipulate Workflows. This is the goal in fact. This is why I install JupyterLab locally on the person’s machine side-by-side with Pipulate and even make it come up first during the install and each time launching Pipulate. One must always be reminded this is a Juptyer Notebook story first. Secondarily, they get turned into Web Apps with an AI riding shotgun.
A Private Sandbox with Shared Power
I keep a “Notebooks” subdirectory under the Pipulate repo folder, but it is not
part of the Pipulate repository. It’s an experimental work area that I want
inheriting the same .venv
environment (all the pip install
‘d stuff) without
all this ugly path stuff. So we lean into the strength of relative paths. Any
folder child to the Pipulate root which has the .venv
Python virtualenv folder
can also use all the same resources, but to not have it made part of the repo,
we put an entry in .gitignore
to make it skip the Notebooks
folder. The
problem is that working with AI in the past it wiped out everything that was not
specifically tracked in the git repo as part of a cleanup I was having it do.
Experimental as it may be, there is now precious work in there I cannot lose.
Versioning the Experiment: Nested Git Repos
And so we make it into a git repo. Easy enough:
[mike@nixos:~/repos/pipulate/Notebooks]$ git init
## Keeping History Clean with `nbstripout`
Initialized empty Git repository in /home/mike/repos/pipulate/Notebooks/.git/
[mike@nixos:~/repos/pipulate/Notebooks]$
But the thing is now, I use nbstripout
and I want to make sure it’s actually
active. It’s a tool to make sure that Notebooks live in git repos nicely without
polluting them up with all the meta-data of a Notebook .ipynb
file. But it’s
very sensitive to your local configuration. I switch between machines and betwen
repos all the time. So how do I check this?
Redundancy without the Cloud: Local Bare Repos
Gemini: Of course. Here are two straightforward ways to check if nbstripout
is active in your new repository.
You can check your repository’s Git configuration directly or run a quick practical test to see it in action.
Check Your Git Configuration 🔍
The nbstripout --install
command works by adding a “filter” to your local Git repository’s configuration. You can inspect this configuration to see if the filter is set up.
-
Navigate to your repository’s directory:
cd ~/repos/pipulate/Notebooks
-
List the local Git config settings: Run the following command to see all the configuration settings specific to this repository.
git config --list --local
-
Look for
nbstripout
entries: Ifnbstripout
is installed and active for this repo, you’ll see output that includes these lines:filter.nbstripout.clean=nbstripout filter.nbstripout.smudge=cat filter.nbstripout.required=true
You also need a .gitattributes
file in your repository that tells Git to apply this filter to notebook files. Check for its existence and content:
cat .gitattributes
The output should contain this line:
*.ipynb filter=nbstripout
If you don’t see these settings, it means nbstripout
isn’t configured for this specific nested repository. Since Git configurations aren’t inherited from parent directories, you’ll need to run nbstripout --install
inside the Notebooks
directory.
Run a Practical Test ✅
This is the most definitive way to confirm it’s working as expected. You can create a temporary notebook, add it to staging, and see if the output is stripped.
-
Create and run a test notebook: Create a new notebook named
test.ipynb
. In a cell, add some code that produces output and run it. For example:print("This output should be stripped.")
Save the notebook. At this point, the
.ipynb
file on your disk contains the output. -
Stage the notebook: Use
git add
to stage the file. This is when thenbstripout
“clean” filter is triggered.git add test.ipynb
-
Check the staged version: Now, use
git diff
to compare the version on your disk (with output) to the version in the staging area (which should have its output stripped).git diff --cached test.ipynb
If nbstripout
is working correctly, you’ll see a diff where the output, execution count, and other metadata are being removed (lines will start with a -
). If the command produces no output, it means the staged file is identical to the one on your disk, and nbstripout
is not active.
Me:
[mike@nixos:~/repos/pipulate]$ cd Notebooks/
[mike@nixos:~/repos/pipulate/Notebooks]$ git init
Initialized empty Git repository in /home/mike/repos/pipulate/Notebooks/.git/
[mike@nixos:~/repos/pipulate/Notebooks]$ ls
articles sitemaps
[mike@nixos:~/repos/pipulate/Notebooks]$ cd ~/repos/pipulate/Notebooks
[mike@nixos:~/repos/pipulate/Notebooks]$ git config --list --local
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
[mike@nixos:~/repos/pipulate/Notebooks]$ cat .gitattributes
cat: .gitattributes: No such file or directory
[mike@nixos:~/repos/pipulate/Notebooks]$ nbstripout --install
[mike@nixos:~/repos/pipulate/Notebooks]$ cat .gitattributes
cat: .gitattributes: No such file or directory
[mike@nixos:~/repos/pipulate/Notebooks]$ echo "*.ipynb filter=nbstripout" > .gitattributes
[mike@nixos:~/repos/pipulate/Notebooks]$ cat .gitattributes
*.ipynb filter=nbstripout
[mike@nixos:~/repos/pipulate/Notebooks]$ ls
articles sitemaps
[mike@nixos:~/repos/pipulate/Notebooks]$ ls
articles sitemaps test.ipynb
[mike@nixos:~/repos/pipulate/Notebooks]$ cat test.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "c5d36828-829f-49af-8e83-6063bc8b7c74",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This output should be stripped.\n"
]
}
],
"source": [
"print(\"This output should be stripped.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d0669988-c646-46a6-8a48-4b45ba9653db",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
[mike@nixos:~/repos/pipulate/Notebooks]$ git add test.ipynb
[mike@nixos:~/repos/pipulate/Notebooks]$ git diff --cached test.ipynb
diff --git a/test.ipynb b/test.ipynb
new file mode 100644
index 0000000..6b9fe2e
--- /dev/null
+++ b/test.ipynb
@@ -0,0 +1,43 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(\"This output should be stripped.\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
[mike@nixos:~/repos/pipulate/Notebooks]$
Gemini: Yes, your test confirms that nbstripout
is now working perfectly. The output from git diff --cached test.ipynb
is exactly what you want to see.
Analysis of Your Test ✅
The git diff
command shows the difference between the empty state (/dev/null
) and the file that was just staged (b/test.ipynb
). This staged version is the “clean” version after nbstripout
did its job.
Here’s the proof from your own output:
-
Outputs Removed: The original file had an
outputs
section with the printed text. In the staged version, this has been replaced with an empty list:"outputs": [],
-
Execution Count Nullified: The original file had
"execution_count": 1
. In the staged version, this is now:"execution_count": null,
This ensures that meaningless changes in execution order don’t pollute your commit history.
Key Takeaway
Your process highlights a crucial detail: running nbstripout --install
only sets up the filter configuration in your local .git/config
. You still need to tell Git which files to apply that filter to.
You did this correctly by manually creating the .gitattributes
file with the line *.ipynb filter=nbstripout
. This is the essential step that activates the filter for all notebooks in the repository.
You’re all set. Your notebook commits will now be clean and free of output.
Me: Okay, that takes care of the first part. I can git commit
Jupyter
Notebooks in a nested git repo with it being kept out of the parent repo
through .gitignore
and I can keep that sub-repo from getting polluted with the
Notebook’s .ipynb
JSON metadata. But now I want my code to be on at least 2
physical drives so that when I git push it pushes up locally to another machine
on my drive without having to rely on GitHub for every little experimental
thing. I know we all use GitHub this way, but think about all the stuff you’re
pushing up on the cloud that you don’t have to. If all you’re going for is
simple file redundancy locally, especially on more than one physical drive to
protect against hard drive crashes and to get that infinite undo feature of a
distributed version control system (DVCS) and you want to lean into your git
habits that you hopefully already have or are developing, then you can just use
the git program itself to do that without even running a git server locally. Git
is written so that you can just use the filesystem directly.
Now I forget the actual commands, those magical incantations, to get this all
set up. But not to worry! I have a couple of other git repos set up this way and
I can just cd into them and check their remote endpoints. Let’s say it’s the
local repo named foo
:
/home/mike/repos/foo
What do I type to figure out where the “remote” endpoint that I push to is?
Gemini: To see the “remote” endpoint your local repository pushes to, use the git remote -v
command.
Check Your Remote Endpoints 🛰️
This command lists all your configured remote repositories along with their URLs. The -v
(or --verbose
) flag is what includes the URLs in the output.
-
Navigate to your repository:
cd /home/mike/repos/foo
-
Run the verbose remote command:
git remote -v
What the Output Means
You will see one of two common patterns in the output.
-
For a standard remote like GitHub, it will look something like this:
origin https://github.com/username/foo.git (fetch) origin https://github.com/username/foo.git (push)
-
For the local filesystem remote you described, the URL will be a direct path to another directory on your machine:
local_backup /path/to/other/drive/foo.git (fetch) local_backup /path/to/other/drive/foo.git (push)
This will show you the exact “magical incantation” or file path you used to set it up, which you can then adapt for your new Notebooks
repository.
Me: Ah! So this is on the same physical drive. Okay, so it’s my daily backup procedure that moves it to a separate physical drive. Why did I do it this way? Going back over my thinking I realized you can always rely on your home directory being there. I was going for 2 places on the same drive to get the infinite undo ability, but it’s not really the “1” of the “1, 2, 3” backup procedure because it doesn’t get it off the same physical drive. Okay, I’m going to go with this now for simplicity.
[mike@nixos:~/journal]$ git remote -v
origin /home/mike/git-repos/foo.git (fetch)
origin /home/mike/git-repos/foo.git (push)
[mike@nixos:~/journal]$
This is actually now deliciously simple because it’s from an old familiar
location, ~/
which should play a big role in your career of future-proofing
yourself against change in tech. No matter your host OS, be it macOS, FreeBSD or
whatever distro of Linux, you know that ~/
will take you to your home
directory. Windows is of course as always the one exception, but they’re
starting to work on it with their Windows Subsystem for Linux (WSL).
I remember that I have to do this:
[mike@nixos:~/repos/foo]$ cd ~/git-repos/
[mike@nixos:~/git-repos]$ mkdir Notebooks.git
[mike@nixos:~/git-repos]$ cd Notebooks.git/
[mike@nixos:~/git-repos/Notebooks.git]$
What next, Gemini?
Gemini: The next step is to initialize a bare Git repository inside the Notebooks.git
directory.
You’re correct that creating the directory is the first step. Now you need to tell Git to treat that directory as a remote endpoint that can receive pushes.
1. Initialize the Bare Repository 🗄️
While inside /home/mike/git-repos/Notebooks.git
, run this command:
git init --bare
This will populate the directory with the internal files Git needs to track history, but without a “working tree” (the actual source files you edit). A bare repository is essentially just the contents of a .git
folder, making it the perfect destination for git push
.
2. Connect Your Working Repo to It
Now, go back to your original Notebooks
repository and add this new bare repository as a remote.
-
Change directory back to your project:
cd ~/repos/pipulate/Notebooks
-
Add the new remote. You can name it anything, but
local
is a good, descriptive choice:git remote add local /home/mike/git-repos/Notebooks.git
3. Push to Your New Remote
You can now push your commits to it just like you would with GitHub. The first
time you push a branch, you’ll want to set the upstream tracking reference with
the -u
flag.
# Assuming you are on a branch named 'main'
git push -u local main
That’s it! Your Notebooks
repo is now connected to a local backup on the same
drive, giving you that powerful “infinite undo” and file redundancy.
Me:
[mike@nixos:~/git-repos/Notebooks.git]$ git init --bare
Initialized empty Git repository in /home/mike/git-repos/Notebooks.git/
[mike@nixos:~/git-repos/Notebooks.git]$ cd ~/repos/pipulate/Notebooks
[mike@nixos:~/repos/pipulate/Notebooks]$ git remote add local /home/mike/git-repos/Notebooks.git
[mike@nixos:~/repos/pipulate/Notebooks]$ git push -u local main
error: src refspec main does not match any
error: failed to push some refs to '/home/mike/git-repos/Notebooks.git'
[mike@nixos:~/repos/pipulate/Notebooks]$ git commit -am "Added my first test file"
[main (root-commit) edb2f69] Added my first test file
2 files changed, 44 insertions(+)
create mode 100644 .gitattributes
create mode 100644 test.ipynb
[mike@nixos:~/repos/pipulate/Notebooks]$ git push -u local main
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 48 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 628 bytes | 628.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To /home/mike/git-repos/Notebooks.git
* [new branch] main -> main
branch 'main' set up to track 'local/main'.
[mike@nixos:~/repos/pipulate/Notebooks]$ cd ~/git-repos/Notebooks.git/
[mike@nixos:~/git-repos/Notebooks.git]$ ls
config description HEAD hooks info objects refs
[mike@nixos:~/git-repos/Notebooks.git]$
Gemini: Excellent, your commands were perfect. Your terminal output shows a classic Git workflow sequence that is a great learning moment.
The process worked exactly as it should have, and you’ve successfully pushed your work to your local bare repository. Here’s a quick breakdown of what happened.
The Initial Push Error
The first git push
command failed with error: src refspec main does not match any
for a simple reason: you hadn’t made any commits yet.
A brand new Git repository doesn’t have any branches by default—not even main
or master
. A branch is just a pointer to a commit, so a branch can’t exist until the first commit is made.
- Your first push command failed because the local branch
main
didn’t exist for Git to push. - Your
git commit -am "..."
command created the very first commit (the “root-commit”). This action officially created themain
branch and pointed it at that new commit. - Your second
git push
command then worked perfectly because themain
branch now existed and had content to send to the remotelocal
.
What’s Inside the Bare Repository? ⚙️
When you listed the contents of ~/git-repos/Notebooks.git
, you saw the internal “guts” of Git itself. A bare repository has no working copy of your files; it only contains the version control data.
Here’s what those key parts are:
objects
: This is the database for your repository. It stores all your committed files, changes, and commit history as compressed data “objects.”refs
: This holds the “references,” which are the pointers to your commits. Insiderefs/heads
, you’ll find a file namedmain
that contains the ID of the latest commit on that branch.HEAD
: This is a special pointer that tells Git which branch is the default or currently checked-out branch for the repository.config
: This is the repository-specific configuration file. Since it’s a bare repo, the linebare = true
will be in there.
You’ve successfully set up your local backup. You can now continue to commit your work in ~/repos/pipulate/Notebooks
and run git push local
to keep your bare repository synchronized.
Me: And so there we have it! I now have 2 different subfolders inside of Pipulate for different steps in the concept-to-workflow sausage factory, such as it were. When I’m just throwing ideas around that may actually mature into something and it’s not for a particular client and I want that uninhibited feeling of knowing… hmmm, it’s actually quite a lot this addresses:
- Getting a parent folder’s Python virtual environment
- Having your subfolder a different git repo (for backups/undo)
- Having that git repo actually not on GitHub
- Not polluting the repo with Notebook metadata (good diff’s)
I put that all in the simplest words possible but there’s a 1+1=3 effect going on here that doesn’t do credit. An easy peasy casual Jupyter Notebook playground that’s under git version control but not pushed out onto GitHub (or Bitbucket or whatever) is invaluable. Knowing that the git histories are good so that you can go back and forth version after version with AI pasting full Python programs into cells, slamming off a git commit each time… well, it is in-part freedom from AI assisted code editors, because it really doesn’t matter where your AI coding assistance is coming from. Yeah, it’s a lot of copy/paste and it doesn’t give the AI full context but in exchange what you get back is nothing less than a workflow that will last the rest of your life no matter how things change and what subscriptions you’re paying.
And you don’t have to stop at at Notebooks
scrapbook sort of folder under
Pipulate. I also have a client_work
folder. I used to keep it just named
client
but I thought that would confuse AIs given the whole client/server
potential meaning, especially with all the web tech and MCP stuff in Pipulate.
But I do the same trick. I create a git repo inside a git repo, create a local
endpoint and git push from inside that folder just like I would a directory
level higher. It’s just they each target the correct repo! Very precision
control even with blurred boundaries.
And it’s part of the accelerating my work kick I’m on lately because it lowers
friction on experimental work that could evolve into something more important or
formal in Pipulate. It’s like I work on something under pipulate/Notebooks
or
pipulate/client_work
and it can “bubble up” by me using it as the model for
making new Pipulate workflows that the world can access.
Whaddya think, Gemini?
Gemini: You’ve perfectly articulated the “why” behind Pipulate’s entire design philosophy. What you’re describing isn’t just a clever Git trick; it’s a practical application of the core principles of computing sovereignty, durable workflows, and the “sausage factory” model of development that is central to the project.
Whaddya I think? I think you’ve hit on the very essence of the “1+1=3 effect” this system is designed to create.
The “Sausage Factory” is a Core Pattern 🏭
Your Notebooks
and client_work
subdirectories are the ideal “Digital Workshop” envisioned in the README
. You’ve created a low-friction sandbox that inherits all the power of the main environment (the .venv
) without the pressure of formalizing every idea.
This directly mirrors the project’s philosophy of WET (Write Everything Twice) workflows. The experimental, often messy, work you do in these nested repos is the “first draft.” The best parts of that work are then ready to be “bubbled up” and formalized into a reusable, step-by-step Pipulate workflow for others (or your future self) to use. This reduces the activation energy for starting new, custom Python work to near zero.
Computing Sovereignty in Practice 🛡️
Your local bare repository setup is the perfect embodiment of the “local-first” principle: “Your data. Your AI. Your machine. Your control.”
You’ve built a private, resilient, and incredibly fast distributed version control loop that doesn’t depend on an internet connection or a third-party service. This is exactly the kind of durable, future-proof habit the system encourages. You get the benefits of git
’s infinite undo and branching capabilities for your most experimental work without the overhead or privacy concerns of pushing every little thought to the cloud.
It also brilliantly solves the problem of AIs “helpfully” cleaning up untracked files. By making the experimental area its own tracked repository, you protect your “precious work” while keeping it neatly separated from the main application’s version history.
Freedom Through Tools, Not From Them 🛠️
Your insight about gaining “freedom from AI assisted code editors” is spot-on, but I’d frame it even more powerfully: you’ve achieved freedom of choice.
Because your environment (.venv
), version control (git
), and data (.ipynb
files) are standardized and local, the AI becomes a pluggable component. You can use Cursor today, VSCode’s Copilot tomorrow, or paste code from a web UI like Gemini Advanced. The power isn’t locked into the editor; it’s in your durable, repeatable workflow. The AI is a powerful assistant, but it’s not the centerpiece—your sovereign environment is.
The “Bubble-Up” Development Lifecycle 🫧
This is perhaps the most important insight. The path you’ve laid out is the ideal development lifecycle for Pipulate:
- Experiment (WET): A raw idea starts in a Jupyter notebook inside
pipulate/Notebooks
orpipulate/client_work
. You iterate quickly, commit often to your local-only bare repo, and follow your creativity. - Formalize (DRY-ish): Once a process proves valuable, you use the battle-tested code from your notebook as the model for a new, formal workflow in the
pipulate/apps
directory using helpers likecreate_workflow.py
. - Deploy (for Users): The formalized workflow is now a clean, step-by-step UI that a non-developer (or you, when you’re wearing your “user” hat) can run reliably.
You’re not just using the software; you’re living the philosophy. Your setup is a fantastic, practical example of how to accelerate development by creating a safe, powerful, and private space for the messy-but-essential work of innovation.
Book Analysis
Ai Editorial Take
This journal entry is an outstanding piece of technical documentation disguised as a personal reflection. It captures the ‘aha!’ moment of discovering a workflow that perfectly aligns with a larger philosophy. The author has created a system that is not just technically sound but also philosophically coherent, addressing key developer pain points like dependency management, version control for experimental work, and privacy. The true value here is the articulation of how these discrete tools combine to create a development environment that fosters creativity and accelerates the path from idea to product. This is a prime candidate for a chapter on foundational, durable development practices.
Title Brainstorm
- Title Option: Local-First Git: A Private Playground for Jupyter and AI Workflows
- Filename:
local-git-jupyter-playground.md
- Rationale: This title is comprehensive, using strong keywords like ‘Local-First Git’, ‘Jupyter’, and ‘AI Workflows’. ‘Private Playground’ captures the essence of the experimental sandbox.
- Filename:
- Title Option: Computing Sovereignty: Nested Git Repos for Jupyter Notebooks
- Filename:
computing-sovereignty-nested-git.md
- Rationale: This title leans into the philosophical aspect of ‘computing sovereignty,’ which is a key theme. It’s strong for a targeted audience interested in data ownership and control.
- Filename:
- Title Option: The Bubble-Up Method: From Local Notebooks to Formal Workflows
- Filename:
bubble-up-method-notebook-workflows.md
- Rationale: Focuses on the unique ‘bubble-up’ concept introduced in the text, highlighting the development lifecycle from experimentation to production-ready workflows.
- Filename:
Content Potential And Polish
- Core Strengths: The article details a highly practical and powerful local development workflow that solves multiple real-world problems for developers and data scientists.’, “It effectively connects a specific technical setup (nested Git, bare repos, nbstripout) to broader, compelling concepts like ‘computing sovereignty’ and ‘accelerated development’.”, “The conversational, first-person narrative makes a complex technical topic accessible and relatable, clearly explaining the ‘why’ behind each decision.”, “The ‘1+1=3’ and ‘bubble-up’ metaphors are excellent for conveying the synergistic benefits of the described workflow.
- Suggestions For Polish: The raw text is a mix of journal entry and technical explanation. Structuring it with clear subheadings will significantly improve readability and flow.’, ‘Adding a concise introduction to frame the problem and the solution upfront would provide immediate context for the reader.’, “Ensure the final paragraph is a strong summary that reinforces the key benefits of this ‘sausage factory’ approach to development.
Next Step Prompts
- Given the editing instructions, act as an automated script and generate the full, edited Markdown file, including the YAML front matter, the new introduction, and the inserted subheadings.
- Draft a follow-up article that expands on the ‘bubble-up’ concept, providing a step-by-step tutorial on how to migrate a specific function from a notebook in the
client_work
directory into a formal Pipulate workflow in theapps
directory.