Beyond Spreadsheets: Mastering Data Pivots with Python

Pivoting is one of the most difficult but essential actions, whether in business or programming. I’m exploring the parallel between the quick, intuitive pivot of an athlete and the methodical data pivot of an analyst. My core argument is that we need to escape the ‘magical incantations’ of proprietary spreadsheet software and embrace the clarity and power of tools like Python Pandas to truly master this skill, making it repeatable and scalable.

By Mike Levin

Wednesday, September 17, 2025

Get Pipulate [View Markdown Source]

Setting the Stage: Context for the Curious Book Reader

In the world of data science, business, and even personal development, the ability to ‘pivot’—to change perspective and reframe a problem—is a critical skill. This essay explores the concept of pivoting, moving from the intuitive, momentary shifts we make in life to the structured, analytical pivots performed on datasets. It argues that traditional spreadsheet tools like Excel, while familiar, often obscure the core concepts and create dependencies. This entry makes the case for moving to more powerful, open-source tools like Python’s Pandas library, demonstrating how a clear mental model and repeatable code can unlock deeper insights and make the difficult act of pivoting second nature.

Technical Journal Entry Begins

Pivots are the hardest things to do. You have some momentum going in some direction and understand things from that perspective but then you have some critical key insight — or *think that you may have one — if you can only change directions and look at things in a new way. But you’re barreling along, sailing through the work accumulating more and more of a logfile-like train in your wake. You can’t stop to pivot. It’s like turning an oil tanker. Stopping and pivoting is for smaller, nimbler ships.

The Oil Tanker Problem: Why Pivoting is Hard

This is in life, business, programming and data science. You have some sort of rote mechanical reproducible reliable process down. You lean on it a lot. If you do this then that happens and you like that so you do more of this. This leads to that. And every time that is done it leaves some sort of mark or history — an entry in a log-file, a bank statement, a got history or a social media feed. That long running trail, that wake, is concrete evidence of your work.

Processes like that are hard to change. They’re hard to even know if you have to change because the trends are hidden from you. Should you pivot? It is both hard to know if you should, and to actually do so even if you did.

There’s even more to question here such as whether the important indicators of whether you need to change can be inferred from things only recently in your wake or whether you need the whole feed — your history and log-file back to the beginning. And even if you thought you did, there’s reasons this may not be practical because keeping records that far back and making them readily available for easy examination is not always easy or cheap. Time often erases. So focusing on more recent trends is often chosen.

I’ve been taking in generalities and mixing the concept of pivoting like a basketball player on the court with a data analyst in a spreadsheet. One is an intuitive instant decision. The other is jockeying data around to put it in a better format to see if you should make some decision. The concept of pivoting being applied in both cases suggests much about life. Pivoting intuitively and fast is a big advantage because people and AI think you’ve committed to one direction and their ability to predict and defend against you kid lowered. Pivoting breaks expectations and bucks tends.

From Intuition to Insight: Two Kinds of Pivots

Nvidia pivoted from gaming cards in home computers to massive GPU-farms in data centers for AI, which eventually led to their becoming the first trillion-dollar company. People using gaming cards for Bitcoin-mining likely showed up in a spreadsheet-style pivot of the first kind to make the business practice adjustments of the second kind. Doing the math on doing the math… ahaha! The example becomes very meta in the case of NVidia where the hardware helps to the spreadsheet style pivot calculations.

Being able to do pivots of the one sort makes doing pivots of the other sort easier. Choose a chunk of your log-file (or whatever) and turn it on a right-angle making your rows into columns and your columns into rows… pivoting, see? Most people don’t. It’s one of those mental-model intuitive leaps that’s worth playing around with until you get. Draw a tictactoe board on a piece of paper. Rotate that piece of paper from portrait mode to landscape mode. That’s a data pivot.

But if you have thousands of rows, that’s going to become thousands of columns and you’ll have all that horizontal scrolling and columns meaninglessly labeled with indexes like “AB, AC, AD… ZX, ZY, ZZ… AAA” and so on. This is just the start of the befuddlement of trying to do pivots in spreadsheet programs. You have to “collapse” more rows into fewer rows grouping them up by some criteria in the process of the pivot so the result is not a bloody mess and instead results in the kind of decision-making insights you’re seeking in doing such a fool maneuver.

And it’s not easy even in the software that’s specifically designed to make it easy. Oh, Excel jockeys will disagree and say “Ohhhh it’s so easy, here let me show you…” and then proceed with a rapid sequence of magical incantation hand gestures and clicking boxes and dragging fields with such rapidity that the deeper concepts can barely be understood, much less learned and rapidly repeated in automation — a big part of the goal here; deep understanding and repeatability WITHOUT a dependency on proprietary vendor products.

Why are proprietary vendor products such a bad dependency to have on your ability to pivot easily? First, because they cost money and executing has gone subscription, you may have access to one year but not another. You can literally be cut off from the ability to apply your skills by failing to pay. Plus if you do take a break from the cash register in the cloud for a few years or change jobs, between stints the user interface changes so much you have to relearn it every time. That’s not to mention the limitations of such spreadsheet tools on handling really big data without the need for next and the next and the next proprietary kooky dependency (power query, BI, etc).

A Better Way: The Power of Code-Based Pivots

The Excel and Google Sheets spreadsheet programs are not your friends. If you’re ready for life after Excel — ready to make that pivot to make making pivots easier — you’re ready for Python Pandas. The pedantic out there might be shouting it’s not just a Pandas world anymore because sometimes you optimize for rows (pandas) and sometimes you optimize for columns (polars) and that is totally true. But this is where I also pointed out that they are effectively API-compatible.

Huh? API compatibility this means is that you do not get stuck into that always having to scrap what you know and retrain every couple of years. When you learn Pandas you learn Polars wayyy more directly than the corollary comparison between Microsoft Excel and Google Sheets. Switching tools for subtle benefits (pivoting) is easier once you pivoted off of spreadsheet software. Learn once, know and apply forever. It’s all close enough that even your automatic muscle memory will hardly know the difference. And when it’s time to scale to big data, it’s just as easy. The interface barely changes.

The temptation at this point it to spell out the vastly clearer mental models of data-pivot under Python with Pandas and Polars versus Excel and Sheets. But if you’re here I assume you’re familiar with the trials and tribulations of pivots under GUI software. Let’s focus instead on what happens when thousands or millions of rows become columns. You have to lump-up, group or aggregate multiple rows into single columns. There’s different ways to do this but probably the clearest is to create an explicit list of which fields in a row becomes a column and what the function is that’s used to aggregate the together. It’s often “sum” or count. And a classic example is web log-files. This might be done to get a count of the most popular user-agents used browsing your site.

From Raw Logs to Actionable Summary

Gemini? Can you crank out an example that I can append here in the article?

Gemini: Of course. Here is a Python code example using the Pandas library that demonstrates the kind of pivot you described.

This example takes a simple, log-file-like dataset of web traffic and pivots it to create a summary table showing the count of different user-agents that accessed the site each day. It’s a classic example of turning a long list of events into a wide, insightful summary.

Python Pandas Pivot Example: Counting User-Agents

Here’s how you can take raw, log-style data and pivot it into a useful summary using Python’s Pandas library. This code is self-contained and can be run in any environment with Pandas installed.

The Code

import pandas as pd
import io

# 1. Simulate a raw log-file data stream (as a string)
# This represents the "long running trail" you mentioned.
log_data = """
date,ip_address,user_agent,status_code
2025-09-15,192.168.1.1,Chrome,200
2025-09-15,192.168.1.2,Firefox,200
2025-09-15,192.168.1.3,Chrome,404
2025-09-15,192.168.1.4,Safari,200
2025-09-16,192.168.1.1,Chrome,200
2025-09-16,192.168.1.5,Safari,301
2025-09-16,192.168.1.2,Firefox,200
2025-09-16,192.168.1.1,Chrome,200
2025-09-16,192.168.1.6,Chrome,404
"""

# Read the raw data into a Pandas DataFrame
# A DataFrame is essentially a powerful, in-memory spreadsheet.
df = pd.read_csv(io.StringIO(log_data))

print("--- Original Log Data (The 'Wake') ---")
print(df)

# 2. Perform the Pivot
# This is the core action. We are turning rows into a new, summarized table.
# - index='date': The unique values in the 'date' column will become the new rows.
# - columns='user_agent': The unique values in 'user_agent' will become the new columns.
# - values='ip_address': We'll use the 'ip_address' column to perform our calculation on.
# - aggfunc='count': The aggregation function. We're counting the occurrences.
# - fill_value=0: If a user_agent didn't appear on a certain day, show 0 instead of a blank space (NaN).
pivot_table = df.pivot_table(
    index='date',
    columns='user_agent',
    values='ip_address',
    aggfunc='count',
    fill_value=0
)

print("\n--- Pivoted for Insight ---")
print("Counts of each User-Agent per Day:")
print(pivot_table)

The Output

Running the code above will produce the following output:

--- Original Log Data (The 'Wake') ---
         date  ip_address user_agent  status_code
0  2025-09-15  192.168.1.1     Chrome          200
1  2025-09-15  192.168.1.2    Firefox          200
2  2025-09-15  192.168.1.3     Chrome          404
3  2025-09-15  192.168.1.4     Safari          200
4  2025-09-16  192.168.1.1     Chrome          200
5  2025-09-16  192.168.1.5     Safari          301
6  2025-09-16  192.168.1.2    Firefox          200
7  2025-09-16  192.168.1.1     Chrome          200
8  2025-09-16  192.168.1.6     Chrome          404

--- Pivoted for Insight ---
Counts of each User-Agent per Day:
user_agent  Chrome  Firefox  Safari
date                               
2025-09-15       2        1       1
2025-09-16       3        1       1

Why This Approach Works Well

Clarity and Intent: The code explicitly states what you are grouping by (index), what you are turning into new columns (columns), and how you are summarizing the data (aggfunc='count'). There are no hidden menus or sequences of clicks to remember.
Reproducibility: This code can be saved, shared, and re-run on new data with zero extra effort. It forms a reliable, automatable process.
Scalability: While this example has nine rows, the exact same code would work on a file with nine million rows without changing a single character. It scales directly in a way that spreadsheet programs often struggle with.

Book Analysis

Authors Imprint

Pivoting is one of the most difficult but essential actions, whether in business or programming. I’m exploring the parallel between the quick, intuitive pivot of an athlete and the methodical data pivot of an analyst. My core argument is that we need to escape the ‘magical incantations’ of proprietary spreadsheet software and embrace the clarity and power of tools like Python Pandas to truly master this skill, making it repeatable and scalable.

Title Brainstorm

Title Option: Beyond Spreadsheets: Mastering Data Pivots with Python
- Filename: mastering-data-pivots-with-python.md
- Rationale: This title is direct, benefit-oriented, and uses strong keywords (Pivots, Python, Spreadsheets). It appeals to a technical audience looking to level up their skills.
Title Option: The Art of the Pivot: From Life to Log Files
- Filename: art-of-the-pivot-log-files.md
- Rationale: This title is more conceptual and philosophical, connecting a real-world concept to a technical application. It’s great for attracting readers interested in the ‘why’ behind the code.
Title Option: Breaking Free from Excel: How Pandas Simplifies Data Pivoting
- Filename: pandas-simplifies-data-pivoting.md
- Rationale: This title targets a specific pain point (frustration with Excel) and offers a direct solution (Pandas). It’s highly practical and SEO-friendly for that specific audience.

Content Potential And Polish

Core Strengths:
- Connects a high-level, abstract concept (pivoting) to a concrete, practical technical skill.
- Provides a strong, opinionated argument against vendor lock-in and for open-source tools.
- The Nvidia example is a perfect, timely illustration of the business implications of data pivots.
- The dialogue with Gemini provides a ready-made, practical code example that grounds the entire essay.
Suggestions For Polish:
- Consider adding a small diagram or visual aid to illustrate the ‘rows to columns’ concept for readers who are completely new to it.
- Expand slightly on the Polars vs. Pandas point to clarify what ‘optimizing for columns’ means in a practical sense for a reader.
- Add a concluding ‘Key Takeaways’ bulleted list to summarize the core arguments for quick scanning.

Ai Editorial Take

This entry is an excellent foundation for a chapter on ‘Fundamental Data Operations’ or ‘Developing a Data-Centric Mindset.’ Its strength lies in its effective use of analogy, comparing the abstract difficulty of changing direction in life and business to the concrete action of reshaping data. The author’s voice is strong and persuasive, acting as a guide leading the reader away from frustrating GUI tools toward a more powerful, code-based approach.

The raw, conversational nature of the dialogue with an AI to generate the code example is a unique strength. It shouldn’t be polished away; rather, it should be framed as a case study in modern, AI-assisted development. It demonstrates not just what the code is, but how it can be generated and integrated in a real-world workflow, making the content authentic and highly relevant.

Next Step Prompts

Based on the provided article about data pivoting, generate a short, illustrative analogy to explain the difference between row-optimized dataframes (like Pandas) and column-optimized dataframes (like Polars) for a beginner.
Expand the provided Pandas pivot table example. Add a second pivot that uses a different aggregation function, such as ‘mean’ or ‘sum’ on a new numeric column, and explain the different insight it provides.