Mike Levin SEO

Future-proof your technology-skills with Linux, Python, vim & git... and me!

Running Python on Google Sheets

The time comes in every Python data jockey’s career when the best way to enter or display data is through Google Sheets.

OAuth2 Login is The Biggest Challenge

There are many ways to go about logging into Google Services with OAuth2. My way is the easiest:

pip install ohawf

Okay, let’s get some barebones Python code together as my own go-to reference for this sort of work. First and foremost, you need to login which is no easy task:

import ohawf
import pandas as pd  # optional
from googleapiclient.discovery import build

creds = ohawf.get()

This will give a long gobbledygook link that you copy/paste into a browser to walk you through Web-based Google OAuth2 authentication which will give you a token that you copy/paste back into the place you ran it from, probably a Jupyter Notebook. Next, you create a Google Sheets serice.

service = build("sheets", "v4", credentials=creds)
doc = "[Insert document key here]"  # Found on the GSheets URL
sheet = service.spreadsheets()

You’re welcome. You will find sample code on the Internet that seems to intentionally obfuscate this process. I don’t know why. Perhaps it is some sort of right of passage Google is making you go through. But if you’re lucky enough to find my site, I can reduce the pain pretty significantly. Once you have a sheets object created, you can define a range that includes the Tab name and select a rectangular region (the range) out of the GSheet:

tabname = "For Example"
arange = "A2:D1000"
result = sheet.values().get(spreadsheetId=doc, range=f"'{tabname}'!{arange}").execute()
columns = "Column Names Here".split()
df = pd.DataFrame(result["values"], columns=columns)  # Optional

Now you’re sitting on a Pandas DataFrame (df) that you can do stuff with. You couldn’t image the amount of framework-inventing I’ve done around this step, suffice to say Pandas and Python is all the framework you actually really need. You might use the Pandas apply function to do something with every row in the dataframe, or you might forego Pandas dataframes entirely. I only even introduce them here because SQL-like stuff such as joins become much easier when your list-of-lists exists as a DataFrame. But there are so many options here you could write a book on it. Anyhoo, let’s say you edited data in your dataframe and simply want to push it back into location. There’s ways built into Pandas, but we’re going to use standard Python and the GSheets API.

Now say you weren’t using Pandas DataFrames and you wanted to edit the data in the rectangular region you just selected. Well, result[“values”] is just a list of lists so you can step through it row by row and make a new one making whatever changes you want to along the way:

table = []
for arow in result["values"]
    newrow = [f"foo {x}" for x in arow]

The table we use here oddly can’t be a list of tuples or else the JSON conversion would fail. Instead, it likes the less memory-efficient construct of a list of lists, but hey, if it works it works. I showed a native Python example above, but if you’re using Pandas dataframes and have to go back to a list of lists for the Google Sheets API, here’s how to do it.

table = df.values.tolist()  # If you're using Pandas DataFrames

In either case, you’re eventually sitting on a list of lists variable called table and we can push it back into the same rectangular region (range) from which we selected it:

payload = {
    "value_input_option": "USER_ENTERED",
    "data": [{"range": arange, "values": table}],
request = sheet.values().batchUpdate(spreadsheetId=doc, body=payload).execute()

Simple, right? I don’t know why the Google Spreadsheets API Python examples don’t make it this easy. You pull a rectangular region out of a Google Sheet. You manipulate it. You push it back into that same rectangular region in the Google Sheet. There are other things you may wish to do, but this is the main workhorse.

Again, you’re welcome.