Google Search Console API Python Example: Trend Analysis

I saw a puzzling spike in my site’s GSC impressions and realized the standard interface lacks the granular, per-query/per-page data needed for real analysis. This article walks you through my process of using the GSC API and Python, including service account setup, to fetch detailed daily performance data. I show how to combine this data over several days to calculate trend slopes using linear regression, revealing the specific ‘movers and shakers’ – keywords and URLs experiencing sudden shifts – providing actionable insights you can’t easily get otherwise.

By Mike Levin

Thursday, April 10, 2025

Get Pipulate [View Markdown Source]

Something’s up in my personal test-site’s Google Search Console (GSC) — and that something is impressions. Looking at a hockeystick curve like this makes you wonder what’s the story? There is a per-keyword / per-URL story here — an analysis every SEO should be able to do: identifying sudden movers & shakers using steep-slope linear regression. That’s sortable sparklines to you Excel wonks. I’ll try to walk you through it gently.

You Need More Granular Data Than GSC Lets You Download

How many times have you wanted to download more than 1000 pages or queries from the GSC web user interface, and said “oh, you can do that with the API,” but then never really done it? This is the page resource to make it as easy as reasonably possible for you. And it’s worse than getting more than 1000. It’s not having to switch between queries and pages tabs on the final download. It’s having per-query/per-page in the same table.

Per Keyword Per URL

It’s a little tough to wrap your mind around at first. But every URL (page) has a list of keywords (queries) that lead to it. Therefore, there is an association between keywords and pages that gets lost using the normal GSC Web UI download feature. You lose the granularity required to do this sort of analysis. Getting the data through the API is the only way.

The crucial first step is setting up authentication. It’s a question of throwing you down the rabbit hole abyss of OAuth authentication versus a service account.

GSC Authentication: GCP Service Account

The most common and relatively easy method for GSC authentication, especially for automated scripts, is using a Service Account. This involves creating a special account in Google Cloud Platform (GCP) and downloading a key file that your script will use to prove its identity.

Steps to Get Your Service Account Key File and Grant Access:

Enable the API in GCP: Go to the Google Cloud Platform console, find your project (or create one), and enable the “Google Search Console API”.
Create a Service Account:
- Navigate to “IAM & Admin” > “Service Accounts”.
- Click “Create Service Account”.
- Give it a name and description.
- Copy the generated email address (you’ll need this later).
- Grant it a role (the article uses “Owner”, though other roles might suffice).
Generate and Download the Key File:
- Find your newly created service account in the list.
- Click the three dots menu (…) next to it and select “Manage keys”.
- Click “ADD KEY” > “Create new key”.
- Choose JSON as the key type and click “Create”.
- A .json file will download to your computer. This is your service account key file – keep it secure!
Grant Permissions in GSC:
- Go to your Google Search Console property.
- Navigate to “Settings” > “Users and permissions”.
- Click “Add user”.
- Paste the service account email address you copied earlier.
- Set the permission level to “Full” access.
- Click “Add”.

With the downloaded .json key file and the permissions granted in GSC, your script can now authenticate and interact with the GSC API.

Getting Most Recent Date With Data in GSC

If you made it this far, then you’ve gotten over the biggest hurdle. You can work in either a Jupyter Notebook like Google Colab or the one that comes with Pipualte, or you can work in a plain old Python file like I’m doing in this example, gsc_top_movers.py. The completed file can be found in the Pipulate project at pipulate/precursors/gsc_top_movers.py, but for this article, we are going to build it up step-by-step. I mean, they’re going to be pretty big steps. Step 1:

#!/usr/bin/env python
# coding: utf-8

"""
Finds the most recent date for which Google Search Console (GSC) data
is available for a specified site using a service account.

Required pip installs:
    pip install google-api-python-client google-auth
"""

import os
import sys
import time
from datetime import datetime, timedelta

from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# --- Configuration ---

# Set your GSC Property URL here (e.g., "sc-domain:example.com" or "https://www.example.com/")
# IMPORTANT: Make sure the service account has access to this property!
SITE_URL = "sc-domain:mikelev.in"  

# Path to your service account key JSON file
# Assumes the key file is in the same directory as the script. Adjust if needed.
SERVICE_ACCOUNT_KEY_FILE = os.path.join(os.path.dirname(__file__), 'service-account-key.json')

# Required Google API scopes
SCOPES = ['https://www.googleapis.com/auth/webmasters']

# GSC data typically has a delay. Start checking this many days before today.
START_OFFSET_DAYS = 2

# Maximum number of past days to check before giving up
MAX_DAYS_TO_CHECK = 10

# Delay between API checks (in seconds) to respect rate limits
API_CHECK_DELAY = 0.5

# --- End Configuration ---


def authenticate_gsc():
    """Authenticates with Google API using service account credentials."""
    if not os.path.exists(SERVICE_ACCOUNT_KEY_FILE):
        print(f"Error: Service account key file not found at: {SERVICE_ACCOUNT_KEY_FILE}")
        print("Please ensure the file exists and the path is correct.")
        sys.exit(1)

    try:
        credentials = service_account.Credentials.from_service_account_file(
            SERVICE_ACCOUNT_KEY_FILE, scopes=SCOPES)
        webmasters_service = build('webmasters', 'v3', credentials=credentials)
        print("✓ Successfully authenticated with Google Search Console API.")
        return webmasters_service
    except Exception as e:
        print(f"Error during GSC authentication: {e}")
        sys.exit(1)


def check_date_has_data(service, site_url, check_date):
    """
    Checks if GSC has any performance data for the given site and date.

    Args:
        service: The authenticated GSC API service object.
        site_url (str): The GSC property URL.
        check_date (date): The date to check for data.

    Returns:
        bool: True if data exists for the date, False otherwise.
    """
    date_str = check_date.strftime('%Y-%m-%d')
    # Create a minimal query request for the specific date
    request_body = {
        'startDate': date_str,
        'endDate': date_str,
        'dimensions': ['query'],  # Using a minimal dimension
        'rowLimit': 1             # We only need 1 row to confirm data exists
    }

    try:
        # Execute the query via the searchanalytics endpoint
        response = service.searchanalytics().query(
            siteUrl=site_url,
            body=request_body
        ).execute()

        # Check if the 'rows' key exists and contains any data
        return bool(response.get('rows')) # More concise check

    except HttpError as e:
        # Handle common API errors gracefully
        print(f"\nAPI Error querying for date {date_str}: {e.resp.status} {e.resp.reason}")
        if e.resp.status == 403:
             print("   (This could be a permission issue or rate limiting.)")
        elif e.resp.status == 429:
             print("   (Rate limit exceeded. Consider increasing API_CHECK_DELAY.)")
        # Continue checking previous days despite error on this date
        return False
    except Exception as e:
        # Catch other potential exceptions
        print(f"\nUnexpected error querying for date {date_str}: {e}")
        return False


def find_most_recent_gsc_data_date(service, site_url):
    """
    Iterates backward from recent dates to find the latest date with GSC data.

    Args:
        service: The authenticated GSC API service object.
        site_url (str): The GSC property URL to check.

    Returns:
        date or None: The most recent date with data, or None if not found within limit.
    """
    most_recent_data_date = None
    days_checked = 0
    current_date = datetime.now().date() - timedelta(days=START_OFFSET_DAYS)

    print(f"\nFinding most recent data for site: {site_url}")
    print(f"Starting check from date: {current_date}")

    while days_checked < MAX_DAYS_TO_CHECK:
        print(f"Checking date {current_date}...", end=" ", flush=True)

        if check_date_has_data(service, site_url, current_date):
            print("✓ Data found!")
            most_recent_data_date = current_date
            break  # Exit loop once data is found
        else:
            print("✗ No data")
            current_date -= timedelta(days=1)
            days_checked += 1
            time.sleep(API_CHECK_DELAY) # Pause before the next check

    return most_recent_data_date


def main():
    """Main execution function."""
    # 1. Authenticate and get the GSC service object
    gsc_service = authenticate_gsc()

    # 2. Find the most recent date with data
    latest_date = find_most_recent_gsc_data_date(gsc_service, SITE_URL)

    # 3. Report the result
    if latest_date:
        print(f"\nSuccess: Most recent GSC data available is for: {latest_date.strftime('%Y-%m-%d')}")
    else:
        last_checked_date = datetime.now().date() - timedelta(days=START_OFFSET_DAYS + MAX_DAYS_TO_CHECK -1)
        print(f"\nWarning: Could not find GSC data within the last {MAX_DAYS_TO_CHECK} checked days")
        print(f"         (Checked back to {last_checked_date.strftime('%Y-%m-%d')}).")
        print("         Please verify the service account has permissions for the site,")
        print("         and check the GSC interface for data availability.")


if __name__ == "__main__":
    # Ensure SITE_URL is set
    if not SITE_URL:
         print("Error: Please set the GSC_SITE_URL environment variable or update the 'SITE_URL' variable in the script.")
         print("Example usage: GSC_SITE_URL=\"sc-domain:example.com\" python precursors/gsc_movers_and_shakers.py")
         sys.exit(1)
    main()

The glorious output of this script is a simple date, the most recent date in Google Search Console that actually has data. Many scripts out there will assume 2, 3 or 4 days ago based on assumptions, but I like to know. If the Web UI shows data up to a certain day, I want my scripts to automatically choose that same day. The only value you should have to hardwire is your domain, and even that could be brought in through the command-line args, but I’m keeping that extra code out of this (for now).

[mike@nixos:~/repos/pipulate/precursors]$ python gsc_top_movers.py 
✓ Successfully authenticated with Google Search Console API.

Finding most recent data for site: sc-domain:mikelev.in
Starting check from date: 2025-04-08
Checking date 2025-04-08... ✓ Data found!

Success: Most recent GSC data available is for: 2025-04-08

Downloading Most Recent Day of GSC Data

Now the GSC API supports start and end dates and various metrics and dimensions, and for the sake of an easy-ish example, we will start out with downloading and putting into a Pandas DataFrame just 1-day of data from the most recent day we just found, and ask for only the query and page dimensions. We will also ask for the maximum rowLimit of 25000 and deliberately start at startRow 0 so we can step through and empty out all the responses into a single Python DataFrame…

#!/usr/bin/env python
# coding: utf-8

"""
Finds the most recent date for which Google Search Console (GSC) data
is available for a specified site using a service account.

Required pip installs:
    pip install google-api-python-client google-auth
"""

import os
import sys
import time
from datetime import datetime, timedelta

import pandas as pd
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# --- Configuration ---

# Set your GSC Property URL here (e.g., "sc-domain:example.com" or "https://www.example.com/")
# IMPORTANT: Make sure the service account has access to this property!
SITE_URL = "sc-domain:mikelev.in"

# Path to your service account key JSON file
# Assumes the key file is in the same directory as the script. Adjust if needed.
SERVICE_ACCOUNT_KEY_FILE = os.path.join(os.path.dirname(__file__), 'service-account-key.json')

# Required Google API scopes
SCOPES = ['https://www.googleapis.com/auth/webmasters']

# GSC data typically has a delay. Start checking this many days before today.
START_OFFSET_DAYS = 2

# Maximum number of past days to check before giving up
MAX_DAYS_TO_CHECK = 10

# Delay between API checks (in seconds) to respect rate limits
API_CHECK_DELAY = 0.5

# --- End Configuration ---


def authenticate_gsc():
    """Authenticates with Google API using service account credentials."""
    if not os.path.exists(SERVICE_ACCOUNT_KEY_FILE):
        print(f"Error: Service account key file not found at: {SERVICE_ACCOUNT_KEY_FILE}")
        print("Please ensure the file exists and the path is correct.")
        sys.exit(1)

    try:
        credentials = service_account.Credentials.from_service_account_file(
            SERVICE_ACCOUNT_KEY_FILE, scopes=SCOPES)
        webmasters_service = build('webmasters', 'v3', credentials=credentials)
        print("✓ Successfully authenticated with Google Search Console API.")
        return webmasters_service
    except Exception as e:
        print(f"Error during GSC authentication: {e}")
        sys.exit(1)


def check_date_has_data(service, site_url, check_date):
    """
    Checks if GSC has any performance data for the given site and date.

    Args:
        service: The authenticated GSC API service object.
        site_url (str): The GSC property URL.
        check_date (date): The date to check for data.

    Returns:
        bool: True if data exists for the date, False otherwise.
    """
    date_str = check_date.strftime('%Y-%m-%d')
    # Create a minimal query request for the specific date
    request_body = {
        'startDate': date_str,
        'endDate': date_str,
        'dimensions': ['query'],  # Using a minimal dimension
        'rowLimit': 1             # We only need 1 row to confirm data exists
    }

    try:
        # Execute the query via the searchanalytics endpoint
        response = service.searchanalytics().query(
            siteUrl=site_url,
            body=request_body
        ).execute()

        # Check if the 'rows' key exists and contains any data
        return bool(response.get('rows'))  # More concise check

    except HttpError as e:
        # Handle common API errors gracefully
        print(f"\nAPI Error querying for date {date_str}: {e.resp.status} {e.resp.reason}")
        if e.resp.status == 403:
            print("   (This could be a permission issue or rate limiting.)")
        elif e.resp.status == 429:
            print("   (Rate limit exceeded. Consider increasing API_CHECK_DELAY.)")
        # Continue checking previous days despite error on this date
        return False
    except Exception as e:
        # Catch other potential exceptions
        print(f"\nUnexpected error querying for date {date_str}: {e}")
        return False


def find_most_recent_gsc_data_date(service, site_url):
    """
    Iterates backward from recent dates to find the latest date with GSC data.

    Args:
        service: The authenticated GSC API service object.
        site_url (str): The GSC property URL to check.

    Returns:
        date or None: The most recent date with data, or None if not found within limit.
    """
    most_recent_data_date = None
    days_checked = 0
    current_date = datetime.now().date() - timedelta(days=START_OFFSET_DAYS)

    print(f"\nFinding most recent data for site: {site_url}")
    print(f"Starting check from date: {current_date}")

    while days_checked < MAX_DAYS_TO_CHECK:
        print(f"Checking date {current_date}...", end=" ", flush=True)

        if check_date_has_data(service, site_url, current_date):
            print("✓ Data found!")
            most_recent_data_date = current_date
            break  # Exit loop once data is found
        else:
            print("✗ No data")
            current_date -= timedelta(days=1)
            days_checked += 1
            time.sleep(API_CHECK_DELAY)  # Pause before the next check

    return most_recent_data_date


def fetch_all_gsc_data(service, site_url, request_body):
    """
    Fetches all available data rows for a given GSC query, handling pagination.

    Args:
        service: The authenticated GSC API service object.
        site_url (str): The GSC property URL.
        request_body (dict): The initial query request body. Must include
                             'rowLimit' and 'startRow'.

    Returns:
        list: A list containing all the row data dictionaries fetched from GSC.
              Returns an empty list if an error occurs or no data is found.
    """
    all_rows = []
    start_row = request_body.get('startRow', 0)
    row_limit = request_body.get('rowLimit', 1000)  # Default GSC limit if not specified
    page_count = 0

    print(f"\nFetching data with dimensions: {request_body.get('dimensions', [])}")

    while True:
        page_count += 1
        request_body['startRow'] = start_row
        print(f"Fetching page {page_count} (starting row {start_row})...", end=" ", flush=True)

        try:
            response = service.searchanalytics().query(
                siteUrl=site_url,
                body=request_body
            ).execute()

            current_rows = response.get('rows', [])

            if not current_rows:
                if page_count == 1:
                    print("No data found for this query.")
                else:
                    print("✓ No more data.")
                break  # Exit loop if no rows are returned

            print(f"✓ Retrieved {len(current_rows)} rows.")
            all_rows.extend(current_rows)

            # GSC API max rowLimit is 25000, but let's use the requested rowLimit
            # If fewer rows than the limit are returned, it's the last page
            if len(current_rows) < row_limit:
                print("✓ Reached last page of results.")
                break  # Exit loop if last page is fetched

            start_row += len(current_rows)
            time.sleep(API_CHECK_DELAY)  # Be nice to the API between pages

        except HttpError as e:
            print(f"\nAPI Error during data fetch (page {page_count}): {e.resp.status} {e.resp.reason}")
            if e.resp.status in [403, 429]:
                print("   (Consider permissions, quotas, or increasing API_CHECK_DELAY.)")
            print("   Aborting data fetch for this request.")
            return []  # Return empty list on error
        except Exception as e:
            print(f"\nUnexpected error during data fetch (page {page_count}): {e}")
            print("   Aborting data fetch for this request.")
            return []  # Return empty list on error

    print(f"✓ Finished fetching. Total rows retrieved: {len(all_rows)}")
    return all_rows


def process_gsc_data_to_dataframe(gsc_data_list):
    """
    Converts the raw list of GSC data rows into a processed Pandas DataFrame.

    Args:
        gsc_data_list (list): A list of row data dictionaries from fetch_all_gsc_data.

    Returns:
        pandas.DataFrame: A DataFrame with processed GSC data, including separate
                          'query' and 'page' columns, and numeric metrics.
                          Returns an empty DataFrame if input is empty or invalid.
    """
    if not gsc_data_list:
        print("No data to process into DataFrame.")
        return pd.DataFrame()

    df = pd.DataFrame(gsc_data_list)

    # Split the 'keys' column (list) into separate dimension columns
    if 'keys' in df.columns and not df['keys'].empty:
        # Assuming dimensions are ['query', 'page'] as requested
        try:
            df['query'] = df['keys'].apply(lambda x: x[0] if isinstance(x, list) and len(x) > 0 else None)
            df['page'] = df['keys'].apply(lambda x: x[1] if isinstance(x, list) and len(x) > 1 else None)
            df = df.drop('keys', axis=1)
            print("✓ Split 'keys' into 'query' and 'page' columns.")
        except Exception as e:
            print(f"Warning: Could not split 'keys' column reliably: {e}")
            # Keep the 'keys' column if splitting fails
            if 'query' in df.columns:
                df = df.drop('query', axis=1)
            if 'page' in df.columns:
                df = df.drop('page', axis=1)

    # Ensure standard metric columns exist and convert to numeric types
    metric_cols = ['clicks', 'impressions', 'ctr', 'position']
    for col in metric_cols:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')  # Coerce errors to NaN
            # Optional: Fill NaNs created by coercion if needed, e.g., df[col] = df[col].fillna(0)
        else:
            # Add the column with default value if missing (optional)
            # df[col] = 0
            print(f"Note: Metric column '{col}' not found in raw data.")

    # Optional: Convert CTR to percentage for display
    if 'ctr' in df.columns:
        df['ctr'] = df['ctr'] * 100
        print("✓ Converted 'ctr' to percentage.")

    print(f"✓ Processed data into DataFrame with {len(df)} rows and {len(df.columns)} columns.")
    return df


def main():
    """Main execution function."""
    # 1. Authenticate and get the GSC service object
    gsc_service = authenticate_gsc()

    # 2. Find the most recent date with data
    latest_date = find_most_recent_gsc_data_date(gsc_service, SITE_URL)

    # --- ADD THE FOLLOWING ---
    # 3. If a date was found, fetch and process the data for that day
    if latest_date:
        print(f"\nAttempting to download data for the most recent date: {latest_date.strftime('%Y-%m-%d')}")

        # Define the request for that single day
        single_day_request = {
            'startDate': latest_date.strftime('%Y-%m-%d'),
            'endDate': latest_date.strftime('%Y-%m-%d'),
            'dimensions': ['query', 'page'],
            'rowLimit': 25000,  # Use max row limit per page fetch
            'startRow': 0
        }

        # Fetch data using the pagination function
        raw_gsc_data = fetch_all_gsc_data(gsc_service, SITE_URL, single_day_request)

        # Process the raw data into a DataFrame
        df_gsc_data = process_gsc_data_to_dataframe(raw_gsc_data)

        # Display results if DataFrame is not empty
        if not df_gsc_data.empty:
            print("\n--- GSC Data DataFrame Preview (first 10 rows) ---")
            # Set display options for better preview
            pd.set_option('display.max_rows', 20)
            pd.set_option('display.max_columns', None)
            pd.set_option('display.width', 100)
            pd.set_option('display.max_colwidth', 50)
            print(df_gsc_data.head(10))

            print("\n--- DataFrame Info ---")
            df_gsc_data.info()

            print("\n--- Basic Stats ---")
            # Use describe() for numeric columns, handle potential missing columns
            numeric_cols = df_gsc_data.select_dtypes(include=['number']).columns
            if not numeric_cols.empty:
                print(df_gsc_data[numeric_cols].describe())
            else:
                print("No numeric columns found for describe().")

            # Example of accessing specific data
            # print(f"\nTotal Clicks: {df_gsc_data['clicks'].sum()}")
            # print(f"Total Impressions: {df_gsc_data['impressions'].sum()}")

        else:
            print("\nNo data was processed into the DataFrame.")
    # --- END OF ADDED SECTION ---
    else:
        # This part was already there: handle case where no date was found
        last_checked_date = datetime.now().date() - timedelta(days=START_OFFSET_DAYS + MAX_DAYS_TO_CHECK - 1)
        print(f"\nWarning: Could not find GSC data within the last {MAX_DAYS_TO_CHECK} checked days")
        print(f"         (Checked back to {last_checked_date.strftime('%Y-%m-%d')}).")
        print("         Cannot proceed with data download.")
        # Removed the previous permission/interface check message as it's less relevant here


# --- Ensure the final block still calls main() ---
if __name__ == "__main__":
    # Ensure SITE_URL is set (this check can be simplified or enhanced)
    if not SITE_URL or SITE_URL == "sc-domain:yourdomain.com":  # Added check for default placeholder
        print("Error: Please update the 'SITE_URL' variable in the script configuration.")
        sys.exit(1)
    main()

The first program has been expanded with a number of functions added and a reworking of the main function. Here’s the explanation.

Import pandas: We add import pandas as pd at the top to handle data manipulation.
Workspace_all_gsc_data function: This function is crucial. It takes the initial request and repeatedly calls the GSC API, increasing the startRow each time, until all data matching the request (for the specified date and dimensions) is collected. It includes basic error handling and delays between calls.
process_gsc_data_to_dataframe function: This takes the raw list of data returned by the API and transforms it into a structured Pandas DataFrame. It splits the keys list into proper query and page columns and ensures the metric columns (clicks, impressions, ctr, position) are numeric.
Updated main function:
- After successfully finding the latest_date, it defines a single_day_request dictionary using that date for both startDate and endDate, specifying the desired dimensions, and setting the maximum rowLimit.
- It calls Workspace_all_gsc_data to get the raw data.
- It calls process_gsc_data_to_dataframe to structure the data.
- Finally, if data was successfully processed, it prints a preview of the DataFrame using df_gsc_data.head(10) and shows basic info and stats using df_gsc_data.info() and df_gsc_data.describe().

Running The 1-Day Version of GSC Script

We’re building-up here. I know it’s a lot of code to look at. But this will let you build-up the same way, going from basic GSC access that finds the first day of data, to downloading the data for that day.

Now, running the script will not only find the most recent date but also download and display the query and page performance data for that specific day.

[mike@nixos:~/repos/pipulate/precursors]$ python gsc_top_movers.py 
✓ Successfully authenticated with Google Search Console API.

Finding most recent data for site: sc-domain:mikelev.in
Starting check from date: 2025-04-08
Checking date 2025-04-08... ✓ Data found!

Attempting to download data for the most recent date: 2025-04-08

Fetching data with dimensions: ['query', 'page']
Fetching page 1 (starting row 0)... ✓ Retrieved 3261 rows.
✓ Reached last page of results.
✓ Finished fetching. Total rows retrieved: 3261
✓ Split 'keys' into 'query' and 'page' columns.
✓ Converted 'ctr' to percentage.
✓ Processed data into DataFrame with 3261 rows and 6 columns.

--- GSC Data DataFrame Preview (first 10 rows) ---
   clicks  impressions         ctr   position                                  query  \
0       2            9   22.222222  11.111111                cursor jupyter notebook   
1       1           27    3.703704  41.851852                              cursor ai   
2       1            2   50.000000   3.500000                      cursor ai jupyter   
3       1            2   50.000000   7.000000            cursor and jupyter notebook   
4       1            1  100.000000   2.000000                               fasthtml   
5       1            3   33.333333  25.666667                               fasthtml   
6       1            1  100.000000  23.000000               function calling in llms   
7       1            3   33.333333   1.000000                          grok markdown   
8       1            3   33.333333   7.000000  how to run jupyter notebook in cursor   
9       1            7   14.285714   4.285714                       nix flake python   

                                                page  
0  https://mikelev.in/futureproof/jupyter-noteboo...  
1  https://mikelev.in/futureproof/ai-agentic-mode...  
2  https://mikelev.in/futureproof/jupyter-noteboo...  
3  https://mikelev.in/futureproof/jupyter-noteboo...  
4                https://mikelev.in/futureproof/foo/  
5  https://mikelev.in/futureproof/unpacking-fasth...  
6  https://mikelev.in/futureproof/function-callin...  
7  https://mikelev.in/futureproof/grok3-markdown-...  
8  https://mikelev.in/futureproof/jupyter-noteboo...  
9         https://mikelev.in/futureproof/nix-flakes/  

--- DataFrame Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3261 entries, 0 to 3260
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   clicks       3261 non-null   int64  
 1   impressions  3261 non-null   int64  
 2   ctr          3261 non-null   float64
 3   position     3261 non-null   float64
 4   query        3261 non-null   object 
 5   page         3261 non-null   object 
dtypes: float64(2), int64(2), object(2)
memory usage: 153.0+ KB

--- Basic Stats ---
            clicks  impressions          ctr     position
count  3261.000000  3261.000000  3261.000000  3261.000000
mean      0.003680     1.697639     0.165658    74.311338
std       0.065429     2.811669     3.456809    23.902479
min       0.000000     1.000000     0.000000     1.000000
25%       0.000000     1.000000     0.000000    64.000000
50%       0.000000     1.000000     0.000000    80.000000
75%       0.000000     2.000000     0.000000    92.000000
max       2.000000    87.000000   100.000000   194.000000

Downloading 4 Individual Days of GSC Data

Now the beautiful thing here is that because all our data is for a particular day which so happens to be the most recent day that has GSC data, we can start to gather the data we need for really beautiful directional analysis per keyword per URL. In GSC parlance, keyword is query and URL is page. So what we do next is subtract 1 day from the date, and do the same thing we just did.

In fact, we can have a day-range and create a Pandas DataFrame for each day, N-days back from the most recent that has data. For the sake of simplicity and being kind to your computer’s memory, we will only do 3 days back total, for a total of 4 days worth of data, or 4 separate Pandas DataFrames.

Here’s how we do that…

#!/usr/bin/env python
# coding: utf-8

"""
Fetches Google Search Console (GSC) data for the 4 most recent days available
for a specified site using a service account. Saves each day to a CSV file.

Required pip installs:
    pip install google-api-python-client google-auth pandas
"""

import os
import sys
import time
from datetime import datetime, timedelta

import pandas as pd
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# --- Configuration ---

# Set your GSC Property URL here (e.g., "sc-domain:example.com" or "https://www.example.com/")
SITE_URL = "sc-domain:mikelev.in"

# Path to your service account key JSON file
# Assumes the key file is in the same directory as the script. Adjust if needed.
SCRIPT_DIR = os.path.dirname(__file__)
SERVICE_ACCOUNT_KEY_FILE = os.path.join(SCRIPT_DIR, 'service-account-key.json')

# Required Google API scopes
SCOPES = ['https://www.googleapis.com/auth/webmasters']

# GSC data typically has a delay. Start checking this many days before today.
START_OFFSET_DAYS = 2

# Maximum number of past days to check before giving up
MAX_DAYS_TO_CHECK = 10

# Delay between API checks (in seconds) to respect rate limits
API_CHECK_DELAY = 0.5

# Number of consecutive days of data to fetch (ending on the most recent date)
NUM_DAYS_TO_FETCH = 4

# Directory to store/load cached daily GSC data CSV files
CACHE_DIR = os.path.join(SCRIPT_DIR, 'gsc_cache')

# --- End Configuration ---


def authenticate_gsc():
    """Authenticates with Google API using service account credentials."""
    if not os.path.exists(SERVICE_ACCOUNT_KEY_FILE):
        print(f"Error: Service account key file not found at: {SERVICE_ACCOUNT_KEY_FILE}")
        print("Please ensure the file exists and the path is correct.")
        sys.exit(1)

    try:
        credentials = service_account.Credentials.from_service_account_file(
            SERVICE_ACCOUNT_KEY_FILE, scopes=SCOPES)
        webmasters_service = build('webmasters', 'v3', credentials=credentials)
        print("✓ Successfully authenticated with Google Search Console API.")
        return webmasters_service
    except Exception as e:
        print(f"Error during GSC authentication: {e}")
        sys.exit(1)


def check_date_has_data(service, site_url, check_date):
    """Checks if GSC has any performance data for the given site and date."""
    date_str = check_date.strftime('%Y-%m-%d')
    request_body = {
        'startDate': date_str, 'endDate': date_str,
        'dimensions': ['query'], 'rowLimit': 1
    }
    try:
        response = service.searchanalytics().query(
            siteUrl=site_url, body=request_body).execute()
        return bool(response.get('rows'))
    except HttpError as e:
        print(f"\nAPI Error checking date {date_str}: {e.resp.status} {e.resp.reason}")
        return False
    except Exception as e:
        print(f"\nUnexpected error checking date {date_str}: {e}")
        return False


def find_most_recent_gsc_data_date(service, site_url):
    """Iterates backward to find the latest date with GSC data."""
    most_recent_data_date = None
    days_checked = 0
    current_date = datetime.now().date() - timedelta(days=START_OFFSET_DAYS)
    print(f"\nFinding most recent data date for site: {site_url}")
    print(f"Starting check from date: {current_date}")
    while days_checked < MAX_DAYS_TO_CHECK:
        print(f"Checking date {current_date}...", end=" ", flush=True)
        if check_date_has_data(service, site_url, current_date):
            print("✓ Data found!")
            most_recent_data_date = current_date
            break
        else:
            print("✗ No data")
            current_date -= timedelta(days=1)
            days_checked += 1
            time.sleep(API_CHECK_DELAY)
    if not most_recent_data_date:
        last_checked_date = current_date + timedelta(days=1) # Date where check stopped
        print(f"\nWarning: Could not find data within {MAX_DAYS_TO_CHECK} checked days (back to {last_checked_date}).")
    return most_recent_data_date


def fetch_all_gsc_data(service, site_url, request_body):
    """Fetches all available data rows for a given GSC query, handling pagination."""
    all_rows = []
    start_row = request_body.get('startRow', 0)
    row_limit = request_body.get('rowLimit', 25000) # Use max GSC limit
    page_count = 0
    print(f"\nFetching data with dimensions: {request_body.get('dimensions', [])} for {request_body.get('startDate')}")
    while True:
        page_count += 1
        request_body['startRow'] = start_row
        print(f"Fetching page {page_count} (starting row {start_row})...", end=" ", flush=True)
        try:
            response = service.searchanalytics().query(
                siteUrl=site_url, body=request_body).execute()
            current_rows = response.get('rows', [])
            if not current_rows:
                if page_count == 1: print("No data found for this query.")
                else: print("✓ No more data.")
                break
            print(f"✓ Retrieved {len(current_rows)} rows.")
            all_rows.extend(current_rows)
            if len(current_rows) < row_limit:
                print("✓ Reached last page of results.")
                break
            start_row += len(current_rows)
            time.sleep(API_CHECK_DELAY)
        except HttpError as e:
            print(f"\nAPI Error fetching page {page_count}: {e.resp.status} {e.resp.reason}")
            return [] # Return empty list on error
        except Exception as e:
            print(f"\nUnexpected error fetching page {page_count}: {e}")
            return []
    print(f"✓ Finished fetching {request_body.get('startDate')}. Total rows: {len(all_rows)}")
    return all_rows


def process_gsc_data_to_dataframe(gsc_data_list, data_date):
    """Converts raw GSC data list into a processed Pandas DataFrame."""
    if not gsc_data_list:
        print(f"No data to process into DataFrame for {data_date}.")
        return pd.DataFrame()
    df = pd.DataFrame(gsc_data_list)
    if 'keys' in df.columns and not df['keys'].empty:
        try:
            df['query'] = df['keys'].apply(lambda x: x[0] if isinstance(x, list) and len(x) > 0 else None)
            df['page'] = df['keys'].apply(lambda x: x[1] if isinstance(x, list) and len(x) > 1 else None)
            df = df.drop('keys', axis=1)
        except Exception as e:
            print(f"Warning: Could not split 'keys' column for {data_date}: {e}")
    metric_cols = ['clicks', 'impressions', 'ctr', 'position']
    for col in metric_cols:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')
        else:
            print(f"Note: Metric column '{col}' not found for {data_date}.")
    if 'ctr' in df.columns: df['ctr'] = df['ctr'] * 100
    print(f"✓ Processed data for {data_date} into DataFrame ({len(df)} rows).")
    return df


def get_gsc_data_for_day(service, site_url, data_date):
    """
    Fetches GSC data for a specific day, using cache if available.

    Args:
        service: Authenticated GSC service object.
        site_url (str): Target GSC property URL.
        data_date (date): The specific date to fetch data for.

    Returns:
        pandas.DataFrame: DataFrame containing data for the specified date.
                          Returns an empty DataFrame if fetching fails or no data.
    """
    date_str = data_date.strftime('%Y-%m-%d')
    cache_filename = os.path.join(CACHE_DIR, f"gsc_data_{date_str}.csv")

    # Check cache first
    if os.path.exists(cache_filename):
        try:
            print(f"✓ Loading data for {date_str} from cache: {cache_filename}")
            # Specify dtypes to avoid warnings and ensure consistency
            df = pd.read_csv(cache_filename, dtype={
                'clicks': 'Int64', # Use nullable integer type
                'impressions': 'Int64',
                'ctr': 'float64',
                'position': 'float64',
                'query': 'object',
                'page': 'object'
            }, parse_dates=False) # Dates handled separately
            # Ensure CTR is percentage if loaded from CSV
            if 'ctr' in df.columns and df['ctr'].max() <= 1.0:
                 df['ctr'] = df['ctr'] * 100
            return df
        except Exception as e:
            print(f"Warning: Could not load cache file {cache_filename}. Error: {e}. Re-fetching.")

    # If not in cache or cache fails, fetch from API
    print(f"Fetching data for {date_str} from GSC API...")
    request = {
        'startDate': date_str, 'endDate': date_str,
        'dimensions': ['query', 'page'],
        'rowLimit': 25000, 'startRow': 0
    }
    raw_data = fetch_all_gsc_data(service, site_url, request)
    df = process_gsc_data_to_dataframe(raw_data, date_str)

    # Save to cache if data was fetched successfully and is not empty
    if not df.empty:
        try:
            os.makedirs(CACHE_DIR, exist_ok=True) # Ensure cache directory exists
            df.to_csv(cache_filename, index=False)
            print(f"✓ Saved data for {date_str} to cache: {cache_filename}")
        except Exception as e:
            print(f"Warning: Could not save data for {date_str} to cache. Error: {e}")

    return df


def main():
    """Main execution function."""
    # 1. Authenticate and get the GSC service object
    gsc_service = authenticate_gsc()

    # 2. Find the most recent date with data
    latest_date = find_most_recent_gsc_data_date(gsc_service, SITE_URL)

    if not latest_date:
        print("\nCannot proceed without a valid recent date.")
        sys.exit(1)

    print(f"\nSuccess: Most recent GSC data available is for: {latest_date.strftime('%Y-%m-%d')}")

    # 3. Determine the date range (last NUM_DAYS_TO_FETCH days)
    dates_to_fetch = [latest_date - timedelta(days=i) for i in range(NUM_DAYS_TO_FETCH)][::-1] # Reverse to get oldest first
    print(f"\nPreparing to fetch/load data for {NUM_DAYS_TO_FETCH} days:")
    for d in dates_to_fetch:
        print(f"  - {d.strftime('%Y-%m-%d')}")

    # 4. Fetch/load data for each day into a dictionary of DataFrames
    daily_dataframes = {}
    all_data_fetched = True
    for target_date in dates_to_fetch:
        df_day = get_gsc_data_for_day(gsc_service, SITE_URL, target_date)
        if df_day.empty and not os.path.exists(os.path.join(CACHE_DIR, f"gsc_data_{target_date.strftime('%Y-%m-%d')}.csv")):
             # Check if the file exists even if df is empty, maybe it was an empty day
             print(f"Warning: Failed to fetch or load data for {target_date.strftime('%Y-%m-%d')}. Trend analysis might be incomplete.")
             all_data_fetched = False
        daily_dataframes[target_date] = df_day

    # 5. Confirmation and next steps preview
    if all_data_fetched:
        print(f"\n✓ Successfully fetched/loaded data for all {len(daily_dataframes)} target days.")
        # Optionally display info about one of the DataFrames
        if latest_date in daily_dataframes and not daily_dataframes[latest_date].empty:
            print(f"\n--- Preview of data for the latest date ({latest_date.strftime('%Y-%m-%d')}): ---")
            print(daily_dataframes[latest_date].head())
            print(f"(Total rows for {latest_date.strftime('%Y-%m-%d')}: {len(daily_dataframes[latest_date])})")
        print("\nDataFrames are stored in the 'daily_dataframes' dictionary, keyed by date.")
        print("Next step: Combine these DataFrames and perform trend analysis.")
    else:
        print("\nWarning: Data fetching/loading was incomplete for one or more days.")
        print("Proceeding with available data, but trend analysis might be affected.")
        # Still print available data info
        print(f"\nAvailable DataFrames ({len(daily_dataframes)}): {list(daily_dataframes.keys())}")

    # The script now ends here. The next logical step (trend analysis)
    # would follow, using the 'daily_dataframes' dictionary.


if __name__ == "__main__":
    if not SITE_URL or SITE_URL == "sc-domain:yourdomain.com":
        print("Error: Please update the 'SITE_URL' variable in the script configuration.")
        sys.exit(1)
    main()

Explanation of Changes:

Configuration: Added NUM_DAYS_TO_FETCH (set to 4) and CACHE_DIR variables.
get_gsc_data_for_day Function: This new function encapsulates the logic for getting data for a single specific date.
- It constructs the expected cache filename (gsc_data_YYYY-MM-DD.csv) inside the CACHE_DIR.
- It first checks if this file exists. If yes, it loads the data using pd.read_csv (specifying dtypes for robustness) and returns the DataFrame.
- If the file doesn’t exist (or loading fails), it proceeds to call Workspace_all_gsc_data to get data from the API.
- It then calls process_gsc_data_to_dataframe.
- If the fetched data is not empty, it saves the DataFrame to the cache file using df.to_csv(index=False).
- It returns the resulting DataFrame (which might be empty if fetching failed or the day truly had no data).
Updated main Function:
- After finding latest_date, it calculates the list of dates_to_fetch (the 4 days ending on latest_date).
- It initializes an empty dictionary daily_dataframes to store the results.
- It loops through dates_to_fetch. In each iteration:
  - It calls get_gsc_data_for_day for the current date.
  - It stores the returned DataFrame in the daily_dataframes dictionary, using the date object as the key.
  - It includes a basic check (all_data_fetched) to see if data was obtained for all requested days.
- After the loop, it prints a confirmation message and a preview of the data for the most recent day. It explicitly mentions that the data is now ready for the next step (trend analysis) and stored in the daily_dataframes dictionary.

Now, when you run this script, it will fetch (or load from cache) the data for each of the last 4 days and store them in memory, ready for the trend analysis code you outlined previously.

Here’s sample output:

[mike@nixos:~/repos/pipulate/precursors]$ python gsc_top_movers.py 
✓ Successfully authenticated with Google Search Console API.

Finding most recent data date for site: sc-domain:mikelev.in
Starting check from date: 2025-04-08
Checking date 2025-04-08... ✓ Data found!

Success: Most recent GSC data available is for: 2025-04-08

Preparing to fetch/load data for 4 days:
  - 2025-04-05
  - 2025-04-06
  - 2025-04-07
  - 2025-04-08
Fetching data for 2025-04-05 from GSC API...

Fetching data with dimensions: ['query', 'page'] for 2025-04-05
Fetching page 1 (starting row 0)... ✓ Retrieved 214 rows.
✓ Reached last page of results.
✓ Finished fetching 2025-04-05. Total rows: 214
✓ Processed data for 2025-04-05 into DataFrame (214 rows).
✓ Saved data for 2025-04-05 to cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-05.csv
Fetching data for 2025-04-06 from GSC API...

Fetching data with dimensions: ['query', 'page'] for 2025-04-06
Fetching page 1 (starting row 0)... ✓ Retrieved 223 rows.
✓ Reached last page of results.
✓ Finished fetching 2025-04-06. Total rows: 223
✓ Processed data for 2025-04-06 into DataFrame (223 rows).
✓ Saved data for 2025-04-06 to cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-06.csv
Fetching data for 2025-04-07 from GSC API...

Fetching data with dimensions: ['query', 'page'] for 2025-04-07
Fetching page 1 (starting row 0)... ✓ Retrieved 1791 rows.
✓ Reached last page of results.
✓ Finished fetching 2025-04-07. Total rows: 1791
✓ Processed data for 2025-04-07 into DataFrame (1791 rows).
✓ Saved data for 2025-04-07 to cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-07.csv
Fetching data for 2025-04-08 from GSC API...

Fetching data with dimensions: ['query', 'page'] for 2025-04-08
Fetching page 1 (starting row 0)... ✓ Retrieved 3261 rows.
✓ Reached last page of results.
✓ Finished fetching 2025-04-08. Total rows: 3261
✓ Processed data for 2025-04-08 into DataFrame (3261 rows).
✓ Saved data for 2025-04-08 to cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-08.csv

✓ Successfully fetched/loaded data for all 4 target days.

--- Preview of data for the latest date (2025-04-08): ---
   clicks  impressions         ctr   position                        query                                               page
0       2            9   22.222222  11.111111      cursor jupyter notebook  https://mikelev.in/futureproof/jupyter-noteboo...
1       1           27    3.703704  41.851852                    cursor ai  https://mikelev.in/futureproof/ai-agentic-mode...
2       1            2   50.000000   3.500000            cursor ai jupyter  https://mikelev.in/futureproof/jupyter-noteboo...
3       1            2   50.000000   7.000000  cursor and jupyter notebook  https://mikelev.in/futureproof/jupyter-noteboo...
4       1            1  100.000000   2.000000                     fasthtml                https://mikelev.in/futureproof/foo/
(Total rows for 2025-04-08: 3261)

DataFrames are stored in the 'daily_dataframes' dictionary, keyed by date.
Next step: Combine these DataFrames and perform trend analysis.

Now you may be asking why go through all the trouble of downloading 4 individual days of GSC data when you could have just given a date-range starting from 3-days prior to the first day that has data. The answer is that we now have enough granular per-day data to do trend-lines! If you’re an Excel wonk, that’s sparklines to you. And if you’re a stats geek, that’s enough to calculate the linear regression coefficient slope. A steep positive slope shows sudden upward movement. And so, we will be joining all 4 of our dataframes and adding a linear regression coefficient slope column like so…

This final version of the script takes the daily_dataframes dictionary (containing the data for our 4 chosen days, loaded either from the cache or fetched via the API) and performs the core trend analysis.

Here is the complete, final script (gsc_top_movers.py):

#!/usr/bin/env python
# coding: utf-8

"""
Fetches Google Search Console (GSC) data for the 4 most recent days available,
calculates trend slopes for impressions, clicks, and position for each
page/query combination, and identifies top movers. Saves daily data to CSV cache.

Required pip installs:
    pip install google-api-python-client google-auth pandas numpy scipy
"""

import os
import sys
import time
from datetime import datetime, timedelta

import numpy as np
import pandas as pd
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Need scipy for linregress OR numpy for polyfit - we'll use polyfit here
# from scipy.stats import linregress

# --- Configuration ---

# Set your GSC Property URL here (e.g., "sc-domain:example.com" or "https://www.example.com/")
SITE_URL = "sc-domain:mikelev.in"

# Path to your service account key JSON file
# Assumes the key file is in the same directory as the script. Adjust if needed.
SCRIPT_DIR = os.path.dirname(__file__)
SERVICE_ACCOUNT_KEY_FILE = os.path.join(SCRIPT_DIR, 'service-account-key.json')

# Required Google API scopes
SCOPES = ['https://www.googleapis.com/auth/webmasters']

# GSC data typically has a delay. Start checking this many days before today.
START_OFFSET_DAYS = 2

# Maximum number of past days to check before giving up
MAX_DAYS_TO_CHECK = 10

# Delay between API checks (in seconds) to respect rate limits
API_CHECK_DELAY = 0.5

# Number of consecutive days of data to fetch (ending on the most recent date)
NUM_DAYS_TO_FETCH = 4

# Directory to store/load cached daily GSC data CSV files
CACHE_DIR = os.path.join(SCRIPT_DIR, 'gsc_cache')

# --- End Configuration ---


# --- Functions from previous steps (authenticate_gsc, check_date_has_data, find_most_recent_gsc_data_date, fetch_all_gsc_data, process_gsc_data_to_dataframe, get_gsc_data_for_day) ---
# Include the full definitions of these functions here from the previous code block...

def authenticate_gsc():
    """Authenticates with Google API using service account credentials."""
    if not os.path.exists(SERVICE_ACCOUNT_KEY_FILE):
        print(f"Error: Service account key file not found at: {SERVICE_ACCOUNT_KEY_FILE}")
        sys.exit(1)
    try:
        credentials = service_account.Credentials.from_service_account_file(
            SERVICE_ACCOUNT_KEY_FILE, scopes=SCOPES)
        webmasters_service = build('webmasters', 'v3', credentials=credentials)
        print("✓ Successfully authenticated with Google Search Console API.")
        return webmasters_service
    except Exception as e:
        print(f"Error during GSC authentication: {e}")
        sys.exit(1)


def check_date_has_data(service, site_url, check_date):
    """Checks if GSC has any performance data for the given site and date."""
    date_str = check_date.strftime('%Y-%m-%d')
    request_body = {
        'startDate': date_str, 'endDate': date_str,
        'dimensions': ['query'], 'rowLimit': 1
    }
    try:
        response = service.searchanalytics().query(
            siteUrl=site_url, body=request_body).execute()
        return bool(response.get('rows'))
    except HttpError as e:
        print(f"\nAPI Error checking date {date_str}: {e.resp.status} {e.resp.reason}")
        return False
    except Exception as e:
        print(f"\nUnexpected error checking date {date_str}: {e}")
        return False


def find_most_recent_gsc_data_date(service, site_url):
    """Iterates backward to find the latest date with GSC data."""
    most_recent_data_date = None
    days_checked = 0
    current_date = datetime.now().date() - timedelta(days=START_OFFSET_DAYS)
    print(f"\nFinding most recent data date for site: {site_url}")
    print(f"Starting check from date: {current_date}")
    while days_checked < MAX_DAYS_TO_CHECK:
        print(f"Checking date {current_date}...", end=" ", flush=True)
        if check_date_has_data(service, site_url, current_date):
            print("✓ Data found!")
            most_recent_data_date = current_date
            break
        else:
            print("✗ No data")
            current_date -= timedelta(days=1)
            days_checked += 1
            time.sleep(API_CHECK_DELAY)
    if not most_recent_data_date:
        last_checked_date = current_date + timedelta(days=1)  # Date where check stopped
        print(f"\nWarning: Could not find data within {MAX_DAYS_TO_CHECK} checked days (back to {last_checked_date}).")
    return most_recent_data_date


def fetch_all_gsc_data(service, site_url, request_body):
    """Fetches all available data rows for a given GSC query, handling pagination."""
    all_rows = []
    start_row = request_body.get('startRow', 0)
    row_limit = request_body.get('rowLimit', 25000)  # Use max GSC limit
    page_count = 0
    print(f"\nFetching data with dimensions: {request_body.get('dimensions', [])} for {request_body.get('startDate')}")
    while True:
        page_count += 1
        request_body['startRow'] = start_row
        print(f"Fetching page {page_count} (starting row {start_row})...", end=" ", flush=True)
        try:
            response = service.searchanalytics().query(
                siteUrl=site_url, body=request_body).execute()
            current_rows = response.get('rows', [])
            if not current_rows:
                if page_count == 1:
                    print("No data found for this query.")
                else:
                    print("✓ No more data.")
                break
            print(f"✓ Retrieved {len(current_rows)} rows.")
            all_rows.extend(current_rows)
            if len(current_rows) < row_limit:
                print("✓ Reached last page of results.")
                break
            start_row += len(current_rows)
            time.sleep(API_CHECK_DELAY)
        except HttpError as e:
            print(f"\nAPI Error fetching page {page_count}: {e.resp.status} {e.resp.reason}")
            return []  # Return empty list on error
        except Exception as e:
            print(f"\nUnexpected error fetching page {page_count}: {e}")
            return []
    print(f"✓ Finished fetching {request_body.get('startDate')}. Total rows: {len(all_rows)}")
    return all_rows


def process_gsc_data_to_dataframe(gsc_data_list, data_date):
    """Converts raw GSC data list into a processed Pandas DataFrame."""
    if not gsc_data_list:
        # Keep this print statement for clarity during processing
        # print(f"No data to process into DataFrame for {data_date}.")
        return pd.DataFrame()  # Return empty DF if no raw data
    df = pd.DataFrame(gsc_data_list)
    if 'keys' in df.columns and not df['keys'].empty:
        try:
            df['query'] = df['keys'].apply(lambda x: x[0] if isinstance(x, list) and len(x) > 0 else None)
            df['page'] = df['keys'].apply(lambda x: x[1] if isinstance(x, list) and len(x) > 1 else None)
            df = df.drop('keys', axis=1)
        except Exception as e:
            print(f"Warning: Could not split 'keys' column for {data_date}: {e}")
    metric_cols = ['clicks', 'impressions', 'ctr', 'position']
    for col in metric_cols:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')
        # else: # No need to print this repeatedly if a metric is consistently missing
            # print(f"Note: Metric column '{col}' not found for {data_date}.")
    if 'ctr' in df.columns:
        df['ctr'] = df['ctr'] * 100
    # Don't print success here, wait until after combining DFs if needed
    # print(f"✓ Processed data for {data_date} into DataFrame ({len(df)} rows).")
    return df


def get_gsc_data_for_day(service, site_url, data_date):
    """Fetches GSC data for a specific day, using cache if available."""
    date_str = data_date.strftime('%Y-%m-%d')
    cache_filename = os.path.join(CACHE_DIR, f"gsc_data_{date_str}.csv")
    if os.path.exists(cache_filename):
        try:
            print(f"📂 CACHE: Loading data for {date_str} from cache: {cache_filename}")
            df = pd.read_csv(cache_filename, dtype={
                'clicks': 'Int64', 'impressions': 'Int64', 'ctr': 'float64',
                'position': 'float64', 'query': 'object', 'page': 'object'
            }, parse_dates=False)
            if 'ctr' in df.columns and not df['ctr'].empty and df['ctr'].max() <= 1.0:
                df['ctr'] = df['ctr'] * 100
            return df
        except Exception as e:
            print(f"⚠️ Warning: Could not load cache file {cache_filename}. Error: {e}. Re-fetching.")
    print(f"🔄 API: Fetching data for {date_str} from GSC API...")
    request = {
        'startDate': date_str, 'endDate': date_str,
        'dimensions': ['query', 'page'],
        'rowLimit': 25000, 'startRow': 0
    }
    raw_data = fetch_all_gsc_data(service, site_url, request)
    df = process_gsc_data_to_dataframe(raw_data, date_str)
    if not df.empty:
        try:
            os.makedirs(CACHE_DIR, exist_ok=True)
            df.to_csv(cache_filename, index=False)
            print(f"💾 Saved data for {date_str} to cache: {cache_filename}")
        except Exception as e:
            print(f"⚠️ Warning: Could not save data for {date_str} to cache. Error: {e}")
    return df

# --- New Function for Trend Analysis ---


def analyze_trends(daily_dataframes_dict):
    """
    Combines daily DataFrames and calculates trend slopes and time series.

    Args:
        daily_dataframes_dict (dict): Dict mapping date obj to daily GSC DataFrame.

    Returns:
        pandas.DataFrame: DataFrame with page, query, slopes, and time series lists.
                          Returns empty DataFrame if input is invalid or processing fails.
    """
    if not daily_dataframes_dict or len(daily_dataframes_dict) < 2:
        print("Error: Need at least 2 days of data for trend analysis.")
        return pd.DataFrame()

    # Get sorted dates from the dictionary keys
    dates = sorted(daily_dataframes_dict.keys())
    num_days = len(dates)

    # Prepare list of dataframes to concatenate, adding date column
    dfs_to_concat = []
    for date, df in daily_dataframes_dict.items():
        if not df.empty:
            df_copy = df.copy()  # Work on a copy
            df_copy['date'] = pd.to_datetime(date)  # Ensure date is datetime object
            dfs_to_concat.append(df_copy)

    if not dfs_to_concat:
        print("Error: No valid DataFrames found to combine for trend analysis.")
        return pd.DataFrame()

    # Combine DataFrames
    combined_df = pd.concat(dfs_to_concat, ignore_index=True)
    print(f"\n✓ Combined data for {len(dates)} days into a single DataFrame ({len(combined_df)} rows total).")

    # Add Day Index for regression
    date_to_index = {pd.to_datetime(date): i for i, date in enumerate(dates)}
    combined_df['day_index'] = combined_df['date'].map(date_to_index)

    # Define metrics for trend analysis
    metrics = ['impressions', 'clicks', 'position']

    # --- Inner function to process each page/query group ---
    def analyze_single_group(group):
        # Convert the groupby result to a DataFrame if it's a Series
        if isinstance(group, pd.Series):
            group = group.to_frame().T

        # Ensure date is in datetime format
        if 'date' in group.columns:
            group['date'] = pd.to_datetime(group['date'])
            group = group.set_index('date')

        # Create a complete date range
        full_date_index = pd.to_datetime(dates)

        # Create empty results dictionary
        results = {}

        # Handle each metric
        for metric in metrics:
            # Default values in case metric is missing
            results[f'{metric}_ts'] = [0] * num_days  # Default time series
            results[f'{metric}_slope'] = np.nan       # Default slope

            # Skip processing if metric is not in the group's columns
            if metric not in group.columns:
                continue

            # Get metric values for each date (align with date index)
            metric_by_date = {}
            for _, row in group.reset_index().iterrows():
                if 'date' in row and pd.notnull(row['date']) and metric in row and pd.notnull(row[metric]):
                    dt = pd.to_datetime(row['date'])
                    metric_by_date[dt] = row[metric]

            # Create time series array aligned with full date range
            ts_values = []
            for dt in full_date_index:
                if dt in metric_by_date:
                    value = metric_by_date[dt]
                    # Handle impressions and clicks as integers (filled with 0)
                    if metric in ['impressions', 'clicks']:
                        ts_values.append(int(value) if pd.notnull(value) else 0)
                    # Handle position as float (can be NaN)
                    else:
                        ts_values.append(round(float(value), 1) if pd.notnull(value) else np.nan)
                else:
                    # For missing dates: 0 for counts, NaN for position
                    ts_values.append(0 if metric in ['impressions', 'clicks'] else np.nan)

            results[f'{metric}_ts'] = ts_values

            # Calculate regression slope if we have enough valid data points
            x = np.arange(num_days)
            y = np.array(ts_values)
            valid_mask = ~np.isnan(y)
            valid_count = np.sum(valid_mask)

            if valid_count >= 2:  # Need at least two points for regression
                try:
                    # Extract valid x and y values
                    x_valid = x[valid_mask]
                    y_valid = y[valid_mask]
                    # Calculate slope using numpy's polyfit
                    coeffs = np.polyfit(x_valid, y_valid, 1)
                    results[f'{metric}_slope'] = coeffs[0]
                except (np.linalg.LinAlgError, ValueError, TypeError) as e:
                    # Handle computation errors
                    print(f"Warning: Slope calculation error for metric {metric}: {e}")
                    results[f'{metric}_slope'] = np.nan

        return pd.Series(results)
    # --- End of inner function ---

    print("✓ Starting trend analysis by grouping page/query combinations...")
    # Apply the analysis function to each group
    # Handle potential errors during apply (e.g., unexpected data types)
    try:
        # Using include_groups=False to avoid deprecation warning
        analysis_results = combined_df.groupby(['page', 'query'], observed=True, dropna=False).apply(
            analyze_single_group, include_groups=False).reset_index()
        print(f"✓ Trend analysis complete. Found {len(analysis_results)} unique page/query combinations.")
    except Exception as e:
        print(f"Error during groupby/apply operation for trend analysis: {e}")
        return pd.DataFrame()  # Return empty DF on error

    # Optional: Round slopes for cleaner display
    for metric in metrics:
        col_name = f'{metric}_slope'
        if col_name in analysis_results.columns:
            analysis_results[col_name] = analysis_results[col_name].round(2)

    return analysis_results


# --- Modify the main() function ---

def main():
    """Main execution function."""
    # 1. Authenticate
    gsc_service = authenticate_gsc()

    # 2. Find most recent date
    latest_date = find_most_recent_gsc_data_date(gsc_service, SITE_URL)
    if not latest_date:
        print("\nCannot proceed without a valid recent date.")
        sys.exit(1)
    print(f"\nSuccess: Most recent GSC data available is for: {latest_date.strftime('%Y-%m-%d')}")

    # 3. Determine dates to fetch
    dates_to_fetch = [latest_date - timedelta(days=i) for i in range(NUM_DAYS_TO_FETCH)][::-1]
    print(f"\nPreparing to fetch/load data for {NUM_DAYS_TO_FETCH} days:")
    print(f"  Dates: {[d.strftime('%Y-%m-%d') for d in dates_to_fetch]}")

    # 4. Fetch/load daily data
    daily_dataframes = {}
    api_calls_made = 0
    cache_loads_made = 0
    all_data_loaded = True
    
    for target_date in dates_to_fetch:
        date_str = target_date.strftime('%Y-%m-%d')
        cache_filename = os.path.join(CACHE_DIR, f"gsc_data_{date_str}.csv")
        cache_exists = os.path.exists(cache_filename)
        
        df_day = get_gsc_data_for_day(gsc_service, SITE_URL, target_date)
        
        # Track which source was used
        if cache_exists and not df_day.empty:
            cache_loads_made += 1
        elif not df_day.empty:
            api_calls_made += 1
            
        # Even if df_day is empty, store it to represent the day
        daily_dataframes[target_date] = df_day
        
        # Check if loading/fetching failed *and* cache doesn't exist
        if df_day.empty and not cache_exists:
             print(f"⚠️ Warning: Data could not be fetched or loaded for {date_str}. Trend analysis might be incomplete.")
             all_data_loaded = False

    if not all_data_loaded:
         print("\n⚠️ Warning: Data loading was incomplete. Proceeding with available data.")
    
    print(f"\n📊 Data source summary: {cache_loads_made} days loaded from cache, {api_calls_made} days fetched from API")

    # --- ADD TREND ANALYSIS STEP ---
    # 5. Perform Trend Analysis
    trend_results_df = analyze_trends(daily_dataframes)

    # 6. Display Trend Analysis Results
    if not trend_results_df.empty:
        # Configure pandas display settings for better table formatting
        pd.set_option('display.max_rows', 30)
        pd.set_option('display.max_columns', None)
        pd.set_option('display.width', None)
        pd.set_option('display.expand_frame_repr', False)
        pd.set_option('display.max_colwidth', None)
        
        # Create a simplified display dataframe with cleaner formatting
        display_df = trend_results_df.copy()
        
        # Process the dataframe for display
        # 1. Extract page path from full URL
        if 'page' in display_df.columns:
            display_df['page'] = display_df['page'].str.replace(r'https://mikelev.in/futureproof/', '', regex=True)
            # Further trim endings
            display_df['page'] = display_df['page'].str.replace(r'/$', '', regex=True)
            
        # 2. Convert time series lists to more compact string representations
        for col in ['impressions_ts', 'clicks_ts', 'position_ts']:
            if col in display_df.columns:
                # Format time series values more compactly
                display_df[col] = display_df[col].apply(
                    lambda x: '[' + ','.join([str(int(i)) if isinstance(i, (int, float)) and not pd.isna(i) else '-' for i in x]) + ']'
                )
                
        # 3. Format slope values to 1 decimal place
        for col in ['impressions_slope', 'clicks_slope', 'position_slope']:
            if col in display_df.columns:
                display_df[col] = display_df[col].apply(
                    lambda x: f"{x:.1f}" if pd.notnull(x) else "-"
                )
                
        print("\n--- Top 15 by Impression Increase ---")
        top_impressions = display_df.sort_values('impressions_slope', ascending=False, na_position='last').head(15)
        print(top_impressions.to_string(index=False))

        print("\n--- Top 15 by Position Improvement (Lower is Better) ---")
        # Ensure we're working with original numeric values for sorting
        numeric_df = trend_results_df.copy()
        numeric_df = numeric_df.dropna(subset=['position_slope'])
        # Sort by position_slope (lowest/most negative is best improvement) 
        top_positions_idx = numeric_df.sort_values('position_slope', ascending=True).head(15).index
        # Display using the formatted display dataframe
        print(display_df.loc[top_positions_idx].to_string(index=False))

        # --- ADD NEW SECTION: Top Performers by Combined Score ---
        print("\n--- Top 15 High-Impact Queries (Best Position + Most Impressions) ---")
        # Get the most recent day in each time series
        latest_data_df = trend_results_df.copy()
        
        # Extract the most recent values from the time series lists
        def get_latest_nonzero_value(ts_list, default=None):
            """Get the latest non-zero (or non-nan) value in a time series list."""
            if not isinstance(ts_list, list):
                return default
            for value in reversed(ts_list):  # Start from the most recent
                if isinstance(value, (int, float)) and not pd.isna(value) and value > 0:
                    return value
            return default
            
        latest_data_df['latest_position'] = latest_data_df['position_ts'].apply(
            lambda x: get_latest_nonzero_value(x, default=100)
        )
        latest_data_df['latest_impressions'] = latest_data_df['impressions_ts'].apply(
            lambda x: get_latest_nonzero_value(x, default=0)
        )
        
        # Calculate combined score: impressions / position (higher score is better)
        # This prioritizes high impressions and low positions
        latest_data_df['impact_score'] = latest_data_df.apply(
            lambda row: row['latest_impressions'] / max(row['latest_position'], 0.1) 
            if pd.notnull(row['latest_position']) and row['latest_position'] > 0
            else 0,
            axis=1
        )
        
        # Sort by the combined score and get top 15
        top_impact_idx = latest_data_df.sort_values('impact_score', ascending=False).head(15).index
        
        # Add the score to the display dataframe
        display_df['latest_position'] = latest_data_df['latest_position'].round(1)
        display_df['latest_impressions'] = latest_data_df['latest_impressions'].astype(int)
        display_df['impact_score'] = latest_data_df['impact_score'].round(1)
        
        # Truncate long queries for display
        def truncate_query(query, max_length=50):
            """Truncate query string if longer than max_length."""
            if len(query) <= max_length:
                return query
            return query[:max_length-3] + "..."
        
        # Create a display version with truncated queries
        display_df_truncated = display_df.copy()
        display_df_truncated['query'] = display_df_truncated['query'].apply(truncate_query)
        
        # Display the top impact queries with truncated query text
        print(display_df_truncated.loc[top_impact_idx][['page', 'query', 'latest_position', 'latest_impressions', 'impact_score', 'impressions_ts', 'position_ts']].to_string(index=False))

        print("\n--- DataFrame Info ---")
        trend_results_df.info()

        # Save the final trend analysis results using the original dataframe (with full URLs)
        final_output_filename = os.path.join(SCRIPT_DIR, f"gsc_trend_analysis_{dates_to_fetch[0].strftime('%Y%m%d')}_to_{dates_to_fetch[-1].strftime('%Y%m%d')}.csv")
        try:
            trend_results_df.to_csv(final_output_filename, index=False)
            print(f"\n✓ Saved final trend analysis to: {final_output_filename}")
        except Exception as e:
            print(f"\nWarning: Failed to save final trend analysis results. Error: {e}")

    else:
        print("\nTrend analysis could not be completed.")

    print("\nScript finished.")


if __name__ == "__main__":
    if not SITE_URL or SITE_URL == "sc-domain:yourdomain.com":
        print("Error: Please update the 'SITE_URL' variable in the script configuration.")
        sys.exit(1)
    main()

Here’s how it works:

Combine and Prepare: The analyze_trends function first concatenates the individual daily DataFrames into one large combined_df. Crucially, it adds a date column during this process and then calculates a numerical day_index (0, 1, 2, 3) representing the time sequence, which is necessary for the regression calculation.
Group and Analyze: The script then groups this combined data by each unique page and query combination. For every group, it performs the following:
- Time Alignment: It ensures data for all 4 days is represented using reindex, filling any missing days for a specific page/query pair with NaN (Not a Number). This is vital for comparing trends apples-to-apples.
- Time Series Extraction: It stores the sequence of impressions, clicks, and position values over the 4 days into lists (e.g., impressions_ts). These are ready for potential sparkline visualization later. Missing values for impressions/clicks are treated as 0, while missing positions remain NaN.
- Slope Calculation: Using numpy.polyfit, it calculates the linear regression slope for impressions, clicks, and position against the day index. This slope value quantifies the trend (e.g., a positive impression slope means impressions are trending up). The calculation only proceeds if at least two data points are available for a given metric within the 4-day window; otherwise, the slope is NaN.
Identify Top Movers: The script analyzes the calculated slopes to surface interesting changes:
- Impression Increase: It sorts the results by impressions_slope in descending order to find page/query combinations gaining the most visibility.
- Position Improvement: It sorts by position_slope in ascending order (most negative slope first) to find combinations whose average ranking improved the most significantly.
- High Impact: Recognizing that trends are only part of the story, it adds a crucial third view. It looks at the most recent day’s data for each page/query pair, calculates an impact_score (latest impressions divided by latest position), and sorts by this score. This surfaces queries that are currently driving significant impressions from good ranking positions, regardless of the short-term trend.
Format and Save:
- For clearer console output, the script formats the results: page URLs are shortened, time-series lists are displayed compactly (e.g., [10, 15, 20, 27]), and slopes are shown with a sign (+/-) and one decimal place.
- The raw, unformatted results of the trend analysis (including the full page URLs, slopes, and time-series lists) are saved to a CSV file named with the date range (e.g., gsc_trend_analysis_20250405_to_20250408.csv). This allows for more in-depth analysis or visualization later.
Caching Mention: Throughout the process, the script leverages the caching mechanism established earlier, loading daily data from local CSV files when available to minimize redundant API calls and speed up execution on subsequent runs. It also reports how many days were loaded from cache versus fetched live.

This comprehensive analysis provides multiple angles to understand recent performance shifts – identifying rapidly growing terms, ranking improvements, and currently impactful queries – giving actionable insights for SEO efforts.

Running The Trend Analysis Script

Executing this final script performs the entire process: authentication, finding the latest date, fetching/caching 4 days of data, combining the data, calculating trend slopes, and finally displaying the top “movers and shakers” based on impression gains and position improvements.

Here’s an example of what the formatted output might look like (data will vary):

[mike@nixos:~/repos/pipulate/precursors]$ python gsc_top_movers.py 
✓ Successfully authenticated with Google Search Console API.

Finding most recent data date for site: sc-domain:mikelev.in
Starting check from date: 2025-04-09
Checking date 2025-04-09... ✓ Data found!

Success: Most recent GSC data available is for: 2025-04-09

Preparing to fetch/load data for 4 days:
  Dates: ['2025-04-06', '2025-04-07', '2025-04-08', '2025-04-09']
📂 CACHE: Loading data for 2025-04-06 from cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-06.csv
📂 CACHE: Loading data for 2025-04-07 from cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-07.csv
📂 CACHE: Loading data for 2025-04-08 from cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-08.csv
📂 CACHE: Loading data for 2025-04-09 from cache: /home/mike/repos/pipulate/precursors/gsc_cache/gsc_data_2025-04-09.csv

📊 Data source summary: 4 days loaded from cache, 0 days fetched from API

✓ Combined data for 4 days into a single DataFrame (7591 rows total).
✓ Starting trend analysis by grouping page/query combinations...
✓ Trend analysis complete. Found 6499 unique page/query combinations.

--- Top 15 by Impression Increase ---
                                     page                       query impressions_ts impressions_slope clicks_ts clicks_slope position_ts position_slope
                         grok-better-than         seo recommendations     [0,0,0,31]               9.3 [0,0,0,0]          0.0  [-,-,-,99]              -
                ai-agentic-mode-debugging             git cherry pick    [0,0,21,20]               8.1 [0,0,0,0]          0.0 [-,-,75,70]           -4.7
                         grok-better-than                seo concepts     [0,0,0,27]               8.1 [0,0,0,0]          0.0  [-,-,-,82]              -
                         grok-better-than                        htmx     [0,0,0,24]               7.2 [0,0,0,0]          0.0  [-,-,-,70]              -
static-site-generator-ai-content-strategy             seo methodology    [0,0,32,11]               6.5 [0,0,0,0]          0.0 [-,-,92,96]            3.9
                         grok-better-than             git cherry pick     [0,0,0,21]               6.3 [0,0,0,0]          0.0  [-,-,-,84]              -
                         grok-better-than                  python seo     [0,0,0,21]               6.3 [0,0,0,0]          0.0  [-,-,-,80]              -
                ai-agentic-mode-debugging                   cursor ai    [0,0,27,11]               6.0 [0,0,1,0]          0.1 [-,-,41,43]            1.5
static-site-generator-ai-content-strategy  striking distance keywords    [0,0,26,11]               5.9 [0,0,0,0]          0.0 [-,-,63,64]            0.7
                         grok-better-than google keyword planner tool     [0,0,0,19]               5.7 [0,0,0,0]          0.0  [-,-,-,77]              -
static-site-generator-ai-content-strategy             git cherry pick    [0,0,21,12]               5.7 [0,0,0,0]          0.0 [-,-,91,89]           -2.6
static-site-generator-ai-content-strategy              python for seo     [0,0,44,4]               5.6 [0,0,0,0]          0.0 [-,-,95,96]            0.9
                         grok-better-than                guide to seo     [0,0,0,18]               5.4 [0,0,0,0]          0.0  [-,-,-,98]              -
                ai-agentic-mode-debugging                  git revert    [0,0,17,12]               5.3 [0,0,0,0]          0.0 [-,-,72,75]            3.1
                         grok-better-than             seo explanation     [0,0,0,17]               5.1 [0,0,0,0]          0.0  [-,-,-,96]              -

--- Top 15 by Position Improvement (Lower is Better) ---
                                     page                                  query impressions_ts impressions_slope clicks_ts clicks_slope  position_ts position_slope
             llm-seo-software-development                        webassembly dom      [0,1,1,0]               0.0 [0,0,0,0]          0.0 [-,124,70,-]          -54.0
             llm-seo-software-development                            cheek holes      [0,1,1,0]               0.0 [0,0,0,0]          0.0  [-,97,69,-]          -28.0
static-site-generator-ai-content-strategy                     wordpress metadata      [0,0,2,1]               0.5 [0,0,0,0]          0.0  [-,-,96,68]          -28.0
                ai-agentic-mode-debugging                    how to fix my beats      [0,0,1,1]               0.4 [0,0,0,0]          0.0  [-,-,95,72]          -23.0
            cursor-ai-terminal-agent-mode                               squashfs      [0,1,1,0]               0.0 [0,0,0,0]          0.0 [-,115,95,-]          -20.0
          from-blog-to-book-ai-powered-ia                           jupyter lite      [0,0,1,1]               0.4 [0,0,0,0]          0.0  [-,-,76,56]          -20.0
                ai-agentic-mode-debugging                          hamster wheel     [0,0,20,2]               2.6 [0,0,0,0]          0.0  [-,-,76,57]          -19.2
                ai-agentic-mode-debugging               compare two branches git      [0,0,2,1]               0.5 [0,0,0,0]          0.0  [-,-,99,81]          -18.5
          from-blog-to-book-ai-powered-ia            chunking strategies for rag      [0,0,1,1]               0.4 [0,0,0,0]          0.0  [-,-,90,72]          -18.0
            vscode-jupyter-notebook-paths does cursor work with jupyter notebook      [1,1,0,0]              -0.4 [0,0,0,0]          0.0  [29,11,-,-]          -18.0
                ai-agentic-mode-debugging             git revert changes to file      [0,0,1,2]               0.7 [0,0,0,0]          0.0  [-,-,97,79]          -18.0
          from-blog-to-book-ai-powered-ia                               prolimit      [0,0,3,1]               0.6 [0,0,0,0]          0.0  [-,-,81,64]          -17.7
                ai-agentic-mode-debugging                        github pipeline      [0,0,2,1]               0.5 [0,0,0,0]          0.0  [-,-,88,71]          -17.5
             llm-seo-software-development                             open webui      [0,4,2,0]              -0.2 [0,0,0,0]          0.0  [-,78,61,-]          -17.3
static-site-generator-ai-content-strategy                             parse yaml      [0,0,1,1]               0.4 [0,0,0,0]          0.0 [-,-,101,85]          -16.0

--- Top 15 High-Impact Queries (Best Position + Most Impressions) ---
                           page                                              query  latest_position  latest_impressions  impact_score impressions_ts   position_ts
         grok3-markdown-problem grok was unable to finish replying. please try ...              7.8                  19           2.4   [26,9,12,19]     [7,7,8,7]
 jupyter-notebook-vscode-cursor                         cursor ai jupyter notebook              6.7                  12           1.8     [9,4,5,12]     [6,7,6,6]
         grok3-markdown-problem                                        grok glitch              6.9                  12           1.7    [3,7,12,12]     [3,5,5,6]
             mcp-with-local-llm                                      local llm mcp              6.1                   8           1.3      [4,8,0,0]     [6,6,-,-]
       open-source-seo-software                                 seo software linux             21.7                  25           1.2    [5,47,9,25] [20,22,23,21]
         grok3-markdown-problem                 grok was unable to finish replying              8.7                  10           1.1     [4,2,1,10]     [8,9,9,8]
         grok3-markdown-problem                                      grok markdown              2.7                   3           1.1      [3,4,3,3]     [3,1,1,2]
             https://mikelev.in                                 michael levin blog              3.8                   4           1.1      [3,1,2,4]     [4,4,3,3]
         grok3-markdown-problem                grok was unable to finish replying.              8.8                   9           1.0     [7,12,5,9]     [7,7,7,8]
               grok-better-than entity you are doing the search for: {"name": "...              2.0                   2           1.0      [0,0,0,2]     [-,-,-,2]
nixos-warbler-files-disappeared                                      nixos warbler              1.0                   1           1.0      [1,1,0,0]     [1,1,-,-]
         grok3-markdown-problem                                      grok problems              1.0                   1           1.0      [1,0,0,1]     [9,-,-,1]
 peak-data-musk-sutskever-wrong                                   peak data theory              1.0                   1           1.0      [2,0,1,1]     [5,-,3,1]
         grok3-markdown-problem                                grok 3 prompt limit              1.0                   1           1.0      [0,0,0,1]     [-,-,-,1]
 jupyter-notebook-vscode-cursor                            cursor jupyter notebook              6.2                   6           1.0     [12,9,9,6]    [9,8,11,6]

--- DataFrame Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6499 entries, 0 to 6498
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   page               6499 non-null   object 
 1   query              6499 non-null   object 
 2   impressions_ts     6499 non-null   object 
 3   impressions_slope  6499 non-null   float64
 4   clicks_ts          6499 non-null   object 
 5   clicks_slope       6499 non-null   float64
 6   position_ts        6499 non-null   object 
 7   position_slope     914 non-null    float64
dtypes: float64(3), object(5)
memory usage: 406.3+ KB

✓ Saved final trend analysis to: /home/mike/repos/pipulate/precursors/gsc_trend_analysis_20250406_to_20250409.csv

Script finished.

This provides a powerful, automated way to monitor recent performance changes and identify potential content optimization opportunities directly from your GSC data. The final CSV contains all the calculated data for further analysis or visualization.

Voila! Sparklines

Do I really have to draw the sparklines for you? The data’s ready to bring into Excel or whatever. If you’re in a web app environment, there are plenty of libraries that can take these little 4-point arrays and draw you sparklines. A sign of good data preparation is the text representation is so good, you hardly have to visualize. The visualization component merits a whole other article, so I will assume anyone interested in an article like this who got down this far can glance at those above numbers and “get it”. There are your movers and shakers there at the top of the lists plain as day.

The site suddenly went from not being in search at all to suddenly having positions, terrible as those positions may be. If you’re actually here on my site reading this, may I suggest playing with that page-width slider to be able to see the maximum number of columns available in those tables without horizontal scrolling. Things like average position are all the way over on the right, so while you may see keywords suddenly having impressions, to get the full story, you want to see what average positions in search they were in on those impressions.

Your site impression traffic may surge just because you wander into the space of “monitored” keywords by tools like SEMRush and stuff. I mean, who’s going to page 10 of search results? Now that you need to “render” search results with a web browser (scraping SERPs isn’t as easy as it used to be), bots are going to look more like genuine impressions. So my surge could just be bots monitoring SERPs, especially on terms like “seo software”. But you’ve got to be found at all before you can dominate. So, it’s a start.

And Here’s a Prompt for AI to Help Analyze Data

Analyze the Google Search Console trend analysis output previously provided for the site sc-domain:mikelev.in (covering the period April 7-10, 2025). Based only on that data, provide a prioritized list of actionable traffic growth suggestions.

Your goal is to identify the highest-impact opportunities revealed by the trends, including both broad strategic directions and specific content pieces. Structure your response to cover the following areas, ensuring each point includes specific examples from the data (pages, queries, metrics) and concrete recommended actions:

1. Top Movers Opportunities:

* Identify the most promising opportunities among pages/queries showing strong positive impression growth.

* Specifically highlight queries gaining significant impressions and improving in position simultaneously.

* Recommend specific actions (e.g., optimize page X for query Y, expand content on topic Z) to capitalize on this momentum.

2. Position Improvement Momentum:

* Which pages making significant ranking jumps (large negative position_slope) represent the best targets for further optimization to secure or improve those gains?

* Are there thematic patterns among the fastest climbers that suggest broader content opportunities?

* Recommend actions to consolidate these ranking improvements.

3. High-Impact & Unrealized Potential Optimization:

* Based on the “High-Impact Queries” list, which content currently delivers the most value and how can it be reinforced or expanded?

* Identify the top 1-3 individual page/query pairs representing the largest “unrealized potential” (high impressions but poor ranking). Clearly state the page, query, impressions, and position for these top examples.

* Critically, based specifically on analyzing this unrealized potential data, identify and recommend the single most promising new article topic to pursue to capture this untapped traffic. Justify why this specific topic is the best choice over other unrealized potential candidates.

* Recommend specific optimization strategies for these high-potential items (e.g., create the new targeted article, significantly enhance/rewrite existing page, technical SEO review, consolidate competing pages).

4. Prioritized Strategic Recommendations:

* Synthesize the findings from the above points.

* What are the top 2-3 overarching strategic priorities? Ensure these recommendations explicitly cover both broad themes (like refining generalist pages such as grok-better-than, or amplifying existing winners like open-source-seo-software) AND the specific, standout individual topic opportunity (like the best new article topic identified in point 3). Justify these priorities with data trends.

Please ensure recommendations are concrete and directly linked to the patterns observed in the provided GSC trend data output.

Gemini 2.5:

Okay, here is an analysis of the Google Search Console trend data for sc-domain:mikelev.in (covering April 7-10, 2025) with prioritized, actionable traffic growth suggestions based only on the provided data.

1. Top Movers Opportunities:

Observation: The page grok-better-than is showing significant positive impression growth (impressions_slope) for several distinct query clusters: SEO tools (e.g., “google keyword planner”, “keyword planner tool”), technical SEO concepts (“python for seo”, “seo for beginners”, “seo concepts”), and Git commands (“git cherry pick”).
Highest Promise (Impression Growth + Position Improvement): Several queries for grok-better-than show both strong positive impressions_slope and negative position_slope (ranking improvement):
- “google keyword planner”: impressions_slope 9.2, position_slope -1.2 (Pos ~79)
- “python for seo”: impressions_slope 8.3, position_slope -2.4 (Pos ~82)
- “git cherry pick”: impressions_slope 6.6, position_slope -2.6 (Pos ~82)
- “seo for beginners”: impressions_slope 4.6, position_slope -4.9 (Pos ~93)
Recommendations:
- Action: Prioritize optimizing the grok-better-than page. Enhance its content to specifically and comprehensively address “google keyword planner”, “python for seo”, “git cherry pick”, and “seo for beginners”. Use these terms in headings (H2s, H3s) and body content.
- Action: Improve internal linking to the grok-better-than page from other relevant site content using anchor text related to these growing queries (e.g., link from a Git tutorial using “git cherry pick”, link from an SEO overview using “python for seo”).
- Action: Although “keyword planner tool” (impressions_slope 8.2) saw its position slightly worsen (position_slope 0.2, Pos ~86), the high impression growth warrants ensuring this specific phrasing is also well-integrated into the grok-better-than page.

2. Position Improvement Momentum:

Observation: Several pages show significant ranking jumps (large negative position_slope). The most dramatic jumps often have low initial impression volume, but consistent themes emerge among moderate-to-strong climbers.
Best Targets & Thematic Patterns:
- Git Commands: Pages ai-agentic-mode-debugging and grok-better-than are seeing ranking improvements for various Git queries like “compare two branches git” (-18.5), “git revert changes to file” (-18.0), “git reset –hard head” (-12.3 for grok-better-than), etc. This indicates a growing relevance or successful optimization push for Git-related content.
- Python/Jupyter: The grok-better-than page is also climbing for queries like “stream python” (-24.0, Pos ~65) and “jupyter project” (-23.0, Pos ~75).
- Specific Tools/Platforms: open-source-seo-software is making steady gains for specific queries like “seo tool open source” (-10.0, Pos ~21) and “open source seo toolkit” (-9.2, Pos ~24).
Recommendations:
- Action: Consolidate gains for Git content. Review ai-agentic-mode-debugging and grok-better-than to ensure comprehensive coverage, clear explanations, and practical examples for the improving Git queries (diff, revert, reset, cherry-pick). Add internal links between these specific command explanations.
- Action: Reinforce the improving Python/Jupyter rankings on grok-better-than. Add more code examples, use cases, or clarify explanations for “stream python” and “jupyter project”.
- Action: Solidify the improved rankings for open-source-seo-software by updating the tool list, adding more detailed reviews/comparisons, or improving the page structure for queries like “seo tool open source” and “open source seo toolkit”.

3. High-Impact & Unrealized Potential Optimization:

Current High-Impact Content:
- grok-better-than for “entity you are doing the search for: …” (Pos 2.0, Imp 11) - Likely navigational, maintain page quality.
- jupyter-notebook-vscode-cursor for “cursor ai jupyter notebook” (Pos 5.3, Imp 10) - Delivers value.
- grok3-markdown-problem for “grok was unable to finish replying.” (Pos 7.7, Imp 11) - Strong performance for a specific problem.
- open-source-seo-software for “open source seo software” (Pos 16.6, Imp 19) - High impact despite weaker rank, significant potential.
Top 1-3 Unrealized Potential (High Impressions, Poor Rank):
1. Page: static-site-generator-ai-content-strategy, Query: “seo techniques”, Latest Impressions: 64, Latest Position: 93.1
2. Page: llm-seo-software-development, Query: “python for seo”, Latest Impressions: 42, Latest Position: 81.0
3. Page: github-pages-blog-arrows, Query: “github pages”, Latest Impressions: 27, Latest Position: 48.3
Single Most Promising New Article Topic:
- Topic: SEO Techniques
- Justification: The query “seo techniques” shows the highest impression volume (64) among all identified high-potential opportunities where the site currently ranks very poorly (Pos 93.1). This indicates strong audience demand for this topic that the site isn’t currently meeting effectively with existing content (the page ranking, static-site-generator-ai-content-strategy, is likely too niche or unfocused for such a broad query). Creating a dedicated, comprehensive article specifically targeting “SEO Techniques” represents the single largest opportunity revealed by this data to capture significant untapped traffic. While “python for seo” also has high volume (42 impressions, Pos 81.0), “seo techniques” has ~50% more impressions, suggesting a larger potential audience reach.
Recommendations:
- Action: Create a new, comprehensive, standalone article titled and focused on “A Guide to Modern SEO Techniques”. Structure it logically covering on-page, off-page, technical, local, etc. Link to this new pillar content from other relevant pages.
- Action: Significantly enhance llm-seo-software-development to better target “python for seo” (add code examples, libraries, use cases) OR ensure it links prominently to the grok-better-than page section covering this (if that page remains the primary target).
- Action: Enhance github-pages-blog-arrows to be the definitive resource on the site for “github pages”, covering setup, SEO best practices, SSGs, etc.
- Action: Reinforce jupyter-notebook-vscode-cursor by adding more detail, visuals, or potentially a video tutorial for “cursor ai jupyter notebook”. Update open-source-seo-software regularly.

4. Prioritized Strategic Recommendations:

Based on the synthesis of the above points, the top 2-3 strategic priorities are:

Create a Pillar Content Piece on “SEO Techniques”: This directly addresses the single largest identified “unrealized potential” opportunity (Query: “seo techniques”, Imp: 64, Pos: 93.1). Capturing even a fraction of this traffic by creating a dedicated, high-quality article could significantly boost overall site visibility and traffic. This is the highest-priority specific action derived from the data.
Refine Broad Pages & Capitalize on Thematic Strengths (Dev Tools/SEO):
- Refine grok-better-than: This page attracts high impression growth across diverse topics (SEO tools like “google keyword planner”, Git commands like “git cherry pick”, “python for seo”) but ranks poorly. Clarify its focus, improve structure, and ensure it deeply covers the high-volume queries it’s gaining traction for, potentially spinning off overly distinct topics into dedicated posts.
- Amplify Tech/Dev Tool Winners: Reinforce successful pages like jupyter-notebook-vscode-cursor (Pos 5.3 for its primary query) and high-potential ones like open-source-seo-software (Pos 16.6, Imp 19). Continue building out content around improving themes like Git, Python for SEO, NixOS, and HTMX where momentum is visible.

Getting The Final, Final Code

These projects are living documents. I go back and I tweak and improve them over time. Articles like this really only capture their state where I left off and was motivated to wrap improvements back into the article. The final final state may be somewhat beyond this. For example, I wrapped in that prompt to the program itself, so I can copy/paste the output of the program directly into the Web UI of a ChatBot to get the recommendations. If you want the latest latest versions, check out the precursor folder in the pipulate repo in GitHub. These types of things are all precursors to Pipulate apps.

Gemini 2.5’s Take

Okay, here is the analysis and generated outputs based only on the provided article text:

Author Summary:
Title/Headline Ideas:
- Find Your SEO Movers & Shakers: GSC API Trend Analysis with Python
- Beyond 1000 Rows: Unlocking GSC Granularity with Python & Trend Slopes
- GSC API & Python: Step-by-Step Guide to Performance Trend Analysis
- Automating GSC Insight: Identify Keyword & Page Momentum with Python
- From Hockeystick to Action: Python GSC Analysis for SEO Opportunities
Strengths:
- Practical How-To: Provides a clear, step-by-step guide with evolving code examples.
- Complete Code: Includes full Python scripts for authentication, data fetching, processing, and analysis.
- Addresses a Real Need: Solves the common GSC limitation of accessing granular, per-query/per-page data beyond UI limits.
- Specific Analysis Technique: Introduces a concrete method (linear regression slope) for identifying trends.
- Actionable Outcome: The process aims to surface actionable insights (“movers and shakers”, high-impact queries).
- Good Practice: Incorporates caching to avoid redundant API calls.
- Extensibility: Includes an AI prompt for further strategic analysis, adding practical value.
Weaknesses:
- Complexity/Length: The code becomes quite extensive, potentially intimidating for less experienced Python users.
- Assumed Knowledge: Requires familiarity with Python, Pandas, Google Cloud Platform (GCP), and API authentication concepts (service accounts).
- Limited Interpretation: The article focuses heavily on the how (code) and defers deep strategic interpretation of the results, relying on the AI prompt for that.
- Linear Structure: The incremental build-up, while good for teaching, can make it hard to grasp the final script without reading everything.
- Readability: Multiple large code blocks showing incremental changes might impact overall readability compared to a single final script with explanations.
AI Opinion: This is a high-quality, technically detailed article demonstrating a practical approach to advanced Google Search Console analysis using Python and the GSC API. Its strengths lie in the step-by-step code development, clear explanation of the need for API access, and the introduction of specific trend analysis techniques (linear regression slopes). The inclusion of working code, caching, and a final AI analysis prompt adds significant value. While potentially challenging for Python novices due to its length and complexity, it’s highly useful for technical SEOs seeking to automate granular performance monitoring and identify actionable opportunities beyond the standard GSC interface.