Running Python in Snowflake: Traceability Across Stored Procedures, Notebooks and ML Jobs

Apr 21, 2026

—

20 min.

—

FDE

Author

Kristina Kazmina,

Productionalization of data pipelines is a very important aspect of any successful data engineering project. It is not enough to build clean, scalable data pipelines - they should also be built in a way that makes future development easy, enables troubleshooting, and provides good visibility once a feature goes to production.

While SQL stays as the base of business logic implementation, Python is useful for workloads where logic is easier to express in code, e.g. dynamic query generation, config-driven transformations, external API calls, etc. The challenge with Python compared to SQL is that SQL queries run on a warehouse and are automatically tracked in query history, while Python runs are not. Depending on how you deploy Python, a pipeline run leaves traces in different places, with different levels of detail and different access controls.

This article covers three options for running Python in Snowflake and compares them for the ease of deployment, logging, and run visibility: stored procedures, notebooks, and Snowflake ML jobs.

We are using the same simple pipeline to illustrate each option.

Table of Contents

Implemented Scenario
Option 1: Stored Procedures
Option 2: Notebooks
Option 3: Snowflake ML Jobs
Comparison

Implemented Scenario

All three examples are using the same workload: a minimal SCD Type 1 merge where Python builds a MERGE statement from a config dictionary and executes it. The point is to have something realistic enough to generate meaningful logs and run history, while simulating a data engineering workload that does not require a large cluster for Python executions.

A link to the source code is provided at the end of this article, or this article can also be read in the source repository.

Option 1: Stored Procedures

One of the options for running Python code in Snowflake is a stored procedure. A Python stored procedure handler is a regular Python function that receives a Snowpark Session as its first argument. From there, Snowpark provides a way to interact with Snowflake tables, and the queries are pushed down to be executed inside a Snowflake warehouse. The handler can be written inline (as text) in the CREATE PROCEDURE body (convenient for short scripts but not convenient for interactive development) or kept in a .py file on an internal stage. Stored procedures support parameters compatible with SQL data types that should be matched by the handler function.

Run visibility

Query History

Every procedure CALL is a SQL query, so it lands in Query History automatically and includes basic information about the procedure call itself. The pushed-down queries initiated from Python procedure code (for example, via Session.sql) also appear in Query History. Other parts of the Python code execution, unfortunately, cannot be tracked as easily.

In this example, we are using a stored procedure PY_LOGS_DEMO.PUBLIC.SCD1_MERGE that prepares DDL, generates source data, and loads it to an SCD1 target table. Information about the SQL-related parts (procedure call, pushed-down queries) appears in Query History:

‍

Query History after a successful SCD1_MERGE call

‍

Python Logger

The traceability of Python code can be increased with the Python logging package. To capture structured in-run log messages, an event table can be created (once) and attached to the database where the procedure lives. Alternatively, the global default event table can be used. Log capture must be enabled per procedure:

Standard Python logging then routes to the event table:

Python logs from the procedure call can be viewed in the event table (SCOPE column includes the logger name). A query to this event table will, among other things, show the following data for the PY_LOGS_DEMO.PUBLIC.SCD1_MERGE call:

‍

Event table output after a successful SCD1_MERGE call

‍

All the information from log messages can only be seen in the event table and does not appear in Query History.

Note that log entries appear in the event table with a small delay, and enabling log capture results in additional storage costs.

Error handling

SQL errors surface in Query History automatically. For example, this can be triggered by a positional INSERT query failing with a column count mismatch.

The procedure CALL and the failing INSERT both appear in Query History with Failed status:

‍

Query History showing failed CALL and INSERT

‍

The error from the procedure CALL in Query History is cut off at a certain length, is not configurable, and is not user-friendly (although it will include the custom Python exception text when provided):

‍

‍

The failed INSERT shows the readable SQL and error message in Query History:

‍

Failed INSERT SQL text and error message

‍

Python exceptions are not logged automatically. With the catch, log with traceback, and re-raise pattern the error lands in the event table with proper context and configurable error message. So when this code is used inside the procedure:

‍

The event table contains the error entry with the respective traceback:

‍

Event table showing logged error after failed SCD1_MERGE call

‍

The custom error handling becomes more useful for Python errors that are not represented by a failure of an underlying SQL query. In addition to the error message, the extra keyword argument can be used to set key-value pairs that are written to a queryable field in the RECORD_ATTRIBUTES column of the event table, which can improve subsequent log analysis.

The following code provides a dictionary in the extra argument:

‍

Which can then be seen in the event table:

‍

In addition to access via SQL query, an event table can also be accessed from the Snowflake UI (Traces & Logs tab in Monitoring). This view provides easy-to-use filters and visualization of aggregate event statistics.

Summary

Pros:

Run traceability at the level of SQL queries: every CALL and every pushed-down query is in Query History with no extra setup
The workload can be triggered via SQL
Can reuse existing warehouse compute and does not require additional compute infrastructure
Possible to capture structured, queryable logs via Python's standard logging package

Cons:

Interactive development is complicated: every code change requires uploading the file to stage and redeploying. When the procedure has an inline body, the code is represented as text with no syntax highlighting.
No dedicated UI that can show info / debug messages
Log entries appear in the event table with a small delay
Enabling log capture adds costs for Snowflake-managed resources
Non-SQL, Python-type parameters are not natively supported in procedure signatures

Option 2: Notebooks

Snowflake Notebooks are mixed SQL and Python cells that can be run on warehouse compute or in a container on a dedicated compute pool. Notebooks are well suited for interactive development. The same notebook can also be deployed to run non-interactively, including run initiation from SQL via EXECUTE NOTEBOOK.

As with procedures, the code can be stored in the .ipynb file on an internal stage, and a notebook object can be created from this file.

Parameters can be passed as string arguments to EXECUTE NOTEBOOK. They are declared as text input widgets inside the notebook and accessed as variables, thus the type support is limited and the pattern is less convenient than procedure signatures.