Unlock Exclusive Content

Share your details to access the content and stay informed with relevant updates.
By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data engineering
Data engineering
CRM
Process intelligence
CLM
Work with us

Simplified Databricks Workflows Git Versioning with a Databricks App

Feb 27, 2025
10 min.
data engineering
Author
Dmitriy Alergant, Principal Cloud Architect

Introduction

Databricks Workflow Jobs are powerful tools for orchestrating data pipelines, analytics tasks, and other workloads. Anyone working with multiple Databricks environments (Dev, UAT, Prod) quickly realizes the need for seamless workflow promotion between environments, as well as robust version control for workflow definitions.

Naturally, Git repositories — whether GitHub, Azure DecOps, or another flavor — are the go-to solution everybody wants to use.

Databricks and T1A recommend the “Jobs-as-Code” approach with Databricks Asset Bundles, which  suggests packaging workflow definitions (YAML files) alongside underlying Notebooks Code. This method, deployable via CLI, integrates smoothly into CI/CD pipelines, making it a mature, native, and automation-friendly solution. We actively use Databricks Asset Bundles in some of our larger projects with great success.

However, in our experience, some smaller or less technically mature client teams have found CLI-based and YAML-first Databricks Asset Bundles intimidating or overwhelming, often resorting to manual copy/paste of Workflows JSON definitions between environments.

Exploring a More Accessible Approach

Recently, Databricks introduced Apps functionality that allows deploying 3rd party or partner-provided web apps inside the Databricks Workspace. These apps can offer:

  • Tailored, interactive user experiences with rich full-stack Web UIs
  • Seamless yet secure and governed access to Lakehouse assets, Unity Catalog, and Databricks REST APIs
  • The ability for the apps to leverage efficient Databricks Compute and GenAI resources

We set out to experiment with Databricks Apps to create a user-friendly, interactive UI that streamlines Workflow Jobs synchronization with Git Repositories. The goal? A simplified solution for teams not yet ready to adopt Databricks Asset Bundles but still needing structured version control and cross-environment promotions for job definitions.

Solution: A Databricks App for Workflow Versioning

We designed a Databricks App (planning to release as open source) that provides an intuitive Web UI for:

  • API-driven export and import of Workflow Job Definitions
  • Storing these definitions as JSON files within Workspace Git Folders
  • Enabling push and pull actions to a Git repository for  version control and change promotion between environments

Usage Flow

A Data Engineering development team works in the DEV environment, where they create and modify notebooks, write SQL queries, and configure orchestration Workflow Jobs - including adding new jobs, updating existing ones, and removing obsolete ones.

A Tech Lead wants to commit all Workflow Job changes to a Git Folder to ensure these updates can be replicated and promoted to UAT and then PROD.

Export (DEV → Git Repo)

  1. The Tech lead navigates to the deployed App URL in the DEV Workspace. Authentication is handled by Databricks itself, as long is the user is a Workspace Admin or has “Can Use” permissions for the App
  2. Confirm that “Export Mode” is selected
  3. Click “Export Workflow Jobs to JSON Files.”
  4. Check results and/or detailed logs addressing any errors if needed
  5. Click “Push to Git Repo” will open a Workspace page looking at the Git Folder. Click on the Branch Name to enter the Git popup window, review changes, provide commit comment, and Commit&Push to Git Repository.
Workflow Jobs Syncronizer UI

  1. Click “Push to Git Repo” to open the Workspace Git Folder page.
    • Click on the Branch Name to open the Git popup window
    • Review changes, provide a commit message
    • Click “Commit & Push” to sync with the Git Repository.

Push to Git Repo

Import (Git Repo → UAT or PROD)

  1. Use Pull Requests to promote your changes to the target branch
  2. Navigate to the deployed App URL in the target environment (UAT or DEV)
  3. Confirm that “Import Mode” is selected
  4. Click “Pull from Git Repository” to open the Workspace page looking at the Git Folder. Click on the Branch Name to enter the Git popup window and pull the changes
  5. Click “Validate Job Definition JSON Files” to pre-process and validate the files in the Git Folder, compare them against existing Workflow Jobs in the Workplace, and Identify changes
  6. Click “Import New and Changed Jobs” and/or “Remove Deleted Jobs” to deploy updates
  7. Address any errors discovered during steps 4 or 5, adding new resource name mappings to the configuration file (if needed), or re-attempting the import process.
PR Merged

Pre-Import Validation

Imported Jobs

Deleted Jobs

Deployment as a Databricks App

Deployment instructions are provided in the README.md of the GitHub repo.

The app needs to be deployed separately in each Databricks Workspace where you plan to  synchronize Workflow Jobs with a Git Repo via exporting or importing.

CAUTION: Databricks Apps run on always-on compute, which can lead to high costs. Recommended approach:

  • manually stop and start the app as needed to perform workflow promotions
  • monitor usage to control costs

If or when Databricks implements an auto-pause / auto-resume feature for rarely used apps (such as this one) it would would help optimize costs. Until then, manual intervention is required.

Alternative Deployment Option: Azure Container Apps

This App can run outside the Databricks Environment as a Docker Container — whether in local Docker or any managed containerized environment such as Azure Container Apps. The deployment environment must handle user authentication and TLS termination since these are not built into the app itself (Databricks environment natively provides this).

When deploying as a standalone container (outside of Databricks Apps), certain additional  environment variables such as Databricks Host and Token must be set so the app could communicate with Databricks REST APIs. See README.md for details.

Running the app as an Azure Container App can be significantly cheaper than an always-on Databricks App, based on current pricing. This is relevant unless or until Databricks introduces an auto-stop/auto-resume feature for apps.

You will still need to deploy as many copies of the app as many Databricks Workspace environments you have, since each deployed instance will be pointed towards a single Databricks Workspace where it facilitates either Export or Import process (rarely both).

Resource Names Mapping and Overrides

When promoting workflows between environments, some job configuration aspects differ and require overrides, including:

  • Compute Resources: named persistent resources (e.g., Clusters, SQL Warehouses) must be mapped unless Job Clusters or Serverless Compute are used.
  • “Run As” Users & Service Principals: different environments may have distinct user roles or authentication setups.
  • Other Potential Overrides: some additional settings may require customization depending on the use case.

At this time (v0.2), the App supports configurable mapping overrides on the receiving side (for Importing) only for Compute Resources and Run As principals.  The Job Definition files in the Git Repo will reflect the original workflow configuration in DEV environment (as exported).

Administrators of the target (importing) environment manage overrides via a Resource Name Mapping File, maintained as a Workspace File in each target environment.

Example resources mapping file:

1{"compute_name_mappings":         
2  {"cluster_name_mappings":             
3    {"bi-users-dev-cluster": 	              	
4      "bi-users-prod",               
5    "rajesh-dev-cluster": 	               
6      "etl-prod",               
7    "unknown": 	               
8      "etl-prod"           
9    }           
10  "warehouse_name_mappings":
11    {"starter-dev-warehouse":
12      "etl-prod-warehouse-small",
13    "unknown":
14      "etl-prod-warehouse-small"
15    }        
16  },
17  "run_as_mappings":
18    {"developer.lastname@company.com":
19      {"user_name":
20        "super.admin@company.com"
21      },
22    "1234d931-d019-48a3-b606-431cc316ecdd":
23      {"user_name":
24        "super.admin@company.com"
25      },
26    "another.developer@company.com":
27      {"service_principal_name":
28        "692bc6d0-ffa3-11ed-be56-0242ac120002"
29      },
30    "default":
31      {"service_principal_name":
32        "692bc6d0-ffa3-11ed-be56-0242ac120002"
33      }
34    }
35}


Give It a Try — We Value Your Feedback

If this approach resonates with you, we invite you to deploy the app in your own Databricks Workspace. It’s open source, and you can find the GitHub repository linked below:

➡️ GitHub Repository

Have any issues, bugs, or feature requests? Feel free to open an issue in the GitHub repository, and we’ll do our best to address it promptly!

Conclusion and Takeaways

Databricks Apps offer a powerful mechanism for building user-centric tools with rich interactive modern UIs handling complex background tasks within the Databricks platform

Databricks Asset Bundles (DABs) continues being an officially recommended native CI-CD friendly approach for git version control and promotions of Workflows, Notebooks Code and other bundled assets - when Data Engineering teams are technically mature and capable to use it.

➡️ This app is functional (as a first version) and can be used for workflows git version control and promotions by non-sophisticated clients that are not quite ready for DABs.

💡 There is still room for further improvement of the app, which we may address in subsequent versions, subject to clients interest

  • Support for additional job attributes overrides in the mapping file (e.g., notification email addresses)
  • The ability to export only selected jobs instead of all jobs in a workspace — making it easier to limit the scope of changes reflected in specific commits and PRs.  Currently the app exports all Workflow Jobs at once.

💡 We  identified several suggestions to Databricks Apps themselves, e.g. what capabilities we wish the vendor to add or improve at the platform level:

  • Add support for ReactJS and other modern rich frontend frameworks, or, ideally, a generic Dockerfile build process.  Databricks Apps currently center around Python-driven frameworks like Streamlit or Dash, with limited support for modern frontend frameworks like ReactJS. A more flexible deployment model — such as a generic Dockerfile build process — would allow for seamless integration of full-featured React frontends. Right now, if the developer wants to use React frontend, it must be pre-built, and hosted externally, or committed as static artifacts in Git, which is an anti-pattern. Due to this limitation, we had to rely on AlpineJS instead of React.

    NOTE: an unpopular choice of frontend framework (AlpineJS) is a noticeable impediment with AI-assisted coding. Modern AI-powered development agentic tools such as Cursor using frontier LLMs do a fantastic job quickly churning out interactive frontend UIs using React - but have much more struggles and difficulties correctly cooking AlpineJS and plain Javascript.
  • Improve the way we can manage Environment Variables for app deployment, like allowing admins to specify values at the App Compute settings level. Currently, all non-default env vars values need to be set in the app.yaml file directly in the Git Folder from where the app is deployed, likely leaving this file [uncommitted] to git. This approach is cumbersome, non-intuitive, and leads to pull conflicts any time the original app.yaml file is modified in the upstream GitHub Repository.
  • Support smaller compute capacities, and/or auto-pause and quick auto-resume of deployed apps (similar to Heroku or Render.io) to save costs and avoid the need to manually Stop and Start of infrequently used apps . Current pricing for an always-on App deployed on 2 CPU cores 6GB RAM (an unchangeable default) is somewhat steep and can reach $350 USD/month in Azure for each deployed app, unless it is manually stopped and started.

No items found.

Get in touch

Schedule a call for

Or fill in the form

By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Get in touch

Schedule a call for

Or fill in the form

By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Please turn your phone to see the form
Data engineering
CRM
Process intelligence
CLM
work with us