Unlock Exclusive Content

Share your details to access the content and stay informed with relevant updates.
By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data engineering
Data engineering
CRM
Process intelligence
CLM
Work with us

Genie Forge: Accelerating Genie Space Deployment

May 27, 2026
7 min.
FDE
Author
Oleg Mikhov

Genie Space Setup Challenge

When teams first experiment with Genie Spaces, the setup can look deceptively simple: connect the data, add a few instructions, and start asking questions. In practice, a reliable Genie Space depends on the quality of the context behind it: trusted tables, business definitions, joins, examples, and evaluation cases. The real work is collecting that context, filtering out what is not relevant, and shaping it around a focused use case.

Genie Forge: From Internal Tool to Databricks App

Genie Forge was an internal tool we built for our own delivery work. We used it to speed up setups, reduce repetitive manual effort, and create artifacts that could be reviewed.

The solution was built around two keystones.

Usage-aware setup. Gathering the right context from people is often the most time-consuming part of a Genie Space project. Analysts, engineers, and business stakeholders know the data, but their knowledge is spread across teams and calendars. Genie Forge reduces that dependency by collecting as much context as possible from the existing environment: table metadata, historical queries, dashboards, lineage, join patterns, and a short description of the intended use case. It uses those signals to prepare the first version of the Genie Space setup.

Autorefinement aka autoresearch loop. Genie Forge helps generate evaluation cases, test how the Genie Space behaves, and refine the setup through iterations. Instructions, semantic definitions, joins, and sql examples can be adjusted based on measured behavior, not informal trial and error. The goal is not just to create a space, but to improve whether it answers the intended questions reliably.

After showing it to Databricks, the feedback was clear: if this workflow could help teams adopt Genie more broadly, it should be available for them and should be easier to use. That pushed us to turn Genie Forge into a Databricks App.

Genie Forge Demo
Genie Forge - Short Demo

Three Design Decisions

During development of Genie Forge, we made a few architecture choices to keep the workflow faster without making it opaque. The setup still needed to be reviewable, traceable, and safe to improve through iterations.

First, we use a separate semantic layer built with views or metric views. The goal is to simplify what we expose to the Genie Space and make the data easier for Genie to understand. Raw tables often contain more detail than the use case needs, or fields are not in the best shape for natural language analysis. Views let us apply transformations such as case when logic, text-to-numeric conversion, JSON field extraction, cleaned labels, or predefined metric logic before exposing the data to Genie.

Second, we use MLflow with custom judges for evaluation tracking instead of relying only on built-in benchmarks. Genie Forge generates its own evaluation tests. Some are SQL-grounded: we analyze historical queries and create business questions with expected SQL patterns. Others are synthetic and not tied to a specific query; these help check whether the response shape, structure, and business meaning look reasonable when SQL history is limited. MLflow lets us compare runs and decide which version is strong enough to promote.

Third, we stage everything in Git. Genie Space configuration can be represented as JSON and updated through the API, even when parts of it are hard to manage directly in the UI. Genie Forge serializes the setup into versioned files so every change can be reviewed, compared, redeployed, or reverted. If an experiment makes the space worse, we can return to a previous version; if it improves results, we can promote it with a clear record of what changed.

Genie Forge - High-level architecture

A Concrete Example: From Weak Assumption to Better Instructions

Here is an example of how Genie Forge autorefinement improves a Genie Space during setup in a retail domain.

In one evaluation run, the Genie Space was tested on a question about active promotions. After the initial setup, Genie made a reasonable but incorrect assumption: it treated active promotions as records where “Promotion End Date” was NULL.

That was not correct for this dataset. In this case, all promotions had predefined timelines, so an active promotion meant a promotion with an end date in the future.

Genie Forge detected the failure through the evaluation test, captured the incorrect assumption, and updated the Genie Space instructions with clearer business logic. After the autorefinement cycle, the evaluation score for this test improved from 32 to 73.

MLflow log of Genie Forge autorefinement

go


Genie Space Updated Instructions

This is the value of the autorefinement loop: the issue was not fixed by manually guessing what to change. It was found through evaluation, translated into clearer instructions, and made reviewable before promotion.

More Than Faster Setup

For us, Genie Forge is more than an accelerator.  It is a way to help business users and analysts better understand what a well-built Genie Space should look like and how it should be managed. 

In many companies, the demand is not for one Genie Space, but for many Genie Spaces across teams, domains, and use cases. Genie Forge helps reduce the cost and effort of each setup while also educating teams on how to manage spaces with more confidence.

Based on our delivery work, Genie Forge has reduced initial setup effort by 60-70% compared with our previous manual process, where preparing context, setup artifacts, and the first evaluation pass typically required 25-30 hours per space. We also see +15-25% response accuracy uplift compared with the initial setup baseline, driven by usage-led design and autorefinement against defined evaluation cases.

Full demo:  https://www.youtube.com/watch?v=TYfCrGrOZcc&t

More materials: t1a.com/gf

No items found.

Get in touch

Schedule a call for

Or fill in the form

By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Get in touch

Schedule a call for
Or fill in the form
By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Please turn your phone to see the form
Data engineering
CRM
Process intelligence
CLM
work with us