Our comprehensive approach included an internal audit, designing the Databricks architecture, and supporting the recruitment of a Head of Data Science.
Project highlights
- Internal audit and platform selection: we conducted a thorough internal audit to understand Finom's current data infrastructure and business needs. Based on our findings, Databricks was chosen for its robust features and scalability.
- Databricks architecture design: our team designed a comprehensive Databricks architecture tailored to Finom’s requirements. This architecture includes the integration of MLFlow for seamless model management and deployment, and Google Cloud functions for defining call logic.
- Recruitment of head of data science: we are actively supporting Finom in recruiting a head of data science, ensuring they have the right leadership to drive their data initiatives forward.
Model deployment and automation
- Production models: currently, one production model operates in a recommendation mode, deployed as a stateless Docker container on GCP's managed Kubernetes service. We are working to automate the model deployment cycle, allowing models to be deployed "on the side" before going live, and ensuring the process stabilizes for active use.
- Rule-based models: while there are no rule-based models in production yet, several are planned. Our goal is to build an automated deployment process without requiring DevOps involvement and to enable these models to function in business-critical processes.
- LLM-Based Models: Finom has an LLM model in production with limited scope, integrated with INTERCOM and OpenAI API. We are centralizing template management and ensuring seamless integration with their new data warehouse.
We proposed several centralized model deployment options, ultimately recommending Databricks for its built-in MLFlow, seamless integration with Kafka, and multi-cloud capabilities. This approach minimizes licensing costs and leverages existing expertise while enhancing scalability and functionality.
Data warehouse and reporting
Finom's reporting is built on GCP's BigQuery and Dataform, processing 30 million transactions per quarter. We suggested maintaining BigQuery for reporting while leveraging Databricks SQL for a unified data warehouse, enabling comprehensive data lineage, and transparent control over changes and impacts on reports.
Future Roadmap
The future roadmap includes migrating existing ETL processes from Postgres to Databricks, configuring CI/CD, and gradually shifting reporting from BigQuery to Databricks. This will ensure a single centralized data warehouse for all purposes, enhancing data management and operational efficiency.