How We Reduced Data Maintenance Costs by 60% for a Digital Native Company

How We Reduced Data Maintenance Costs by 60% for a Digital Native Company

Our digital native client faced challenges in managing an increasing volume of data and optimizing analytical processes. The company heavily relied on data to drive its marketing, operational, and product initiatives. The goal of the project was to modernize the data consumption model and align it with the needs of a rapidly growing organization.

Scope of Work

The collaboration involved a comprehensive migration and refactoring of unmanageable, disorganized data processes that had grown freely over the years, turning into technical debt that required urgent changes. These processes were moved to the modern DBT (Data Build Tool) Core environment. The key phases of the project included:

Analysis of the Existing Data Consumption Model

Our team of analysts mapped all existing data transformation processes, identifying key areas for optimization. Our analysis revealed that the previous model delayed refresh times, generated inconsistencies in results, and did not allow for dynamic scaling of infrastructure in response to growing business needs. Critical pain points included slow data refresh rates and a lack of standardization in SQL code.

The team reviewed approximately 20 data projects, covering diverse sources such as user monitoring systems, marketing tools, and financial data.

Migration of Data Processes to DBT

We migrated existing SQL procedures (BigQuery Routines) to DBT Core, enhancing the recommended data model with staging, intermediate, and mart models. Additionally, we introduced core models containing key dimensions and metrics used across most marts. The marts were designed to be cost-effective while meeting the client's reporting needs.

Our team developed universal macros that replaced repetitive SQL code, increasing process consistency and simplifying management and updates. We also redesigned the prediction process to unify and simplify it using macros.

Performance Optimization

Our specialists implemented partitioning and clustering in existing BigQuery procedures before completing the migration to introduce optimizations as quickly as possible. This allowed us to reduce current data processing costs even before the migration was finished. The new DBT-based approach also leveraged partitioning and clustering, resulting in a fourfold reduction in data refresh times and ensuring that processing costs remained optimally managed despite increasing data volumes.

Changes in data structure, including switching from STRUCT columns to JSON format, reduced table processing costs by nearly 30% and cut creation time by 85%.

Process Orchestration

We implemented Apache Airflow to manage data refresh scheduling, enabling more precise task planning and dynamic adaptation to changing requirements. By leveraging DAGs (Directed Acyclic Graphs) with shared dependencies, processes were executed in parallel, significantly reducing data processing times and eliminating the need for manual interventions. Advanced error notifications allowed the team to focus on strategic tasks rather than routine monitoring.

We also integrated CI/CD in Google Cloud Build, streamlining version control and accelerating deployment. This ensured a faster and more transparent synchronization between teams while eliminating manual deployment delays and minimizing the risk of code conflicts.

Data Validation and Cleansing

We implemented DBT tests to automatically monitor data accuracy at various processing stages. This included detecting non-unique keys, unexpected missing values, or sudden drops in key metrics, allowing for proactive issue identification and business alerts.

Engineers also developed mechanisms for historical data correction, eliminating errors at the data source while ensuring documentation of necessary changes.

Reporting and Visualization

By designing new data marts, dashboards now operate faster and consume fewer data resources.

Client Benefits

Key Results

  • Reduced data refresh time from ~8 hours to 2 hours.
  • Lowered data infrastructure maintenance costs by ~60% through BigQuery optimizations.
  • Standardization and better data organization enabled faster execution of repetitive tasks, quicker integration of new data sources, and improved team efficiency.
  • Minimized risk of manual errors through automation and data quality testing.

Business Value

  • Increased operational efficiency for analytical, marketing, and product teams.
  • Improved data infrastructure scalability, enabling further organizational growth.
  • Better cost management, supporting strategic investment decisions.
  • Faster onboarding of new employees into BI projects and greater self-sufficiency for advanced BI users.

Conclusion

The data migration and optimization project for our anonymous technology industry client is an example of a comprehensive approach to modernizing data consumption models. By implementing DBT, Airflow, and CI/CD, the company significantly improved data process efficiency and manageability, eliminating technical debt while reducing barriers for new analysts.

As a result, advanced BI users can now utilize tools more effectively and perform self-service analytics. This project demonstrates how proper data management can deliver measurable benefits at both operational and strategic levels.

Need expert help optimizing your Data Warehouse?

Let’s talk

Read also