Bulletpoints
- Polski Światłowód Otwarty inherited incompatible databases from parent companies, which complicated data integration and hindered growth.
- Alterdata designed and built a secure, scalable serverless platform to address legacy issues and create a future-proof data management system.
- The solutions used included containerized applications for integration with data sources in Cloud Run, cost-effective Google Cloud Storage as a data lake, seamless data integration via BigQuery, and Google Cloud Composer with Airflow for data processing management.
- After successful project delivery, PŚO entrusted Alterdata with managing its data management platform and reporting in a convenient Data Team as a Service model.
Fiber to the home: leader’s journey towards 6M connections
Polski Światłowód Otwarty (Polish Open Fiber) is the largest wholesale-only open broadband network provider in Poland. It currently reaches 4 million households and boldly aims at 6 million connections by 2028.
PŚO was formed from the merger of two major telecom operators: UPC Polska and Play, to provide cutting-edge fiber internet access services. The company continuously expands its network and introduces solutions that allow its partners to start providing services to their customers quickly.
Data Dilemma: inherited efficiency challenges demand strategic solutions
Shortly after its launch, Polski Światłowód Otwarty faced the challenge of managing and integrating multiple data sources, including those inherited from its parent companies. Those included:
- 50 different systems
- 13 systems integrated into a Data Lake, including 6 outside PŚO control, 20 data sources in total
- 210 tables, approximately 1 billion rows, over 240 active logical Gibibytes of data, with the largest table containing: 565,422,508 rows, and over 93 active logical Gibibytes
PŚO could have created an independent unit within the parent company's on-premise data warehouse or separate data marts, but this wouldn't solve the performance or data quality issues. PŚO decided that building its data warehouse from scratch was the best choice.
Sky’s the limit: PŚO aims at cloud integration
PŚO aimed to gain independence from the outdated, predefined panels and dashboards of their parent organizations. They wanted to integrate diverse data sources and processing practices using the latest, future-proof cloud technology but lacked the necessary skills. They also needed a single, reliable data source
Most of all, the company needed seasoned experts to explain available market solutions and recommend the best fit for their business.
“We wanted to move away from legacy data sources in both companies’ systems and build a new data platform based on the latest cloud technologies. We chose Alterdata.io because of their expertise and experience with Google Clouds, as well as with external systems.”
- Aleksander Tomczyk, Product Owner w PŚO
The project delivery timetable was also crucial. PŚO aimed at rapid development from the start, and needed a new reporting and data analysis solution up and running within 3 to 6 months. This further added to the project’s complexity.
Another issue was manpower. The recently founded company preferred a small, agile team, and a tight project timeframe practically ruled out the in-house department.
Engineering of success: how to craft a seamless data ecosystem
Developing a solution for PŚO was a multi-faceted project, which involved:
- Data integration and collection
- Data management
- Data security and process compliance with regulations
- Data transformation
- Coordination of the processes that perform these tasks
Our priority was to ensure clear extraction of data from both relational (SQL Server, PostgreSQL, Oracle) and non-relational databases (MongoDB) into a coherent system, with support for various data extraction strategies.
We began by organizing team workshops with the company, to define their main requirements for the new system and analyze data acquisition strategies.
Together, we established the main requirements for the new system:
- Scalability (system, setup and maintenance costs)
- Connectivity
- Smooth integration with various data sources
- Preferably serverless solutions
- High efficiency, which allows for reporting faster than ever before
- Low implementation cost
It was also crucial to establish several data retrieval strategies:
- Incremental (partial data refresh)
- Full refresh (when partial refresh isn't possible)
- Merge (when data has retroactively changed)
After thoroughly assessing the client’s current and future needs, we decided to introduce a coherent data management solution to streamline and standardize data source integration and simplify making changes.
Alterdata solution: streamlined and scalable, next-gen data management
Alterdata implemented an advanced Data Hub architecture based on the Google Cloud ecosystem. This solution enabled independent, precise data management and centralization, ensuring a flexible and scalable, unified data system.
The project’s key element was deploying Data Lake and Enterprise Data Warehouse, including building, orchestrating and monitoring ELT processes. It allowed PŚO to efficiently collect, process and analyze data in real-time.
Tools of the trade: DevOps, Cloud Run and Google Cloud for PŚO’s data revolution
We used DevOps methodology and Terraform as an Infrastructure as Code tool, to automate resource deployment across three environments, ensuring fast, reliable implementations and maintaining operational continuity.
We also constructed and deployed containerized applications tailored to the PŚO needs in Cloud Run to integrate with source data. This approach eliminated concerns about scalability, costs, or infrastructure management.
Cloud Run also enabled secure connections through the UPC network to access data sources in the company’s data centers. Cloud Run Jobs, an extension of this service, allowed PŚO to easily import large source tables manually without complex configurations.
Next, we used Google Cloud Storage (GCS) to store pre-processed data, transformed by the Cloud Run application. GCS allowed us to handle large amounts of data without upfront costs and improved data management.
Using BigQuery enabled us to build a Data Warehouse for business analytics and data transformation, improving the data processing workflow. Google Cloud Composer and Airflow allowed us to manage multiple data sources and around 20 systems
Airflow’s task management and process triggering based on task execution facilitated the management of complex dataflows and their transformation processes. This allows PŚO to effectively manage a diverse and expanding data infrastructure.
We also implemented analytical tools, such as Power BI and provided user training. This improved data accessibility, enabled the benefits of data democratization and sped up reports and dashboard generation.
Alterdata demonstrated excellent expertise, professionalism and an individual approach to the client throughout the entire system implementation process. Its engineering team and specialists were extremely committed to the project, delivering high-quality solutions in line with our expectations and current IT market standards.
- Aleksander Tomczyk, Product Owner w PŚO
Ready for the future: a scalable data platform that adapts on the fly
Flexibility and scalability
A key feature of our solution for PŚO is its ability to efficiently adapt to even the most sudden changes. It efficiently integrates new data sources and is easy to modify as needed.
Even-based approach ready
The platform we used enabled PŚO to fully embrace the event-based approach. Google Cloud Composer sensors can immediately detect desired events, allowing the company to analyze them immediately.
Easy configuration management and modification
Our platform’s configuration file, the key element of our solution, simplifies system management and configuration, making it more intuitive and effective. Users enjoy full control over intuitive settings and can quickly adapt to changing business environments.
Large-file support
The new PŚO solution is optimized for handling large files and fast processing of huge volumes of data without performance loss. This ensures uninterrupted data analysis, even with the largest data loads.
DAG Generator
The platform we implemented generates dependency graphs (DAGs) and supports the orchestration with the use of sensors, making process configuration simpler and more effective. This speeds up and clarifies the design and management of the data structure.
For us, cooperation with Alterdata was not only effective but also inspiring. We recommend Alterdata as a reliable and competent partner in corporate data digitization.
For us, cooperation with Alterdata was not only effective but also inspiring. We recommend Alterdata as a reliable and competent partner in corporate data digitization.
Strategic success: cloud goes up, costs go down, PŚO gains data autonomy
PŚO achieved all expected immediate goals
Alterdata-implemented solutions gave PŚO complete data independence from their parent companies within the required timeframe. They can manage old and new data efficiently and gain real-time, actionable insights to drive their business.
The company now benefits from a flexible and scalable serverless data retrieval system and a unified data source for reliable reporting, which reduces data acquisition time and provides greater insight quality.
The high efficiency of the new solution shortened the data generation time by several hours compared to those previously used. This allows for creating a competitive edge, as every day the business user starts his work with access to current information.
Alterdata’s cost-effective solution
Implementing Google Cloud reduced the high maintenance and management costs of parent companies’ on-site infrastructure for the IT team. PŚO, already familiar with this data management solution, could hit the ground running and were satisfied with Google's customer support and overall experience.
The client gained long-term, strategic advantages
After successful transformation, the flexible and scalable platform, with its key features, is ready for future challenges. Its new infrastructure can now quickly adapt to accommodate more data and new data sources, allowing PŚO to make better business decisions.