Process & Story
Boussias is a leading B2B media and events company operating in Greece and Cyprus. Since 1980, the organization has been setting industry standards in trade publishing, conferences, and awards. With a growing footprint and an increasingly digital operation, Boussias needed a modern, scalable approach to data management.
Boussias faced a familiar data dilemma, disconnected systems, manual processes, and limited access to insights. Beyond solving immediate operational inefficiencies, the company wanted to future-proof its data infrastructure to support advanced analytics and AI initiatives.
Problem
The core issues included:
- Data scattered across multiple formats, including flat files, Excel sheets, databases, real-time streaming data, and APIs
- Data silos across departments
- Heavy reliance on spreadsheets
- Manual reconciliation of entities
- Fragmented data sources with no clear lineage
- Lack of a unified, robust foundation to enable AI-driven projects and data innovation
The company needed more than a one-off solution, they needed a data infrastructure that could grow with the business, be managed in the future by an internal team, and integrate seamlessly with existing systems.
Solution
We designed and implemented a modern, scalable data lake architecture for Boussias, built on Google Cloud Platform, to unify fragmented data sources, streamline data operations, and lay a strong foundation for analytics and future AI initiatives.
Outcomes
- Key data sources integrated, Salesforce (CRM) and MariaDB (eCommerce data) connected to the centralized data lake
- Over 3 million data rows migrated and ready for analysis and reporting
- 1,000+ structured tables established across medallion architecture layers
- End-to-end pipeline delivered, from ingestion with Fivetran to business intelligence exposure via Metabase, all in just 3 months
- Full knowledge transfer completed within one week, enabling seamless handover to the internal team
Cloud-native, scalable data infrastructure
At the core of the solution was a robust and cost-effective infrastructure based on Google Cloud Platform. We used BigQuery as the central data warehouse to ingest and store structured, semi-structured, and unstructured data. Its serverless nature ensured effortless scalability, while allowing the business to process millions of records efficiently without managing infrastructure.
Within 3 months, Boussias went from disconnected data sources to a production-ready system ingesting two critical data sources:
- Salesforce, the CRM platform
- eCommerce MariaDB database, housing transactional data
The platform was ready with all layers, from ingestion to analytics, fully automated and ready for plugging in the next sources. The solution seamlessly migrated over 3 million records across 1,000+ structured tables, laying the groundwork for effortless expansion as new data sources are added.
Layered medallion architecture for clear data governance
To ensure clarity and traceability, we implemented a medallion architecture with bronze (raw), silver (cleaned), and gold (business-ready) layers. This approach provided a systematic pipeline for refining and promoting data across stages, significantly reducing manual data wrangling.
We incorporated CI/CD pipelines to deploy transformations and models safely and efficiently, enabling iterative development while ensuring data quality through built-in validation checks.
Automated ingestion from diverse sources
Using Fivetran, we connected initial core systems, Salesforce and an eCommerce MariaDB database, to automate data ingestion. This eliminated the need for manual exports and provided a real-time, always-updated data pipeline that could be easily extended to additional sources like Excel files, APIs, and streaming data.
This automation allowed Boussias to shift from time-consuming, spreadsheet-based workflows to instantaneous, reliable access to unified data.
Powerful transformation and modeling framework
Data cleansing, augmentation, and transformation were managed via SQLMesh, supported by Python for advanced data logic. We implemented logic to:
- Clean inconsistent data at the source
- Enrich records with contextual metadata
- Automatically promote and validate datasets through transformation stages
- Monitor lineage and manage schema evolution seamlessly
This framework made the entire pipeline transparent, auditable, and easy to evolve for future needs.
Business Intelligence for all
To empower non-technical users, we integrated Metabase, an open-source BI platform. Business users could instantly explore curated datasets, create dashboards, and build reports, no SQL or engineering support required. The platform became the single source of truth for insights across departments.