Data Product Thinking: Treating Data as a Product in a Data Mesh Environment

A significant shift is underway in the data-driven landscape of the modern business world. Instead of seeing data as a by-product of business processes, forward-thinking organizations are now embracing Data Product Thinking, fundamentally reorienting their perspective to treat data as a product. Spurred by the revolutionary Data Mesh approach, this paradigm shift is dramatically reshaping how businesses create, manage, and utilize their data.

At its core, Data Product Thinking encapsulates the idea that data, like any other product, should be designed, created, and managed to meet the needs of its data consumers. From raw data harvested by data engineers to the sophisticated data products developed and deployed by data product managers and developers, every element in the data lifecycle serves a purpose and brings value to the business.

But this isn’t just about managing databases, data pipelines, or ensuring data quality. It’s about a profound shift in data management, moving away from monolithic data warehouses to a distributed, domain-oriented data mesh architecture. Organizations can turn their data into a strategic tool that drives business success and competitive advantage by creating reusable data assets and products that cater to specific business needs.

In this article, we’ll dive deep into the concept of treating data as a product within a Data Mesh environment, the roles involved, and how it’s influencing the future of data management. This approach to data is not just a passing trend – it’s the future of data infrastructure and a key driver of business value. So, let’s explore the new world of data products and the benefits they can bring to your business.

TL;DR

• Data as a Product is an approach that transforms the way organizations view and handle data, treating it as a standalone, valuable product rather than just a by-product of operations. This requires the data to be self-describing, discoverable, trustworthy, and secure.

• This methodology is instrumental in the Data Mesh architecture, which decentralizes data ownership to domain teams. It promotes greater agility, scalability, and adaptability in data management.

• Data product managers/owners, as a part of domain teams, act as a vital bridge between data and domain experts. Their understanding of the product and its associated data is paramount to successful data product development.

• The lifecycle of data products in a Data Mesh involves creation and development stages. During creation, data contracts are formulated and new data is added to an enterprise product catalog. In the development phase, connectors to the self-serve data platform are built.

• Ready to leverage the Data Mesh architecture? Contact nexocode’s data consultants today for tailored strategies to navigate your data management transformation journey.

Key Principles of Data Mesh - A Quick Recap

In our constantly evolving business landscape, the demand for high-quality, actionable data has never been greater. Central to this new wave of data-focused strategies is the Data Mesh. This innovative approach seeks to redefine how businesses handle their data management, shifting from a centralized model to a more distributed, domain-focused one..

Data Mesh is built around four fundamental principles:

Domain-Oriented Decentralized Data Ownership and Architecture

The first principle of Data Mesh asserts that data ownership should reside with the specific domain teams that best understand and utilize the data. This approach ensures that the teams responsible for the data products are those who are most familiar with the data sources and their value.

Data as a Product

This is where treating data as a product comes into play. Each domain team is responsible for the full lifecycle of their data product, from inception to retirement. This brings about a shift in mindset where data isn’t just a by-product of operations but is considered a standalone product with its own intrinsic value.

Self-Serve Data Infrastructure as a Platform

This principle emphasizes that a data infrastructure should be designed to be self-serve for data consumers, data analysts, and data scientists. This ensures the accessibility of data and enables domain teams to manage their data products independently.

Federated Computational Governance

Data quality, security, and privacy governance are shared across the domain data teams within the federated data governance model, ensuring a high level of data quality and accountability in the Data Mesh.

These principles embody the core tenets of the Data Mesh, revolutionizing how organizations view, treat, and manage their data. In the following sections, we delve into one of these principles — treating data as a product — and its transformative impact on how organizations think about and interact with their data.

Data mesh principles

Embracing Data Product Thinking

Central to the Data Mesh approach is the concept of Data Product Thinking. It’s a perspective that redefines the way data teams view, manage, and interact with their data assets. By treating data as a product, organizations can optimize their data management strategies, aligning their data with their business objectives more efficiently and effectively.

What Does it Mean to Treat Data as a Product?

Treating data as a product implies that data isn’t merely an output of operations, but a standalone, valuable asset that can create business value and competitive advantage. This shift in perspective means that data must have defined quality standards, a lifecycle, and a dedicated team for its development and maintenance — namely, the data product team. Each data product is designed to serve the needs of specific data consumers, ensuring that the data is not just available but valuable, usable, and fit for purpose.

Data Products vs. Data as a Product

The terms data product and data as a product may sound similar, but there’s a critical distinction. A data product is often a well-defined output that serves specific use cases, like a report, a dashboard, or a dataset used to train machine learning algorithms. On the other hand, data as a product is a broader concept that encapsulates the entire journey of data — from raw data to a refined, valuable asset. This concept emphasizes the lifecycle of data, the people involved (data product managers, data engineers, data analysts, etc.), and the processes (like data pipelines) that transform raw data into a valuable data product.

Why Is Data as a Product an Essential Concept in Today’s Data-Driven Landscape?

The concept of data as a product aligns perfectly with today’s data-driven landscape. As businesses become more reliant on data for their decision-making processes, treating data as a valuable asset rather than just a by-product of operations can lead to more meaningful insights and better business decisions.

Check this series

With the Data Mesh architecture, data is decentralized and owned by domain-focused teams who know best how to use and maintain their data. By viewing their data as a product, these teams can ensure data quality, develop data products that truly meet the needs of their consumers, and continuously adapt and improve their data products in line with changing business requirements and objectives.

This shift in perspective doesn’t just lead to improved data management — it’s a crucial part of gaining a competitive advantage in our increasingly data-driven world.

Traits of Successful Data Products

Key traits of successful data products

Discoverable: Successful data products are easily discoverable. They provide information about their existence, purpose, owner, and key metrics. They exist in an environment that is constantly evolving, so they share their source of origin, real-time information such as timeliness and quality metrics, as well as other crucial information including top use cases and applications enabled by their data. Discoverability allows data users to confidently search, find, and use the data they need.
Understandable: Once discovered, data users must be able to understand the data product. This means comprehending the semantics and syntax of the data it presents. It also involves grasping the relationship between entities and adjacent data products. For a data product to be usable, data users should understand how the data is presented, serialized, and how they can access and query it. All these are supplemented with sample datasets and example consumer codes.
Trustworthy: The trustworthiness of a data product is pivotal for its successful use. Data users need to confidently know that the data product is truthful - that it represents the reality of events and business facts. To instill trust, data products must clearly communicate their service level objectives (SLOs) and guarantee their fulfillment. Aspects like timeliness, completeness, data lineage, and operational qualities contribute to the trustworthiness of the data.
Addressable: Successful data products provide a unique and permanent address that users can access either programmatically or manually. This addressing system must adapt to the continuous changes in data and the mesh topology, including schema evolution, new data time slices, newly supported syntaxes, and changing run-time behavioral information.
Interoperable and Composable: Effective data products must standardize certain elements like field types, identifiers, global addresses, metadata fields, and schema linking to facilitate interoperability and composability. These standardizations enable users to link data across domains easily and compose them in insightful ways.
Natively Accessible: The spectrum of data user personas necessitates that data products be natively accessible. This means that data analysts, data scientists, or analytical application developers should be able to access and use data products with their preferred tools and methods.
Valuable on Its Own: A successful data product is valuable on its own. It carries a dataset that holds inherent value for the data users, contributing to business growth and customer satisfaction. It’s not merely the value derived from its correlation with other data products, but the value it carries as a standalone product.

Data product thinking - Venn Diagram

Each of these traits contributes to a holistic and user-centric approach to designing data products, ensuring they meet the needs of the data users while adhering to the overarching organizational goals. They are integral in forming the foundation of a robust, effective, and user-friendly data product.

Evolving Roles: Data Product Managers/Owners

The role of Data Product Managers (DPMs) or Data Product Owners has gained significance in the new data-oriented business environment. These individuals are key figures in developing, managing, and improving data products, playing a crucial part in the interface between domain experts, data scientists, data engineers, and business analysts.

Harness the full potential of AI for your business

As an integral part of the domain team, DPMs work closely with domain and data experts to transform business needs into data requirements and to ensure that these requirements are met. Their goal is to provide data products that are not only compliant with FAIR principles (Findability, Accessibility, Interoperability, and Reusability) but also bring measurable value to the business.

The success of an organization’s data-centric approach will rely heavily on the effectiveness of DPMs in deriving value from data products, underlining the critical importance of their role in the future of data-driven businesses.

The Lifecycle of Data Products in a Data Mesh Environment

The lifecycle of data products in a Data Mesh environment kickstarts with their formation, where raw data undergoes transformation to valuable assets. This process, a nexus of careful prioritization and planning, detailed curation, and precise execution, forms the foundation for data-driven decision-making and strategic initiatives.

Creation of Data Products: From Raw Data to Valuable Assets

The creation of data products pivots on a series of steps, including data collection, preprocessing, and cleaning. After identifying relevant data sources and structuring the gathered data, it is processed into a suitable format for further analysis. Crucially, a data contract is also developed during this phase, outlining the data usage and handling guidelines. Once created, these assets are added to an enterprise product catalog, enhancing the discoverability of the newly available data.

Developing Data Products: Data Pipelines and Dataset Instances

With the transformation of raw data into valuable assets complete, the focus shifts to the development of data products. This entails crafting data pipelines - sequential data processing steps - and generating dataset instances, the tangible outputs of these pipelines. Each pipeline is custom-built to fulfill particular business objectives. Organizations employing Apache Kafka as their data mesh backbone typically means building or reusing connectors that publish data products to Kafka. The self-serve data platform team is then responsible for supporting anyone wanting to utilize these data products (data consumers) by providing them with connectors for data consumption.

Data products published to the event streaming backbone and data consumers subscibing to them

Data as a Product Examples

In order to illustrate how a data product looks like within the framework of a Data Mesh, let’s delve into an example: the “Customer Purchase History” dataset from a hypothetical retail company. This data product encompasses valuable information about customer transactions and is a key asset for teams like marketing and sales. The following details provide an overview of what one could find in a data product catalog entry for this specific data product:

Data Product Name: Customer Purchase History

Data Contract:

• Data Product ID: DPH123
• Data Owner: Marketing Domain
• Data Product Manager: Jane Doe (Contact: jane.doe@company.com)
• SLA: Data refreshed daily at 12:00 AM UTC; 99.5% availability
• Data Confidentiality: Contains Personally Identifiable Information (PII), needs to be handled according to GDPR and company’s privacy policies
• Data Quality Checks: Every data ingestion is followed by automated data quality checks including completeness, validity, accuracy, consistency, and uniformity.

Description: This data product includes historical data of all customer transactions across all the company’s retail outlets and online platforms. It consists of individual transaction data, payment method, basket size, timestamp, store location, and product details. It’s primarily used by the marketing, sales, and strategy teams for customer segmentation, sales prediction, and personalized marketing campaigns.

Technical Information:

• Data Format: Parquet
• Data Size: ~500 GB updated daily
• API Access: Yes
• Access Endpoint: https://api.company.com/data/dph123
• Data Dictionary: Available in attached document

Usage: To access the data product, users can connect via the provided API endpoint or download directly in the preferred format. Users can filter data by various attributes such as date range, store location, product category, etc.

Related Documentation: Link to API Documentation, Data Dictionary, Usage Guidelines, GDPR compliance details

Version: v2.0.3

The listing in the data product catalog for this dataset would include all of this information. This would allow potential users to understand what the data product contains, who is responsible for it, how to access it, what the SLA is, and how to use it properly. It also clarifies data privacy expectations, given the presence of PII in the dataset.

In a streaming data context, a fitting example might be a “Real-Time Inventory Status” data product from the same retail company. This data product could provide near real-time updates about the availability of each product in each store or warehouse. Here’s how this data product might be described in the data product catalog:

Title: Real-Time Inventory Status

Description: The Real-Time Inventory Status data product provides near real-time updates about the availability of each product in each store or warehouse. It allows various departments across the organization to track product availability and make data-driven decisions.

Domain: Supply Chain and Inventory Management
Domain Team: Warehouse Operations Team
Data Product Manager: Jane Doe
Data Steward: John Smith
Data Source(s): Warehouse Management Systems, In-store Point of Sales Systems

Technical Information:

• Data Contract: An agreed format of inventory status messages including Product ID, Store ID, Current Quantity, Last Updated Timestamp, etc.
• Data Platform: Apache Kafka (with Kafka Connect for sourcing data, and Kafka Streams or KSQL for processing it)
• Data Frequency: Near real-time, with updates every time an inventory change event occurs (e.g., purchase, return, restocking)
• Data Quality Metrics: Metrics to track the freshness of the data (time from event occurrence to availability in Kafka), accuracy (comparing against periodic physical inventory counts), etc.

Usage: This data product can be consumed by multiple teams across the organization. For example, the Sales Team for monitoring product availability, the Marketing Team for planning campaigns based on product availability, the Supply Chain Team for better inventory planning and so on.

Related Documentation: [Link to technical documentation about how to consume from Kafka], [Link to business documentation about the meaning of each field in the data contract]

Access and Security: All access to this data product will require appropriate authentication and authorization. Strict GDPR regulations and privacy laws will be adhered to, ensuring that no sensitive information is leaked.

Data Discoverability: This data product will be discoverable in the central self-serve data platform’s catalog.

Version: 1.0.0 (with clear documentation of what changes with each version)

Here, data is not only generated by daily operations, but it is also continuously streamed to Apache Kafka in near real-time. This streaming data is then made available to various teams for timely and data-driven decision-making.

The Future of Data Management with Data as a Product

As we traverse further into the era of digital transformation, the concept of Data as a Product emerges as a powerful paradigm. It represents a significant shift from the traditional, monolithic data management approach, granting organizations the ability to scale and adapt quickly in the data-centric business environment. By embodying a decentralized, product-oriented model, the data mesh architecture unlocks the potential to treat data as valuable, standalone products that serve specific business needs, are owned by domain teams, and are governed through self-serve data platforms.

With the application of data product thinking, your organization can embrace a more agile, robust, and efficient way of leveraging data. It paves the way for a future where every stakeholder can discover, understand, trust, and use data autonomously to drive actionable insights and impactful results.

Transitioning towards a Data as a Product mindset may require rethinking your current data strategies and structures. If you’re considering this shift, nexocode’s data engineering experts are ready to guide your journey. With deep experience in data product management and data mesh implementation, we can help you craft and execute a strategy tailored to your organization’s unique requirements.

To explore more about how your organization can benefit from this approach, contact nexocode’s data engineering experts. The future of data management is here, and it’s more promising than ever.

What is meant by "Data as a Product"?: Data as a Product" is a concept where data is treated as a standalone, valuable asset rather than just an output of business operations. It requires the data to be self-describing, discoverable, secure, and trustworthy.
How does the Data as a Product approach benefit businesses?: This approach benefits businesses by making data more manageable, useful, and efficient. It promotes interoperability, domain orientation, self-serve access, and decentralized governance, making it easier for different teams to utilize the data.
What is the role of a Data Product Manager in a Data Mesh architecture?: In a Data Mesh architecture, a Data Product Manager acts as a bridge between data and domain experts, guiding the development and usage of data products. They are part of the domain team and have an intimate understanding of the product and its associated data.

About the author

Dorota Owczarek

AI Product Lead & Design Thinking Facilitator

With over ten years of professional experience in designing and developing software, Dorota is quick to recognize the best ways to serve users and stakeholders by shaping strategies and ensuring their execution by working closely with engineering and design teams.
She acts as a Product Leader, covering the ongoing AI agile development processes and operationalizing AI throughout the business.

Data Product Thinking: Treating Data as a Product in a Data Mesh Environment

Key Principles of Data Mesh - A Quick Recap