Data Product Thinking: Treating Data as a Product in a Data Mesh Environment

Data Product Thinking: Treating Data as a Product in a Data Mesh Environment

Dorota Owczarek - August 21, 2023

A significant shift is underway in the data-driven landscape of the modern business world. Instead of seeing data as a by-product of business processes, forward-thinking organizations are now embracing Data Product Thinking, fundamentally reorienting their perspective to treat data as a product. Spurred by the revolutionary Data Mesh approach, this paradigm shift is dramatically reshaping how businesses create, manage, and utilize their data.

At its core, Data Product Thinking encapsulates the idea that data, like any other product, should be designed, created, and managed to meet the needs of its data consumers. From raw data harvested by data engineers to the sophisticated data products developed and deployed by data product managers and developers, every element in the data lifecycle serves a purpose and brings value to the business.

But this isn’t just about managing databases, data pipelines, or ensuring data quality. It’s about a profound shift in data management, moving away from monolithic data warehouses to a distributed, domain-oriented data mesh architecture. Organizations can turn their data into a strategic tool that drives business success and competitive advantage by creating reusable data assets and products that cater to specific business needs.

In this article, we’ll dive deep into the concept of treating data as a product within a Data Mesh environment, the roles involved, and how it’s influencing the future of data management. This approach to data is not just a passing trend – it’s the future of data infrastructure and a key driver of business value. So, let’s explore the new world of data products and the benefits they can bring to your business.

TL;DR

Data as a Product is an approach that transforms the way organizations view and handle data, treating it as a standalone, valuable product rather than just a by-product of operations. This requires the data to be self-describing, discoverable, trustworthy, and secure.

This methodology is instrumental in the Data Mesh architecture, which decentralizes data ownership to domain teams. It promotes greater agility, scalability, and adaptability in data management.

Data product managers/owners, as a part of domain teams, act as a vital bridge between data and domain experts. Their understanding of the product and its associated data is paramount to successful data product development.

The lifecycle of data products in a Data Mesh involves creation and development stages. During creation, data contracts are formulated and new data is added to an enterprise product catalog. In the development phase, connectors to the self-serve data platform are built.

Ready to leverage the Data Mesh architecture? Contact nexocode’s data consultants today for tailored strategies to navigate your data management transformation journey.

Key Principles of Data Mesh - A Quick Recap

In our constantly evolving business landscape, the demand for high-quality, actionable data has never been greater. Central to this new wave of data-focused strategies is the Data Mesh. This innovative approach seeks to redefine how businesses handle their data management, shifting from a centralized model to a more distributed, domain-focused one..

Data Mesh is built around four fundamental principles:

Domain-Oriented Decentralized Data Ownership and Architecture

The first principle of Data Mesh asserts that data ownership should reside with the specific domain teams that best understand and utilize the data. This approach ensures that the teams responsible for the data products are those who are most familiar with the data sources and their value.

Data as a Product

This is where treating data as a product comes into play. Each domain team is responsible for the full lifecycle of their data product, from inception to retirement. This brings about a shift in mindset where data isn’t just a by-product of operations but is considered a standalone product with its own intrinsic value.

Self-Serve Data Infrastructure as a Platform

This principle emphasizes that a data infrastructure should be designed to be self-serve for data consumers, data analysts, and data scientists. This ensures the accessibility of data and enables domain teams to manage their data products independently.

Federated Computational Governance

Data quality, security, and privacy governance are shared across the domain data teams within the federated data governance model, ensuring a high level of data quality and accountability in the Data Mesh.

These principles embody the core tenets of the Data Mesh, revolutionizing how organizations view, treat, and manage their data. In the following sections, we delve into one of these principles — treating data as a product — and its transformative impact on how organizations think about and interact with their data.

Data mesh principles

Data mesh principles

Embracing Data Product Thinking

Central to the Data Mesh approach is the concept of Data Product Thinking. It’s a perspective that redefines the way data teams view, manage, and interact with their data assets. By treating data as a product, organizations can optimize their data management strategies, aligning their data with their business objectives more efficiently and effectively.

What Does it Mean to Treat Data as a Product?

Treating data as a product implies that data isn’t merely an output of operations, but a standalone, valuable asset that can create business value and competitive advantage. This shift in perspective means that data must have defined quality standards, a lifecycle, and a dedicated team for its development and maintenance — namely, the data product team. Each data product is designed to serve the needs of specific data consumers, ensuring that the data is not just available but valuable, usable, and fit for purpose.

Data Products vs. Data as a Product

The terms data product and data as a product may sound similar, but there’s a critical distinction. A data product is often a well-defined output that serves specific use cases, like a report, a dashboard, or a dataset used to train machine learning algorithms. On the other hand, data as a product is a broader concept that encapsulates the entire journey of data — from raw data to a refined, valuable asset. This concept emphasizes the lifecycle of data, the people involved (data product managers, data engineers, data analysts, etc.), and the processes (like data pipelines) that transform raw data into a valuable data product.

Why Is Data as a Product an Essential Concept in Today’s Data-Driven Landscape?

The concept of data as a product aligns perfectly with today’s data-driven landscape. As businesses become more reliant on data for their decision-making processes, treating data as a valuable asset rather than just a by-product of operations can lead to more meaningful insights and better business decisions.

With the Data Mesh architecture, data is decentralized and owned by domain-focused teams who know best how to use and maintain their data. By viewing their data as a product, these teams can ensure data quality, develop data products that truly meet the needs of their consumers, and continuously adapt and improve their data products in line with changing business requirements and objectives.

This shift in perspective doesn’t just lead to improved data management — it’s a crucial part of gaining a competitive advantage in our increasingly data-driven world.

Traits of Successful Data Products

Key traits of successful data products

Key traits of successful data products

  1. Discoverable: Successful data products are easily discoverable. They provide information about their existence, purpose, owner, and key metrics. They exist in an environment that is constantly evolving, so they share their source of origin, real-time information such as timeliness and quality metrics, as well as other crucial information including top use cases and applications enabled by their data. Discoverability allows data users to confidently search, find, and use the data they need.
  2. Understandable: Once discovered, data users must be able to understand the data product. This means comprehending the semantics and syntax of the data it presents. It also involves grasping the relationship between entities and adjacent data products. For a data product to be usable, data users should understand how the data is presented, serialized, and how they can access and query it. All these are supplemented with sample datasets and example consumer codes.
  3. Trustworthy: The trustworthiness of a data product is pivotal for its successful use. Data users need to confidently know that the data product is truthful - that it represents the reality of events and business facts. To instill trust, data products must clearly communicate their service level objectives (SLOs) and guarantee their fulfillment. Aspects like timeliness, completeness, data lineage, and operational qualities contribute to the trustworthiness of the data.
  4. Addressable: Successful data products provide a unique and permanent address that users can access either programmatically or manually. This addressing system must adapt to the continuous changes in data and the mesh topology, including schema evolution, new data time slices, newly supported syntaxes, and changing run-time behavioral information.
  5. Interoperable and Composable: Effective data products must standardize certain elements like field types, identifiers, global addresses, metadata fields, and schema linking to facilitate interoperability and composability. These standardizations enable users to link data across domains easily and compose them in insightful ways.
  6. Natively Accessible: The spectrum of data user personas necessitates that data products be natively accessible. This means that data analysts, data scientists, or analytical application developers should be able to access and use data products with their preferred tools and methods.
  7. Valuable on Its Own: A successful data product is valuable on its own. It carries a dataset that holds inherent value for the data users, contributing to business growth and customer satisfaction. It’s not merely the value derived from its correlation with other data products, but the value it carries as a standalone product.

Data product thinking - Venn Diagram

Data product thinking - Venn Diagram

Each of these traits contributes to a holistic and user-centric approach to designing data products, ensuring they meet the needs of the data users while adhering to the overarching organizational goals. They are integral in forming the foundation of a robust, effective, and user-friendly data product.

Evolving Roles: Data Product Managers/Owners

The role of Data Product Managers (DPMs) or Data Product Owners has gained significance in the new data-oriented business environment. These individuals are key figures in developing, managing, and improving data products, playing a crucial part in the interface between domain experts, data scientists, data engineers, and business analysts.

As an integral part of the domain team, DPMs work closely with domain and data experts to transform business needs into data requirements and to ensure that these requirements are met. Their goal is to provide data products that are not only compliant with FAIR principles (Findability, Accessibility, Interoperability, and Reusability) but also bring measurable value to the business.

The success of an organization’s data-centric approach will rely heavily on the effectiveness of DPMs in deriving value from data products, underlining the critical importance of their role in the future of data-driven businesses.

The Lifecycle of Data Products in a Data Mesh Environment

The lifecycle of data products in a Data Mesh environment kickstarts with their formation, where raw data undergoes transformation to valuable assets. This process, a nexus of careful prioritization and planning, detailed curation, and precise execution, forms the foundation for data-driven decision-making and strategic initiatives.

Creation of Data Products: From Raw Data to Valuable Assets

The creation of data products pivots on a series of steps, including data collection, preprocessing, and cleaning. After identifying relevant data sources and structuring the gathered data, it is processed into a suitable format for further analysis. Crucially, a data contract is also developed during this phase, outlining the data usage and handling guidelines. Once created, these assets are added to an enterprise product catalog, enhancing the discoverability of the newly available data.

Developing Data Products: Data Pipelines and Dataset Instances

With the transformation of raw data into valuable assets complete, the focus shifts to the development of data products. This entails crafting data pipelines - sequential data processing steps - and generating dataset instances, the tangible outputs of these pipelines. Each pipeline is custom-built to fulfill particular business objectives. Organizations employing Apache Kafka as their data mesh backbone typically means building or reusing connectors that publish data products to Kafka. The self-serve data platform team is then responsible for supporting anyone wanting to utilize these data products (data consumers) by providing them with connectors for data consumption.

Data products published to the event streaming backbone and data consumers subscibing to them

Data products published to the event streaming backbone and data consumers subscibing to them

Data as a Product Examples

In order to illustrate how a data product looks like within the framework of a Data Mesh, let’s delve into an example: the “Customer Purchase History” dataset from a hypothetical retail company. This data product encompasses valuable information about customer transactions and is a key asset for teams like marketing and sales. The following details provide an overview of what one could find in a data product catalog entry for this specific data product:

Data Product Name: Customer Purchase History

Data Contract:

Data Product ID: DPH123
Data Owner: Marketing Domain
Data Product Manager: Jane Doe (Contact: jane.doe@company.com)
SLA: Data refreshed daily at 12:00 AM UTC; 99.5% availability
Data Confidentiality: Contains Personally Identifiable Information (PII), needs to be handled according to GDPR and company’s privacy policies
Data Quality Checks: Every data ingestion is followed by automated data quality checks including completeness, validity, accuracy, consistency, and uniformity.

Description: This data product includes historical data of all customer transactions across all the company’s retail outlets and online platforms. It consists of individual transaction data, payment method, basket size, timestamp, store location, and product details. It’s primarily used by the marketing, sales, and strategy teams for customer segmentation, sales prediction, and personalized marketing campaigns.

Technical Information:

Data Format: Parquet
Data Size: ~500 GB updated daily
API Access: Yes
Access Endpoint: https://api.company.com/data/dph123
Data Dictionary: Available in attached document

Usage: To access the data product, users can connect via the provided API endpoint or download directly in the preferred format. Users can filter data by various attributes such as date range, store location, product category, etc.

Related Documentation: Link to API Documentation, Data Dictionary, Usage Guidelines, GDPR compliance details

Version: v2.0.3

The listing in the data product catalog for this dataset would include all of this information. This would allow potential users to understand what the data product contains, who is responsible for it, how to access it, what the SLA is, and how to use it properly. It also clarifies data privacy expectations, given the presence of PII in the dataset.

In a streaming data context, a fitting example might be a “Real-Time Inventory Status” data product from the same retail company. This data product could provide near real-time updates about the availability of each product in each store or warehouse. Here’s how this data product might be described in the data product catalog:

Title: Real-Time Inventory Status

Description: The Real-Time Inventory Status data product provides near real-time updates about the availability of each product in each store or warehouse. It allows various departments across the organization to track product availability and make data-driven decisions.

Domain: Supply Chain and Inventory Management
Domain Team: Warehouse Operations Team
Data Product Manager: Jane Doe
Data Steward: John Smith
Data Source(s): Warehouse Management Systems, In-store Point of Sales Systems

Technical Information:

Data Contract: An agreed format of inventory status messages including Product ID, Store ID, Current Quantity, Last Updated Timestamp, etc.
Data Platform: Apache Kafka (with Kafka Connect for sourcing data, and Kafka Streams or KSQL for processing it)
Data Frequency: Near real-time, with updates every time an inventory change event occurs (e.g., purchase, return, restocking)
Data Quality Metrics: Metrics to track the freshness of the data (time from event occurrence to availability in Kafka), accuracy (comparing against periodic physical inventory counts), etc.

Usage: This data product can be consumed by multiple teams across the organization. For example, the Sales Team for monitoring product availability, the Marketing Team for planning campaigns based on product availability, the Supply Chain Team for better inventory planning and so on.

Related Documentation: [Link to technical documentation about how to consume from Kafka], [Link to business documentation about the meaning of each field in the data contract]

Access and Security: All access to this data product will require appropriate authentication and authorization. Strict GDPR regulations and privacy laws will be adhered to, ensuring that no sensitive information is leaked.

Data Discoverability: This data product will be discoverable in the central self-serve data platform’s catalog.

Version: 1.0.0 (with clear documentation of what changes with each version)

Here, data is not only generated by daily operations, but it is also continuously streamed to Apache Kafka in near real-time. This streaming data is then made available to various teams for timely and data-driven decision-making.

The Future of Data Management with Data as a Product

As we traverse further into the era of digital transformation, the concept of Data as a Product emerges as a powerful paradigm. It represents a significant shift from the traditional, monolithic data management approach, granting organizations the ability to scale and adapt quickly in the data-centric business environment. By embodying a decentralized, product-oriented model, the data mesh architecture unlocks the potential to treat data as valuable, standalone products that serve specific business needs, are owned by domain teams, and are governed through self-serve data platforms.

With the application of data product thinking, your organization can embrace a more agile, robust, and efficient way of leveraging data. It paves the way for a future where every stakeholder can discover, understand, trust, and use data autonomously to drive actionable insights and impactful results.

Transitioning towards a Data as a Product mindset may require rethinking your current data strategies and structures. If you’re considering this shift, nexocode’s data engineering experts are ready to guide your journey. With deep experience in data product management and data mesh implementation, we can help you craft and execute a strategy tailored to your organization’s unique requirements.

To explore more about how your organization can benefit from this approach, contact nexocode’s data engineering experts. The future of data management is here, and it’s more promising than ever.

What is meant by "Data as a Product"?

Data as a Product" is a concept where data is treated as a standalone, valuable asset rather than just an output of business operations. It requires the data to be self-describing, discoverable, secure, and trustworthy.

How does the Data as a Product approach benefit businesses?

This approach benefits businesses by making data more manageable, useful, and efficient. It promotes interoperability, domain orientation, self-serve access, and decentralized governance, making it easier for different teams to utilize the data.

What is the role of a Data Product Manager in a Data Mesh architecture?

In a Data Mesh architecture, a Data Product Manager acts as a bridge between data and domain experts, guiding the development and usage of data products. They are part of the domain team and have an intimate understanding of the product and its associated data.

About the author

Dorota Owczarek

Dorota Owczarek

AI Product Lead & Design Thinking Facilitator

Linkedin profile Twitter

With over ten years of professional experience in designing and developing software, Dorota is quick to recognize the best ways to serve users and stakeholders by shaping strategies and ensuring their execution by working closely with engineering and design teams.
She acts as a Product Leader, covering the ongoing AI agile development processes and operationalizing AI throughout the business.

Would you like to discuss AI opportunities in your business?

Let us know and Dorota will arrange a call with our experts.

Dorota Owczarek
Dorota Owczarek
AI Product Lead

Thanks for the message!

We'll do our best to get back to you
as soon as possible.

This article is a part of

Becoming AI Driven
90 articles

Becoming AI Driven

Artificial Intelligence solutions are becoming the next competitive edge for many companies within various industries. How do you know if your company should invest time into emerging tech? How to discover and benefit from AI opportunities? How to run AI projects?

Follow our article series to learn how to get on a path towards AI adoption. Join us as we explore the benefits and challenges that come with AI implementation and guide business leaders in creating AI-based companies.

check it out

Becoming AI Driven

Insights on practical AI applications just one click away

Sign up for our newsletter and don't miss out on the latest insights, trends and innovations from this sector.

Done!

Thanks for joining the newsletter

Check your inbox for the confirmation email & enjoy the read!

This site uses cookies for analytical purposes.

Accept Privacy Policy

In the interests of your safety and to implement the principle of lawful, reliable and transparent processing of your personal data when using our services, we developed this document called the Privacy Policy. This document regulates the processing and protection of Users’ personal data in connection with their use of the Website and has been prepared by Nexocode.

To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at the level which ensures compliance with applicable Polish and European laws such as:

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (published in the Official Journal of the European Union L 119, p 1); Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item 1000);
  2. Act of 18 July 2002 on providing services by electronic means;
  3. Telecommunications Law of 16 July 2004.

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.

1. Definitions

  1. User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal person, or an organizational unit which is not a legal person to which specific provisions grant legal capacity.
  2. Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Wadowicka 7, 30-347 Kraków, entered into the Register of Entrepreneurs of the National Court Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
  3. Website – website run by Nexocode, at the URL: nexocode.com whose content is available to authorized persons.
  4. Cookies – small files saved by the server on the User's computer, which the server can read when when the website is accessed from the computer.
  5. SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary methods of data transmission encrypts data transmission.
  6. System log – the information that the User's computer transmits to the server which may contain various data (e.g. the user’s IP number), allowing to determine the approximate location where the connection came from.
  7. IP address – individual number which is usually assigned to every computer connected to the Internet. The IP number can be permanently associated with the computer (static) or assigned to a given connection (dynamic).
  8. GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals regarding the processing of personal data and onthe free transmission of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
  9. Personal data – information about an identified or identifiable natural person ("data subject"). An identifiable natural person is a person who can be directly or indirectly identified, in particular on the basis of identifiers such as name, identification number, location data, online identifiers or one or more specific factors determining the physical, physiological, genetic, mental, economic, cultural or social identity of a natural person.
  10. Processing – any operations performed on personal data, such as collecting, recording, storing, developing, modifying, sharing, and deleting, especially when performed in IT systems.

2. Cookies

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet. The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end device.
Cookies are used to:

  1. improve user experience and facilitate navigation on the site;
  2. help to identify returning Users who access the website using the device on which Cookies were saved;
  3. creating statistics which help to understand how the Users use websites, which allows to improve their structure and content;
  4. adjusting the content of the Website pages to specific User’s preferences and optimizing the websites website experience to the each User's individual needs.

Cookies usually contain the name of the website from which they originate, their storage time on the end device and a unique number. On our Website, we use the following types of Cookies:

  • "Session" – cookie files stored on the User's end device until the Uses logs out, leaves the website or turns off the web browser;
  • "Persistent" – cookie files stored on the User's end device for the time specified in the Cookie file parameters or until they are deleted by the User;
  • "Performance" – cookies used specifically for gathering data on how visitors use a website to measure the performance of a website;
  • "Strictly necessary" – essential for browsing the website and using its features, such as accessing secure areas of the site;
  • "Functional" – cookies enabling remembering the settings selected by the User and personalizing the User interface;
  • "First-party" – cookies stored by the Website;
  • "Third-party" – cookies derived from a website other than the Website;
  • "Facebook cookies" – You should read Facebook cookies policy: www.facebook.com
  • "Other Google cookies" – Refer to Google cookie policy: google.com

3. How System Logs work on the Website

User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The information collected in the Logs is processed primarily for purposes related to the provision of services, i.e. for the purposes of:

  • analytics – to improve the quality of services provided by us as part of the Website and adapt its functionalities to the needs of the Users. The legal basis for processing in this case is the legitimate interest of Nexocode consisting in analyzing Users' activities and their preferences;
  • fraud detection, identification and countering threats to stability and correct operation of the Website.

4. Cookie mechanism on the Website

Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful information and are stored on the User's computer – our server can read them when connecting to this computer again. Most web browsers allow cookies to be stored on the User's end device by default. Each User can change their Cookie settings in the web browser settings menu: Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings > Advanced. In the "Privacy and security" section, click the Content Settings button. In the "Cookies and site date" section you can change the following Cookie settings:

  • Deleting cookies,
  • Blocking cookies by default,
  • Default permission for cookies,
  • Saving Cookies and website data by default and clearing them when the browser is closed,
  • Specifying exceptions for Cookies for specific websites or domains

Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options > Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with the OK button.

Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field. From there, you can check a relevant field to decide whether or not to accept cookies.

Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site data. From there, adjust the setting: Allow sites to save and read cookie data

Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there, select the desired security level in the "Accept cookies" area.

Disabling Cookies in your browser does not deprive you of access to the resources of the Website. Web browsers, by default, allow storing Cookies on the User's end device. Website Users can freely adjust cookie settings. The web browser allows you to delete cookies. It is also possible to automatically block cookies. Detailed information on this subject is provided in the help or documentation of the specific web browser used by the User. The User can decide not to receive Cookies by changing browser settings. However, disabling Cookies necessary for authentication, security or remembering User preferences may impact user experience, or even make the Website unusable.

5. Additional information

External links may be placed on the Website enabling Users to directly reach other website. Also, while using the Website, cookies may also be placed on the User’s device from other entities, in particular from third parties such as Google, in order to enable the use the functionalities of the Website integrated with these third parties. Each of such providers sets out the rules for the use of cookies in their privacy policy, so for security reasons we recommend that you read the privacy policy document before using these pages. We reserve the right to change this privacy policy at any time by publishing an updated version on our Website. After making the change, the privacy policy will be published on the page with a new date. For more information on the conditions of providing services, in particular the rules of using the Website, contracting, as well as the conditions of accessing content and using the Website, please refer to the the Website’s Terms and Conditions.

Nexocode Team

Close

Want to unlock the full potential of Artificial Intelligence technology?

Download our ebook and learn how to drive AI adoption in your business.

GET EBOOK NOW